A novel algorithm for pattern matching with back references

2015 
Modern network security applications, such as network-based intrusion detection systems (NIDS) and firewalls, routinely employ deep packet inspection to identify malicious traffic. In deep packet inspection, the contents of network traffic are matched against patterns of malicious traffic to identify attack-carrying packets. The pattern matching algorithms employed for deep packet inspection must be fast, as the algorithms are often implemented on middle-boxes residing on high-speed gigabits per second links. The majority of patterns employed in network security applications are regular languages. However, regular language-based patterns have limited expressive power and are not capable of describing some complex features in network payload. Back reference is an important feature provided by many pattern matching tools, e.g., PCRE, the regular expression libraries of Java, Perl, and Python. Back references are used to identify repeated patterns in input strings. Patterns containing back-references are non-regular languages. Very little work has been done to improve the time-efficiency of back reference-based pattern matching. The de facto algorithm to implement back reference is recursive backtracking, but it is vulnerable to algorithmic complexity attacks. In this paper, we present a novel approach to implement back references. The basic idea of our approach is to transform a back reference problem to a conditional submatch problem, and represent it with a Non-deterministic Finite Automata (NFA)-like machine subject to some constraints. Our experimental results show that our approach resists known algorithmic complexity attacks, and is faster than PCRE by up to three orders of magnitude for certain types of patterns.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    1
    Citations
    NaN
    KQI
    []