Disambiguation in regular expression matching via position automata with augmented transitions

Satoshi Okui,Taro Suzuki

Disambiguation in regular expression matching via position automata with augmented transitions

2010

Satoshi Okui
Taro Suzuki

This paper offers a new efficient regular expression matching algorithm which follows the POSIX-type leftmost-longest rule. The algorithm basically emulates the subset construction without backtracking, so that its computational cost even in the worst case does not explode exponentially; the time complexity of the algorithm is O(mn(n+c)), where m is the length of a given input string, n the number of occurrences of the most frequently used letter in a given regular expression and c the number of subexpressions to be used for capturing substrings. A formalization of the leftmost-longest semantics by using parse trees is also discussed.

Keywords:

Backtracking
Powerset construction
ReDoS
Substring
Blossom algorithm
Parsing
Regular expression
Time complexity
Theoretical computer science
Discrete mathematics
Mathematics
Algorithm

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations