Learning a Subclass of Deterministic Regular Expression with Counting

2019 
In this paper, we propose a subclass of single-occurrence regular expressions with counting (cSOREs) and give a learning algorithm of cSOREs. First, we learn a SORE. Then, we construct a countable finite automaton (CFA) by traversing the syntax tree of the obtained SORE. Next, the CFA runs on the given finite sample to obtain the minimum and maximum number of repetitions of the subexpressions under the iteration operators. Finally we obtain a cSORE by traversing the syntax tree and introducing the counting operators. Our algorithm not only can learn a cSORE, which is expressive enough to cover more XML data, but also has better generalization ability for smaller sample.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    1
    Citations
    NaN
    KQI
    []