Automatic Classification of Matching Rules in Pattern Matching.

2020 
Word embedding algorithm is usually used in recommendation system, relation mining, text similarity matching and other fields. In this paper, we apply Word2vec in the word embedding algorithm to automatic classification of matching rules. During pattern matching, a matching rule needs to be given to find all substrings that are identical to the matching rule in a string. In practical application scenarios, some matching rules can be divided into several categories. The same category of matching rules has the same meaning with different expressions. For homogeneous matching rules with consistent structure and strong regularity, regular expressions can be used to aggregate them together. However, in the actual scene, such homogeneous matching rules are rare, and most of them are random and disordered matching rules. For such matching rules, manually designing regular expressions to aggregate them becomes time consuming and laborious. In the field of natural language processing, there are many algorithms based on neural network that can embed words into low-dimensional space vectors, However, these algorithms take into account the relationship between semantic information and context, so a large amount of data is needed. If only the matching rules in pattern matching are considered, there is often not enough data to reflect the context relationship, which leads to the failure to get accurate results. In this paper, a method of automatic classification of matching rules is designed based on Word2vec.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    0
    Citations
    NaN
    KQI
    []