Knowledge Extraction: Automatic Classification of Matching Rules

Yunyi Tang,Le Wang,Xiaolong Chen,Zhaoquan Gu,Zhihong Tian

Knowledge Extraction: Automatic Classification of Matching Rules

2021

With the fast development of information technologies, more massive amounts of data are produced in cyberspace. Traditional web search methods cannot satisfy users’ demands timely and accurately, and it is an urgent task to develop big search techniques in cyberspace. MDATA (Multi-dimensional Data Association and Intelligent Analysis) is a knowledge representation model with temporal and spatial characteristics. Through the effective expression of temporal and spatial characteristics, it supports efficient updating of dynamic knowledge. Pattern matching is often used to extract the needed knowledge from massive data for constructing the MDATA. Pattern matching requires matching rules to acquire needed substrings from a string. In practical application scenarios, some matching rules can be divided into several categories. The same category of the matching rules has the same meaning, but with different expressions. Regular expressions can aggregate matching rules with consistent structure and strong regularity together. However, in practical scenarios such as cyber security knowledge, such homogeneous matching rules are rare, and most of them are random and disordered. For random matching rules, manually designing regular expressions to aggregate them becomes time consuming and laborious. In order to address the problem, we apply word embedding algorithm to automatic classifying matching rules. Word embedding is a kind of representation learning algorithms which is usually adopted in recommendation systems, relation mining, text similarity matching and so on. It can convert words into low-dimensional space vectors based on neural network models. However, word embedding algorithms take into account the relationship between semantic information and context, which needs a large number of data. When we only consider the matching rules in pattern matching, such data is insufficient to reflect the context relationship, which leads to the failure of deriving accurate results. In this chapter, we design an automatic classification method which only needs a small number of data to meet the practical requirement.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations