An Novel Text Extraction Technique Based on Pattern Matching and Automatic Backtracking

Xiangyu He,Zhiyong Hong,Wenhua Yu,Zhiqiang Zeng,Kaiyao Wang

An Novel Text Extraction Technique Based on Pattern Matching and Automatic Backtracking

2021

Xiangyu He
Zhiyong Hong
Wenhua Yu
Zhiqiang Zeng
Kaiyao Wang

Most Web documents are described in a DOM tree structure, and therefore, the extraction of Web key information usually requires a process of traversing the DOM tree. The key entities to be extracted are not only determined by the extracted content itself, but also depend on the surrounding environment of the extracted entities. In this paper, we introduce a method for Web key information extraction through non-deterministic algorithm, which uses automatic backtracking algorithms and pattern matching to concisely describe the key text content that needs to be extracted, while significantly simplifying the extraction process.

Keywords:

Backtracking
Pattern matching
Pattern recognition
Artificial intelligence
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations