Shallow syntactic analysis of Chinese texts

2017 
The paper considers a problem of automatic processing of natural language Chinese texts. One of the pressing tasks in this area is automatic fact acquisition from text documents by a query because existing automatic translators are useless at this task. The goal of the work is direct extraction of facts from the text in the original language without its translation. The suggested approach consists of syntactic analysis of sentences with subsequent matching of parts of speech found with a formalized query in the form of subject-object-predicate. A distinctive feature of the proposed algorithm of syntactic analysis is the absence of phase of segmentation into words for the sequence of hieroglyphs that make up the sentences. The bottleneck at this task is a dictionary because the correct interpretation of a phrase can be impossible when a word is absent in the dictionary. To eliminate this problem, we propose to identify a sentence model by function words while limitedness of the dictionary could be compensated by an automatic building of a subject area thesaurus and a dictionary of common words using statistical processing of a document corpus. The suggested approach was approved on a small topic area with a limited dictionary where it demonstrates its robustness. The analysis of temporal characteristics of the developed algorithm was carried out as well. As the proposed algorithm uses a naive inference, the parsing speed at real tasks could be unacceptable low, and this should become a subject for further research.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    8
    References
    0
    Citations
    NaN
    KQI
    []