Unsupervised text pattern learning using minimum description length

2010 
The knowledge of text patterns in a domain-specific corpus is valuable in many natural language processing (NLP) applications such as information extraction, question-answering system, and etc. In this paper, we propose a simple but effective probabilistic language model for modeling the in-decomposability of text patterns. Under the minimum description length (MDL) principle, an efficient unsupervised learning algorithm is implemented and the experiment on an English critical writing corpus has shown promising coverage of patterns compared with human summary.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    11
    References
    3
    Citations
    NaN
    KQI
    []