A Construction Engineering Domain New Word Detection Method with the Combination of BiLSTM-CRF and Information Entropy.

2019 
The study of new word detection is of great significance of the improvement on the performance of Chinese natural language processing tasks. To solve the problem of the inconsistency of coarse-grained long-word boundaries and the detection of compound words in detection of new words, a new word detection method with the combination of BiLSTM-CRF and information entropy(IE) is proposed. First, BiLSTM model extracts candidate new words. Then, information entropy splicing candidate new words to redefine word boundaries. The BiLSTM model could effectively utilize context information, CRF could consider the relationship between adjacent labels, realizing sentence horizontal sequence labeling, which could solve the problem that some compound words and long words are difficult to identify. The results of experiment show that our model achieves better performance on construction engineering datasets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []