Named Entity Recognition with Homophones-Noisy Data

2019 
General named entity recognition systems exclusively focus on higher accuracy regardless of dirty data. However, raw source data face serious challenges specially that are originated from automated speech recognition systems’ results. In this paper, we propose Pinyin (Pinyin is the official romanization system for Standard Chinese, each Chinese character has its own pinyin sequence which is composed of Latin alphabet) Hierarchical Attention Encoder-Decoder network and Character Alternate Network to overcome Chinese homophones’ problems which frequently frustrate researchers in consecutive Natural Language Understanding (NLU). Our models present a none word segmentation structure to effectively avoid secondary data corruption and adequately extract words’ internal features. Besides, corrupted sequences can be revised by character-level network. Evaluation demonstrates that our proposed method achieves 93.73% F1 scores which are higher than 90.97% F1 scores using baseline models in homophone-noisy dataset. Additional experiments are conducted to show equivalent results in the universal dataset.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    0
    Citations
    NaN
    KQI
    []