Text Representation Model for Multiple Language Forms in Spoken Chinese Expression

2022 
Mixture of multiple language forms in spoken Chinese is a common but unfavorable issue.. It increases the difficulty of intent understanding and leads to inconvenience for information communication. Existing studies on intent recognition mainly focus on single language form or parallel multilingual language while paying little attention to spoken texts including multiple language forms. In considering that it is hard to capture the semantics of an expression with multiple language forms, it is important to study the problem. To solve this issue, a text representation model for the spoken Chinese expression mixed with English and Chinese Pinyin is proposed. And the feature matrix is built to mine the composition information of English and Pinyin. Besides, the model can efficiently distinguish English from Chinese Pinyin even though both fragments are composed of English letters. Meanwhile, it can effectively process the problem of hidden text information since the problem has been transformed into the Chinese translation task of English and Pinyin. In addition, to verify the performance of the model, the texts processed by this model are used as the input of the classifier. extensive experiments on a large online logistics manual customer service corpus show that this text representation model is correct and effective. It can not only eliminate the obstacles of the mixing of multiple language forms but also bring better results for intent understanding.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []