Public opinion classification and text alignment based on Chinese and Tibetan corpus

2017 
To address the need for researching the security of Chinese and Tibetan networks, the classification of public opinion of Chinese and Tibetan texts is proposed. First, web pages are collected. Second, preprocessing is conducted to extract the useful information from web pages. Third, a table of the Chinese and Tibetan public opinion key words is built. Finally, text similarity calculation is proposed to classify the text according to the table of public opinion words. A Chinese–Tibetan text-level alignment approach that is based on Chinese and Tibetan translation dictionary is proposed to match word frequency and position. Furthermore, sentence-level alignment algorithm is studied. The alignment performance is related to the Chinese and Tibetan translation dictionary. Text classification of public opinion and Chinese–Tibetan text alignment system is developed. After public opinion classification of Chinese text, the alignment software can discover the most similar Tibetan text and present it to the user. This research can effectively contribute to identifying Chinese and Tibetan public opinion text and is meaningful for information retrieval, text clustering, and Chinese and Tibetan machine translation.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    1
    Citations
    NaN
    KQI
    []