Semantic and Morphological Information Guided Chinese Text Classification

2020 
Recently proposed models such as BERT, perform well in many text processing tasks. They get context-sensitive features, which is a good semantic for word sense disambiguation, through deeper layer and a large number of texts. But, for Chinese text classification, majority of datasets are crawled from social networking sites, these datasets are semantically complex and variable. How much data is needed to pre-train these models in order for them to grasp semantic features and understand context is a question. In this paper, we propose a novel shallow layer language model, which uses sememe information to guide model to grasp semantic information without a large number of pre-trained data. Then, we use the Chinese character representations generated from this model to do text classification. Furthermore, in order to make Chinese as easy to initialize as English, we employ convolution neural networks over Chinese strokes to get Chinese character structure initialization for our model.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    0
    Citations
    NaN
    KQI
    []