Polyseme-Aware Vector Representation for Text Classification

2020 
Representation models for text classification have recently shown impressive performance. However, these models neglect the importance of polysemous words in text. When polysemous words appear in a text, imprecise polysemous word embeddings will produce low-quality text representation that results in changing the original meaning of the text. To address this problem, in this paper, we present a more effective model architecture, the polyseme-aware vector representation model (PAVRM), to generate more precise vector representations for words and texts. The PAVRM can effectively identify polysemous words in a corpus with a context clustering algorithm. Additionally, we propose two methods to construct polysemous word representations, PAVRM-Context and PAVRM-Center. Experiments conducted on three standard text classification tasks and a custom text classification task demonstrate that the proposed PAVRM can be effectively introduced into existing models to generate higher-quality word and text representations to achieve better classification performance.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    2
    Citations
    NaN
    KQI
    []