Using Label Noise Filtering and Ensemble Method for Sentiment Analysis on Thai Social Data

2019 
Sentiment analysis is an essential task for social listening, especially in service and product analysis. Prior works on sentiment analysis, especially in Thai language, mostly focus on the improvement of model architecture without considering error propagation from word tokenizers or noisy text from social media. In this paper, three contributions are proposed for implementing social analysis model. First, text pre-processing is used to mitigate noise from input texts. Second, robustness towards word segmentation is enhanced by using an ensemble process with two tokenizers. Lastly, the training process inspired by Co-training method is proposed in order to filter label noise within the data. In the experiments, the model achieves 2.56% improvement on the average macro f-l score when compared with the baseline models in social media data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    1
    Citations
    NaN
    KQI
    []