Effectiveness of preprocessing techniques over social media texts for the improvement of machine learning based classifiers

2019 
The language present in the context of social networks is usually more informal than the one used in traditional sources. The researches that take such content as input for machine learning based classifying algorithms, perform, as a first step, a cleaning and standardization process. The goal of the latter is to improve the accuracy of the classification. In this paper, several cleaning tasks are defined and executed over a dataset of comments extracted from the social network Facebook. The goal is to verify if the corrections, made by such tasks, produce a significant improvement in the accuracy reached by the classifying algorithms. The results obtained, indicate that, over this type of dataset, preprocessing tasks with a reasonably good performance in the correction of errors, do not necessarily produce a noteworthy improvement in the classification accuracy reached by the algorithms.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    0
    Citations
    NaN
    KQI
    []