Effectiveness of preprocessing techniques over social media texts for the improvement of machine learning based classifiers

Leonardo Esnaola,Juan Pablo Tessore,Hugo Dionisio Ramón,Claudia Cecilia Russo

Effectiveness of preprocessing techniques over social media texts for the improvement of machine learning based classifiers

2019

The language present in the context of social networks is usually more informal than the one used in traditional sources. The researches that take such content as input for machine learning based classifying algorithms, perform, as a first step, a cleaning and standardization process. The goal of the latter is to improve the accuracy of the classification. In this paper, several cleaning tasks are defined and executed over a dataset of comments extracted from the social network Facebook. The goal is to verify if the corrections, made by such tasks, produce a significant improvement in the accuracy reached by the classifying algorithms. The results obtained, indicate that, over this type of dataset, preprocessing tasks with a reasonably good performance in the correction of errors, do not necessarily produce a noteworthy improvement in the classification accuracy reached by the algorithms.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations