Presumptive Detection of Cyberbullying on Twitter through Natural Language Processing and Machine Learning in the Spanish Language

2019 
Nowadays, the constant development of information and communication technologies (ICTs) has changed the inter-personal interaction, allowing to transfer real experiences to a virtualized medium such as Internet. In this sense, although the space-time barriers of traditional communication are broken and social relationships are strengthened, problems related to adverse behaviors may arise. Bullying, defined as an act that threatens a person’s holistic well-being, becomes cyberbullying when it is done over Internet, causing anxiety problems, depression and even suicide attempts. For this reason, it is essential to detect this type of behaviour in time. This research deploys a Spanish cyberbullying prevention system (SPC), which relies on Natural Language Processing (NLP) methods and different machine learning techniques (Naive Bayes, Support Vector Machine and Logistic Regression), using Twitter as the basis for the extraction of knowledge bases or corpus. Several precision metrics and variable corpus sizes are used for the training. The learning results reach a maximum accuracy of 93%, verified through the application of three study cases.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    5
    Citations
    NaN
    KQI
    []