Anuj@IEEE BigData 2019: A Novel Code-Switching Behavior Analysis in Social Media Discussions Natural Language Processing

2019 
With internet and social media breaking the ice, more and more people across the globe have started to use social media platforms like Facebook, Twitter, Instagram etc. Most people follow Multilingualism as a mode of communication to convey information across the globe. They share topics over the common forum to converse, with the use of multiple languages being spoken either by individual speaker or group of speakers. This essentially makes the context more complex to understand and it makes even more harder for processing various Natural Language Processing (NLP) tasks. Such user behavior of mixing multiple languages in one single discussion topic, having multiple community inclusion is referred as code-switching. At IEEE 2019 Big data conference, a Shared Task (Understanding Multilingual Communities through Analysis of code-switching Behaviors in Social Media Discussions) is conducted as a track of Big Data Cup. Firstly, Tasks is to detect the language of each post given in the discussion forum with the help of multiple languages. Secondly, to detect relevance score of a post by determining how much the content is closely connected or appropriate in the discussion. This paper proposes a novel approach to detect the language of each word in the post using Natural Language Processing (NLP) techniques involving linguistics, Python package(langdetect) and various other approaches. It also explains how Machine Learning is applied to figure out relevance of a post and other metrics required for prediction. Code-Mixing detection is an important step for any NLP application to determine the language of a post at first place in order to perform any NLP task over social media.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    2
    References
    1
    Citations
    NaN
    KQI
    []