Polarity Detection in a Cross-Lingual Sentiment Analysis using spaCy

2020 
This paper presents a comparison of sentiment analyses performed on a dataset of French tweets and it’s machine translated version (in English) using Google Translate. In recent years, a fairly new Python library called spaCy has gained significant traction in performing sentiment analyses in languages other than English due to its multilingual support. There still haven’t been major publications on evaluating its usage for the said purpose. In this research, TFIDF features are extracted from three different N-grams (Unigrams, Bigrams and Trigrams) in the corpus after preprocessing to remove irrelevant details. These are then trained and tested using three machine learning algorithms - Logistic Regression, Naive Bayes Algorithm and Stochastic Gradient Descent. A comparative study then put forth will help future researchers in the following three areas – the capability of performing sentiment analyses in languages other than English, reliability of machine translation tools in performing cross-lingual sentiment analyses and the evaluation of Python’s library, spaCy for performing multilingual sentiment analyses.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    1
    Citations
    NaN
    KQI
    []