Classification of Textual Sentiment Using Ensemble Technique

Md. Mashiur Rahaman Mamun,Omar Sharif,Mohammed Moshiul Hoque

Classification of Textual Sentiment Using Ensemble Technique

2022

In recent years, the widespread use of the Internet has resulted in a revolutionary way for people to share their feelings or sentiment on blogs, social media, e-commerce sites, and online platforms. Most of the feelings expressed on the online platforms are in textual forms (such as status, tweets, comments, and reviews). These textual expressions are unstructured, laborious, and time-consuming to organize, manipulate, or efficient storage due to their messy forms. Textual sentiment analysis refers to the automatic process of assigning an expression or text to an appropriate polarity (positive, negative, and neutral). Although Bengali is ranked seventh most popular language globally and the second famous Indic language, the development of language processing tools is minimal to date. This paper proposes an ensemble-based technique to classify Bengali textual sentiment into two categories: positive and negative. Due to the unavailability of the Bengali sentiment corpus, this work also developed a dataset (called ‘Bengali Sentiment Analysis Dataset or BSaD’) containing 8122 text expressions. This work investigates eight popular baseline classifiers [such as Logistic Regression (LR), Randon Forest (RF), Decision Tree (DT), K-nearest Neighbor (KNN), Support Vector Machine (SVM), Multinomial Naive Bayes (MNB), Stochastic Gradient Descent, and AdaBoost] with Term frequency-Inverse document frequency (TF-IDF) and Bag-of-words (BoW) feature for textual sentiment analysis on three datasets. This work also investigates the four ensemble methods (LR + RF, RF + SVM, LR + SVM, and LR + RF + SVM) developed by combining three best-performing base classifiers (LR, RF, and SVM). Experimental results show that the ensemble approach (i.e., LR + RF + SVM) with TF-IDF (uni-gram + bi-gram + tri-gram) features outperformed the other classifier models achieving the highest accuracy 82% on the developed dataset.

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations