Comparative Evaluation of Several Classification Algorithms on News Posts Using Reddit Social Network Dataset

Ahmad Rawashdeh,Mohammad Rawashdeh,Omar Rawashdeh

Comparative Evaluation of Several Classification Algorithms on News Posts Using Reddit Social Network Dataset

2021

This research investigates a comparative evaluation of the results of different classification algorithms applied on the Reddit News Social Network Dataset. A program, which is well described was written in C# Dot Net that could run the classification using different parameters. These parameters which were input to the program, include the data size to read, training and testing size, class to predict, vector representation and the classification algorithm including Logistic Regression (LR), Naive Bayes (NB), Decision Tree (DT), and N-Nearest Neighbor (NN). The classification was performed multiple times (various parameters). Also, the training and testing data were selected at random. That, to get a better judgment on the results. The performance metrics: precision, recall, accuracy, f-measure, sensitivity, and specificity were calculated for every time any classification algorithm is executed. Results showed that the precision for the classification Algorithms ranged from a value greater than 50 to 100%. Regarding the average precision results, in most of the results, DT had the highest average precision (4 times as the highest with one time together with LR but for different data size). Also, it was found that regarding the time required for a classification algorithm to produce results, for a data of size 200, all algorithms were fast (less than 5 s) except for LR (2–3 min). For a data of size 500, LR was the slowest with a time of 32 min. Then NN was the second slowest with a time of 2 min. All others including NB, NN and DT were the fastest (time in less than 5 s). For a data of size 1000, LR took 1 day and it was not even enough for the LR classifier to finish executing and produce any results. All remaining algorithms also did not complete the calculation of results in 1 day.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations