A MapReduce Improved ID3 Decision Tree for Classifying Twitter Data

2021 
In this contribution, we introduce an innovative classification approach for opinion mining. We have used the feature extractor Fast Text to detect and capture the given tweets’ relevant data efficiently. Then, we have applied the feature selector Information Gain to reduce the dimensionality of the high feature. Finally, we have employed the obtained features to carry out the classification task using our improved ID3 decision tree classifier, which aims to calculate the weighted information gain instead of information gain used in traditional ID3. In other words, to measure the weighted information gain for the current conditioned feature, we follow two steps: First, we compute the weighted correlation function of the current conditioned feature. Second, we multiply the obtained weighted correlation function by the information gain of this current conditioned feature. This work is implemented in a distributed environment using the Hadoop framework, with its programming framework MapReduce and its distributed file system HDFS. Its primary goal is to enhance the performance of a well-known ID3 classifier in terms of accuracy, execution time, and ability to handle massive datasets. We have performed several experiments that aim to evaluate our suggested classifier’s effectiveness compared to some other contributions chosen from the literature.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    35
    References
    1
    Citations
    NaN
    KQI
    []