Importance of Data Re-Sampling and Dimensionality Reduction in Predicting Students' Success

2021 
In this paper, we present the importance of data pre-processing in predicting students' success. We implemented Principal Component Analysis for dimensionality reduction to achieve better model performance. Data re-sampling technique was also utilized to handle the imbalanced class problem that is one of the significant issues in effective classification in Educational Data Mining due to the nature of the data from educational settings. We also performed a comparative analysis on the impacts of Random Under-Sampling (RUS), Random Over-Sampling (ROS), and Synthetic Minority Over-Sampling Technique (SMOTE) to an imbalanced dataset used in this study. SMOTE and PCA techniques application offer better performance compared to RUS and ROS with PCA. Support Vector Machine had the best accuracy value of 0.94 after the application of SMOTE and PCA. The application of PCA on the imbalanced data also positively affected the accuracy of the models used in this study. We used other performance metrics to evaluate our models: Kappa, Area Under Curve, and Precision-Recall curve. Our finding shows that the predictive models can predict student success with the proper application of PCA and data re-samnling techniques.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    0
    Citations
    NaN
    KQI
    []