Preprocessing kNN algorithm classification using K-means and distance matrix with students’ academic performance dataset

2020 
The existence of outliers in the dataset can cause low accuracy in a classification process. Outliers in dataset can be removed from a preprocessing stage of classification algorithms. Clustering can be used as an outlier detection method. This study applies K-Means and a distance matrix to detect outliers and remove them from datasets that already have class labels. This research used a dataset of student study results totaling 6847 instances, having 18 attributes, and 3 class labels. Preprocessing applies the K-Means method to get centroid in each class, the distance matrix is used to evaluate distance of instance to centroid. Outliers, which is a different class, will be removed from dataset. This preprocessing improves classification accuracy of K-NN algorithm. Data without pre-processing has 72.28% accuracy, preprocessed data using K-Means and Euclidean has 98.42% accuracy (an increase of 26.14%), while the K-Means and Manhattan has 97.76% accuracy (an increase of 25.48%).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []