Code smell detection using feature selection and stacking ensemble: An empirical investigation

2021 
Abstract Context: Code smell detection is the process of identifying code pieces that are poorly designed and implemented. Recently more research has been directed towards machine learning-based approaches for code smells detection. Many classifiers have been explored in the literature, yet, finding an effective model to detect different code smells types has not yet been achieved. Objective: The main objective of this paper is to empirically investigate the capabilities of stacking heterogeneous ensemble model in code smell detection. Methods: Gain feature selection technique was applied to select relevant features in code smell detection. Detection performance of 14 individual classifiers was investigated in the context of two class-level and four method-level code smells. Then, three stacking ensembles were built using all individual classifiers as base classifiers, and three different meta-classifiers (LR, SVM and DT). Results: GP, MLP, DT and SVM(Lin) classifiers were among the best performing classifiers in detecting most of the code smells. On the other hand, SVM(Sig), NB(B), NB(M), and SGD were among the least accurate classifiers for most smell types. The stacking ensemble with LR and SVM meta-classifiers achieved a consistent high detection performance in class-level and method-level code smells compared to all individual models. Conclusion: This paper concludes that the detection performance of the majority of individual classifiers varied from one code smell type to another. However, the detection performance of the stacking ensemble with LR and SVM meta-classifiers was consistently superior over all individual classifiers in detecting different code smell types.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    64
    References
    0
    Citations
    NaN
    KQI
    []