Non-parametric Distance—A New Class Separability Measure

2021 
Feature Selection, one of the most important preprocessing steps in Machine Learning, is the process where we automatically or manually select those features which contribute most to our prediction variable or output in which we are interested. This subset of features has some very important benefits: it reduces the computational complexity of learning algorithms, saves time, improves accuracy, and the selected features can be insightful for the people involved in the problem domain. Among the different ways of performing feature selection such as filter, wrapper and hybrid, filter-based separability methods can be used as a feature ranking tool in binary classification problems, most popular being the Bhattacharya distance and the Jeffries–Matusita (JM) distance. However, these measures are parametric and their computation requires knowledge of the distribution from which the samples are drawn. In real life, we often come across instances where it is difficult to have an idea about the distribution of observations. In this paper, we have presented a new non-parametric approach for performing feature selection called the ‘Non-Parametric Distance Measure’. The experiment with the new measure is performed over nine datasets and the results are compared with other ranking-based methods for feature selection using those datasets. This experiment proves that the new box-plot-based method can provide greater accuracy and efficiency than the conventional ranking-based measures for feature selection such as the Chi-Square, Symmetric Uncertainty and Information Gain.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []