Semi-supervised Feature Selection by Mutual Information Based on Kernel Density Estimation

2018 
Feature selection, which improves the computational efficiency by selecting relevant features and removing redundant features, plays an important role in data mining and machine learning. In practice, collecting completely labeled data is tough and time-consuming. Therefore, semi-supervised feature selection methods have become a necessity. However, most existing feature selection methods based on mutual information are only suitable for completely labeled data. In this paper, we utilize kernel density estimation to learn the soft labels of unlabeled instances first. As for the data whose class separation is small, we propose a concept of kernel purity to indicate the contribution of each labeled instance with regard to each class, which can reduce the negative influence of some labeled instances in predicting the soft labels of unlabeled instances. Additionally, we extend the definitions of kernel density estimation entropy and mutual information to handle partially labeled continuous data effectively. Experimental results over several datasets have demonstrated the effectiveness of the proposed method.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    4
    Citations
    NaN
    KQI
    []