Towards the Optimal Feature Selection in High-Dimensional Bayesian Network Classifiers

Tatjana Pavlenko,Mikael Hall,Dietrich von Rosen,Zhanna Andrushchenko

Towards the Optimal Feature Selection in High-Dimensional Bayesian Network Classifiers

2004

Incorporating subset selection into a classification method often carries a number of advantages, especially when operating in the domain of high-dimensional features. In this paper, we focus on Bayesian network (BN) classifiers and formalize the feature selection from a perspective of improving classification accuracy. To exploring the effect of high-dimensionality we apply the growing dimension asymptotics, meaning that the number of training examples is relatively small compared to the number of feature nodes. In order to ascertain which set of features is indeed relevant for a classification task, we introduce a distance-based scoring measure reflecting how well the set separates different classes. This score is then employed to feature selection, using the weighted form of BN classifier. The idea is to view weights as inclusion-exclusion factors which eliminates the sets of features whose separation score do not exceed a given threshold. We establish the asymptotic optimal threshold and demonstrate that the proposed selection technique carries improvements over classification accuracy for different a priori assumptions concerning the separation strength.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations