Improving Feature Subset Selection Using a Genetic Algorithm for Microarray Gene Expression Data

2006 
Microarray data usually contains a huge number of genes (features) and a comparatively small number of samples, which make accurate classification or prediction of diseases challenging. Feature selection techniques can help us identify important and irrelevant (unimportant) features by applying certain selection criteria. However, different feature selection algorithms based on various theoretical arguments often produce different results when applied to the same data set. This makes selecting an optimal or near optimal feature subset for a data set difficult. In this paper, we propose using a genetic algorithm to improve feature subset selection by combining valuable outcomes from multiple feature selection methods. The goal of our genetic algorithm is to achieve a balance between the classification accuracy and the size of the feature subsets selected. The advantages of this approach include the ability to accommodate different feature selection criteria and find small subsets of features that perform well for a particular inductive learning algorithm of interest to build the classifier. The experimental results demonstrate that our approach can find subsets of features with higher classification accuracy and/or smaller size compared with each individual feature selection algorithm.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    37
    Citations
    NaN
    KQI
    []