Stability of feature selection algorithms for classification in high-throughput genomics datasets

2013 
A major goal of the application of Machine Learning techniques to high-throughput genomics data (e.g. DNA microarrays or RNA-Seq), is the identification of “gene signatures”. These signatures can be used to discriminate among healthy or disease states (e.g. normal vs cancerous tissue) or among different biological mechanisms, at the gene expression level. Thus, the literature is plenty of studies, where numerous feature selection techniques are applied, in an effort to reduce the noise and dimensionality of such datasets. However, little attention is given to the stability of these signatures, in cases where the original dataset is perturbed by adding, removing or simply resampling the original observations. In this article, we are assessing the stability of a set of well characterized public cancer microarray datasets, using five popular feature selection algorithms in the field of high-throughput genomics data analysis.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    11
    Citations
    NaN
    KQI
    []