Dimensionality Reduction in Gene Expression Data Sets

2019 
Dimensionality reduction is used in microarray data analysis to enhance prediction quality, reduce computing time, and construct more robust models. In addition, the algorithm learning performance involves an expressive number of attributes (genes) relative to the classes (samples). Therefore, in this paper, we conducted a detailed comparison of two reduction methods, attribute selection and principal component analysis, to analyze gene expression data sets. Both reduction methods were employed in the pre-processing stage and then evaluated experimentally. Furthermore, we introduced a combination of consistency-based subset evaluation (CSE) and minimum redundancy maximum relevance (mRMR), which we referred to as CSE-mRMR, to improve classification efficiency. The results indicated a significant increase in classifier hit rates with both methods, compared to using all attributes. By employing cross-validation, attribute selection outperformed PCA consistently across classifiers and datasets, and CSE-mRMR demonstrated good classification performance in the data sets. Taken together, the literature and current results suggest that the attribute selection may be relevant in the analysis and future prediction of gene expression data sets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    6
    Citations
    NaN
    KQI
    []