Merging microarray data, robust feature selection, and predicting prognosis in prostate cancer

2006 
Abstract Motivation: Individual microarray studies searching for prognostic biomarkers often have few samples and low statistical power; however, publicly accessible data sets make it possible to combine data across studies. Method: We present a novel approach for combining microarray data across institutions and platforms. We introduce a new algorithm, robust greedy feature selection (RGFS), to select predictive genes. Results: We combined two prostate cancer microarray data sets, confi rmed the appropriateness of the approach with the Kolmogorov-Smirnov goodness-of-fi t test, and built several predictive models. The best logistic regression model with stepwise forward selection used 7 genes and had a misclassi fi cation rate of 31%. Models that combined LDA with different feature selection algorithms had misclassifi cation rates between 19% and 33%, and the sets of genes in the models varied substantially during cross-validation. When we combined RGFS with LDA, the best model used two genes and had a misclassifi cation rate of 15%.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    33
    References
    21
    Citations
    NaN
    KQI
    []