Understanding microarray data through applying competent program evolution

2007 
Many researchers have used supervised categorization algorithms such as GP and SVMs, to analyze gene expression microarray data. Overall, the results in this area using SVMs have been stronger than those for GP. However, GP is sometimes preferable to SVMs because of the relative transparency of the models it produces. Studying the GP models themselves can indicate exactly how the classification is being performed, which can lead to biological insights. We ask here first whether the use of an alternate program evolution technique, MOSES (meta-optimizing semantic evolutionary search) [2], can improve GP’s results in this domain (in terms of both accuracy and model simplicity), and second, if MOSES might succeed in providing “important gene” lists with substantial biological relevance. Here we report results for two datasets: (1) distinguishing between types of lymphoma based on gene expression data [4]; and (2) classifying between young and old human brains [3]. Three issues are relevant to any classification approach to microarray analysis: (1) dealing with a huge number of problem variables; (2) dealing with noisy continuous data; (3) avoiding overfitting to the data. We dealt with (1) by selecting the 50 most-differentiating features to use in all experiments, (2) by considering gene expression levels as Boolean features determined by median-thresholding (to eliminates concerns regarding noise and scaling), and (3) by using TP +TN − s/2 as our fitness function, where s is the number of nodes in the classifier, TP is the number of true positives, and TN is the number of true negatives (i.e., high parsimony pressure). See [2] for details and justification, along with algorithm parameter settings (which were fixed across a variety of experiments). Results are presented in the
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    41
    References
    1
    Citations
    NaN
    KQI
    []