Gene-pair representation and incorporation of GO-based semantic similarity into classification of gene expression data

2012 
In this work, a novel data representation for learning from gene expression data is introduced, which is aimed at emphasizing gene-gene interactions in learning. With this representation, the data simply comprise differences in the expression values of gene pairs and not the expression values themselves. An important benefit of this representation, except the better sensitivity to gene interactions, is the opportunity to incorporate external knowledge in the form of semantic similarity corresponding to the pairs, which is also studied. In this context, two common learning algorithms, plain k-NN classification and Random Forest are compared with two distance function learning-based techniques, learning from equivalence constraints and the intrinsic Random Forest similarity on a set of genetic benchmark datasets. The most discriminative gene pairs are selected and the new representation is evaluated on the benchmark data. The novel representation is shown to increase classification accuracy for genetic datasets. Exploiting the gene-pair representation and the Gene Ontology GO, the semantic similarity of gene pairs is calculated and used to pre-select pairs with a high similarity value. The GO-based feature selection approach is compared to the common feature selection and is shown to often increase the classification accuracy.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    49
    References
    2
    Citations
    NaN
    KQI
    []