Fast and Accurate Gene Prediction by Decision Tree Classification

2010 
Abstract Gene prediction is one of the most challenging tasks in genome analysis, for which many tools have been developed and are still evolving. In this paper, we present a novel gene prediction method that is both fast and accurate, by making use of protein homology and decision tree classification. Specifically, we apply the principled entropy and decision tree concepts to assist in such gene prediction process. Our goal is to resolve the exact gene structures in terms of finding “coding” regions (exons) and “non-coding” regions (introns). Unlike traditional classification tasks, however, we do not have explicit class labels for such structures in the genes. We use protein sequence (the product of gene) as a query to help in finding genes that are homologous to the query protein and deduce class labels based on homology. Our experiments on the genomes of two nematodes C. elegans and C. briggsae show that in addition to achieving prediction accuracy comparable with that of the state of the art methods, it is several orders of magnitude faster, especially for genes that encode longer proteins.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []