Using Machine Learning and Gene Nonhomology Features to Predict Gene Ontology

2019 
Advances in genomic sequencing and annotation have make identifying genes straightforward. Predicting the functions of these newly identified genes remains challenging. Often function is predicted from homology. Genes descended from a common ancestral sequence are likely to have common functions. Functional annotation errors can propagate from one species to another. Here we test approaches based on machine learning classification algorithms to predict gene function -- specifically 1,562 GO terms -- from non-homology gene features. Performance varied across GO terms, but, of eight supervised classification algorithms evaluated, random forest based prediction consistently provided the most accurate gene function prediction. Nonhomology-based functional annotation provides complementary strengths to homology-based annotation, with higher average performance among Biological Process GO terms, where homology based functional annotation performs the worst, and weaker performance among Molecular Function GO terms while the accuracy of homology-based functional annotation is highest. Further improvements in prediction accuracy may be possible using annotation provenance to generate higher confidence training datasets and the incorporation of more non-homology feature types. Machine learning non-homology based functional annotation may ultimately prove useful both as a method to assign predicted function to orphan genes which lack functionally characterized homologs, and to identify and correct functional annotation errors propagated through homology-based functional annotations.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    86
    References
    2
    Citations
    NaN
    KQI
    []