Uncovering the dark matter of the metagenome one read at a time

2019 
Contemporary metagenomic annotation methods have proven insufficient in our attempts to better understand the complex environments around us. We call the yet to be annotated part of a metagenome it’s ‘dark matter’. The Gene Ontology (GO) is a hierarchical vocabulary used to describe gene product function and a large collection of curated genes with GO annotations already exists. DeepGO utilises deep learning to build models from these curated genes and gene products to predict GO categories for novel proteins. One of the major problems with metagenomic studies today is the process of assembling the environmental DNA sequences into their original genomes. This is difficult, with chimeric metagenomically assembled genomes being common. To avoid this and the computational and time expense, we have modified DeepGO to perform protein function prediction directly from sequence reads with limited protein coding sequence prediction. Three independent models were trained as the following; The first 50 amino acids of a protein were used for training, The last 50 amino acids were used for training, A phasing window of 50 amino acids was used to train across the entirety of a protein sequence. These models were chosen to learn from the different parts of a protein sequence we are likely to capture from only the short unassembled sequence reads. We compared the three models by producing a mock metagenomic community consisting of 6 model bacterial genomes. We evaluated the functions predicted from the unassembled sequence reads and the protein coding sequences predicted from the assembled metagenome.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []