MetaNovo: a probabilistic approach to peptide and polymorphism discovery in complex mass spectrometry datasets

2019 
The characterization of complex mass spectrometry data obtained from metaproteomics or clinical studies presents unique challenges and potential insights in fields as diverse as the pathogenesis of human disease, the metabolic interactions of complex microbial ecosystems involved in agriculture, or climate change. However, accurate peptide identification requires representative sequence databases, which typically rely on prior knowledge or matched genome sequencing, and are often error-prone. We present a novel software pipeline to directly estimate the proteins and species present in complex mass spectrometry samples at the level of expressed proteomes, using de novo sequence tag matching and probabilistic optimization of very large sequence databases prior to target-decoy search. We validated our pipeline against the results obtained from the recently published MetaPro-IQ pipeline on 8 human mucosal-luminal interface samples with comparable numbers of peptide and protein identifications, and novel identifications. We showed that using the entire release of UniProt we were able to identify a similar taxonomic distribution compared to a matched metagenome database, with improved identifications of certain taxa. Using MetaNovo to analyze a set of single-organism human neuroblastoma cell-line samples (SH-SY5Y) against UniProt we achieved a comparable MS/MS identification rate during target-decoy search to using the UniProt human Reference proteome, with 22583 (85.99 %) of the total set of identified peptides shared in common. Taxonomic analysis of 612 peptides not found in the canonical set of human proteins yielded 158 peptides unique to the Chordata phylum as potential human variant identifications. Of these, 40 had previously been predicted and 9 identified using whole genome sequencing in a proteogenomic study of the same cell line. The MetaNovo software is available from GitHub or can be run as a standalone Docker container available from the Docker Hub.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    4
    Citations
    NaN
    KQI
    []