Lost and found: re-searching and re-scoring proteomics data aids the discovery of bacterial proteins and improves proteome coverage

2019 
Prokaryotic genome annotation is heavily dependent on automated gene annotation pipelines that are prone to propagate errors and underestimate genome complexity. We describe an optimized proteogenomic workflow that uses ribo-seq and proteomic data of Salmonella Typhiumurium to identify unannotated proteins or alternative protein forms raised upon alternative translation initiation (i.e. N-terminal proteoforms). This data analysis encompasses the searching of co-fragmenting peptides and post-processing with extended peptide-to-spectrum quality features including comparison to predicted fragment ion intensities. When applying this strategy, an enhanced proteome-depth is achieved as well as greater confidence for unannotated peptide hits. We demonstrate the general applicability of our pipeline by re-analyzing public Deinococcus radiodurans datasets. Taken together, systematic re-analysis using available prokaryotic (proteome) datasets holds great promise to assist in experimentally-based genome annotation.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    57
    References
    4
    Citations
    NaN
    KQI
    []