A Workflow of MS/MS Data Analysis to Maximize the Peptide Identification.

2014 
INTRODUCTION. A key step in shotgun proteomics is the peptide identification. There are two approaches for the analysis of MS/MS spectra – database search and de novo sequencing. A protein sequence database search is prioritized for database peptides and modified peptides. De novo sequencing is the only option for novel or homolog peptides. Unlike target-decoy approach for database search, there lacks a validation approach for de novo sequencing. Here we describe a workflow integrating database search and de novo sequencing, in which database peptides are used to validate de novo peptides. The workflow maximized the peptide identification. METHODS. 1. Let T1 be the set of MS/MS spectra. Perform de novo sequencing and database search with T1. 2. Let T2 be the set of the spectra identified by database search with 1% of FDR at the peptide-spectrum match level. For each spectrum in T2, a de novo peptide was validated with the database peptide at amino acid residue level. The local confidence score distributions were plotted for de novo residues that agree/disagree with database residues. 3. For the de novo peptides in T3 = T1 –T2, their score distributions of correct and incorrect residues were estimated with validated distributions in Step 2. PRELIMINARY RESULTS. Three data sets from complex protein samples on LTQ-Orbitrap and 5600 TripleTOF were tested. PEAKS was used for both de novo and database search. Average local confidence was used to filter de novo sequences. The peptide identification was compared with the one obtained from a consensus database search (PEAKS + MASCOT + X!Tandem). The de novo peptides were selected by a local confidence score threshold with 85% of correctness in the score distributions. The results showed that 8% extra peptides were identified with this workflow. CONCLUSION. A workflow for maximizing peptide identification with de novo and database search.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []