Site-specific chemical cross-linking in combination with mass spectrometry analysis has emerged as a powerful proteomic approach for studying the three-dimensional structure of protein complexes and in mapping protein-protein interactions (PPIs). Building on the success of MS analysis of in vitro cross-linked proteins, which has been widely used to investigate specific interactions of bait proteins and their targets in various organisms, we report a workflow for in vivo chemical cross-linking and MS analysis in a multicellular eukaryote. This approach optimizes the in vivo protein cross-linking conditions in Arabidopsis thaliana, establishes a MudPIT procedure for the enrichment of cross-linked peptides, and develops an integrated software program, exhaustive cross-linked peptides identification tool (ECL), to identify the MS spectra of in planta chemical cross-linked peptides. In total, two pairs of in vivo cross-linked peptides of high confidence have been identified from two independent biological replicates. This work demarks the beginning of an alternative proteomic approach in the study of in vivo protein tertiary structure and PPIs in multicellular eukaryotes.
Abstract MHC-associated peptides (MAPs) bearing post-translational modifications (PTMs) have raised intriguing questions regarding their attractiveness for targeted therapies. Here, we developed a novel computational glyco-immunopeptidomics workflow that integrates the ultrafast glycopeptide search of MSFragger with a glycopeptide-focused false discovery rate (FDR) control. We performed a harmonized analysis of 8 large-scale publicly available studies and found that glycosylated MAPs are predominantly presented by the MHC class II. We created HLA-Glyco, a resource containing over 3,400 human leukocyte antigen (HLA) class II N-glycopeptides from 1,049 distinct protein glycosylation sites. Our comprehensive resource reveals high levels of truncated glycans, conserved HLA-binding cores, and differences in glycosylation positional specificity between classical HLA class II allele groups. To support the nascent field of glyco-immunopeptidomics, we include the optimized workflow in the FragPipe suite and provide HLA-Glyco as a free web resource.
Proteinaceous aggregates containing α-synuclein protein called Lewy bodies in the substantia nigra is a hallmark of Parkinson's disease. The molecular mechanisms of Lewy body formation and associated neuronal loss remain largely unknown. To gain insights into proteins and pathways associated with Lewy body pathology, we performed quantitative profiling of the proteome. We analyzed substantia nigra tissue from 51 subjects arranged into three groups: cases with Lewy body pathology, Lewy body-negative controls with matching neuronal loss, and controls with no neuronal loss. Using a label-free liquid chromatography-tandem mass spectrometry (LC-MS/MS) approach, we characterized the proteome both in terms of protein abundances and peptide modifications. Statistical testing for differential abundance of the most abundant 2963 proteins, followed by pathway enrichment and Bayesian learning of the causal network structure, was performed to identify likely drivers of Lewy body formation and dopaminergic neuronal loss. The identified pathways include (1) Arp2/3 complex-mediated actin nucleation; (2) synaptic function; (3) poly(A) RNA binding; (4) basement membrane and endothelium; and (5) hydrogen peroxide metabolic process. According to the data, the endothelial/basement membrane pathway is tightly connected with both pathologies and likely to be one of the drivers of neuronal loss. The poly(A) RNA-binding proteins, including the ones relevant to other neurodegenerative disorders (e.g., TDP-43 and FUS), have a strong inverse correlation with Lewy bodies and may reflect an alternative mechanism of nigral neurodegeneration.
Abstract Background Chemical cross-linking combined with mass spectrometry (CX-MS) is a high-throughput approach to studying protein-protein interactions. The number of peptide-peptide combinations grows quadratically with respect to the number of proteins, resulting in a high computational complexity. Widely used methods including xQuest (Rinner et al., Nat Methods 5(4):315–8, 2008; Walzthoeni et al., Nat Methods 9(9):901–3, 2012), pLink (Yang et al., Nat Methods 9(9):904–6, 2012), ProteinProspector (Chu et al., Mol Cell Proteomics 9:25–31, 2010; Trnka et al., 13(2):420–34, 2014) and Kojak (Hoopmann et al., J Proteome Res 14(5):2190–198, 2015) avoid searching all peptide-peptide combinations by pre-selecting peptides with heuristic approaches. However, pre-selection procedures may cause missing findings. The most intuitive approach is searching all possible candidates. A tool that can exhaustively search a whole database without any heuristic pre-selection procedure is therefore desirable. Results We have developed a cross-linked peptides identification tool named ECL. It can exhaustively search a whole database in a reasonable period of time without any heuristic pre-selection procedure. Tests showed that searching a database containing 5200 proteins took 7 h. ECL identified more non-redundant cross-linked peptides than xQuest, pLink, and ProteinProspector. Experiments showed that about 30 % of these additional identified peptides were not pre-selected by Kojak. We used protein crystal structures from the protein data bank to check the intra-protein cross-linked peptides. Most of the distances between cross-linking sites were smaller than 30 Å. Conclusions To the best of our knowledge, ECL is the first tool that can exhaustively search all candidates in cross-linked peptides identification. The experiments showed that ECL could identify more peptides than xQuest, pLink, and ProteinProspector. A further analysis indicated that some of the additional identified results were thanks to the exhaustive search.
Abstract Rapidly improving methods for glycoproteomics have enabled increasingly large-scale analyses of complex glycopeptide samples, but annotating the resulting mass spectrometry data with high confidence remains a major bottleneck. We recently introduced a fast and sensitive glycoproteomics search method in our MSFragger search engine, which reports glycopeptides as a combination of a peptide sequence and the mass of the attached glycan. In samples with complex glycosylation patterns, converting this mass to a specific glycan composition is not straightforward, however, as many glycans have similar or identical masses. Here, we have developed a new method for determining the glycan composition of N-linked glycopeptides fragmented by collision or hybrid activation that uses multiple sources of information from the spectrum, including observed glycan B- (oxonium) and Y-type ions and mass and precursor monoisotopic selection errors to discriminate between possible glycan candidates. Combined with false discovery rate estimation for the glycan assignment, we show this method is capable of specifically and sensitively identifying glycans in complex glycopeptide analyses and effectively controls the rate of false glycan assignments. The new method has been incorporated into the PTM-Shepherd modification analysis tool to work directly with the MSFragger glyco search in the FragPipe graphical user interface, providing a complete computational pipeline for annotation of N-glycopeptide spectra with FDR control of both peptide and glycan components that is both sensitive and robust against false identifications.
Abstract Identification of post-translationally or chemically modified peptides in mass spectrometry-based proteomics experiments is a crucial yet challenging task. We have recently introduced a fragment ion indexing method and the MSFragger search engine to empower an open search strategy for comprehensive analysis of modified peptides. However, this strategy does not consider fragment ions shifted by unknown modifications, preventing modification localization and limiting the sensitivity of the search. Here we present a localization-aware open search method, in which both modification-containing (shifted) and regular fragment ions are indexed and used in scoring. We also implement a fast mass calibration and optimization method, allowing optimization of the mass tolerances and other key search parameters. We demonstrate that MSFragger with mass calibration and localization-aware open search identifies modified peptides with significantly higher sensitivity and accuracy. Comparing MSFragger to other modification-focused tools (pFind3, MetaMorpheus, and TagGraph) shows that MSFragger remains an excellent option for fast, comprehensive, and sensitive searches for modified peptides in shotgun proteomics data.
Abstract In computational proteomics, identification of peptides with an unlimited number of post-translational modification (PTM) types is a challenging task. The computational cost increases exponentially with respect to the number of modifiable amino acids and linearly with respect to the number of potential PTM types at each amino acid. The problem becomes intractable very quickly if we want to enumerate all possible modification patterns. Existing tools (e.g., MS-Alignment, ProteinProspector, and MODa) avoid enumerating modification patterns in database search by using an alignment-based approach to localize and characterize modified amino acids. This approach avoids enumerating all possible modification patterns in a database search. However, due to the large search space and PTM localization issue, the sensitivity of these tools is low. This paper proposes a novel method named PIPI to achieve PTM-invariant peptide identification. PIPI first codes peptide sequences into Boolean vectors and converts experimental spectra into real-valued vectors. Then, it finds the top 10 peptide-coded vectors for each spectrum-coded vector. After that, PIPI uses a dynamic programming algorithm to localize and characterize modified amino acids. Simulations and real data experiments have shown that PIPI outperforms existing tools by identifying more peptide-spectrum matches (PSMs) and reporting fewer false positives. It also runs much faster than existing tools when the database is large.
Abstract Ion mobility brings an additional dimension of separation to liquid chromatography-mass spectrometry, improving identification of peptides and proteins in complex mixtures. A recently introduced timsTOF mass spectrometer (Bruker) couples trapped ion mobility separation to time-of-flight mass analysis. With the parallel accumulation serial fragmentation (PASEF) method, the timsTOF platform achieves promising results, yet analysis of the data generated on this platform represents a major bottleneck. Currently, MaxQuant and PEAKS are most commonly used to analyze these data. However, due to the high complexity of timsTOF PASEF data, both require substantial time to perform even standard tryptic searches. Advanced searches (e.g. with many variable modifications, semi- or non-enzymatic searches, or open searches for post-translational modification discovery) are practically impossible. We have extended our fast peptide identification tool MSFragger to support timsTOF PASEF data, and developed a label-free quantification tool, IonQuant, for fast and accurate 4-D feature extraction and quantification. Using a HeLa data set published by Meier et al. (2018), we demonstrate that MSFragger identifies significantly (∼30%) more unique peptides than MaxQuant (1.6.10.43), and performs comparably or better than PEAKS X+ (∼10% more peptides). IonQuant outperforms both in terms of number of quantified proteins while maintaining good quantification precision and accuracy. Runtime tests show that MSFragger and IonQuant can fully process a typical two-hour PASEF run in under 70 minutes on a typical desktop (6 CPU cores, 32 GB RAM), significantly faster than other tools. Finally, through semi-enzymatic searching, we significantly increase the number of identified peptides. Within these semi-tryptic identifications, we report evidence of gas-phase fragmentation prior to MS/MS analysis.
Rapidly improving methods for glycoproteomics have enabled increasingly large-scale analyses of complex glycopeptide samples, but annotating the resulting mass spectrometry data with high confidence remains a major bottleneck. We recently introduced a fast and sensitive glycoproteomics search method in our MSFragger search engine, which reports glycopeptides as a combination of a peptide sequence and the mass of the attached glycan. In samples with complex glycosylation patterns, converting this mass to a specific glycan composition is not straightforward; however, as many glycans have similar or identical masses. Here, we have developed a new method for determining the glycan composition of N-linked glycopeptides fragmented by collisional or hybrid activation that uses multiple sources of information from the spectrum, including observed glycan B-type (oxonium) and Y-type ions and mass and precursor monoisotopic selection errors to discriminate between possible glycan candidates. Combined with false discovery rate estimation for the glycan assignment, we show that this method is capable of specifically and sensitively identifying glycans in complex glycopeptide analyses and effectively controls the rate of false glycan assignments. The new method has been incorporated into the PTM-Shepherd modification analysis tool to work directly with the MSFragger glyco search in the FragPipe graphical user interface, providing a complete computational pipeline for annotation of N-glycopeptide spectra with false discovery rate control of both peptide and glycan components that is both sensitive and robust against false identifications.
Non-clear cell renal cell carcinomas (non-ccRCCs) encompass diverse malignant and benign tumors. Refinement of differential diagnosis biomarkers, markers for early prognosis of aggressive disease, and therapeutic targets to complement immunotherapy are current clinical needs. Multi-omics analyses of 48 non-ccRCCs compared with 103 ccRCCs reveal proteogenomic, phosphorylation, glycosylation, and metabolic aberrations in RCC subtypes. RCCs with high genome instability display overexpression of IGF2BP3 and PYCR1. Integration of single-cell and bulk transcriptome data predicts diverse cell-of-origin and clarifies RCC subtype-specific proteogenomic signatures. Expression of biomarkers MAPRE3, ADGRF5, and GPNMB differentiates renal oncocytoma from chromophobe RCC, and PIGR and SOSTDC1 distinguish papillary RCC from MTSCC. This study expands our knowledge of proteogenomic signatures, biomarkers, and potential therapeutic targets in non-ccRCC.