Wen‐Feng Zeng

Institute of Computing Technology

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Matthias Mann

Novo Nordisk Foundation

Si‐Min He

Sichuan University

Maximilian T. Strauss

Max Planck Institute of Biochemistry

Constantin Ammar

Max Planck Institute of Biochemistry

Sander Willems

Max Planck Institute of Biochemistry

Hao Chi

University of Chinese Academy of Sciences

Isabell Bludau

Max Planck Institute of Biochemistry

Xie‐Xuan Zhou

Max Planck Institute of Biochemistry

Patricia Skowronek

Max Planck Institute of Biochemistry

Marvin Thielert

Max Planck Institute of Biochemistry

Cooperative Institutions

Max Planck Institute of Biochemistry

Chinese Academy of Sciences

Harvard University

Institute of Computing Technology

Mayo Clinic in Arizona

WinnMed

University of Chinese Academy of Sciences

Mayo Clinic in Florida

Fudan University

Mayo Clinic

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

AlphaViz: Visualization and validation of critical proteomics data directly at the raw data level

bioRxiv (Cold Spring Harbor Laboratory) (2022)

Eugenia Voytik Patricia Skowronek Wen‐Feng Zeng Maria C. Tanzer Andreas‐David Brunner

ABSTRACT Although current mass spectrometry (MS)-based proteomics identifies and quantifies thousands of proteins and (modified) peptides, only a minority of them are subjected to in-depth downstream analysis. With the advent of automated processing workflows, biologically or clinically important results within a study are rarely validated by visualization of the underlying raw information. Current tools are often not integrated into the overall analysis nor readily extendable with new approaches. To remedy this, we developed AlphaViz, an open-source Python package to superimpose output from common analysis workflows on the raw data for easy visualization and validation of protein and peptide identifications. AlphaViz takes advantage of recent breakthroughs in the deep learning-assisted prediction of experimental peptide properties to allow manual assessment of the expected versus measured peptide result. We focused on the visualization of the 4-dimensional data cuboid provided by Bruker TimsTOF instruments, where the ion mobility dimension, besides intensity and retention time, can be predicted and used for verification. We illustrate how AlphaViz can quickly validate or invalidate peptide identifications regardless of the score given to them by automated workflows. Furthermore, we provide a ‘predict mode’ that can locate peptides present in the raw data but not reported by the search engine. This is illustrated the recovery of missing values from experimental replicates. Applied to phosphoproteomics, we show how key signaling nodes can be validated to enhance confidence for downstream interpretation or follow-up experiments. AlphaViz follows standards for open-source software development and features an easy-to-install graphical user interface for end-users and a modular Python package for bioinformaticians. Validation of critical proteomics results should now become a standard feature in MS-based proteomics.

Python

Graphical user interface

10.1101/2022.07.12.499676

Cite

Citations (11)

pDeep3: Toward More Accurate Spectrum Prediction with Fast Few-Shot Learning

Analytical Chemistry (2021)

Ching Tarn Wen‐Feng Zeng

Spectrum prediction using deep learning has attracted a lot of attention in recent years. Although existing deep learning methods have dramatically increased the prediction accuracy, there is still considerable space for improvement, which is presently limited by the difference of fragmentation types or instrument settings. In this work, we use the few-shot learning method to fit the data online to make up for the shortcoming. The method is evaluated using ten data sets, where the instruments includes Velos, QE, Lumos, and Sciex, with collision energies being differently set. Experimental results show that few-shot learning can achieve higher prediction accuracy with almost negligible computing resources. For example, on the data set from a untrained instrument Sciex-6600, within about 10 s, the prediction accuracy is increased from 69.7% to 86.4%; on the CID (collision-induced dissociation) data set, the prediction accuracy of the model trained by HCD (higher energy collision dissociation) spectra is increased from 48.0% to 83.9%. It is also shown that, the method is not critical to data quality and is sufficiently efficient to fill the accuracy gap. The source code of pDeep3 is available at http://pfind.ict.ac.cn/software/pdeep3.

Fragmentation

Data set

10.1021/acs.analchem.0c05427

Cite

Citations (39)

AlphaPept, a modern and open framework for MS-based proteomics

bioRxiv (Cold Spring Harbor Laboratory) (2021)

Maximilian T. Strauss Isabell Bludau Wen‐Feng Zeng Eugenia Voytik Constantin Ammar

ABSTRACT In common with other omics technologies, mass spectrometry (MS)-based proteomics produces ever-increasing amounts of raw data, making their efficient analysis a principal challenge. There is a plethora of different computational tools that process the raw MS data and derive peptide and protein identification and quantification. During the last decade, there has been dramatic progress in computer science and software engineering, including collaboration tools that have transformed research and industry. To leverage these advances, we developed AlphaPept, a Python-based open-source framework for efficient processing of large high-resolution MS data sets. Using Numba for just-in-time machine code compilation on CPU and GPU, we achieve hundred-fold speed improvements while maintaining clear syntax and rapid development speed. AlphaPept uses the Python scientific stack of highly optimized packages, reducing the code base to domain-specific tasks while providing access to the latest advances in machine learning. We provide an easy on-ramp for community validation and contributions through the concept of literate programming, implemented in Jupyter Notebooks of the different modules. A framework for continuous integration, testing, and benchmarking enforces solid software engineering principles. Large datasets can rapidly be processed as shown by the analysis of hundreds of cellular proteomes in minutes per file, many-fold faster than the data acquisiton. The AlphaPept framework can be used to build automated processing pipelines using efficient HDF5 based file formats, web-serving functionality and compatibility with downstream analysis tools. Easy access for end-users is provided by one-click installation of the graphical user interface, for advanced users via a modular Python library, and for developers via a fully open GitHub repository.

Python

File format

10.1101/2021.07.23.453379

Cite

Citations (34)

AlphaPept: a modern and open framework for MS-based proteomics

Nature Communications (2024)

Maximilian T. Strauss Isabell Bludau Wen‐Feng Zeng Eugenia Voytik Constantin Ammar

Abstract In common with other omics technologies, mass spectrometry (MS)-based proteomics produces ever-increasing amounts of raw data, making efficient analysis a principal challenge. A plethora of different computational tools can process the MS data to derive peptide and protein identification and quantification. However, during the last years there has been dramatic progress in computer science, including collaboration tools that have transformed research and industry. To leverage these advances, we develop AlphaPept, a Python-based open-source framework for efficient processing of large high-resolution MS data sets. Numba for just-in-time compilation on CPU and GPU achieves hundred-fold speed improvements. AlphaPept uses the Python scientific stack of highly optimized packages, reducing the code base to domain-specific tasks while accessing the latest advances. We provide an easy on-ramp for community contributions through the concept of literate programming, implemented in Jupyter Notebooks. Large datasets can rapidly be processed as shown by the analysis of hundreds of proteomes in minutes per file, many-fold faster than acquisition. AlphaPept can be used to build automated processing pipelines with web-serving functionality and compatibility with downstream analysis tools. It provides easy access via one-click installation, a modular Python library for advanced users, and via an open GitHub repository for developers.

Python

Leverage (statistics)

10.1038/s41467-024-46485-4

Cite

Citations (30)

MS/MS Spectrum Prediction for Modified Peptides Using pDeep2 Trained by Transfer Learning

Analytical Chemistry (2019)

Wen‐Feng Zeng Xie‐Xuan Zhou Wenjing Zhou Hao Chi Jianfeng Zhan

In the past decade, tandem mass spectrometry (MS/MS)-based bottom-up proteomics has become the method of choice for analyzing post-translational modifications (PTMs) in complex mixtures. The key to the identification of the PTM-containing peptides and localization of the PTM-modified residues is to measure the similarities between the theoretical spectra and the experimental ones. An accurate prediction of the theoretical MS/MS spectra of the modified peptides will improve the similarity measurement. Here, we proposed the deep-learning-based pDeep2 model for PTMs. We used the transfer learning technique to train pDeep2, facilitating the training with a limited scale of benchmark PTM data. Using the public synthetic PTM data sets, including the synthetic phosphopeptides and 21 synthetic PTMs from ProteomeTools, we showed that the model trained by transfer learning was accurate (>80% Pearson correlation coefficients were higher than 0.9), and was significantly better than the models trained without transfer learning. We also showed that accurate prediction of the fragment ion intensities of the PTM neutral loss, for example, the phosphoric acid loss (−98 Da) of the phosphopeptide, will improve the discriminating power to distinguish the true phosphorylated residue from its adjacent candidate sites. pDeep2 is available at https://github.com/pFindStudio/pDeep/tree/master/pDeep2.

Phosphopeptide

Transfer of learning

10.1021/acs.analchem.9b01262

Cite

Citations (85)

Precise, fast and comprehensive analysis of intact glycopeptides and modified glycans with pGlyco3

Nature Methods (2021)

Wen‐Feng Zeng Weiqian Cao Mingqi Liu Si‐Min He Pengyuan Yang

Abstract Great advances have been made in mass spectrometric data interpretation for intact glycopeptide analysis. However, accurate identification of intact glycopeptides and modified saccharide units at the site-specific level and with fast speed remains challenging. Here, we present a glycan-first glycopeptide search engine, pGlyco3, to comprehensively analyze intact N- and O-glycopeptides, including glycopeptides with modified saccharide units. A glycan ion-indexing algorithm developed for glycan-first search makes pGlyco3 5–40 times faster than other glycoproteomic search engines without decreasing accuracy or sensitivity. By combining electron-based dissociation spectra, pGlyco3 integrates a dynamic programming-based algorithm termed pGlycoSite for site-specific glycan localization. Our evaluation shows that the site-specific glycan localization probabilities estimated by pGlycoSite are suitable to localize site-specific glycans. With pGlyco3, we confidently identified N-glycopeptides and O-mannose glycopeptides that were extensively modified by ammonia adducts in yeast samples. The freely available pGlyco3 is an accurate and flexible tool that can be used to identify glycopeptides and modified saccharide units.

Electron-transfer dissociation

Glycoproteomics

10.1038/s41592-021-01306-0

Cite

Citations (132)

The potential of plasma HLA peptides beyond neoepitopes

bioRxiv (Cold Spring Harbor Laboratory) (2023)

Maria Wahle Marvin Thielert Maximilian Zwiebel Patricia Skowronek Wen‐Feng Zeng

ABSTRACT Distinction of non-self from self is the major task of the immune system. Immunopeptidomics studies the peptide repertoire presented by the human leukocyte antigen (HLA) protein, usually on tissues. However, HLA peptides are also bound to plasma soluble HLA (sHLA), but little is known about their origin and potential for biomarker discovery in this readily available biofluid. Currently, immunopeptidomics is hampered by complex workflows and limited sensitivity, generally requiring several mL of plasma for the detection of hundreds of HLA peptides. Here, we take advantage of recent improvements in the throughput and sensitivity of mass spectrometry (MS)-based proteomics to develop a highly-sensitive, automated and economical workflow for HLA peptide analysis, termed Immunopeptidomics by Biotinylated Antibodies and Streptavidin (IMBAS). IMBAS-MS quantifies more than 5,000 HLA class I peptides from only 200 μL of plasma, in just 30 minutes. Our technology revealed that the plasma immunopeptidome of healthy donors is remarkably stable throughout a year and strongly correlated between individuals with overlapping HLA types. Immunopeptides originating from diverse tissues, including the brain, are proportionately represented. We conclude that sHLAs are a promising avenue for immunology and precision oncology.

10.1101/2023.09.05.556309

Cite

Citations (0)

Redesigning error control in cross-linking mass spectrometry enables more robust and sensitive protein-protein interaction studies

Molecular Systems Biology (2024)

Boris Bogdanow Max Ruwolt Julia Ruta Lars Mühlberg Cong Wang

10.1038/s44320-024-00079-w

Cite

Citations (0)

Quantitative multiorgan proteomics of fatal COVID‐19 uncovers tissue‐specific effects beyond inflammation

EMBO Molecular Medicine (2023)

Lisa Schweizer Tina Schaller Maximilian Zwiebel Özge Karayel Johannes B. Mueller‐Reif

SARS-CoV-2 may directly and indirectly damage lung tissue and other host organs, but there are few system-wide, untargeted studies of these effects on the human body. Here, we developed a parallelized mass spectrometry (MS) proteomics workflow enabling the rapid, quantitative analysis of hundreds of virus-infected FFPE tissues. The first layer of response to SARS-CoV-2 in all tissues was dominated by circulating inflammatory molecules. Beyond systemic inflammation, we differentiated between systemic and true tissue-specific effects to reflect distinct COVID-19-associated damage patterns. Proteomic changes in the lungs resembled those of diffuse alveolar damage (DAD) in non-COVID-19 patients. Extensive organ-specific changes were also evident in the kidneys, liver, and lymphatic and vascular systems. Secondary inflammatory effects in the brain were related to rearrangements in neurotransmitter receptors and myelin degradation. These MS-proteomics-derived results contribute substantially to our understanding of COVID-19 pathomechanisms and suggest strategies for organ-specific therapeutic interventions.

10.15252/emmm.202317459

Cite

Citations (15)

pDeepXL: MS/MS Spectrum Prediction for Cross-Linked Peptide Pairs by Deep Learning

Journal of Proteome Research (2021)

Zhen-Lin Chen Peng-Zhi Mao Wen‐Feng Zeng Hao Chi Si‐Min He

In cross-linking mass spectrometry, the identification of cross-linked peptide pairs heavily relies on the ability of a database search engine to measure the similarities between experimental and theoretical MS/MS spectra. However, the lack of accurate ion intensities in theoretical spectra impairs the performance of search engines, in particular, on proteome scales. Here we introduce pDeepXL, a deep neural network to predict MS/MS spectra of cross-linked peptide pairs. To train pDeepXL, we used the transfer-learning technique because it facilitated the training with limited benchmark data of cross-linked peptide pairs. Test results on more than ten data sets showed that pDeepXL accurately predicted the spectra of both noncleavable DSS/BS3/Leiker cross-linked peptide pairs (>80% of predicted spectra have Pearson's r values higher than 0.9) and cleavable DSSO/DSBU cross-linked peptide pairs (>75% of predicted spectra have Pearson's r values higher than 0.9). pDeepXL also achieved the accurate prediction on unseen data sets using an online fine-tuning technique. Lastly, integrating pDeepXL into a database search engine increased the number of identified cross-link spectra by 18% on average.

Cross-validation

10.1021/acs.jproteome.0c01004

Cite

Citations (9)