Benjamin Pullman

University of Montana

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Nuno Bandeira

University of California, San Diego

Jeremy Carver

University of Montana

Yasset Pérez‐Riverol

Wellcome Trust

Eric W. Deutsch

Institute for Systems Biology

Juan Antonio Vizcaíno

European Bioinformatics Institute

Shin Kawano

Research Organization of Information and Systems

Mingxun Wang

University of California, Riverside

Ralf Gabriels

VIB-UGent Center for Medical Biotechnology

Joshua Klein

Boston University

Wout Bittremieux

University of Antwerp

Cooperative Institutions

AstraZeneca (United Kingdom)

University of California, San Diego

University of Montana

AstraZeneca (Brazil)

AstraZeneca (Sweden)

University of Cambridge

AstraZeneca (United States)

Wellcome Trust

Genomics (United Kingdom)

European Bioinformatics Institute

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

A Taxonomically-informed Mass Spectrometry Search Tool for Microbial Metabolomics Data

Research Square (Research Square) (2023)

Simone Zuffa Robin Schmid Anelize Bauermeister Paulo Wender Portal Gomes Andrés Mauricio Caraballo‐Rodríguez

Abstract MicrobeMASST, a taxonomically-informed mass spectrometry (MS) search tool, tackles limited microbial metabolite annotation in untargeted metabolomics experiments. Leveraging a curated database of >60,000 microbial monocultures, users can search known and unknown MS/MS spectra and link them to their respective microbial producers via MS/MS fragmentation patterns. Identification of microbial-derived metabolites and relative producers, without a priori knowledge, will vastly enhance the understanding of microorganisms' role in ecology and human health.

Fragmentation

Identification

10.21203/rs.3.rs-3189768/v1

Cite

Citations (4)

ProteinExplorer: A Repository-Scale Resource for Exploration of Protein Detection in Public Mass Spectrometry Data Sets

Journal of Proteome Research (2018)

Benjamin Pullman Julie Wertz Jeremy Carver Nuno Bandeira

High-throughput tandem mass spectrometry has enabled the detection and identification of over 75% of all proteins predicted to result in translated gene products in the human genome. In fact, the galloping rate of data acquisition and sharing of mass spectrometry data has led to the current availability of many tens of terabytes of public data in thousands of human data sets. The systematic reanalysis of these public data sets has been used to build a community-scale spectral library of 2.1 million precursors for over 1 million unique sequences from over 19,000 proteins (including spectra of synthetic peptides). However, it has remained challenging to find and inspect spectra of peptides covering functional protein regions or matching novel proteins. ProteinExplorer addresses these challenges with an intuitive interface mapping tens of millions of identifications to functional sites on nearly all human proteins while maintaining provenance for every identification back to the original data set and data file. Additionally, ProteinExplorer facilitates the selection and inspection of HPP-compliant peptides whose spectra can be matched to spectra of synthetic peptides and already includes HPP-compliant evidence for 107 missing (PE2, PE3, and PE4) and 23 dubious (PE5) proteins. Finally, ProteinExplorer allows users to rate spectra and to contribute to a community library of peptides entitled PrEdict (Protein Existance dictionary) mapping to novel proteins but whose preliminary identities have not yet been fully established with community-scale false discovery rates and synthetic peptide spectra. ProteinExplorer can be now be accessed at https://massive.ucsd.edu/ProteoSAFe/protein_explorer_splash.jsp.

10.1021/acs.jproteome.8b00496

Cite

Citations (19)

Rare variant associations with plasma protein levels in the UK Biobank

Nature (2023)

Ryan S. Dhindsa Oliver S. Burren Benjamin B. Sun Bram P. Prins Dorota Matelska

Integrating human genomics and proteomics can help elucidate disease mechanisms, identify clinical biomarkers and discover drug targets1-4. Because previous proteogenomic studies have focused on common variation via genome-wide association studies, the contribution of rare variants to the plasma proteome remains largely unknown. Here we identify associations between rare protein-coding variants and 2,923 plasma protein abundances measured in 49,736 UK Biobank individuals. Our variant-level exome-wide association study identified 5,433 rare genotype-protein associations, of which 81% were undetected in a previous genome-wide association study of the same cohort5. We then looked at aggregate signals using gene-level collapsing analysis, which revealed 1,962 gene-protein associations. Of the 691 gene-level signals from protein-truncating variants, 99.4% were associated with decreased protein levels. STAB1 and STAB2, encoding scavenger receptors involved in plasma protein clearance, emerged as pleiotropic loci, with 77 and 41 protein associations, respectively. We demonstrate the utility of our publicly accessible resource through several applications. These include detailing an allelic series in NLRC4, identifying potential biomarkers for a fatty liver disease-associated variant in HSD17B13 and bolstering phenome-wide association studies by integrating protein quantitative trait loci with protein-truncating variants in collapsing analyses. Finally, we uncover distinct proteomic consequences of clonal haematopoiesis (CH), including an association between TET2-CH and increased FLT3 levels. Our results highlight a considerable role for rare variation in plasma protein abundance and the value of proteogenomics in therapeutic discovery.

Proteogenomics

Phenome

Genome-wide Association Study

Exome

Proteome

Genetic Association

10.1038/s41586-023-06547-x

Cite

Citations (69)

Index-based, High-dimensional, Cosine Threshold Querying with Optimality Guarantees

Theory of Computing Systems (2020)

Yuliang Li Jianguo Wang Benjamin Pullman Nuno Bandeira Yannis Papakonstantinou

Cosine similarity

Similarity (geometry)

10.1007/s00224-020-10009-6

Cite

Citations (6)

Assembling the Community-Scale Discoverable Human Proteome

Cell Systems (2018)

Mingxun Wang Jian Wang Jeremy Carver Benjamin Pullman Seong Won

The increasing throughput and sharing of proteomics mass spectrometry data have now yielded over one-third of a million public mass spectrometry runs. However, these discoveries are not continuously aggregated in an open and error-controlled manner, which limits their utility. To facilitate the reusability of these data, we built the MassIVE Knowledge Base (MassIVE-KB), a community-wide, continuously updating knowledge base that aggregates proteomics mass spectrometry discoveries into an open reusable format with full provenance information for community scrutiny. Reusing >31 TB of public human data stored in a mass spectrometry interactive virtual environment (MassIVE), the MassIVE-KB contains >2.1 million precursors from 19,610 proteins (48% larger than before; 97% of the total) and doubles proteome coverage to 6 million amino acids (54% of the proteome) with strict library-scale false discovery controls, thereby providing evidence for 430 proteins for which sufficient protein-level evidence was previously missing. Furthermore, MassIVE-KB can inform experimental design, helps identify and quantify new data, and provides tools for community construction of specialized spectral libraries.

Proteome

Human proteome project

10.1016/j.cels.2018.08.004

Cite

Citations (176)

Proteomics Standards Initiative’s ProForma 2.0: Unifying the Encoding of Proteoforms and Peptidoforms

Journal of Proteome Research (2022)

Richard D. LeDuc Eric W. Deutsch Pierre‐Alain Binz Ryan T. Fellers Anthony J. Cesnik

It is important for the proteomics community to have a standardized manner to represent all possible variations of a protein or peptide primary sequence, including natural, chemically induced, and artifactual modifications. The Human Proteome Organization Proteomics Standards Initiative in collaboration with several members of the Consortium for Top-Down Proteomics (CTDP) has developed a standard notation called ProForma 2.0, which is a substantial extension of the original ProForma notation developed by the CTDP. ProForma 2.0 aims to unify the representation of proteoforms and peptidoforms. ProForma 2.0 supports use cases needed for bottom-up and middle-/top-down proteomics approaches and allows the encoding of highly modified proteins and peptides using a human- and machine-readable string. ProForma 2.0 can be used to represent protein modifications in a specified or ambiguous location, designated by mass shifts, chemical formulas, or controlled vocabulary terms, including cross-links (natural and chemical) and atomic isotopes. Notational conventions are based on public controlled vocabularies and ontologies. The most up-to-date full specification document and information about software implementations are available at http://psidev.info/proforma.

Human proteome project

Proteome

10.1021/acs.jproteome.1c00771

Cite

Citations (28)

Proteomics Standards Initiatives ProForma 2.0 Unifying the encoding of Proteoforms and Peptidoforms

arXiv (Cornell University) (2021)

Richard D. LeDuc Eric W. Deutsch Pierre‐Alain Binz Ryan T. Fellers Anthony J. Cesnik

There is the need to represent in a standard manner all the possible variations of a protein or peptide primary sequence, including both artefactual and post-translational modifications of peptides and proteins. With that overall aim, here, the Human Proteome Organization (HUPO) Proteomics Standards Initiative (PSI) has developed a notation, called ProForma 2.0, which is a substantial extension of the original ProForma notation, developed by the Consortium for Top-Down Proteomics (CTDP). ProForma 2.0 aims to unify the representation of proteoforms and peptidoforms. Therefore, this notation supports use cases needed for bottom-up and middle/topdown proteomics approaches and allows the encoding of highly modified proteins and peptides using a human and machine-readable string. ProForma 2.0 covers encoding protein modification names and accessions, cross-linking reagents including disulfides, glycans, modifications encoded using mass shifts and/or via chemical formulas, labile and C or N-terminal modifications, ambiguity in the modification position and representation of atomic isotopes, among other use cases. Notational conventions are based on public controlled vocabularies and ontologies. Detailed information about the notation and existing implementations are available at http://www.psidev.info/proforma and at the corresponding GitHub repository (https://github.com/HUPO-PSI/proforma).

Human proteome project

Proteome

Source

Cite

Citations (0)

P-Massive: A Real-Time Search Engine for a Multi-Terabyte Mass Spectrometry Database

Narangerelt Batsoyol Benjamin Pullman Mingxun Wang Nuno Bandeira Steven Swanson

Queries of multi-TB Mass Spectrometry (MS) repositories provide deep insights into biological processes and pose challenging data processing problems. The key bottleneck for running these queries is the number of small random reads. Byte-addressable persistent main memory (PMEM) technologies enable real-time MS search systems by delivering low-latency, high-bandwidth storage. This work presents P-Massive, real-time multi-terabyte scale MS search system. P-Massive takes advantage of PMEM and the underlying nature of its data access patterns to maximize performance. We evaluate P-Massive across various storage hierarchies and project forward over the next decade to understand how MS query systems might evolve. Our evaluation shows that P-Massive offers a cost-effective solution that achieves near-DRAM performance. A single query takes 1.7 seconds in P-Massive, 69× faster than state-of-the-art implementation. In an end-to-end, user-facing application, P-Massive delivers a 90% shorter wait time than the latest MS search tool, returning results within seconds rather than minutes.

Terabyte

Random access

10.1109/sc41404.2022.00014

Cite

Citations (6)

Proteomics Standards Initiative at Twenty Years: Current Activities and Future Work

Journal of Proteome Research (2023)

Eric W. Deutsch Juan Antonio Vizcaíno Andrew R. Jones Pierre‐Alain Binz Henry Lam

The Human Proteome Organization (HUPO) Proteomics Standards Initiative (PSI) has been successfully developing guidelines, data formats, and controlled vocabularies (CVs) for the proteomics community and other fields supported by mass spectrometry since its inception 20 years ago. Here we describe the general operation of the PSI, including its leadership, working groups, yearly workshops, and the document process by which proposals are thoroughly and publicly reviewed in order to be ratified as PSI standards. We briefly describe the current state of the many existing PSI standards, some of which remain the same as when originally developed, some of which have undergone subsequent revisions, and some of which have become obsolete. Then the set of proposals currently being developed are described, with an open call to the community for participation in the forging of the next generation of standards. Finally, we describe some synergies and collaborations with other organizations and look to the future in how the PSI will continue to promote the open sharing of data and thus accelerate the progress of the field of proteomics.

Human proteome project

10.1021/acs.jproteome.2c00637

Cite

Citations (32)

Universal Spectrum Identifier for mass spectra

bioRxiv (Cold Spring Harbor Laboratory) (2020)

Eric W. Deutsch Yasset Pérez‐Riverol Jeremy Carver Shin Kawano Luis Mendoza

Abstract Mass spectra provide the ultimate evidence for supporting the findings of mass spectrometry (MS) proteomics studies in publications, and it is therefore crucial to be able to trace the conclusions back to the spectra. The Universal Spectrum Identifier (USI) provides a standardized mechanism for encoding a virtual path to any mass spectrum contained in datasets deposited to public proteomics repositories. USIs enable greater transparency for providing spectral evidence in support of key findings in publications, with more than 1 billion USI identifications from over 3 billion spectra already available through ProteomeXchange repositories.

TRACE (psycholinguistics)

10.1101/2020.12.07.415539

Cite

Citations (10)