Nearly all cellular functions are mediated by protein-protein interactions and mapping the interactome provides fundamental insights into the regulation and structure of biological systems. In principle, affinity purification coupled to mass spectrometry (AP-MS) is an ideal and scalable tool, however, it has been difficult to identify low copy number complexes, membrane complexes and those disturbed by protein-tagging. As a result, our current knowledge of the interactome is far from complete, and assessing the reliability of reported interactions is challenging. Here we develop a sensitive, high-throughput, and highly reproducible AP-MS technology combined with a quantitative two-dimensional analysis strategy for comprehensive interactome mapping of Saccharomyces cerevisiae . We reduced required cell culture volumes thousand-fold and employed 96-well formats throughout, allowing replicate analysis of the endogenous green fluorescent protein (GFP) tagged library covering the entire expressed yeast proteome. The 4159 pull-downs generated a highly structured network of 3,909 proteins connected by 29,710 interactions. Compared to previous large-scale studies, we double the number of proteins (nodes in the network) and triple the number of reliable interactions (edges), including very low abundant epigenetic complexes, organellar membrane complexes and non-taggable complexes interfered by abundance correlation. This nearly saturated interactome reveals that the vast majority of yeast proteins are highly connected, with an average of 15 interactors, the majority of them unreported so far. Similar to social networks between humans, the average shortest distance is 4.2 interactions. A web portal ( www.yeast-interactome.org ) enables exploration of our dataset by the network and biological communities and variations of our AP-MS technology can be employed in any organism or dynamic conditions.
16S rRNA gene profiling is currently the most widely used technique in microbiome research and allows the study of microbial diversity, taxonomic profiling, phylogenetics, functional and network analysis. While a plethora of tools have been developed for the analysis of 16S rRNA gene data, only a few platforms offer a user-friendly interface and none comprehensively covers the whole analysis pipeline from raw data processing down to complex analysis. We introduce Namco, an R shiny application that offers a streamlined interface and serves as a one-stop solution for microbiome analysis. We demonstrate Namco's capabilities by studying the association between a rich fibre diet and the gut microbiota composition. Namco helped to prove the hypothesis that butyrate-producing bacteria are prompted by fibre-enriched intervention. Namco provides a broad range of features from raw data processing and basic statistics down to machine learning and network analysis, thus covering complex data analysis tasks that are not comprehensively covered elsewhere. Namco is freely available at https://exbio.wzw.tum.de/namco/.
Background: Acquisition of leukemia‐associated somatic mutations by one or more haematopoietic stem cells (HSCs) is inevitable in healthy individuals by the age of 50–60 years 1 . However, the consequences of mutation acquisition are highly variable, even amongst those with identical mutations, and range from long‐term clinically silent clonal haematopoiesis (CH) to leukemic progression. Aims: To investigate the role of the inherited genome in determining CH emergence and behaviour, by studying CH concordance patterns in monozygotic (MZ) and dizygotic (DZ) twin pairs. Methods: 154 individuals from the TwinsUK cohort were studied, comprising 52 MZ and 27 DZ twin pairs, aged 70–99 years 2 . Deep sequencing was performed on blood DNA, targeting 41 genes implicated in CH and myeloid malignancies (Agilent SureSelect, ELID 0735431). Somatic single nucleotide variants and small indels were called using the Shearwater algorithm (v.1.21.5) and verified using CaVEMan (v.1.11.2) and Pindel (v.2.2), and curation of variant oncogenicity was performed, as described previously 3 . Statistical analyses were performed in R (version 3.4.0). Fisher's Exact Test was used to assess twin concordance for CH. Null distributions of CH within the MZ and DZ groups were generated using random permutation (1000 iterations). Results: Using deep sequencing (mean 1650X) and sensitive variant‐calling, we identified CH (VAF≥0.5%) in 62% of individuals (95/154) (Figure 1A), with mutations in the epigenetic regulators DNMT3A and TET2 predominant (Figure 1B). The overall prevalence of CH was very similar among individuals belonging to the MZ and DZ groups (59% and 54% respectively; p = 0.70). We did not observe higher concordance for CH within MZ twin pairs as compared to DZ pairs (p = 0.59, Figure 1C). Furthermore, using random sample permutation to model the null distribution, we found no difference in the observed distribution of CH among either MZ or DZ twins as compared to that expected by chance (p = 1 for MZ; p = 0.86 for DZ; Figure 1C). Increased twin concordance was also not observed when separately considering CH with: (i) mutations in DNMT3A, (ii) mutations in TET2, and (iii) larger clones (VAF > 2%). Despite the overall lack of concordance for CH, we identified two MZ twin pairs in which both twins harbored identical rare somatic nonsense mutations, KDM6A Q692X (NM_021140:c.C2074T) in one pair and DNMT3A R598X (NM_175629:c.C1792T) in the other. Finally, in 4 MZ twin pairs, serial blood samples taken after an interval of 4–5 years showed significant inter‐twin variability in clonal trajectories, even for clones harboring mutations in the same genes. Summary/Conclusion: We find no evidence for a strong heritable predisposition to age‐related CH. The variability in clonal dynamics over time between twins in MZ pairs supports an important role for the non‐inherited environment in determining clonal behaviour. The sharing of identical somatic mutations by twins in two MZ pairs, in view of the rarity of these particular mutations in CH and myeloid malignancies, suggests that mutation acquisition probably occurred during embryogenesis, either prior to twinning or more likely in an HSC whose progeny reached both twins through shared circulation in utero. While the sharing of somatic mutations has been demonstrated in other settings, including pediatric leukemia, this would be the first description of possible acquisition of adult‐type CH driver mutations during embryogenesis. image
Cellular functions are mediated by protein-protein interactions, and mapping the interactome provides fundamental insights into biological systems. Affinity purification coupled to mass spectrometry is an ideal tool for such mapping, but it has been difficult to identify low copy number complexes, membrane complexes and complexes that are disrupted by protein tagging. As a result, our current knowledge of the interactome is far from complete, and assessing the reliability of reported interactions is challenging. Here we develop a sensitive high-throughput method using highly reproducible affinity enrichment coupled to mass spectrometry combined with a quantitative two-dimensional analysis strategy to comprehensively map the interactome of Saccharomyces cerevisiae. Thousand-fold reduced volumes in 96-well format enabled replicate analysis of the endogenous GFP-tagged library covering the entire expressed yeast proteome1. The 4,159 pull-downs generated a highly structured network of 3,927 proteins connected by 31,004 interactions, doubling the number of proteins and tripling the number of reliable interactions compared with existing interactome maps2. This includes very-low-abundance epigenetic complexes, organellar membrane complexes and non-taggable complexes inferred by abundance correlation. This nearly saturated interactome reveals that the vast majority of yeast proteins are highly connected, with an average of 16 interactors. Similar to social networks between humans, the average shortest distance between proteins is 4.2 interactions. AlphaFold-Multimer provided novel insights into the functional roles of previously uncharacterized proteins in complexes. Our web portal ( www.yeast-interactome.org ) enables extensive exploration of the interactome dataset.
Abstract Protein misfolding diseases, including alpha-1 antitrypsin deficiency (AATD), pose significant health challenges, with their cellular progression still poorly understood 1–3 . We utilize spatial proteomics by mass spectrometry and machine learning to map AATD in human liver tissue. Combining Deep Visual Proteomics (DVP) with single-cell analysis 4,5 , we probe intact patient biopsies to resolve molecular events during hepatocyte stress in pseudo-time across fibrosis stages. We achieve unprecedented proteome depth of up to 3,800 proteins from a third of a single cell in formalin-fixed, paraffin-embedded (FFPE) tissue. This dataset revealed a potentially clinically actionable peroxisomal upregulation that precedes the canonical unfolded protein response. Our single-cell proteomics data show alpha-1 antitrypsin accumulation is largely cell-intrinsic, with minimal stress propagation between hepatocytes. We integrated proteomic data with AI-guided image-based phenotyping across multiple disease stages, revealing a terminal hepatocyte state characterized by globular protein aggregates and distinct proteomic signatures, notably including elevated TNFSF10/TRAIL expression. This phenotype may represent a critical disease progression stage. Our study offers novel insights into AATD pathogenesis and introduces a powerful methodology for high-resolution, in situ proteomic analysis of complex tissues. This approach holds potential to unravel molecular mechanisms in various protein misfolding disorders, setting a new standard for understanding disease progression at the single-cell level in human tissue.
Distinction of non-self from self is the major task of the immune system. Immunopeptidomics studies the peptide repertoire presented by the human leukocyte antigen (HLA) protein, usually on tissues. However, HLA peptides are also bound to plasma soluble HLA (sHLA), but little is known about their origin and potential for biomarker discovery in this readily available biofluid. Currently, immunopeptidomics is hampered by complex workflows and limited sensitivity, typically requiring several mL of plasma. Here, we take advantage of recent improvements in the throughput and sensitivity of mass spectrometry (MS)-based proteomics to develop a highly sensitive, automated, and economical workflow for HLA peptide analysis, termed Immunopeptidomics by Biotinylated Antibodies and Streptavidin (IMBAS). IMBAS-MS quantifies more than 5000 HLA class I peptides from only 200 μl of plasma, in just 30 min. Our technology revealed that the plasma immunopeptidome of healthy donors is remarkably stable throughout the year and strongly correlated between individuals with overlapping HLA types. Immunopeptides originating from diverse tissues, including the brain, are proportionately represented. We conclude that sHLAs are a promising avenue for immunology and potentially for precision oncology.
Background: The somatic mutations that drive acute myeloid leukaemia (AML) are highly heterogenous. Identifying the mutations that drive individual cases can help determine patient prognosis and therapy. For this reason, genetic testing for prognostically important mutations in leukaemic DNA is now routine in many diagnostic laboratories. Also, the analysis of gene‐expression profiles from AML RNA can provide additional clinically useful information that cannot be inferred from DNA sequencing. Whilst it is both expensive and impractical to carry out both types of sequencing and analyses, we hypothesised that the two could be combined more cheaply and effectively if the presence of mutations in prognostically important gene mutations could be identified from RNA‐seq data. However, computational methodologies for robust detection of the diverse types of somatic mutations found in AML such as substitutions, indels, tandem duplications and translocations, are not currently available. Aims: To develop an stand‐alone, lightweight and use‐friendly software for the identification of clinically relevant mutations from AML RNA‐seq data. Methods: To ensure efficient mapping of RNA‐seq reads, we hash‐indexed the DNA sequences of target genes using 10‐mer sliding windows and implemented the ”seed and extend“ algorithm for read alignment. Point mutations and small indels were detected from reads with imperfect alignments and tandem duplications were detected from reads spanning the duplication junction. Translocations were detected from reads whose ends belonged to preselected fusion partner genes. Results: To benchmark our software we used RNA‐seq data from 151 whole‐exome/genome‐sequenced AML samples studied by The Cancer Genome Atlas Research Network. We show that our software reliably calls clinically important mutations affecting the genes NPM1 (4‐nt insertion), FLT3 (substitutions and internal tandem duplications, ITD), MLL partial tandem duplications (PTD), as well as substitutions in CEBPA, IDH1/2, TP53 and RUNX1. Furthermore, we identified gene fusions including PML‐RARA, MYH11‐CBFB, RUNX1‐RUNX1T1, BCR‐ABL1 and NUP98‐NSD1. Our software is fast and memory efficient and is able to identify the above mutations in less than 20 minutes starting with RNA‐seq FastQ files of 100 million 50 bp paired‐end reads, using a standard modern laptop computer. In addition, the software operates through a graphical user interface making it accessible to users without programming knowledge. Summary/Conclusion: We demonstrated that clinically important somatic mutations that drives AML can be reliably detected from RNA‐seq data alone using our software. As our approach can be readily combined with conventional gene expression analyses of the same RNA‐seq dataset, it can be used to generate data with enhanced clinical utility that can improve prognostication and guide patient treatment. As RNA sequencing is a straightforward procedure, our approach can readily enter clinical laboratories, where it can significantly reduce experimental costs and accelerate diagnostic work‐ups.
Abstract SARS-CoV-2 directly damages lung tissue via its infection and replication process and indirectly due to systemic effects of the host immune system. There are few systems-wide, untargeted studies of these effects on the different tissues of the human body and nearly all of them base their conclusions on the transcriptome. Here we developed a parallelized mass spectrometry (MS)-based proteomics workflow allowing the rapid, quantitative analysis of hundreds of virus-infected and FFPE preserved tissues. The first layer of response in all tissues was dominated by circulating inflammatory molecules. To discriminated between these systemic and true tissue-specific effects, we developed an analysis pipeline revealing that proteome alterations reflect extensive tissue damage, mostly similar to non-COVID diffuse alveolar damage. The next most affected organs were kidney and liver, while the lymph-vessel system was also strongly affected. Finally, secondary inflammatory effects of the brain correlated with receptor rearrangements and the degradation of neuronal myelin. Our results establish MS-based tissue proteomics as a promising strategy to inform organ-specific therapeutic interventions following COVID-19 infections.