Implementing the re-use of public DIA proteomics datasets: from the PRIDE database to Expression Atlas

2021 
Rising numbers of mass spectrometry proteomics datasets available in the public domain, increasingly include volumes generated from Data Independent Acquisition approaches, SWATH-MS in particular. Unlike Data Dependent Acquisition datasets, their re-use is limited, partially due to challenges in combination and use of free software for analysis in the non-specialist laboratory. We introduce a (re-)analysis pipeline for SWATH-MS data available in the PRIDE database, which includes a harmonised combination of metadata annotation protocols, automated workflows for MS data, statistical analysis and results integration into the resource Expression Atlas. Automation is orchestrated with Nextflow, using containerised open analysis software tools, rendering the pipeline readily available, reproducible and easy to update. To demonstrate its utility, we reanalysed 10 public DIA datasets, 1,278 individual SWATH-MS runs, stored in PRIDE. The robustness of the analysis was evaluated and compared to the results obtained in the original publications. The final results were exported into Expression Atlas, making quantitative results from SWATH-MS experiments more widely available and integrated with results from other reanalysed proteomics and transcriptomics datasets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    42
    References
    4
    Citations
    NaN
    KQI
    []