Simplified and unified access to cancer proteogenomic data
2020
Comprehensive cancer datasets recently generated by the Clinical Proteomic Tumor Analysis Consortium (CPTAC) offer great potential for advancing our understanding of how to combat cancer. These datasets include DNA, RNA, protein, and clinical characterization for tumor and normal samples from large cohorts in many different cancer types. The raw data are publicly available at various Cancer Research Data Commons. However, widespread re-use of these datasets is also facilitated by easy access to the processed quantitative data tables. We have created a Python package, cptac, which is a data API that distributes the finalized processed CPTAC datasets in a consistent, up-to-date format. This consistency makes it easy to integrate the data with common graphing, statistical, and machine learning packages for advanced analysis. Additionally, consistent formatting across all cancer types promotes the investigation of pan-cancer trends. The data API structure of directly streaming data within a programming environment enhances reproducibility. Finally, with the accompanying tutorials, this package provides a novel resource for cancer research education.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
19
References
1
Citations
NaN
KQI