Abstract 3967: The Cancer Genome Project high throughput analysis pipeline

2012 
The Cancer Genome Project (CGP) was set up in 2000 to use systematic mutation screening methods to increase our understanding of human cancer. With the advent of next generation sequencing, and the large volumes of data that it generates, a new suite of software was required to rapidly and accurately screen these data for somatic changes. We have built an analysis pipeline to track and analyse large numbers of tumour samples, using in-house and externally available tools. The analysis pipeline is built around a ∼2,000 node compute farm and Lustre filesystem which outputs into our archive and data storage system, FileTrk. FileTrk holds the raw data files (BAM, CEL etc), the results of the analysis and any versioning information about the software used to generate these results. Sample lanes are aligned back to the genome using Burrows-Wheeler Aligner (BWA) and lane-to-lane comparisons are made to ensure data integrity. Lanes from each sample are merged into a single sample BAM file and once 30 - 40x coverage is reached and the lanes have been quality assessed the sample is locked and ready for analysis. Mutation callers detect point mutations (Caveman, in-house software), small insertions/deletions (Pindel), breakpoints (BRASS, in-house software) and copy number changes (ASCAT & PICNIC, in-house software). The resulting mutations are post-processed to remove false positives, annotated to the RNA and protein level using standard nomenclature (Vagrent, in-house software) and uploaded to a database. Interfaces have been developed to enable the selection of random sets of mutations for validation, the outcome of the validations is recorded so specificity can be calculated for each sample in the system. IT systems are being developed to automatically export lists of somatic changes to COSMIC, the ICGC data portal and raw data to the European Genome-Phenome Archive (EGA). Citation Format: {Authors}. {Abstract title} [abstract]. In: Proceedings of the 103rd Annual Meeting of the American Association for Cancer Research; 2012 Mar 31-Apr 4; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2012;72(8 Suppl):Abstract nr 3967. doi:1538-7445.AM2012-3967
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []