ngALL database: a flexible framework for the management and integration of childhood leukemia next generation sequencing data

2012 
The massive datasets generated by next-generation sequencing (NGS) studies present major challenges for data management and analysis. In our ongoing pediatric oncogenomics study, we have sequenced over 120 childhood acute lymphoblastic (ALL) quartets (matched normal-tumor, father, mother). ALL is the most common pediatric cancer and leading cause of cancer-related deaths among children, however its underlying causes remain largely unknown. We set out to build a comprehensive catalogue of genomic (sequence and structural) as well as epigenomic variations involved in childhood ALL, through deep exome resequencing, as well as transcriptome analysis (RNA-seq), genome-wide genotyping, as well as array-based methylation profiling. In response to the important challenge of integrating these various sources of information, we have created a flexible database system to effectively manage these ambitious datasets. In particular, we have implemented a custom Next-Generation childhood Acute Lymphoblastic Leukemia relational Database (ngALL DB) to report workflow analyses and integrate whole-exome sequencing data. This database also provides progress reports, allowing the user to track the samples through the project pipeline from the biospecimen repository to the annotated SNP list output. Moreover, it provides information about the different bioinformatics tools, sequencing runs and computing platforms used for mapping, cleaning and analyzing the data. The database design's flexibility and reusability facilitates data integration and allows for a customizable analytical approach and execution of custom structured queries. Complex queries and procedures can be written to interrogate the database and analyse virtually any aspect of the integrated NGS, genomic and epigenomic data. Furthermore, as the main goal of our project is to identify the full complement of genetic variants (inherited and somatic) involved in childhood ALL, the ngALL DB is linked to a high-resolution annotation database where each position in the genome is represented with detailed functional annotation. This integrated database structure offers a flexible framework to compile and characterize the large amounts of data generated from NGS studies, providing a powerful research tool for the identification of causal variants in childhood ALL.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    3
    References
    0
    Citations
    NaN
    KQI
    []