Whole cancer genome analysis using an I/O aware job scheduler on high performance computing resource
2014
Recent advances in DNA sequencing technology have enabled Next Generation Sequencing (NGS) instruments to accelerate generating billions of DNA reads in a few days. However, the management of enormous NGS data and the concurrent analysis of these vast amount of data requires a great deal of computing power and memory as well as huge disk storage. Current popular job scheduling systems provide efficient ways for managing and scheduling vast amount of analysis based on the available computing resources but don't consider the maximum amount of Input/Output (I/O). Thus, when executing large number of genome analysis on a large scale cluster system, the maximum bandwidth of storage I/O is insufficient to utilize all computing resources so the analysis jobs are frequently suspended. Here we developed a disk I/O aware job submission scheduler to maximize disk I/O usage but not hampering previously running jobs due to the heavy disk I/O of a new job. And we constructed a cancer genome analysis pipeline by using our I/O aware scheduler and HPC resources in National Institute of Supercomputing and Networking (NISN) to overcome the obstacles of concurrent analysis for vast amount of NGS data. Based on our I/O aware job submission scheduler, we performed major genome analyses on over 50 case-control pairs of chromophobe renal cell carcinoma patients whole genome samples sequenced by The Cancer Genome Atlas (TCGA) and successfully completed all analysis jobs while maintaining no jobs to be suspended by I/O bottleneck.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
5
References
0
Citations
NaN
KQI