Streamlining DNA Sequencing and Bioinformatics Analysis Using Software Containers

Alberto Riva,J. Lucas Boatwright,Tongjun Gu,Fahong Yu,W. Brad Barbazuk

Streamlining DNA Sequencing and Bioinformatics Analysis Using Software Containers

2019

Advances in software containerization are revolutionizing the way applications are distributed and executed. Containers are stand-alone software environments that encapsulate all dependencies an application may need, are built from well-defined recipes, and are immutable and portable, ensuring reliability and reproducibility of results. The Bioinformatics Core of the Interdisciplinary Center for Biotechnology Research (ICBR) is using containers to streamline the management of Next-Gen Sequencing (NGS) data generated by the center's Sequencing Core. NGS data analysis usually begins with a sequence of quality-control and cleanup steps that are common to most applications. These include trimming reads on the basis of quality, generating reports, and producing basic statistics on the sequencing run output (e.g. number of reads per sample, fraction of low-quality reads, etc). These initial steps have been containerized and are now executed automatically after each sequencing run, before the datasets are handed over to the Bioinformatics Core for analysis. This strategy offers three advantages. First, QC reports are immediately available after the sequencing run is complete and can be delivered to the customer right away. Second, any problems with the data can be detected, and if necessary addressed, before starting the analysis, saving precious time. Third, Bioinformatics Core staff are freed from having to perform these routine tasks and are able to focus on the actual analysis of the data. We describe the implementation of the containers, and how they were integrated into the standard workflow of the sequencing core. Examples include generation of QC reports via FASTQC and MULTIQC as well as read trimming via Trimmomatic or fastp. We also report on a preliminary evaluation of the benefits in terms of faster project turnaround and customer feedback. Future plans include integration with CrossLabs, using custom forms to select the specific pre-processing steps to be performed after each sequencing run.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations