ArrOW: Experiencing a Parallel Cloud-Based De Novo Assembler Workflow

2019 
Advances in next generation sequencing technologies has resulted in the generation of unprecedented volume of sequence data. DNA segments are combined into a reconstruction of the original genome using computer software called genome assemblers. Therefore, assembly now presents new challenges in terms of data management, query, and analysis due the huge number of read sequences and computing intensive CPU-memory algorithms. This restriction reduces the chances to uniformly cover space for exploring statistics, k-mer, software or eukaryotic genomes assembly. To address these issues, we present ArrOW, a cloud-based de novo Assembly clOud Workflow that explores the potential of provenance analytics and parallel computation provided by scientific workflow management systems as SciCumulus. We evaluate the overall performance of ArrOW using up to 256 cores in the Amazon AWS cloud. ArrOW reaches improvements up to 88.3% executing 1,000 reads of genomics datasets. We also highlight how data provenance analytics improved the efficiency for recovering assembling features of genomes.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    0
    Citations
    NaN
    KQI
    []