Distributed bioinformatics analyses on an SGE cluster, for variant calling on bovine whole-genome sequencing samples

2019 
As a consequence of next-generation sequencing (NGS) technologies being more and more commonly adopted, bioinformatics and biomedicine have become top fields in terms of data output. These increasing volumes of data pose a challenge for storage, analysis and interpretation. A large number of computational pipelines have been created in order to automate the analysis steps for NGS data. In this study we have carried out multiple computational experiments in order to test the scalability of a data analysis pipeline called Bcbio-nextgen, when running variant calling for whole-genome sequencing data for 40 Bos taurus samples. Bcbio-nextgen showed good scalability in multiple scenarios using between 64 and 320 CPU cores. However, more computing resources (especially storage space) would be required in order to support the data analysis of larger numbers of samples.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    0
    Citations
    NaN
    KQI
    []