Bioinformatics Methodology Development for the Whole Genome Bisulfite Sequencing

2019 
Abstract Understanding the functional role of DNA methylation requires knowledge of its distribution in the genome. Bisulfite treatment followed by deep sequencing (BS-seq) has emerged as the gold standard to study genome-wide DNA methylation at single-nucleotide resolution. While progress in next-generation sequencing allows increasingly affordable BS-seq, the resulting exponential data growth poses significant bioinformatics challenges. Here, we developed a novel bioinformatics pipeline, MOABS, to increase the speed, accuracy, statistical power, and biological relevance of the BS-seq data analysis. MOABS introduces a novel strategy to combine statistical P -value and biological difference into a single metric, termed credible methylation difference (CDIF), and has enough power to detect single-CpG-resolution differential methylation in small regulatory regions, such as transcription factor binding sites (TFBSs), with as low as 4- to 10-fold coverage. Numerous computational optimizations have made MOABS extremely efficient, capable of processing 2 billion aligned reads in 24 CPU hours. Our simulation study reveals superior performance of MOABS over other leading algorithms, such as Fisher's exact test and BSmooth. Using real whole genome BS-seq data, we demonstrate that MOABS improves the detection of allele-specific DNA methylation as well as differential methylation underlying TFBSs, especially at low sequencing depth. In addition, MOABS analysis can be easily extended to more complicated scenarios, such as differential 5hmC analysis using a combination of RRBS and oxBS-seq. The source code of MOABS is freely available at http://www.deqiangsun.org/software/ .
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []