Performance of Mapping Approaches for Whole-Genome Bisulfite Sequencing Data in Crop Plants
2020
DNA methylation is involved in many different biological processes in the development and well-being of crop plants such as transposon activation, heterosis, environment-dependent transcriptome plasticity, aging, and many diseases. Whole-genome bisulfite sequencing is an excellent technology for detecting and quantifying DNA methylation patterns in a wide variety of species, but optimised data analysis pipelines exist only for a small number of species and are missing for many important crop plants. Pipelines for the analysis of whole-genome bisulfite sequencing data usually consists of four steps: read trimming, read mapping, quantification of methylation levels, and prediction of differentially methylated regions (DMRs). Here we focus on read mapping, which is challenging because un-methylated cytosines are transformed to uracil during bisulfite treatment and to thymine during the subsequent polymerase chain reaction and read mappers must be capable of dealing with this cytosine/thymine polymorphism. Several read mappers have been developed for the last years with different strengths and weaknesses, but their performance has not been critically evaluated. Here, we compare eight read mappers: bismark, bismarkbwt2, BSMAP, BS-Seeker2, bwameth, GEM3, segemehl and GSNAP to assess the impact of the read-mapping results on the prediction of DMRs. We use simulated data generated from the genomes of Arabidopsis thaliana, Brassica napus, Glycine max., Solanum tuberosum, and Zea mays, monitor the effects of the bisulfite conversion rate, the sequencing error rate, the maximum number of allowed mismatches, as well as the genome structure and size, and calculate precision, number of uniquely mapped reads, distribution of the mapped reads, run time, and memory consumption as features for benchmarking the eight read mappers mentioned above. We find that the conversion rate has only a minor impact on the mapping quality and the number of uniquely mapped reads, whereas the error rate and the maximum number of allowed mismatches has a strong impact and leads to differences of the performance of the eight read mappers. In conclusion, we recommend BSMAP that needs the shortest run time and yields the highest precision and bismark that requires the smallest amount of memory and yields precision and high numbers of uniquely mapped reads.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
43
References
4
Citations
NaN
KQI