MethylFASTQ: A Tool Simulating Bisulfite Sequencing Data

2019 
DNA methylation is a DNA modification playing an important role in several diseases, including cancer. The gold-standard technique for measuring DNA methylation is Bisulfite Sequencing (BS). The treatment with bisulfite alters the sequence of DNA making the analysis of BS data computationally difficult. There are many tools for analysing BS data but the choice of which to use is difficult due to the extensive biological and technical variability of the data. Synthetic and real datasets can be exploited to evaluate the tool performance and to obtain an accurate data analysis. Today, Sherman is the only available tool to generate BS synthetic datasets. However, this tool does not report any information about the methylated cytosines. For this purpose, in this paper we present MethyIFASTQ, an easy-to-use bioinformatics tool that generates synthetic bisulfite datasets in FASTQ format. MethylFASTQ works in parallel manner using producer-consumer approach. It returns: i) a complete dataset in FASTQ format simulating the results of a BS experiment ii) a report file storing the information about the methylation level of the dataset (i.e. methylated cytosines). First, we test MethylFASTQ performances with an increasing number of concurrent processes and we report the comparison of MethylFASTQ with respect to Sherman tool. Then, we also describe an application of synthetic datasets generated with our tool and we use them as input for two bisulfite mapping and methylation calling tools. Finally, we propose MethylFASTQ as a tool to generate synthetic bisulfite sequencing data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    0
    Citations
    NaN
    KQI
    []