Estimation of Sequencing Error Rates Present in Genome Databases

2012 
ABSTRACTThe quality of next-generation sequencing data is a major problem in today's bioinformatics. The validation of sequences, either by re-sequencing or pure statistical error evaluation, is the tool needed to ensure the correct results of all following research done with the data.Estimating the error rates in genome databases gives an idea about the level of inherited errors in genome sequences. It is important as these kinds of errors have cumulative effect on every following step of analysis of the sequences. Here we present a way to define the error level in a genome, using two different databases: National Center of Biotechnology Information (NCBI) (as a verified one) and Resources for Plant Comparative Genomics (PlantGDB) as reference. Based on the most conservative regions in every genome—donor/acceptor splice sites (the canonical forms are the dinucleotides GT or GC and AG), we applied statistical methods to derive the NCBI error level for Oryza sativa (japonica cultivar) genome.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    4
    References
    1
    Citations
    NaN
    KQI
    []