Biodart - catalogue of biological data aftifact examples

Anitha Veeramani,Kavitha Gopalakrishnan,Vladimir Brusic,Judice L. Y. Koh

Biodart - catalogue of biological data aftifact examples

2006

Information in biological data repositories continues to grow exponentially due to the increasing genomic and proteomic sequencing projects. As with any database, these data repositories are subjected to data quality issues related to correctness, uniformity, completeness, redundancy, among others. Data cleaning is a prerequisite to prevent the interference of low quality data with the accuracy of data mining and analysis. This in turn involves the detection and resolution of data artifacts (errors, discrepancies, redundancies, ambiguifes, and incompleteness). Understanding the causes of data artifacts and systematically classifying them are critical towards their elimination in molecular sequence databases. This paper highlights eight data artifacts found among public molecular databases. Examples of major molecular sequence database records containing these artifacts are collected into the BioDArt catalogue (http://antigen.i2r.a-star.edu.sg/BioDArt).

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations