coil: an R package for cytochrome C oxidase I (COI) DNA barcode data cleaning, translation, and error evaluation

2019 
Biological conclusions based on DNA barcoding and metabarcoding analyses can be strongly influenced by the methods utilized for data generation and curation, leading to varying levels of success in the separation of biological variation from experimental error. The five-prime region of cytochrome c oxidase I (COI-5P) is the most common barcode gene with conserved structure and function that allows for biologically informed error identification. Here we present coil (https://CRAN.R-project.org/package=coil), an R package for the pre-processing and error assessment of COI-5P animal barcode and metabarcode sequence data. The package contains functions for placement of barcodes into a reading frame, accurate translation of sequences to amino acids, and highlighting insertion and deletion errors. The analysis of 10,000 barcode sequences of varying quality demonstrated how coil can place barcode sequences in reading frame and distinguish sequences containing indel errors from error free sequences with greater than 97.5% accuracy. The limitations of the coil9s analysis pipeline were tested through the analysis of COI-5P sequences from the plant and fungal kingdoms as well as the analysis of potential contaminants: nuclear mitochondrial pseudogenes and Wolbachia COI-5P sequences.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    36
    References
    3
    Citations
    NaN
    KQI
    []