metabaR : an R package for the evaluation and improvement of DNA metabarcoding data quality

2020 
DNA metabarcoding is becoming the tool of choice for biodiversity studies across taxa and large-scale environmental gradients. Yet, the artefacts present in metabarcoding datasets often preclude a proper interpretation of ecological patterns. Bioinformatic pipelines removing experimental noise have been designed to address this issue. However, these often only partially target produced artefacts, or are marker specific. In addition, assessments of data curation quality and the appropriateness of filtering thresholds are seldom available in existing pipelines, partly due to the lack of appropriate visualisation tools. Here, we present metabaR, an R package that provides a comprehensive suite of tools to effectively curate DNA metabarcoding data after basic bioinformatic analyses. In particular, metabaR uses experimental negative or positive controls to identify different types of artefactual sequences, i.e. reagent contaminants and tag-jumps. It also flags potentially dysfunctional PCRs based on PCR replicate similarities when those are available. Finally, metabaR provides tools to visualise DNA metabarcoding data characteristics in their experimental context as well as their distribution, and facilitate assessment of the appropriateness of data curation filtering thresholds. metabaR is applicable to any DNA metabarcoding experimental design but is most powerful when the design includes experimental controls and replicates. More generally, the simplicity and flexibility of the package makes it applicable any DNA marker, and data generated with any sequencing platform, and pre-analysed with any bioinformatic pipeline. Its outputs are easily usable for downstream analyses with any ecological R package. metabaR complements existing bioinformatics pipelines by providing scientists with a variety of functions with customisable methods that will allow the user to effectively clean DNA metabarcoding data and avoid serious misinterpretations. It thus offers a promising platform for automatised data quality assessments of DNA metabarcoding data for environmental research and biomonitoring.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    2
    Citations
    NaN
    KQI
    []