A critical assessment of gene catalogs for metagenomic analysis.

2021 
Motivation Microbial gene catalogs are data structures that organize genes found in microbial communities, providing a reference for standardized analysis of the microbes across samples and studies. Although gene catalogs are commonly used, they have not been critically evaluated for their effectiveness as a basis for metagenomic analyses. Results As a case study, we investigate one such catalog, the Integrated Gene Catalog (IGC), however our observations apply broadly to most gene catalogs constructed to date. We focus on both the approach used to construct this catalog and, on its effectiveness, when used as a reference for microbiome studies. Our results highlight important limitations of the approach used to construct the IGC and call into question the broad usefulness of gene catalogs more generally. We also recommend best practices for the construction and use of gene catalogs in microbiome studies and highlight opportunities for future research. Availability All supporting scripts for our analyses can be found on GitHub: https://github.com/SethCommichaux/IGC.git. The supporting data can be downloaded from: https://obj.umiacs.umd.edu/igc-analysis/IGC_analysis_data.tar.gz. Supplementary information Supplementary data are available at Bioinformatics online.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    74
    References
    1
    Citations
    NaN
    KQI
    []