q2-sample-classifier: machine-learning tools for microbiome classification and regression

2018 
Microbiome studies often aim to predict outcomes or differentiate samples based on their microbial compositions, tasks that can be efficiently performed by supervised learning methods. Here we present a benchmark comparison of supervised learning classifiers and regressors implemented in scikit-learn, a Python-based machine-learning library. We additionally present q2-sample-classifier, a plugin for the QIIME 2 microbiome bioinformatics framework, that facilitates application of the scikit-learn classifiers to microbiome data. Random forest, extra trees, and gradient boosting models demonstrate the highest performance for both supervised classification and regression of microbiome data. Automated feature selection and hyperparameter tuning enhance performance of most methods but may not be necessary under all circumstances. The q2-sample-classifier plugin makes these methods more accessible and interpretable to a broad audience of microbiologists, clinicians, and others who wish to utilize supervised learning methods for predicting sample characteristics based on microbiome composition. The q2-sample-classifier source code is available at https://github.com/qiime2/q2-sample-classifier. It is released under a BSD-3-Clause license, and is freely available including for commercial use.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    8
    Citations
    NaN
    KQI
    []