Quality control-based signal drift correction and interpretations of metabolomics/proteomics data using random forest regression

2018 
Large-scale mass spectrometry-based metabolomics and proteomics study requires the long-term analysis of multiple batches of biological samples, which often accompanied with significant signal drift and various inter- and intra- batch variations. The unwanted variations can lead to poor inter- and intra-day reproducibility, which is a hindrance to discover real significance. We developed a novel quality control-based random forest signal correction algorithm, being ensemble learning approach to remove inter- and intra- batches of unwanted variations at feature-level. Our evaluation based on real samples showed the developed algorithm improved the data precision and statistical accuracy for metabolomics and proteomics, which was superior to other common correction methods. We have been able to improve its performance for interpretations of large-scale metabolomics and proteomics data, and to allow the improvement of the data precision for uncovering the real biologically differences.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    2
    Citations
    NaN
    KQI
    []