Optimizing the use of gene expression data to predict plant metabolic pathway memberships

2020 
Abstract Plant metabolites produced via diverse pathways are important for plant survival, human nutrition and medicine. However, the pathway memberships of most plant enzyme genes are unknown. While co-expression is useful for assigning genes to pathways, expression correlation may exist only under specific spatiotemporal and conditional contexts. Utilizing >600 expression values and similarity data combinations from tomato, three strategies for predicting membership in 85 pathways were explored: naive prediction (identifying pathways with the most similarly expressed genes), unsupervised and supervised learning. Optimal predictions for different pathways require distinct data combinations that, in some cases, are indicative of biological processes relevant to pathway functions. Naive prediction produced higher error rates compared with machine learning methods. In 52 pathways, unsupervised learning performed better than a supervised approach, which may be due to the limited availability of training data. Furthermore, using gene-to-pathway expression similarities led to prediction models that outperformed those based simply on gene expression levels. Our study highlights the need to extensively explore expression-based features and prediction strategies to maximize the accuracy of metabolic pathway membership assignment. We anticipate that the prediction framework outlined here can be applied to other species and also be used to improve plant pathway annotation.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    62
    References
    0
    Citations
    NaN
    KQI
    []