Analysis of grapevine gene expression data using node-based resilience clustering

2018 
Powdery mildew is the most economically important disease of cultivated grapevines worldwide. In the agricultural community, there is a great need for better understanding of the complex genetic basis of powdery mildew (PM) resistance by delineating possible gene biomarkers associated with the plants' defense mechanisms. Machine learning techniques can be applied to analysis of gene expression data to aid knowledge discovery of disease fighting genes. In this work, we apply a data-driven computational model, utilizing a graph-based clustering algorithm - Node-Based Resilience Clustering (NBR- Clust), to analyze grapevine gene expression data to identify possible gene biomarkers associated with powdery mildew disease defense mechanisms. We investigated two graph representations (geometric and kNN) on the mean differences of PM inoculated vs. mock inoculated gene expression values of Cabernet and Norton (PM disease resistant) species across 6 time points. By applying the contrarian approach, we hypothesized that smaller sized clusters will contain genes that do not follow general patterns, hence, could display distinct expression patterns of PM- induced transcripts across the time points that may insinuate biological relevance. We compared the smaller clusters obtained in Norton in contrast with the ones from Cabernet in terms of the genes that clustered in common between both (intersection of sets) as well as the differences of the sets. The results obtained demonstrate the usefulness of the geometric graphs for this domain application in contrast to the kNN graphs. Some genes that belong to biologically relevant pathways were identified that displayed differences in patterns across the time points between Norton and Cabernet species.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    3
    Citations
    NaN
    KQI
    []