Multivariate information maximization yields hierarchies of expression components in tumors that are both biologically meaningful and prognostic

2016 
*De novo* inference of clinically relevant gene function relationships from tumor RNA-seq remains a challenging task. In this work we show that Correlation Explanation (CorEx), a recently developed machine learning algorithm that optimizes over multivariate mutual information, achieves significant progress toward this goal. CorEx utilizes high dimensional correlations for a principled construction of relatively independent latent factors that explain dependencies in gene expression among samples. Using only ovarian tumor RNA-seq, CorEx infers gene cohorts with related function, recapitulating Gene Ontology annotation relationships. CorEx is able to identify latent factors that capture dependencies in groups of genes whose expression patterns correlate with patient survival in ovarian cancer. Some inferred pathways such as chemokine signaling and FGF signaling have been implicated previously in chemo responsiveness, but novel survival-associated groups are identified as well. These include a pathway connected with the epithelial-mesenchymal transition in breast cancer that is regulated by a potentially druggable microRNA. Further, it is seen that combinations of factors lead to a synergistic survival advantage in some cases. Comparison to normal ovarian tissue exhibits substantial differences between cancerous and non-cancerous samples related to a variety of cellular processes. In contrast to studies that attempt to partition patients into a small number of subtypes (typically 4 or fewer), our approach utilizes subgroup information for combinatoric transcriptional phenotyping. Considering only the 66 gene expression groups that are found to have significant Gene Ontology enrichment and are also small enough to indicate specific drug targets implies a computational phenotype for ovarian cancer that allows for 366 possible patient profiles, enabling truly personalized treatment. The findings here demonstrate a new technique that sheds light on the complexity of gene expression dependencies in tumors and could eventually enable the use of patient RNA-seq profiles for selection of personalized and effective cancer treatments.
    • Correction
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []