Efficient Representations of Tumor Diversity with Paired DNA-RNA Anomalies
2020
Cancer cells display massive dysregulation of key regulatory pathways due to now well- catalogued mutations and other DNA-related aberrations. Moreover, enormous heterogeneity has been commonly observed in the identity, frequency, and location of these aberrations across individuals with the same cancer type or subtype, and this variation naturally propagates to the transcriptome, resulting in myriad types of dysregulated gene expression programs. Many have argued that a more integrative and quantitative analysis of heterogeneity of DNA and RNA molecular profiles may be necessary for designing more systematic explorations of alternative therapies and improving predictive accuracy. We introduce a representation of multi-omics profiles which is sufficiently rich to account for observed heterogeneity and support the construction of quantitative, integrated metrics of variation. Starting from the network of interactions existing in Reactome, we build a library of "paired DNA- RNA anomalies" that represent prototypical and recurrent patterns of dysregulation in cancer; each two-gene "motif" consists of a "source" regulatory gene and a "target" gene whose expression is "controlled" by the source gene. The pair motif is then "active" or "realized" in a joint DNA-RNA profile if the source gene is DNA-aberrant (e.g., mutated, deleted, or duplicated), and the downstream target gene is "RNA-divergent", meaning its expression level is outside the normal, baseline range. With M motifs, each sample profile has exactly one of the 2M possible configurations. We focus our analyses on reduced configurations by selecting tissue-dependent minimal coverings that we define as the smallest family of motifs with the property that every sample in the considered population displays at least one active motif; these minimal coverings can be computed with integer programming. Given such a covering, a natural measure of cross-sample diversity is the extent to which the particular active motifs vary from sample to sample; this variability is captured by the entropy of the distribution over configurations. We apply this program to data from TCGA for six distinct tumor types (breast, prostate, lung, colon, liver, and kidney cancer). This enables an efficient simplification of the complex landscape observed in cancer populations, resulting in the identification of novel signatures of molecular alterations which are not detected with frequency-based criteria. Estimates of cancer heterogeneity across tumor phenotypes reveals a stable pattern: entropy increases with disease severity. This framework is then well-suited to accommodate the expanding complexity of cancer genomes and epigenomes emerging from large consortia projects.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
74
References
0
Citations
NaN
KQI