Predicting DNA accessibility in the pan-cancer tumor genome using RNA-seq, WGS, and deep learning

2017 
DNA accessibility, chromatin regulation, and genome methylation are key drivers of transcriptional events promoting tumor growth. However, understanding the impact of DNA sequence data on transcriptional regulation of gene expression is a challenge, particularly in noncoding regions of the genome. Recently, neural networks have been used to effectively predict DNA accessibility in multiple specific cell types. These models make it possible to explore the impact of mutations on DNA accessibility and transcriptional regulation. Our work first improved on prior cell-specific accessibility prediction, obtaining a mean receiver operating characteristic (ROC) area under the curve (AUC) = 0:910 and mean precision-recall (PR) AUC = 0:605, compared to the previous mean ROC AUC = 0:895 and mean PR AUC = 0:561. Our key contribution extended the model to enable accessibility predictions on any new sample for which RNA-seq data is available, without requiring cell-type-specific DNase-seq data for re-training. This new model obtained overall PR AUC = 0:621 and ROC AUC = 0:897 when applied across whole genomes of new samples whose biotypes were held out from training, and PR AUC = 0:725 and ROC AUC = 0:913 on randomly held out new samples whose biotypes were allowed to overlap with training. More significantly, we showed that for promoter and promoter flank regions of the genome our model predicts accessibility to high reliability, achieving PR AUC = 0:838 in held out biotypes and PR AUC = 0:908 in randomly held out samples. This performance is not sensitive to whether the promoter and flank regions fall within genes used in the input RNA-seq expression vector. Finally, we utilize this tool to investigate, for the first time, promoter accessibility patterns across several cohorts from The Cancer Genome Atlas (TCGA).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    50
    References
    6
    Citations
    NaN
    KQI
    []