Predicting target genes of noncoding regulatory variants with ICE.

2020 
Interpreting genetic variants of unknown significance (VUS) is essential in clinical applications of genome sequencing for diagnosis and personalized care. Noncoding variants remain particularly difficult to interpret, despite making up a large majority of trait associations identified in GWAS analyses. Predicting the regulatory effects of noncoding variants on candidate genes is a key step in evaluating their clinical significance. Here we develop a machine learning algorithm, ICE (Inference of Connected eQTLs), to predict the regulatory targets of noncoding variants identified in studies of expression quantitative trait loci (eQTLs). We assemble datasets using eQTL results from the Genotype-Tissue Expression (GTEx) project and learn to separate positive and negative pairs based on annotations characterizing the variant, gene and the intermediate sequence. ICE achieves an area under the receiver operating characteristic curve (ROC-AUC) of 0.799 using random cross-validation, and 0.700 for a more stringent position-based cross-validation. Further evaluation on rare variants and experimentally-validated regulatory variants shows a significant enrichment in ICE identifying the true target genes versus negative controls. In gene ranking experiments, ICE achieves a top-1 accuracy of 50% and top-3 accuracy of 90%. Salient features, including GC content, histone modifications and Hi-C interactions are further analyzed and visualized to illustrate their influences on predictions. ICE can be applied to any VUS of interest and each candidate nearby gene to output a score reflecting the likelihood of regulatory effect on the expression level. These scores can be used to prioritize variants and genes to assist in patient diagnosis and GWAS follow-up studies. AVAILABILITY: Codes and data used in this work are available at https://github.com/miaecle/eQTL_Trees. SUPPLEMENTARY INFORMATION: Supplementary data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    46
    References
    2
    Citations
    NaN
    KQI
    []