Prioritizing risk genes for neurodevelopmental disorders using pathway information

2018 
Over the past decade, case-control studies of next-generation sequencing data have proven integral to understanding the contribution of rare inherited and de novo single-nucleotide variants to the genetic architecture of complex disease. Ideally, such studies would identify individual risk genes of moderate to large effect size to generate novel treatment hypotheses for further follow-up. However, due to insufficient power, gene set enrichment analyses have come to be relied upon for detecting differences between cases and controls, implicating sets of hundreds of genes rather than specific targets for further investigation. Here, we present a Bayesian statistical framework, termed gTADA, that integrates gene-set membership information with gene-level de novo (DN) and rare inherited case-control (rCC) counts to prioritize risk genes with excess rare variant burden. With this pipeline, arbitrary significance thresholds can be circumvented. Our method can leverage external gene-level information to identify additional risk genes. Applying gTADA to available whole-exome sequencing datasets for several neuropsychiatric conditions, we replicate previously reported gene set enrichment and identify novel risk genes. For epilepsy, gTADA prioritized 40 significant genes, of which 30 are not in the known gene list (posterior probabilities > 0.95) and 6 replicate in an independent whole-genome sequencing study. We found that epilepsy genes have high protein-protein interaction network connectivity, and their expression during human brain development. Finally, epilepsy risk genes are enriched for the targets of several drugs, including both known anticonvulsants and potentially novel repositioning opportunities.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    46
    References
    0
    Citations
    NaN
    KQI
    []