logo
    Abstract:
    Abstract Phenome-wide association studies (PheWAS), which assess whether a genetic variant is associated with multiple phenotypes across a phenotypic spectrum, have been proposed as a possible aid to drug development through elucidating mechanisms of action, identifying alternative indications, or predicting adverse drug events (ADEs). Here, we evaluate whether PheWAS can inform target validation during drug development. We selected 25 single nucleotide polymorphisms (SNPs) linked through genome-wide association studies (GWAS) to 19 candidate drug targets for common disease therapeutic indications. We independently interrogated these SNPs through PheWAS in four large “real-world data” cohorts (23andMe, UK Biobank, FINRISK, CHOP) for association with a total of 1,892 binary endpoints. We then conducted meta-analyses for 145 harmonized disease endpoints in up to 697,815 individuals and joined results with summary statistics from 57 published GWAS. Our analyses replicate 70% of known GWAS associations and identify 10 novel associations with study-wide significance after multiple test correction (P<1.8x10 -6 ; out of 72 novel associations with FDR<0.1). By leveraging directionality and point estimate of the effect sizes, we describe new associations that may predict ADEs, e.g., acne, high cholesterol, gout and gallstones for rs738409 (p.I148M) in PNPLA3 ; or asthma for rs1990760 (p.T946A) in IFIH1 . We further propose how quantitative estimates of genetic safety/efficacy profiles can be used to help prioritize candidate targets for a specific indication. Our results demonstrate PheWAS as a powerful addition to the toolkit for drug discovery. One Sentence Summary Matching genetics with phenotypes in 800,000 individuals predicts efficacy and on-target safety of future drugs.
    Keywords:
    Genome-wide Association Study
    Phenome
    Genetic Association
    Bonferroni correction
    Multiple comparisons problem
    Pharmacogenomics
    동시에 여러 개의 가설검정 수행시 귀무가설이 참일 경우 귀무가설을 기각할 확률이 커지는 문제가 발생한다. 이러한 다중검정 문제 해결을 위해 여러 연구에서는 가설검정시 필요한 집단별 오류율(FWER; family-wise error rate), 위발견율 (FDR; false discovery rate) 또는 위비발견율 (FNR; false nondiscovery rate) 과 통계량을 고려하여 검정력을 높이고자 하였다. 본 연구에서는 T 통계량, 수정된 T 통계량, 그리고 LPE (local pooled error) 통계량 기반 P값을 이용한 Bonferroni (1960) 방법, Holm (1979) 방법, Benjamini와 Hochberg (1995) 방법과 Benjamini와 Yekutieli (2001) 방법 그리고 Z 통계량 기반 Sun과 Cai (2007) 방법을 고찰하고 모의실험을 통해 다중검정 능력을 비교하였다. 또한 실제 데이터로 애기장대 유전자 발현 데이터에 대해 여러 가지 다중검정법을 통해 유의한 유전자들을 선별하였다. When thousands of hypotheses are tested simultaneously, the probability of rejecting any true hypotheses increases, and large multiplicity problems are generated. To solve these problems, researchers have proposed different approaches to multiple testing methods, considering family-wise error rate (FWER), false discovery rate (FDR) or false nondiscovery rate (FNR) as a type I error and some test statistics. In this article, we discuss Bonferroni (1960), Holm (1979), Benjamini and Hochberg (1995) and Benjamini and Yekutieli (2001) procedures based on T statistics, modified T statistics or local-pooled-error (LPE) statistics. We also consider Sun and Cai (2007) procedure based on Z statistics. These procedures are compared in the simulation and applied to Arabidopsis microarray gene expression data to identify differentially expressed genes.
    False Discovery Rate
    Bonferroni correction
    Multiple comparisons problem
    Word error rate
    False positive rate
    This paper is a review of the popular Benjamini Hochberg Method and other related useful methods of Multiple Hypothesis testing. This is written with the purpose of serving a short but complete easy to understand review of the main article with proper background. The paper titled 'Controlling the False Discovery Rate-a practical and powerful Approach to multiple Testing' by benjamini et. al.[1] proposes a new framework of controlling the False Discovery Rate in a Multiple Hypothesis testing problem. It has been claimed that the procedure proposed in the paper results in a substantial gain in power more applicable in case of problems which call for False discovery rate (FDR) control rather than Familywise Error Rate (FWER). The proposed method uses a simple Bonferroni type procedure for FDR control.
    False Discovery Rate
    Bonferroni correction
    Multiple comparisons problem
    Word error rate
    Citations (0)
    Abstract Missing or inaccurate diagnoses in biobank datasets can reduce the power of human genetic association studies. We present a machine-learning framework (MILTON) that utilizes the wealth of phenotypic information available in a biobank dataset to identify undiagnosed individuals within the cohort who have biomarker profiles similar to those of positively diagnosed cases. We applied MILTON to perform an augmented phenome-wide association study (PheWAS) based on 405,703 whole exome sequencing samples from UK Biobank, resulting in improved signals for known (p<1×10 −8 ) gene-disease relationships alongside 206 novel gene-disease relationships that only achieved genome-wide significance upon using MILTON. To further validate these putatively novel discoveries, we adopt two orthogonal machine learning methods that prioritise gene-disease relationships using comprehensive publicly available datasets alongside a biological insights knowledge graph. For additional clinical translation utility, MILTON outputs a disease-specific biomarker set per disease as well as comorbidity clusters across ICD10 disease codes based on shared biomarker profiles of positively labelled cases. All the extracted associations and biomarker importance results for the 3,308 studied binary traits will be made available via an interactive web-portal.
    Phenome
    Biomarker Discovery
    Genome-wide Association Study
    Exome
    Large biobanks linking phenotype to genotype have led to an explosion of genetic association studies across a wide range of phenotypes. Sharing the knowledge generated by these resources with the scientific community remains a challenge due to patient privacy and the vast amount of data. Here we present Global Biobank Engine (GBE), a web-based tool that enables the exploration of the relationship between genotype and phenotype in large biobank cohorts, such as the UK Biobank. GBE supports browsing for results from genome-wide association studies, phenome-wide association studies, gene-based tests, and genetic correlation between phenotypes. We envision GBE as a platform that facilitates the dissemination of summary statistics from biobanks to the scientific and clinical communities. GBE currently hosts data from the UK Biobank and can be found freely available at biobankengine.stanford.edu .
    Phenome
    Biorepository
    Citations (6)
    Large biobanks linking phenotype to genotype have led to an explosion of genetic association studies across a wide range of phenotypes. Sharing the knowledge generated by these resources with the scientific community remains a challenge due to patient privacy and the vast amount of data. Here, we present Global Biobank Engine (GBE), a web-based tool that enables exploration of the relationship between genotype and phenotype in biobank cohorts, such as the UK Biobank. GBE supports browsing for results from genome-wide association studies, phenome-wide association studies, gene-based tests and genetic correlation between phenotypes. We envision GBE as a platform that facilitates the dissemination of summary statistics from biobanks to the scientific and clinical communities.GBE currently hosts data from the UK Biobank and can be found freely available at biobankengine.stanford.edu.
    Phenome
    Biorepository
    In this paper, it is put forward that the task of designing a procedure for a set of multiple comparisons should be considered as a decision-making under uncertainty. Due to this motivation, for the problem of multiple comparisons, we considered another error rate to be controlled, called PFER (per-family error rate), which requests that the expected number of false rejections of a test procedure should be bounded no more than a prespecified level k. Although PFER was proposed by Tukey in 1953, there is not much studying about it so far. We first present Bonferroni procedure (single-step) and then build two step-up procedures with one having generic critical values and another using critical values in BH (Benjamini and Hochberg) type. These procedures are compared through simulations.
    Bonferroni correction
    Multiple comparisons problem
    False Discovery Rate
    Word error rate
    Bonferroni correction
    Multiple comparisons problem
    False Discovery Rate
    Word error rate
    This paper is a review of the popular Benjamini Hochberg Method and other related useful methods of Multiple Hypothesis testing. This is written with the purpose of serving a short but complete easy to understand review of the main article with proper background. The paper titled 'Controlling the False Discovery Rate-a practical and powerful Approach to multiple Testing' by benjamini et. al.[1] proposes a new framework of controlling the False Discovery Rate in a Multiple Hypothesis testing problem. It has been claimed that the procedure proposed in the paper results in a substantial gain in power more applicable in case of problems which call for False discovery rate (FDR) control rather than Familywise Error Rate (FWER). The proposed method uses a simple Bonferroni type procedure for FDR control.
    False Discovery Rate
    Multiple comparisons problem
    Bonferroni correction
    Word error rate
    Citations (1)
    동시에 여러 개의 가설검정 수행시 귀무가설이 참일 경우 귀무가설을 기각할 확률이 커지는 문제가 발생한다. 이러한 다중검정 문제 해결을 위해 여러 연구에서는 가설검정시 필요한 집단별 오류율(FWER; family-wise error rate), 위발견율 (FDR; false discovery rate) 또는 위비발견율 (FNR; false nondiscovery rate) 과 통계량을 고려하여 검정력을 높이고자 하였다. 본 연구에서는 T 통계량, 수정된 T 통계량, 그리고 LPE (local pooled error) 통계량 기반 P값을 이용한 Bonferroni (1960) 방법, Holm (1979) 방법, Benjamini와 Hochberg (1995) 방법과 Benjamini와 Yekutieli (2001) 방법 그리고 Z 통계량 기반 Sun과 Cai (2007) 방법을 고찰하고 모의실험을 통해 다중검정 능력을 비교하였다. 또한 실제 데이터로 애기장대 유전자 발현 데이터에 대해 여러 가지 다중검정법을 통해 유의한 유전자들을 선별하였다. 【When thousands of hypotheses are tested simultaneously, the probability of rejecting any true hypotheses increases, and large multiplicity problems are generated. To solve these problems, researchers have proposed different approaches to multiple testing methods, considering family-wise error rate (FWER), false discovery rate (FDR) or false nondiscovery rate (FNR) as a type I error and some test statistics. In this article, we discuss Bonferroni (1960), Holm (1979), Benjamini and Hochberg (1995) and Benjamini and Yekutieli (2001) procedures based on T statistics, modified T statistics or local-pooled-error (LPE) statistics. We also consider Sun and Cai (2007) procedure based on Z statistics. These procedures are compared in the simulation and applied to Arabidopsis microarray gene expression data to identify differentially expressed genes.】
    False Discovery Rate
    Bonferroni correction
    Multiple comparisons problem
    Word error rate
    Citations (0)