Jie Zheng

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

Lipids, obesity and gallbladder disease in women: insights from genetic studies using the cardiovascular gene-centric 50K SNP array

European Journal of Human Genetics (2015)

Santiago Rodrı́guez Tom R. Gaunt Yiran Guo Jie Zheng Michael R. Barnes

Gallbladder disease (GBD) has an overall prevalence of 10–40% depending on factors such as age, gender, population, obesity and diabetes, and represents a major economic burden. Although gallstones are composed of cholesterol by-products and are associated with obesity, presumed causal pathways remain unproven, although BMI reduction is typically recommended. We performed genetic studies to discover candidate genes and define pathways involved in GBD. We genotyped 15 241 women of European ancestry from three cohorts, including 3216 with GBD, using the Human cardiovascular disease (HumanCVD) BeadChip containing up to ~53 000 single-nucleotide polymorphisms (SNPs). Effect sizes with P-values for development of GBD were generated. We identify two new loci associated with GBD, GCKR rs1260326:T>C (P=5.88 × 10−7, ß=−0.146) and TTC39B rs686030:C>A (P=6.95x10−7, ß=0.271) and detect four independent SNP effects in ABCG8 rs4953023:G>A (P=7.41 × 10−47, ß=0.734), ABCG8 rs4299376:G>T (P=2.40 × 10−18, ß=0.278), ABCG5 rs6544718:T>C (P=2.08 × 10−14, ß=0.044) and ABCG5 rs6720173:G>C (P=3.81 × 10−12, ß=0.262) in conditional analyses taking genotypes of rs4953023:G>A as a covariate. We also delineate the risk effects among many genotypes known to influence lipids. These data, from the largest GBD genetic study to date, show that specific, mainly hepatocyte-centred, components of lipid metabolism are important to GBD risk in women. We discuss the potential pharmaceutical implications of our findings.

SNP

Gallbladder disease

10.1038/ejhg.2015.63

Cite

Citations (30)

A two‐center randomized controlled trial of a repairing mask as an adjunctive treatment for mild to moderate rosacea

Journal of Cosmetic Dermatology (2024)

Liwei Wang Yiyi Zhang Lihong Chen Dengfeng Yuan Xiamei Feng

Abstract Objective To investigate the efficacy and safety of a repairing mask as an adjunctive treatment for skin barrier maintenance of mild to moderate rosacea. Methods Patients with rosacea were recruited in this dual center randomized controlled trial from November 2019 to December 2021. A total of 64 patients were included and randomized into two groups at a ratio of 3:1 into a mask group ( n = 47) and a control group ( n = 17). Patients in the mask group received treatment with Dr. Yu Centella asiatica repairing facial mask three times weekly for a duration of 6 weeks. All participants were instructed to continue their regimen of 50 mg oral minocycline twice daily and to apply Dr. Yu Intensive Hydrating Soft Cream twice daily. The primary endpoint of this study was the Investigator Global Assessment (IGA) score. Results A total of 54 patients completed this trial, with 41 in the mask group and 13 in the control group. After using this facial mask for 3 and 6 weeks, the IGA, facial skin dryness, facial flushing, and severity of skin lesion in the mask group showed significantly improvement ( p < 0.05). Moreover, the change in the delta degree of skin flushing was significantly higher than that in the control group ( p = 0.037). Throughout the study, no adverse events were reported in either group of participants. Conclusion The Dr. Yu Centella asiatica repairing facial mask, as an adjunctive treatment of rosacea, appears to effectively repair and protect the skin barrier, alleviate cutaneous symptoms of rosacea, and is both efficacious and safe for patient use.

Adjunctive treatment

10.1111/jocd.16413

Cite

Citations (0)

Sex difference and socioeconomic inequity in hypertension: a national survey study of 98,658 adults from 162 study sites (Preprint)

JMIR Public Health and Surveillance (2024)

Xiaoyun Zhang Siyu Wang Qianqian Yang Ruizhi Zheng Long Wang

Sex differences in blood pressure (BP) levels and hypertension are important and the role of socioeconomic status (SES) in sex differences of hypertension remains unclear.

Preprint

Cross-sectional study

10.2196/63144

Cite

Citations (1)

Genetic Evidence for Causal Associations of Sarcopenia with Cardiometabolic Disease And Alzheimer's Disease and the Mediating Role of Insulin Resistance

SSRN Electronic Journal (2022)

Chaojie Ye Lijie Kong Yiying Wang Min Xu Jie Zheng

Background: The causal influence of sarcopenia on major cardiometabolic diseases and Alzheimer’s disease and whether and to what extent insulin resistance plays a mediation role remains unclear.Methods: We performed two-step, two-sample Mendelian randomization analyses, applying genetic instruments of grip strength, appendicular lean mass [ALM], and whole body lean mass [WBLM] from genome-wide association studies (GWASs) in UK Biobank (up to 461,026 participants) to examine their causal associations with six cardiometabolic diseases and Alzheimer’s disease extracted from reliable European-descent GWASs and to assess the proportions of the causal effects mediated by insulin resistance. Insulin resistance was estimated by fasting insulin based on GWAS from the Meta-Analyses of Glucose and Insulin-related traits Consortium (151,013 European participants).Findings: Each 1-SD lower genetically determined grip strength, ALM, and WBLM were associated with higher risks of diabetes (20%-57%), non-alcoholic fatty liver disease [NAFLD] (33%-130%), hypertension (12%-32%), coronary heart disease [CHD] (20%-42%), myocardial infarction [MI] (18%-45%), small vessel stroke (25%-29%), and Alzheimer’s disease (10%-28%). Insulin resistance mediated 10%-25% of the effect of grip strength and 7%-28% of the effect of ALM on diabetes, NAFLD, hypertension, CHD, and MI. The direct effect of WBLM on diabetes diminished towards null with adjustment for insulin resistance. The robustness of all causal findings from the random-effect inverse-variance weighted method was validated by several sensitivity analyses.Interpretation: Sarcopenia, measured by low muscle strength and muscle mass, was causally associated with higher risks of major cardiometabolic diseases and Alzheimer’s disease, with insulin resistance mediating a substantial proportion of sarcopenia-related cardiometabolic risk.Funding Information: This work was supported by the grants from the National Natural Science Foundation of China (82022011, 81970706, 82088102, 81970728, 81941017), the Chinese Academy of Medical Sciences (2018PT32017, 2019PT330006), the “Shanghai Municipal Education Commission–Gaofeng Clinical Medicine Grant Support” from Shanghai Jiao Tong University School of Medicine (20171901 Round 2), the Shanghai Shenkang Hospital Development Center (SHDC12019101, SHDC2020CR1001A, SHDC2020CR3064B), the Shanghai Jiao Tong University School of Medicine (DLY201801), and the Ruijin Hospital (2018CR002).Declaration of Interests: The authors declare no competing interests.Ethics Approval Statement: All the summary-level genome-wide association study (GWAS) data used in the analyses are publicly available, and therefore ethical approval was not imperative for this study. Ethical approval for the GWASs can be found in the corresponding GWAS publications cited in the manuscript.

10.2139/ssrn.4196965

Cite

Citations (0)

A framework for research into continental ancestry groups of the UK Biobank

Human Genomics (2022)

Andrei‐Emil Constantinescu Ruth E. Mitchell Jie Zheng Caroline J. Bull Nicholas J. Timpson

Abstract Background The UK Biobank is a large prospective cohort, based in the UK, that has deep phenotypic and genomic data on roughly a half a million individuals. Included in this resource are data on approximately 78,000 individuals with “non-white British ancestry.” While most epidemiology studies have focused predominantly on populations of European ancestry, there is an opportunity to contribute to the study of health and disease for a broader segment of the population by making use of the UK Biobank’s “non-white British ancestry” samples. Here, we present an empirical description of the continental ancestry and population structure among the individuals in this UK Biobank subset. Results Reference populations from the 1000 Genomes Project for Africa, Europe, East Asia, and South Asia were used to estimate ancestry for each individual. Those with at least 80% ancestry in one of these four continental ancestry groups were taken forward ( N = 62,484). Principal component and K-means clustering analyses were used to identify and characterize population structure within each ancestry group. Of the approximately 78,000 individuals in the UK Biobank that are of “non-white British” ancestry, 50,685, 6653, 2782, and 2364 individuals were associated to the European, African, South Asian, and East Asian continental ancestry groups, respectively. Each continental ancestry group exhibits prominent population structure that is consistent with self-reported country of birth data and geography. Conclusions Methods outlined here provide an avenue to leverage UK Biobank’s deeply phenotyped data allowing researchers to maximize its potential in the study of health and disease in individuals of non-white British ancestry.

10.1186/s40246-022-00380-5

Cite

Citations (19)

Decision letter: A Mendelian randomization study of the role of lipoprotein subfractions in coronary artery disease

David Sullivan Jie Zheng

Mendelian Randomization

Mendelian inheritance

10.7554/elife.58361.sa1

Cite

Citations (0)

Author response: The MR-Base platform supports systematic causal inference across the human phenome

Gibran Hemani Jie Zheng Benjamin Elsworth Kaitlin H. Wade Valeriia Haberland

Article Figures and data Abstract eLife digest Introduction Results Discussion Materials and methods References Decision letter Author response Article and author information Metrics Abstract Results from genome-wide association studies (GWAS) can be used to infer causal relationships between phenotypes, using a strategy known as 2-sample Mendelian randomization (2SMR) and bypassing the need for individual-level data. However, 2SMR methods are evolving rapidly and GWAS results are often insufficiently curated, undermining efficient implementation of the approach. We therefore developed MR-Base (http://www.mrbase.org): a platform that integrates a curated database of complete GWAS results (no restrictions according to statistical significance) with an application programming interface, web app and R packages that automate 2SMR. The software includes several sensitivity analyses for assessing the impact of horizontal pleiotropy and other violations of assumptions. The database currently comprises 11 billion single nucleotide polymorphism-trait associations from 1673 GWAS and is updated on a regular basis. Integrating data with software ensures more rigorous application of hypothesis-driven analyses and allows millions of potential causal relationships to be efficiently evaluated in phenome-wide association studies. https://doi.org/10.7554/eLife.34408.001 eLife digest Our health is affected by many exposures and risk factors, including aspects of our lifestyles, our environments, and our biology. It can, however, be hard to work out the causes of health outcomes because ill-health can influence risk factors and risk factors tend to influence each other. To work out whether particular interventions influence health outcomes, scientists will ideally conduct a so-called randomized controlled trial, where some randomly-chosen participants are given an intervention that modifies the risk factor and others are not. But this type of experiment can be expensive or impractical to conduct. Alternatively, scientists can also use genetics to mimic a randomized controlled trial. This technique – known as Mendelian randomization – is possible for two reasons. First, because it is essentially random whether a person has one version of a gene or another. Second, because our genes influence different risk factors. For example, people with one version of a gene might be more likely to drink alcohol than people with another version. Researchers can compare people with different versions of the gene to infer what effect alcohol drinking has on their health. Every day, new studies investigate the role of genetic variants in human health, which scientists can draw on for research using Mendelian randomization. But until now, complete results from these studies have not been organized in one place. At the same time, statistical methods for Mendelian randomization are continually being developed and improved. To take advantage of these advances, Hemani, Zheng, Elsworth et al. produced a computer programme and online platform called "MR-Base", combining up-to-date genetic data with the latest statistical methods. MR-Base automates the process of Mendelian randomization, making research much faster: analyses that previously could have taken months can now be done in minutes. It also makes studies more reliable, reducing the risk of human error and ensuring scientists use the latest methods. MR-Base contains over 11 billion associations between people's genes and health-related outcomes. This will allow researchers to investigate many potential causes of poor health. As new statistical methods and new findings from genetic studies are added to MR-Base, its value to researchers will grow. https://doi.org/10.7554/eLife.34408.002 Introduction Inferring causal relationships between phenotypes is a major challenge and has important implications for understanding the aetiology of disease processes. The potential for phenome-wide causal inference has increased markedly over the past 10 years due to two major advances. The first is the continuing success of large scale genome-wide association studies (GWAS) in identifying robust genetic associations (Visscher et al., 2017). The second is the development of statistical methods for causal inference that exploit the principles of Mendelian randomization (MR) using GWAS summary data (Davey Smith and Ebrahim, 2003; Davey Smith and Hemani, 2014; Zhu et al., 2016; Pierce and Burgess, 2013). Genetic data for MR can, however, be difficult to access, while MR methods are evolving rapidly and can be difficult to implement for non-specialists. To address the need for more systematic curation and application of complete GWAS summary data and MR methods, we have developed MR-Base (http://www.mrbase.org): a platform that integrates a database of thousands of GWAS summary datasets with a web interface and R packages for automated causal inference through MR. Following an extended introduction on the uses and sources of GWAS summary data, and the principles and assumptions behind MR, we describe how to implement MR analyses using MR-Base, how to interpret results and provide a thorough overview of potential limitations. In an applied example, we demonstrate the functionality of MR-Base through an MR study of low density lipoprotein (LDL) cholesterol and coronary heart disease (CHD). We also demonstrate how the integration achieved by MR-Base supports a wide range of applications, including phenome-wide association studies (PheWAS) to identify potential sources of horizontal pleiotropy, and for performing hypothesis-free MR to gain insight into impacts of interventions. These applications demonstrate how integrating data and analytical tools enable novel insights that would previously have been technically and practically challenging to achieve. GWAS summary data GWAS summary data, the non-disclosive results from testing the association of hundreds of thousands to millions of genetic variants with a phenotype, have been routinely collected and curated for several years (Welter et al., 2014; Li et al., 2016; Beck et al., 2014) and are a valuable resource for dissecting the causal architecture of complex traits (Pasaniuc and Price, 2017). Accessible GWAS summary data are, however, often restricted to 'top hits', that is, statistically significant results, or tend to be hosted informally in different locations under a wide variety of formats. For other studies, summary data may only be available 'on request' from authors. Complete summary data are currently publicly accessible for thousands of phenotypes but to ensure reliability and efficiency for systematic downstream applications they must be harvested, checked for errors, harmonised and curated into standardised formats. GWAS summary data are useful for a wide variety of applications, including MR, PheWAS (Millard et al., 2015; Denny et al., 2010), summary-based transcriptome-wide (Gusev et al., 2016) and methylome-wide (Richardson et al., 2017; Hannon et al., 2017a) association studies and linkage disequilibrium (LD) score regression (Bulik-Sullivan et al., 2015; Zheng et al., 2017b). Mendelian randomization MR (Davey Smith and Ebrahim, 2003; Davey Smith and Hemani, 2014) uses genetic variation to mimic the design of randomised controlled trials (RCT) (although for interpretive caveats see Holmes et al., 2017). Let us suppose we have a single nucleotide polymorphism (SNP) that is known to influence some phenotype (the exposure). Due to Mendel's laws of inheritance and the fixed nature of germline genotypes, the alleles an individual receives at this SNP are expected to be random with respect to potential confounders and causally upstream of the exposure. In this 'natural experiment', the SNP is considered to be an instrumental variable (IV), and observing an individual's genotype at this SNP is akin to randomly assigning an individual to a treatment or control group in a RCT (Figure 1a). To infer the causal influence of the exposure, one calculates the ratio between the SNP effect on the outcome over the SNP effect on the exposure. If there are many independent IVs available for a particular exposure, as is often the case, causal inference can be strengthened (Johnson, 2012). Here, we consider each SNP to mimic an independent RCT and we can adapt tools developed for meta-analysis (Bowden et al., 2017a) to combine the results obtained from each of the SNPs, giving an overall causal estimate that is better powered (Bowden et al., 2017a). Figure 1 Download asset Open asset Principles and assumptions behind Mendelian randomization. (A) Diagram illustrating the analogy between Mendelian randomization (MR) and a randomised controlled trial. (B) A directed acyclic graph representing the MR framework. Instrumental variable (IV) assumption 1: the instruments must be associated with the exposure; IV assumption 2: the instruments must influence the outcome only through the exposure; IV assumption 3: the instruments must not associate with measured or unmeasured confounders. (C-F) Scatter plots demonstrating the relationship between the instrumental single nucleotide polymorphism (SNP) effects on the exposure against their corresponding effects on the outcome. The slope of the regression is the estimate of the causal effect of the exposure on the outcome. (C) If there is no violation of the IV2 assumption (no horizontal pleiotropy), or the horizontal pleiotropy is balanced, an unbiased causal estimate can be obtained by inverse-variance weighted (IVW) linear regression, where the contribution of each instrumental SNP to the overall effect is weighted by the inverse of the variance of the SNP-outcome effect. Fixed and random effects IVW approaches are available (the slopes from both approaches are identical but the variance of the slope is inflated in the random effects model in the presence of heterogeneity between SNPs). (D) If there is a tendency for the horizontal pleiotropic effect to be in a particular direction, then constraining the slope to go through zero will incur bias (grey line). Egger regression relaxes this constraint by allowing the intercept to pass through a value other than zero, returning an unbiased effect estimate if the instrument-exposure and pleiotropic effects are uncorrelated, also known as the InSIDE (Instrument Strength Independent of Direct Effect) assumption (Bowden et al., 2015). Pleiotropic effect here refers to the effect of the instrument on the outcome that is not mediated by the exposure. (E) If the majority of the instruments are valid (black points), with some invalid instruments (red points), the median based approach will provide an unbiased estimate in the presence of unbalanced horizontal pleiotropy (black line), whereas IVW linear regression will provide a biased estimate (grey line). In addition, the median-based estimator does not require the InSIDE assumption of the Egger approach. (F) If a group of SNPs influences the outcome through a particular pathway other than the exposure (i.e. the SNPs are horizontally pleiotropic) then that group of SNPs will return consistently biased estimates. Clustering SNPs based on their estimates (grey lines) is possible with the mode-based estimator. The cluster with the largest weight (black line) is selected as the final causal estimate. The causal estimate from the mode-based estimator is unbiased if the SNPs contributing to the largest cluster are valid instruments. https://doi.org/10.7554/eLife.34408.003 Crucially, MR can be performed using results from GWAS, in a strategy known as 2-sample MR ( 2SMR) (Pierce and Burgess, 2013). Here, the SNP-exposure effects and the SNP-outcome effects are obtained from separate studies. With these summary data alone, it is possible to estimate the causal influence of the exposure on the outcome. This has the tremendous advantage that causal inference can be made between two traits even if they aren't measured in the same set of samples, enabling us to harness the statistical power of pre-existing large GWAS analyses. Due to the flexibility afforded by the 2SMR strategy, MR can be applied to 1000s of potential exposure-outcome associations, where 'exposure' can be very broadly defined, from gene expression and proteins to more complex traits, such as body mass index and smoking. While MR avoids certain problems of conventional observational studies (Davey Smith and Ebrahim, 2001), it introduces its own set of new problems. MR is predicated on exploiting 'vertical' pleiotropy, where a SNP influences two traits because one trait causes the other (Davey Smith and Hemani, 2014). It is crucial to be aware of the assumptions and limitations that arise due to this model (Haycock et al., 2016). The main assumptions (Figure 1b) are: the instrument associates with the exposure (IV assumption 1); the instrument does not influence the outcome through some pathway other than the exposure (IV assumption 2); and the instrument does not associate with confounders (IV assumption 3). The IV1 assumption is easily satisfied in MR by restricting the instruments to genetic variants that are discovered using genome-wide levels of statistical significance and replicated in independent studies. The other two assumptions are impossible to prove, and, when violated, can lead to bias in MR analyses. Violations of the IV2 assumption can be introduced by 'horizontal' pleiotropy where the SNP influences the outcome through some pathway other than the exposure. Such effects can manifest in various different patterns (Figure 1c–f). When multiple independent instruments are available it is possible to perform sensitivity analyses that attempt to distinguish between horizontal and vertical pleiotropy and return causal estimates adjusted for the former (Bowden et al., 2016a; Bowden et al., 2015; Hartwig et al., 2017b). To improve reliability of causal inference, MR results should be presented alongside sensitivity analyses that make allowance for various potential patterns of horizontal pleiotropy. Further details on the design and interpretation of Mendelian randomization studies can be found in several existing reviews (Davey Smith and Hemani, 2014; Haycock et al., 2016; Swerdlow et al., 2016; Holmes et al., 2017; Zheng et al., 2017a). A glossary of terms can be found in Supplementary file 1F. Model In this section we describe how to use MR-Base to conduct MR analyses (Figure 2). The data required to perform the analysis can be described as a 'summary set' (Hemani et al., 2017a), where the genetic effects for a set of instruments are available for both the exposure and the outcome. To create a summary set we select appropriate instruments, obtain the effect estimates for those instruments for the exposure and the outcome, and harmonise the effects so that they reflect the same allele. We can then perform MR analyses using the summary set. These steps are supported by the database of GWAS results and R packages ('TwoSampleMR' and 'MRInstruments') curated by MR-Base and the following R packages curated by other researchers: 'MendelianRandomization' (Yavorska and Burgess, 2017), 'RadialMR' (Bowden et al., 2017b), 'MR-PRESSO' (Verbanck et al., 2018) and 'mr.raps' (Zhao et al., 2018). The statistical methods and R packages accessible through MR-Base are updated on a regular basis. Figure 2 Download asset Open asset The practical steps for performing 2-sample Mendelian randomization (2SMR), as described in the Model section of the paper. The database of genome-wide association study results and R packages ('TwoSampleMR' and 'MRInstruments') curated by MR-Base support the data extraction, harmonisation and analysis steps required for 2SMR. Additional R packages for MR from other researchers are also accessible, including MendelianRandomization (Yavorska and Burgess, 2017), RadialMR ( Bowden et al., 2017b), MR-PRESSO (Verbanck et al., 2018) and mr.raps (Zhao et al., 2018). The available methods are updated on a regular basis. https://doi.org/10.7554/eLife.34408.004 Obtaining instruments Instruments are characterised as SNPs that reliably associate with the exposure, meaning they should be obtained from well-conducted GWAS, typically involving their detection in a discovery sample at a GWAS threshold of statistical significance (e.g. p<5x10−8) followed by replication in an independent sample. The minimum data requirements for each SNP are effect sizes (βx), standard errors (σx) and effect alleles. Also useful are sample size, non-effect allele and effect allele frequency. Sources There are several data sources that can be used in MR-Base (Figure 3) to define exposure and outcome traits (the number of traits is updated on a regular basis): Figure 3 Download asset Open asset The data available through MR-Base and the possible exposure-outcome analyses that can be performed. Exposure traits can very broadly defined and may include molecular traits like gene expression, DNA-methylation, metabolites and proteins, as well as more complex traits, including cholesterol, body mass index, smoking and education. Further details on the traits with complete summary data can be found in Supplementary file 1A. The numbers reflect MR-Base in December 2017 and are updated on a regular basis. https://doi.org/10.7554/eLife.34408.005 The MR-Base database comprises complete GWAS summary data for hundreds of traits (Figure 3 and Supplementary file 1A). By 'complete' we mean all SNPs reported in a GWAS analysis, with no exclusions on the basis of a p-value threshold for association with the target trait of interest. It is possible for the user to extract the top-hits from this data source using their own criteria (e.g. strength of p-value). Alternatively, potential instruments can be obtained from the MRInstruments package, which includes independent SNP-trait associations from the database with p-value < 5e-8. Quantitative trait loci (QTL) studies performed on DNA methylation (Gaunt et al., 2016), gene expression (GTEx Consortium, 2015), protein (Deming et al., 2016) and metabolite (Shin et al., 2014; Kettunen et al., 2016) levels generate hundreds to thousands of independent associations for thousands of traits. The MRInstruments R package contains hundreds of thousands of 'omic QTLs for ease of use within MR-Base. The NHGRI-EBI GWAS catalog (Welter et al., 2014) comprises 21,324 SNPs associated with 1628 complex traits and diseases. This list of potential instruments has been harmonised and formatted for ease of use within MR-Base within the MRInstruments R package. User provided data can also be used for analysis. Independence It is important to ensure that instruments selected for an exposure are independent, unless measures are taken in the MR analysis to account for any correlation structures that arise through linkage disequilibrium. An efficient way to ensure that instruments are independent is to use clumping against a reference dataset of similar ancestry to the samples in which the GWAS was conducted. A clumping procedure has been implemented in MR-Base to automate the generation of independent instruments. Obtaining SNP effects on the outcome In order to generate the summary set, the effects of each of the instruments on the outcome need to be obtained. This typically requires access to the entire set of GWAS results because it is unlikely that the instrumental SNPs for the exposure will be amongst the top hits of the outcome GWAS. As with the exposure data, the outcome data must contain at a minimum the SNP effects (βy), their standard errors (σy) and effect alleles. LD proxies If a particular SNP is not present in the outcome dataset then it is possible to use SNPs that are LD 'proxies' instead. Here, it is important to ensure that for any LD proxy used, the surrogate effect allele is the one in phase with the effect allele of the original target SNP. LD proxy lookups are automatically provided by MR-Base. Sources There are two main sources that can be used (Figure 3): The MR-Base database comprises complete GWAS summary data for hundreds of traits (Supplementary file 1A). Fast lookups for specific SNPs against specific traits can be performed. If a requested SNP is absent, then MR-Base automatically searches for LD proxies, estimated using data from the 1000 genomes project (1000 Genomes Project Consortium et al., 2015), and returns the corresponding data for the best proxy (Figure 2). User provided complete GWAS summary data can be used with the R package. Harmonising exposure and outcome SNP effects To generate a summary set, for each SNP we need its effect and standard error on the exposure and the outcome corresponding to the same effect alleles (Hartwig et al., 2016). This is impossible to generate if the effect alleles for the SNP effects in the exposure and outcome datasets are unknown. MR-Base uses knowledge of the effect alleles, and where necessary the effect allele frequencies, to automatically harmonise the exposure and outcome datasets. The following scenarios are considered: Wrong effect alleles A SNP with (for example) effect/non-effect alleles G/T for the exposure and T/G for the outcome are harmonised by flipping the sign of the SNP-outcome effect. Strand issues SNPs that are reported as (for example) G/T for the exposure summary dataset and C/A for the outcome dataset indicate a strand issue, where, for example, one study has reported the effect on the forward strand and the other on the reverse strand. In this case, the outcome alleles are flipped to match those of the exposure alleles, and effect alleles are then aligned. Palindromic SNPs SNPs with A/T or G/C alleles are known as palindromic SNPs, because their alleles are represented by the same pair of letters on the forward and reverse strands, which can introduce ambiguity into the identity of the effect allele in the exposure and outcome GWASs. If reference strands are unknown, effect allele frequency can be used to resolve the ambiguity. For example, consider a SNP with alleles A and T, with a frequency of 0.11 for allele A in the exposure study and 0.91 in the outcome study. In addition, both studies have coded allele A as the effect allele and both are of European origin. The fact that allele A is the minor allele in the exposure study (frequency<0.5) and the major allele (>0.5) in the outcome study implies that the two studies have used different reference strands. To ensure that the effect sizes for the SNP reflect the same allele it is therefore necessary to switch the direction of the effect in either the exposure or outcome study (the default in MR-Base is to flip the direction of effect in the outcome study). Effect allele frequency may not, however, be a reliable indicator of reference strand when it is close to 0.5. This process has been described in more detail previously (Hartwig et al., 2016). Incompatible alleles If a SNP has (for example) A/G alleles for the exposure and A/C alleles for the outcome, there is no combination of flipping that can reconcile these differences, and either there are build differences or there is an error in the data. In this instance the SNP is excluded from the analysis. Performing MR analysis The generated summary set can now be analysed using a range of methods (summarised in Supplementary file 1B but new methods are added on a regular basis). The most basic way to combine these data is to use a Wald ratio where the estimated causal effect is βMR=βyβx and the standard error of the estimate is σMR=σyβx If there are multiple independent instruments for the exposure (as is typically the case for complex traits with well-powered GWAS), then our analysis can potentially improve in two major ways: first, the variance explained in the exposure, and therefore statistical power will improve; second, we can evaluate the sensitivity of the estimate to bias arising from violations of the IV2 assumption by assuming different patterns of horizontal pleiotropy. Sensitivity analyses are performed automatically by MR-Base. Inverse variance weighted MR The simplest way to obtain an MR estimate using multiple SNPs is to perform an inverse variance weighted (IVW) meta analysis of each Wald ratio (Johnson, 2012), effectively treating each SNP as a valid natural experiment. Fixed effects IVW assumes that each SNP provides the same estimate or, in other words, none of the SNPs exhibit horizontal pleiotropy (or other violations of assumptions). Random effects IVW relaxes this assumption, allowing each SNP to have different mean effects, e.g. due to horizontal pleiotropy (Bowden et al., 2017a). This will return an unbiased estimate if the horizontal pleiotropy is balanced, i.e. the deviation from the mean estimate is independent from all other effects. Another way to conceptualise this result is as a weighted regression of the SNP-exposure effects against the SNP-outcome effects, with the regression constrained to pass through the origin, and with weights derived from the inverse of the variance of the outcome effects. MR-Base implements a random effects IVW model by default, unless there is underdispersion in the causal estimates between SNPs, in which case a fixed effects model is used. The estimates from the random and fixed effects IVW models are the same but the variance for the random effects model is inflated to take into account heterogeneity between SNPs. Maximum likelihood An alternative strategy to the IVW approach is to estimate the causal effect by direct maximisation of the likelihood given the SNP-exposure and SNP-outcome effects and assuming a linear relationship between the exposure and outcome (Pierce and Burgess, 2013). Similar to the fixed effects IVW approach, the method assumes that the effect of the exposure on the outcome due to each SNP is the same, i.e. assumes there is no heterogeneity or horizontal pleiotropy. An unbiased estimate will be returned in the absence of horizontal pleiotropy or when horizontal pleiotropy is balanced (but the variance of the effect estimate will be overly precise in the latter case). An advantage of the method is that it may provide more reliable results in the presence of measurement error in the SNP-exposure effects. MR Egger analysis Relaxing the IV2 assumption of 'no horizontal pleiotropy', MR-Egger (Bowden et al., 2015; Bowden et al., 2016b) adapts the IVW analysis by allowing a non-zero intercept, allowing the net-horizontal pleiotropic effect across all SNPs to be unbalanced, or directional. The method returns an unbiased causal effect even if the IV2 assumption is violated for all SNPs but assumes that the horizontal pleiotropic effects are not correlated with the SNP-exposure effects (this is known as the InSIDE assumption). Horizontal pleiotropy refers to the effects of the SNPs on the outcome not mediated by the exposure. Median-based estimator An alternative approach is to take the median effect of all available SNPs (Bowden et al., 2016a; Kang et al., 2014). This has the advantage that only half the SNPs need to be valid instruments (i.e. exhibiting no horizontal pleiotropy, no association with confounders, robust association with the exposure) for the causal effect estimate to be unbiased. The weighted median estimate allows stronger SNPs to contribute more towards the estimate, and can be obtained by weighting the contribution of each SNP by the inverse variance of its association with the outcome. Mode-based methods The mode-based estimator clusters the SNPs into groups based on similarity of causal effects, and returns the causal effect estimate based on the cluster that has the largest number of SNPs (Hartwig et al., 2017b). The mode-based method returns an unbiased causal effect if the SNPs within the largest cluster are valid instruments. Clustering is performed using a kernel density function that requires selecting a bandwidth parameter. The weighted mode introduces an extra element similar to IVW and the weighted median, weighting each SNP's contribution to the clustering by the inverse variance of its outcome effect. Diagnostics and sensitivity analyses It is recommended that the methods described above are applied to all MR analyses and presented in publications to demonstrate sensitivity to different patterns of assumption violations. MR-Base also automatically performs the following further sensitivity analyses and diagnostics Heterogeneity tests Heterogeneity in causal effects amongst instruments is an indicator of potential violations of IV assumptions (Bowden et al., 2017a). Heterogeneity can be calculated for the IVW and Egger estimates, and this can be used to navigate between models of horizontal pleiotropy (Bowden et al., 2017a). Leave-one-out analysis To evaluate if the MR estimate is driven or biased by a single SNP that might have a particularly large horizontal pleiotropic effect, we can re-estimate the effect by sequentially dropping one SNP at a time. Identifying SNPs that, when dropped, lead to a dramatic change in the estimate can be informative about the sensitivity of the estimate to outliers. Funnel plots A tool used in meta-analysis is the funnel plot in which the estimate for a particular SNP is plotted against its precision (Sterne et al., 2011). Asymmetry in the funnel plot may be indicative of violations of the IV2 assumption through horizontal pleiotropy. Other MR analysis methods In addition to the above, MR-Base also supports access to the following statistical methods

Phenome

Base (topology)

10.7554/elife.34408.012

Cite

Citations (50)

Evaluating the efficacy and mechanism of metformin targets on reducing Alzheimer’s disease risk in the general population: a Mendelian randomization study

medRxiv (Cold Spring Harbor Laboratory) (2022)

Jie Zheng Min Xu Venexia Walker Jinqiu Yuan Roxanna Korologou‐Linden

Abstract Aims/hypothesis Metformin use has been associated with reduced incident dementia in diabetic patients in observational studies. However, the causality between the two in the general population is unclear. This study uses Mendelian randomization (MR) to investigate the causal effect of metformin targets on Alzheimer’s disease (AD) and potential causal mechanisms in the brain linking the two. Methods Genetic proxies for the effects of metformin drug targets were identified as variants in the gene for the corresponding target that associated with HbA 1c level (N=344,182) and expression level of the corresponding gene (N≤31,684). The cognitive outcomes were derived from genome-wide association studies comprising of 527,138 middle-aged Europeans, including 71,880 AD or AD-by-proxy patients. MR estimates representing lifelong metformin use on AD and cognitive function in the general population were generated. Effect of expression level of 22 metformin-related genes in brain cortex (N=6,601 donors) on AD was further estimated. Results Genetically proxied metformin use equivalent to a 6.75 mmol/mol (1.09%) reduction of HbA 1c was associated with 4% lower odds of AD (odds ratio [OR]=0.964, 95%CI=0.982∼0.946, P=1.06×10 −4 ) in non-diabetic individuals. One metformin target, mitochondrial complex 1 (MCI), showed a robust effect on AD (OR=0.88, P=4.73×10 −4 ) that was independent of AMPK. MR of expression in brain cortex tissue showed that decreased MCI-related gene, NDUFA2 , expression was associated with reduced AD risk (OR=0.95, P=4.64×10 −4 ) and less cognitive decline. Conclusion/interpretation Metformin use is likely to cause reduced AD risk in the general population. Mitochondrial function and the NDUFA2 gene are likely mechanisms of action in dementia protection. Research in context What is already known about this subject Metformin is an anti-diabetic drug with repurposing potential for dementia prevention. In a search of PubMed, Embase and clinicaltrials.gov , a few observational studies suggested the association of metformin use with reduced dementia incidence in diabetic patients What is the key question? What is the effect of genetically proxied metformin use on Alzheimer’s disease (AD) and cognitive function in the general population, especially for those without diabetes? Is the causal role between the two at least partly influenced by mechanisms in the brain? What are the new findings? In a Mendelian randomization analysis of over 527,138 individuals (71,880 AD or AD-by-proxy cases), genetically proxied metformin use equivalent to a 6.75 mmol/mol (1.09%) reduction of HbA 1c was associated with 14% lower odds of AD (odds ratio=0.86), where mitochondrial complex I is a key effect modifier. Expression level of a mitochondrial complex I related gene, NDUFA2 , showed an effect on reducing AD risk and less cognitive decline in brain. How might this impact on clinical practice in the foreseeable future? Our study predicts the efficacy of metformin on reducing AD risk and reducing cognitive decline in the general population, especially for those without diabetes. Mitochondrial function and a mitochondrial related gene, NDUFA2 , could be considered as a novel drug target for dementia prevention. Graphical abstract Tweet Effect of metformin targets reduced 4% of Alzheimer’s disease risk in non-diabetic individuals. @oldz84 @tomgaunt @mendel_random @mrc_ieu

Mendelian Randomization

10.1101/2022.04.09.22273625

Cite

Citations (4)

Transcriptome-wide Mendelian randomization study prioritising novel tissue-dependent genes for glioma susceptibility

Scientific Reports (2021)

Jamie Robinson Richard M. Martin Spiridon Tsavachidis Amy Howell Caroline L. Relton

Abstract Genome-wide association studies (GWAS) have discovered 27 loci associated with glioma risk. Whether these loci are causally implicated in glioma risk, and how risk differs across tissues, has yet to be systematically explored. We integrated multi-tissue expression quantitative trait loci (eQTLs) and glioma GWAS data using a combined Mendelian randomisation (MR) and colocalisation approach. We investigated how genetically predicted gene expression affects risk across tissue type (brain, estimated effective n = 1194 and whole blood, n = 31,684) and glioma subtype (all glioma (7400 cases, 8257 controls) glioblastoma (GBM, 3112 cases) and non-GBM gliomas (2411 cases)). We also leveraged tissue-specific eQTLs collected from 13 brain tissues (n = 114 to 209). The MR and colocalisation results suggested that genetically predicted increased gene expression of 12 genes were associated with glioma, GBM and/or non-GBM risk, three of which are novel glioma susceptibility genes ( RETREG2/FAM134A, FAM178B and MVB12B/FAM125B ). The effect of gene expression appears to be relatively consistent across glioma subtype diagnoses. Examining how risk differed across 13 brain tissues highlighted five candidate tissues (cerebellum, cortex, and the putamen, nucleus accumbens and caudate basal ganglia) and four previously implicated genes ( JAK1 , STMN3 , PICK1 and EGFR ). These analyses identified robust causal evidence for 12 genes and glioma risk, three of which are novel. The correlation of MR estimates in brain and blood are consistently low which suggested that tissue specificity needs to be carefully considered for glioma. Our results have implicated genes yet to be associated with glioma susceptibility and provided insight into putatively causal pathways for glioma risk.

Mendelian Randomization

Genome-wide Association Study

10.1038/s41598-021-82169-5

Cite

Citations (12)

Genetic predictors of participation in optional components of UK Biobank

bioRxiv (Cold Spring Harbor Laboratory) (2020)

Jessica Tyrrell Jie Zheng Robin N. Beaumont Kathryn Hinton Tom G. Richardson

Abstract Large studies (e.g. UK Biobank) are increasingly used for GWAS and Mendelian randomization (MR) studies. Selection into and dropout from studies may bias genetic and phenotypic associations. We examine genetic factors affecting participation in four optional components in up to 451,306 UK Biobank participants. We used GWAS to identify genetic variants associated with participation, MR to estimate effects of phenotypes on participation, and genetic correlations to compare participation bias across different studies. 32 variants were associated with participation in one of the optional components ( P <6×10 -9 ), including loci with known links to intelligence and Alzheimer’s disease. Genetic correlations demonstrated that participation bias was common across studies. MR showed that longer educational duration, older menarche and taller stature increased participation, whilst higher levels of adiposity, dyslipidaemia, neuroticism, Alzheimer’s and schizophrenia reduced participation. Our effect estimates can be used for sensitivity analysis to account for selective participation biases in genetic or non-genetic analyses.

Mendelian Randomization

Genome-wide Association Study

10.1101/2020.02.10.941328

Cite

Citations (18)