A comprehensively molecular haplotype-resolved genome of a European individual

2011 
A central goal in biology and medicine is to understand individual genomes, their variation, and how it translates to organismal function, phenotype, and disease. Such knowledge will advance our insights into human individuality and prepare the ground for personalized medicine. Toward this goal, firstly, all existing variants in an individual must be catalogued, including particularly the rare and private ones (Durbin et al. 2010). Secondly, the phase of all variants must be known, that is, their organization into haplotypes, defined as the specific combinations of variants on each of the two chromosomes. Human individuals are diploid, and have about four million genetic variants on average (Durbin et al. 2010). Thus, any genes or noncoding functional sequences constituted by two homologous chromosomes can be genetically very different. Whether variant alleles reside on the same chromosome (in cis), or on opposite chromosomes (in trans), is key to understanding their impact on gene function and phenotype. Benzer (1957) demonstrated that different configurations of mutations result in different phenotypes: Two null mutations in cis left the second allele intact, but when in trans, no functional form of the gene was present. Cis versus trans configurations between mutations in cell essential genes and tumor suppressor genes, even megabases (Mb) apart, have been shown to result in profound alterations of cancer phenotype, spectrum, and progression (Biggs et al. 2003; Wang et al. 2010). Thus, identical genotypes may, depending on their phase, involve different clinical interpretations of potentially tremendous personal impact. Moreover, allele-specific expression (ASE) has been found common among autosomal genes (Knight 2004; Palacios et al. 2009) and related to a spectrum of diseases (de la Chapelle 2009) including cancer (Chen et al. 2008; Valle et al. 2008) and neurodevelopmental disorders (Chamberlain and Lalande 2010). This indicates the global importance of molecular diplotypes for the biology of genes and genomes, phenotype, and health and disease. A functional interpretation of phase information has been proposed earlier (Hoehe et al. 2000; Hoehe 2003) and recently been increasingly recognized (Levy et al. 2007; Tewhey et al. 2011). Ultimately, genetic variation can only be understood from phase. However, human genome sequencing has for the most part been “phase-insensitive,” that is, generating “haploid composites” (Lander et al. 2001; Venter et al. 2001), partly because molecular genetic techniques to separate haplotypes have remained too cost and labor intensive, restricted in resolution, or not easily scalable to whole genome analysis (Zhang et al. 2006; Ma et al. 2010). Therefore, haplotypes are commonly inferred from population genotypic data by statistical methods (Stephens and Donnelly 2003; Scheet and Stephens 2006). Yet, even the presently most advanced resequencing-based population data source (Durbin et al. 2010) cannot predict the phase of rare or individual-specific variants, which potentially play an important role in complex human disease and individualized medicine (Cirulli and Goldstein 2010; McClellan and King 2010). Recently, whole genome sequencing-based approaches have been undertaken to haplotype-resolve individual genomes (Levy et al. 2007; Wang et al. 2008; McKernan et al. 2009; Kitzman et al. 2011) but remain restricted in extent and scope. In this work, we aimed to perform a first systematic and comprehensive assessment of individual molecular haplotype architecture as it constitutes the biology of genes and the genome in a diploid human. To this end we haplotype-resolved a European individual, “Max Planck One” (MP1), to an extremely high degree of completeness, exceeding previous efforts in terms of both numbers of variants and length of contigs phased (Levy et al. 2007; Kitzman et al. 2011). We applied a fosmid pool-based next generation sequencing (NGS) approach developed in direct continuation of our previously described fosmid pool-based molecular haplotyping approach (Burgtorf et al. 2003); a similar method was described recently (Kitzman et al. 2011). The completeness of our phasing allowed determination of the molecular haplotype pairs for 81% of all autosomal protein-coding genes including upstream sequences of up to ∼5.7 Mb in length. It also allowed entire genomic regions to be separated into their underlying “haploid landscapes” extending up to ∼6.3 Mb. The diplotypic nature of genes, upstream and coding sequences, and extended genomic regions was seen to be both substantial and global. This highlights the importance of phase for genome biology in defining the functionally active transcriptome and ultimately proteome, and the indispensability of phase information for personal genome analysis. To gain first insights into the importance of phase for gene function, disease predisposition and clinical applications, we identified and annotated a set of 159 genes with two or more potentially significant protein-altering mutations in either cis or trans. To further advance the field, we provide the annotated molecular haploid genomes of MP1 as an easily browsable haplotype-resolved human genome reference sequence to the scientific community.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    67
    References
    74
    Citations
    NaN
    KQI
    []