Underrepresented populations are often excluded from genomic studies owing in part to a lack of resources supporting their analyses. The 1000 Genomes Project (1kGP) and Human Genome Diversity Project (HGDP), which have recently been sequenced to high coverage, are valuable genomic resources because of the global diversity they capture and their open data sharing policies. Here, we harmonized a high-quality set of 4094 whole genomes from 80 populations in the HGDP and 1kGP with data from the Genome Aggregation Database (gnomAD) and identified over 153 million high-quality SNVs, indels, and SVs. We performed a detailed ancestry analysis of this cohort, characterizing population structure and patterns of admixture across populations, analyzing site frequency spectra, and measuring variant counts at global and subcontinental levels. We also show substantial added value from this data set compared with the prior versions of the component resources, typically combined via liftOver and variant intersection; for example, we catalog millions of new genetic variants, mostly rare, compared with previous releases. In addition to unrestricted individual-level public release, we provide detailed tutorials for conducting many of the most common quality-control steps and analyses with these data in a scalable cloud-computing environment and publicly release this new phased joint callset for use as a haplotype resource in phasing and imputation pipelines. This jointly called reference panel will serve as a key resource to support research of diverse ancestry populations.
ABSTRACT Current clinical guidelines recommend three genetic tests for the assessment of fetal structural anomalies: karyotype to detect microscopically-visible balanced and unbalanced chromosomal rearrangements, chromosomal microarray (CMA) to detect sub-microscopic copy number variants (CNVs), and exome sequencing (ES) to identify individual nucleotide changes in coding sequence. Advances in genome sequencing (GS) analysis suggest that it is poised to displace the sequential application of all three conventional tests to become a single diagnostic approach for the assessment of fetal structural anomalies. However, systematic benchmarking is required to assure that GS can capture the full mutational spectrum associated with fetal structural anomalies and to accurately quantify the added diagnostic yield of GS. We applied a novel GS analytic framework that included the discovery, filtration, and interpretation of nine classes of genomic variation to 7,195 individuals. We assessed the sensitivity of GS to detect diagnostic variants (pathogenic or likely pathogenic) from three standard-of-care tests using 1,612 autism spectrum disorder quartet families (ASD; n=6,448) with matched GS, ES, and CMA data, and validated these findings in 46 fetuses with a clinically reportable variant originally identified by karyotype, CMA, or ES. We then assessed the added diagnostic yield of GS in 249 trios (n=747) comprising a fetus with a structural anomaly detected by ultrasound and two unaffected parents that were pre-screened with a combination of all three standard-of-care tests. Across both cohorts, our GS analytic framework identified 98.2% of all diagnostic variants detected by standard-of-care tests, including 100% of those originally detected by CMA (n=88) and ES (n=61), as well as 78.6% (n=11/14) of the chromosomal rearrangements identified by karyotype. The diagnostic yield from GS was 7.8% across all 1,612 ASD probands, almost two-fold more than CMA (4.4%) and three-fold more than ES (3.0%). We also demonstrated that the yield of ES can approach that of GS when CNVs are captured with high sensitivity from exome data (7.4% vs. 7.8%, respectively). In 249 pre-screened fetuses with structural anomalies, GS provided an additional diagnostic yield of 0.4% beyond the combination of all three tests (karyotype, CMA, and ES). Applying our benchmarking results to existing data indicates that GS can achieve an overall diagnostic yield of 46.1% in unselected fetuses with fetal structural anomalies, providing an estimated 17.2% increase in diagnostic yield over karyotype, 14.1% over CMA, and 36.1% over ES when sequence variants are assessed, and 4.1% when CNVs are also identified from exome data. In this study we demonstrate that GS is sensitive to the detection of almost all pathogenic variation captured by karyotype, CMA, and ES, provides a superior diagnostic yield than any individual test by a wide margin, and contributes a modest increase in diagnostic yield beyond the combination of all three tests. We also outline several strategies to aid the interpretation of GS variants that are cryptic to conventional technologies, which we anticipate will be increasingly encountered as comprehensive variant identification from GS is performed. Taken together, these data suggest GS warrants consideration as a first-tier diagnostic approach for fetal structural anomalies.
Abstract Human iPSC-derived kidney organoids have the potential to revolutionize discovery, but assessing their consistency and reproducibility across iPSC lines, and reducing the generation of off-target cells remain an open challenge. Here, we profile four human iPSC lines for a total of 450,118 single cells to show how organoid composition and development are comparable to human fetal and adult kidneys. Although cell classes are largely reproducible across time points, protocols, and replicates, we detect variability in cell proportions between different iPSC lines, largely due to off-target cells. To address this, we analyze organoids transplanted under the mouse kidney capsule and find diminished off-target cells. Our work shows how single cell RNA-seq (scRNA-seq) can score organoids for reproducibility, faithfulness and quality, that kidney organoids derived from different iPSC lines are comparable surrogates for human kidney, and that transplantation enhances their formation by diminishing off-target cells.
Abstract Underrepresented populations are often excluded from genomic studies due in part to a lack of resources supporting their analyses. The 1000 Genomes Project (1kGP) and Human Genome Diversity Project (HGDP), which have recently been sequenced to high coverage, are valuable genomic resources because of the global diversity they capture and their open data sharing policies. Here, we harmonized a high quality set of 4,096 whole genomes from HGDP and 1kGP with data from gnomAD and identified over 159 million high-quality SNVs, indels, and SVs. We performed a detailed ancestry analysis of this cohort, characterizing population structure and patterns of admixture across populations, analyzing site frequency spectra, and measuring variant counts at global and subcontinental levels. We also demonstrate substantial added value from this dataset compared to the prior versions of the component resources, typically combined via liftover and variant intersection; for example, we catalog millions of new genetic variants, mostly rare, compared to previous releases. In addition to unrestricted individual-level public release, we provide detailed tutorials for conducting many of the most common quality control steps and analyses with these data in a scalable cloud-computing environment and publicly release this new phased joint callset for use as a haplotype resource in phasing and imputation pipelines. This jointly called reference panel will serve as a key resource to support research of diverse ancestry populations.