Abstract As CRISPR-based therapies enter the clinic, evaluation of safety remains a critical and active area of study. Here, we employ a clinical next generation sequencing (NGS) workflow to achieve high sequencing depth and detect ultra-low frequency variants across exons of genes associated with cancer, all exons, and genome wide. In three separate primary human hematopoietic stem and progenitor cell (HSPC) donors assessed in technical triplicates, we electroporated high-fidelity Cas9 protein targeted to three loci (AAVS1, HBB , and ZFPM2 ) and harvested genomic DNA at days 4 and 10. Our results demonstrate that clinically relevant delivery of high-fidelity Cas9 to primary HSPCs and ex vivo culture up to 10 days does not introduce or enrich for tumorigenic variants and that even a single SNP in a gRNA spacer sequence is sufficient to eliminate Cas9 off-target activity in primary, repair-competent human HSPCs.
Abstract As CRISPR-based therapies enter the clinic, evaluation of the safety remains a critical and still active area of study. While whole genome sequencing is an unbiased method for identifying somatic mutations introduced by ex vivo culture and genome editing, this methodology is unable to attain sufficient read depth to detect extremely low frequency events that could result in clonal expansion. As a solution, we utilized an exon capture panel to facilitate ultra-deep sequencing of >500 tumor suppressors and oncogenes most frequently altered in human cancer. We used this panel to investigate whether transient delivery of high-fidelity Cas9 protein targeted to three different loci (using guide RNAs (gRNAs) corresponding to sites at AAVS1, HBB , and ZFPM2 ) at day 4 and day 10 timepoints post-editing resulted in the introduction or enrichment of oncogenic mutations. In three separate primary human HSPC donors, we identified a mean of 1,488 variants per Cas9 treatment (at <0.07% limit of detection). After filtering to remove germline and/or synonymous changes, a mean of 3.3 variants remained per condition, which were further reduced to six total mutations after removing variants in unedited treatments. Of these, four variants resided at the predicted off-target site in the myelodysplasia-associated EZH2 gene that were subject to ZFPM2 gRNA targeting in Donors 2 and 3 at day 4 and day 10 timepoints. While Donor 1 displayed on-target cleavage at ZFPM2 , we found no off-target activity at EZH2 . Sanger sequencing revealed a homozygous single nucleotide polymorphism (SNP) at position 14bp distal from the Cas9 protospacer adjacent motif in EZH2 that eliminated any detectable off-target activity. We found no evidence of exonic off-target INDELs with either of the AAVS1 or HBB gRNAs. These findings indicate that clinically relevant delivery of high-fidelity Cas9 to primary HSPCs and ex vivo culture up to 10 days does not introduce or enrich for tumorigenic variants and that even a single SNP outside the seed region of the gRNA protospacer is sufficient to eliminate Cas9 off-target activity with this method of delivery into primary, repair competent human HSPCs.
Abstract The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has greatly benefited society 1, 2 . However, it still has many gaps and errors, and does not represent a biological human genome since it is a blend of multiple individuals 3, 4 . Recently, a high-quality telomere-to-telomere reference genome, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a duplicate genome, and is thus nearly homozygous 5 . To address these limitations, the Human Pangenome Reference Consortium (HPRC) recently formed with the goal of creating a collection of high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity 6 . Here, in our first scientific report, we determined which combination of current genome sequencing and automated assembly approaches yields the most complete, accurate, and cost-effective diploid genome assemblies with minimal manual curation. Approaches that used highly accurate long reads and parent-child data to sort haplotypes during assembly outperformed those that did not. Developing a combination of all the top performing methods, we generated our first high- quality diploid reference assembly, containing only ∼4 gaps (range 0-12) per chromosome, most within + 1% of CHM13’s length. Nearly 1/4th of protein coding genes have synonymous amino acid changes between haplotypes, and centromeric regions showed the highest density of variation. Our findings serve as a foundation for assembling near-complete diploid human genomes at the scale required for constructing a human pangenome reference that captures all genetic variation from single nucleotides to large structural rearrangements.