Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads

2019 
The sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome, CHM13. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of large tandem repeats, as validated with orthogonal analyses. Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Our analysis also shows a slight excess of disrupted gene annotations, indicating further developments are needed to improve residual single-base-pair indel errors. Despite these shortcomings, our results suggest that HiFi may currently be the most effective stand-alone technology for de novo assembly of human genomes.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    36
    References
    11
    Citations
    NaN
    KQI
    []