Pitfalls of haplotype phasing from amplicon-based long-read sequencing

2016 
Accurately determining genotype phase is important in many aspects of genetics. For example, in pharmacogenetics1, transplant HLA typing2 and disease association mapping3. Until recently, haplotype phasing has generally relied on parental genotypes or statistical phasing based on allele frequency patterns within the population. Next generation sequencing technologies are often not able to phase variants that are more than a few hundred base pairs apart because of short read lengths. Recent developments in long-read single molecule sequencing technologies such as the Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) sequencing systems now promise efficient and accurate haplotype phasing over multiple kilobase distances. For example, Ammar et al.4 recently used the Minion sequencer from ONT to apparently phase variants at the CYP2D6 and HLA loci into clinically important haplotypes. Hirschsprung disease is a congenital abnormality characterised by complete or partial intestinal obstruction resulting from an absence of neuronal ganglion cells in the intestinal tract. The extent of the aganglionosis is classified as short or long-segment disease or total colonic aganglionosis. Mutations in the RET gene account for a high proportion of Hirschsprung cases and are more frequent in patients with long-segment disease or total colonic aganglionosis5. We identified two heterozygous coding variants in the RET gene in a 5-month old female with total colonic aganglionosis: a de novo mutation, p.Arg418Ter (Chr10(GRCh38):43109219C > T), and a previously reported variant, p.Leu56Met (Chr10(GRCh38):43100551)6. The clinical significance of the p.Leu56Met variant is uncertain. It is possible that the p.Leu56Met is a modifier of the disease phenotype observed in our patient or is a benign polymorphism. If both variants occur on the same chromosome (in cis) then this would indicate that the p.Leu56Met variant was not contributing to the phenotype as the truncated transcript would be subjected to nonsense-medicated decay. p.R418X and p.L56 M occur 9 kb apart and in this study we used long-range PCR amplification and ONT and PacBio sequencing to phase these variants. We also re-analysed the sequencing data from Ammar et al.4. From these analyses, we demonstrate PCR-chimera formation during PCR amplification and reference alignment bias are major pitfalls that need to be considered when attempting to phase variants using amplicon-based long-read sequencing technologies.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    50
    Citations
    NaN
    KQI
    []