SNP Discovery in Pooled Samples With Mismatch Repair Detection

2004 
Random sequencing approaches have led to the identification of a tremendous number of single nucleotide polymorphisms (SNPs) in the human genome. Through the work of The SNP Consortium, 1.4 million SNPs have been identified (Sachidanandam et al. 2001). This has been followed by other public projects, leading to the presence of ∼3 million SNPs with some level of validation. These SNPs provide researchers with a wealth of candidate SNPs in their desired candidate regions. Unfortunately, only a fraction of the disease-causing variations in regulatory and coding regions (cSNP) are identified through this approach (Kruglyak and Nickerson 2001; Carlson et al. 2003). The identification of low frequency cSNPs requires a targeted discovery effort. An extensive targeted effort to identify common cSNPs has been advocated before (Johnson et al. 2001) and partially implemented (Haga et al. 2002). Cost has been a major drawback for the targeted approach. To have 95% confidence of identifying an allele with a 2% frequency, 75 individuals need to be sequenced due to the Poisson statistics of chromosome sampling. In addition, detection of these alleles assumes the ability to detect heterozygote peaks in a sequencing trace with good accuracy. We present our utilization of mismatch repair detection (MRD; Faham et al. 2001) to enrich fragments amplified from pooled genomic DNA samples for variant alleles that are then subjected to standard dideoxy sequencing. MRD has been described before as a method for multiplex variation scanning (Faham et al. 2001). Here we describe its use in combination with standard dideoxy terminator sequencing to discover variant alleles in pooled genomic DNA. MRD detects variants using the mismatch repair system of Escherichia coli (Modrich 1991). A specific strain (mutation sorter) is engineered to sort a mixture of transformed fragments into two pools: those carrying a variation and those that do not. The basic approach is shown schematically in Figure 1. Sanger sequencing does not have sufficient sensitivity to detect rare alleles from genomic pools, as demonstrated in Figure 1, top trace, in which the PCR product from the pooled sample is sequenced directly. Instead, individual PCR reactions using pooled genomic DNA as a template are, in turn, pooled together and hybridized to PCR fragments from a single homozygous source (standard). These heteroduplexes are transformed into the mutation sorter strain, generating a pool of colonies enriched for variant alleles (compared with the standard). One amplification reaction from the variant-enriched pool is done for each amplicon, followed by a sequencing reaction to identify variant alleles in the population examined. The end result of this process is that the necessity of amplifying and sequencing many individuals is replaced with a pooled enrichment process that is carried out for hundreds or thousands of amplicons in a multiplexed fashion. The sequencing effort is thus reduced to the task of sequencing a standard and the variant-enriched pool. Figure 1 A schematic of the MRD SNP discovery process. Genomic DNA samples are pooled together, and PCR amplicons are generated by using the pooled genomic DNA as a template. If this PCR product was simply sequenced, the SNP shown would be lost in the noise of ...
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    25
    Citations
    NaN
    KQI
    []