Genome-wide analysis of microsatellite polymorphism in chicken circumventing the ascertainment bias

2008 
Empirical data on microsatellite mutability and polymorphism almost always come with the limitation of suffering from an ascertainment bias. For instance, direct observations of de novo mutation events in pedigrees are essentially confined to loci with very high mutation rates, which are not necessarily representative for the majority of microsatellite loci in the genome when it comes to rate and pattern of evolution (Weber and Wong 1993; Ellegren 2000; Huang et al. 2002). The same applies to observations on microsatellite allele frequency distributions at loci genotyped in population samples (Estoup et al. 1995). Such data tend to be biased toward highly polymorphic loci because there is a selection for polymorphism at various stages of the process of marker development; short repeat tracts are avoided for marker design, monomorphic markers or markers with limited polymorphism are typically discarded at an early screening stage, and the most polymorphic loci would find most widespread use in subsequent studies. Using unusually mutable loci will lead to overestimates of genetic diversity and will give a biased picture of the microsatellite mutation process. Another example, and which is perhaps the most well-known aspect of microsatellite ascertainment bias, is the comparison of repeat lengths of orthologous loci in two related species. Everything else being equal, this will tend to give a pattern of longer repeats in the species from which markers were developed (the focal species), an inevitable consequence of the selection for long and polymorphic loci as described above (Ellegren et al. 1995, 1997; Webster et al. 2002; Vowles and Amos 2006). Again, this will lead to incorrect interpretations of microsatellite mutation and evolution. Whole-genome sequence surveys for microsatellite occurrence avoid this ascertainment bias. Such analyses give a snapshot of the distribution of repeat lengths across the genome, which can be compared to expectations of theoretical models (Dieringer and Schlotterer 2003). However, in the absence of polymorphism data, they do not capture on-going evolutionary processes. For a few species genome sequencing has been augmented with large-scale initiatives toward obtaining sequence information from multiple individuals, like re-sequencing of targeted regions in the human HapMap (International HapMap Consortium 2005) or sparse shotgun sequencing made in different dog (Canis familiaris) breeds (Lindblad-Toh et al. 2005). One of the most extensive efforts of this kind is the light shotgun sequencing of three different domestic chicken (Gallus gallus domesticus) (International Chicken Polymorphism Map Consortium 2004), made in addition to the assembly of the chicken genome sequence, which was based on sequencing of a red jungle fowl (G. g. gallus, the wild ancestor to domestic chicken) (International Chicken Genome Sequencing Consortium 2004). This generated sequence data for another chromosome (than the reference sequence) from the chicken population for about half the genome, uncovering a total of 2.8 million single nucleotide polymorphisms (SNPs) (International Chicken Genome Sequencing Consortium 2004) and more than 270,000 length polymorphisms (Brandstrom and Ellegren 2007). Here, we use these data to obtain an unbiased picture of microsatellite variability in a vertebrate genome and to address several general questions pertinent to microsatellite evolution. Importantly, due to the more or less random nature of shotgun sequencing, this approach gives diversity data for one of the most polymorphic sequence categories in eukaryotic genomes without being confined by an ascertainment bias.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    50
    References
    84
    Citations
    NaN
    KQI
    []