Copy number variation of individual cattle genomes using next-generation sequencing

2012 
Copy number variations (CNVs) are gains and losses of genomic sequence >50 bp between two individuals of a species (Mills et al. 2011). Substantial progress has been made in understanding CNVs in mammals, especially in humans (Redon et al. 2006; Conrad et al. 2009; Altshuler et al. 2010; Mills et al. 2011) and rodents (Graubert et al. 2007; Guryev et al. 2008; She et al. 2008; Yalcin et al. 2011). While single nucleotide polymorphisms (SNPs) are more frequent, CNVs impact a higher percentage of genomic sequence and have potentially greater effects, including the changing of gene structure and dosage, altering gene regulation and exposing recessive alleles (Zhang et al. 2009). In particular, segmental duplications (SDs) were shown to be one of the catalysts and hotspots for CNV formation (Sharp et al. 2005; Alkan et al. 2009; Marques-Bonet et al. 2009). Several common CNVs have been shown to be important in both normal phenotypic variability and disease susceptibility in human (Aitman et al. 2006; Fellermann et al. 2006; Le Marechal et al. 2006; Fanciulli et al. 2007; Yang et al. 2007; Stankiewicz and Lupski 2010). Although analyses of a subset of CNVs provided evidence of linkage disequilibrium with flanking SNPs (McCarroll et al. 2008), a significant portion of CNVs fell in genomic regions not well-covered by SNP arrays, such as SDs, and thus were not genotyped (Locke et al. 2006; Estivill and Armengol 2007; Campbell et al. 2011). Combining CNV and SNP data in human genome-wide association studies has associated CNVs with diseases such as intellectual disability, autism, schizophrenia, neuroblastoma, Crohn's disease, and severe early-onset obesity (de Vries et al. 2005; Sharp et al. 2006; Sebat et al. 2007; Cook and Scherer 2008; Bochukova et al. 2009; Diskin et al. 2009; Glessner et al. 2009; Shi et al. 2009; Stefansson et al. 2009). Comparative genomic hybridization (CGH) and SNP arrays are routinely used for CNV screens, and their performances have been extensively reviewed (Lai et al. 2005; LaFramboise 2009; Winchester et al. 2009; Pinto et al. 2011). Although these platforms offer some detection power in SD regions, they are often affected by low probe density and cross-hybridization of repetitive sequence. In addition, only a relative copy number (CN) increase or decrease is reported with respect to the reference individual in array comparative genomic hybridization (aCGH). This poses a particular problem in the detection of CNVs in SD regions, as the test individual's CN may differ from that of the reference by a smaller proportion than is detectable using array-based calling criteria. The advent of next-generation sequencing (NGS) and complementary analysis programs has provided better approaches to systematically identify CNVs at a genome-wide level. These sequence-based approaches, which are becoming more popular due to the ongoing developments and cost decreases in NGS, allow CNV reconstruction at a higher effective resolution and sensitivity. Different methods to detect CNVs using sequence data were presented in the 1000 Genomes Project pilot studies (Sudmant et al. 2010; Mills et al. 2011) and have been previously reviewed (Snyder et al. 2010). Read depth (RD) methods used to analyze the 1000 Genomes Project data contributed high-resolution CNV calls with the capability of determining exact CN values for each genetic locus in an individual (Sudmant et al. 2010). Specifically, mrFAST/mrsFAST and whole-genome shotgun sequence detection (WSSD) (Alkan et al. 2009; Hach et al. 2010; Sudmant et al. 2010) are able to construct personalized CNV maps in or near SD regions by reporting all mapping locations for sequence reads, whereas other RD methods consider only one mapping location per read. Since CNVs are often found in or near duplicated regions in the genome (Cheng et al. 2005; Marques-Bonet et al. 2009), mrFAST and mrsFAST are more appropriate for detecting CNV in duplication- and repeat-rich regions. Recently, interest in CNV detection has extended into domesticated animals (Chen et al. 2009b; Fontanesi et al. 2009; Nicholas et al. 2009; Bae et al. 2010; Fadista et al. 2010; Liu et al. 2010; Ramayo-Caldas et al. 2010; Fontanesi et al. 2011; Kijas et al. 2011). For example, in ridgeback dogs, duplication of FGF3, FGF4, FGF19, and ORAOV1 causes hair ridge and predisposition to dermoid sinus (Hillbertz et al. 2007). The “wrinkled” skin phenotype and a periodic fever syndrome in Chinese Shar-Pei dogs are caused by a duplication upstream of HAS2 (Olsson et al. 2011). The white coat color in pigs and sheep is caused by a duplication involving KIT and ASIP, respectively (Moller et al. 1996; Norris and Whan 2008). The chicken peacomb phenotype was linked to a duplication near the first intron of SOX5 (Wright et al. 2009). Similarly, partial deletion of the bovine gene ED1 causes anhidrotic ectodermal dysplasia in cattle (Drogemuller et al. 2001). Given the heritability of CNVs and their higher rates of mutation, it is possible that CNVs may be associated with or affect animal health and production traits under recent selection. Bos taurus indicus are better adapted to warm climates and demonstrate superior resistance to tick infestation than Bos taurus taurus breeds (Porto Neto et al. 2011). Likewise, beef and dairy cattle breeds display distinct patterns in selected metabolic pathways related to muscling, marbling, and milk composition traits. It is possible that CNVs may be associated with these agriculturally important traits. The availability of two alternative cattle reference genomes (Btau_4.0 and UMD3.0) (The Bovine Genome Sequencing and Analysis Consortium 2009; Zimin et al. 2009) has opened new avenues of cattle genome research. Using the Btau_4.0 assembly, we previously applied an approach combining MegaBlast and WSSD to detect cattle SD and discovered 94.4 Mbp of duplicated sequence in the reference genome (Liu et al. 2009). Our earlier array-based studies in cattle have also uncovered significant differences in CNV frequency among breeds, as well as several genes associated with CNVs like ULBP and PGR (Liu et al. 2010; Hou et al. 2011). These studies confirm that CNVs are common, associated with SDs, and often occur in gene-rich regions in cattle. Here, we describe the first use of NGS data to detect CNVs in the cattle genome. Using mrsFAST and WSSD, we also analyzed genome-wide gene copy number estimates in order to explore their potential functional and evolutionary contributions to breed-specific traits. By providing the first individualized bovine CNV and SD maps and genome-wide gene copy number estimates, we enable future CNV studies into highly duplicated regions in the cattle genome.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    104
    References
    225
    Citations
    NaN
    KQI
    []