Discovering a genome-wide set of avocado (Persea americana Mill.) single nucleotide polymorphisms and characterizing the diversity of germplasm collection is a powerful tool for breeding. However, discovery is a costly process, due to loss of loci that are proven to be non-informative when genotyping the germplasm.Our study on a collection of 100 accessions comprised the three race types, Guatemalan, Mexican, and West Indian. To increase the chances of discovering polymorphic loci, three pools of genomic DNA, one from each race, were sequenced and the reads were aligned to a reference transcriptome. In total, 507,917 polymorphic loci were identified in the entire collection. Of these, 345,617 were observed in all three pools, 117,692 in two pools, 44,552 in one of the pools, and only 56 (0.0001%) were homozygous in the three pools but for different alleles. The polymorphic loci were validated using 192 randomly selected SNPs by genotyping the accessions within each pool. The sensitivity of polymorphic locus prediction ranged from 0.77 to 0.94. The correlation between the allele frequency estimated from the pooled sequences and actual allele frequency from genotype calling of individual accessions was r = 0.8. A subset of 109 SNPs were then used to evaluate the genetic relationships among avocado accessions and the genetic diversity of the collection. The three races were distinctly clustered by projecting the genetic variation on a PCA plot. As expected, by estimating the kinship coefficient for all the accessions, many of the cultivars from the California breeding program were closely related to each other, especially, the Hass-like ones. The green-skin avocados, e.g., 'Bacon', 'Zutano', 'Ettinger' and 'Fuerte' were also closely related to each other.A framework for SNP discovery and genetically characterizing of a breeder's accessions was described. Sequencing pools of gDNA is a cost-effective approach to create a genome-wide stock of polymorphic loci for a breeding program. Reassessing the botanical and the genetic knowledge about the germplasm accessions is valuable for future breeding. Kinship analysis may be used as a first step in finding a parental candidates in a parentage analyses.
Abstract Motivation: High density oligonucleotide arrays are usually annotated in a one-to-one fashion, with each probeset assigned to one gene. However, in reality, subsets of oligonucleotides in a probeset may match sequences within more than one gene, potentially leading to misinterpretations. Moreover, a gene is often represented by more than one probeset, and analyzing probe matches at the mRNA level can help one deduce whether these probesets are derived from the same or different splice variants. Results: The GeneAnnot system comprehensively documents the many-to-many relationship between oligonucleotide array probesets and annotated genes in GeneCards™. It performs pairwise alignments between the probe sequences and gene transcripts, and assigns sensitivity and specificity scores to each probeset/gene pair. Availability: http://genecards.weizmann.ac.il/geneannot/ Supplementary information: Program description and statistics http://genecards.weizmann.ac.il/geneannot/DOC/index.html
Pomegranate is a valuable crop that is grown commercially in many parts of the world. Wild species have been reported from India, Turkmenistan and Socotra. Pomegranate fruit has a variety of health-beneficial qualities. However, despite this crop's importance, only moderate effort has been invested in studying its biochemical or physiological properties or in establishing genomic and genetic infrastructures. In this study, we reconstructed a transcriptome from two phenotypically different accessions using 454-GS-FLX Titanium technology. These data were used to explore the functional annotation of 45,187 fully annotated contigs. We further compiled a genetic-variation resource of 7,155 simple-sequence repeats (SSRs) and 6,500 single-nucleotide polymorphisms (SNPs). A subset of 480 SNPs was sampled to investigate the genetic structure of the broad pomegranate germplasm collection at the Agricultural Research Organization (ARO), which includes accessions from different geographical areas worldwide. This subset of SNPs was found to be polymorphic, with 10.7% loci with minor allele frequencies of (MAF<0.05). These SNPs were successfully used to classify the ARO pomegranate collection into two major groups of accessions: one from India, China and Iran, composed of mainly unknown country origin and which was more of an admixture than the other major group, composed of accessions mainly from the Mediterranean basin, Central Asia and California. This study establishes a high-throughput transcriptome and genetic-marker infrastructure. Moreover, it sheds new light on the genetic interrelations between pomegranate species worldwide and more accurately defines their genetic nature.
Abstract Genetic diversity a major determinant for the capacity of species to persist and adapt to their environments. Unraveling the factors affecting genetic differentiation is crucial to understand how genetic diversity is shaped and species may react to changing environments. We employed genotyping by sequencing to test the influence of climate, space, latitude, altitude and land cover on genetic differentiation in a collection of 81 wild pea samples ( Pisum sativum ssp. elatius ) from across its distribution range from western Europe to central Asia. We also attempted to elucidate the species recent evolutionary history and its effect on the current distribution of genetic diversity. Association of single SNPs with climate variables were analyses to test for signatures of local adaptation. Genetic variation was geographically structured into six distinct genetic cluster. Two of which were associated with a taxonomic group ( Pisum sativum ssp. humile ) that according to some researchers does not qualify for a sub-species rank due to its alleged lack of genetic distinctness from other conspecific groups. The effect of the tested factors influencing genetic differentiation were rather variable among genetic clusters. The climate predictors were most important in all clusters. Land use was more important in clusters from areas strongly influenced by human land use, especially by agriculture. We found a statistically significant association of 3,623 SNPs (2.4 % of all SNPs) with one of the environmental predictors. Most of them were correlated with latitude followed by temperature, precipitation and altitude. Estimation of SNP effects of the candidates resulted in a missense to silent ratio of 0.45, suggesting many of the observed candidates SNPs may alter the encoded amino acid sequence. Wild peas went through a genetic bottleneck during the last glacial period followed by population recovery. Probably associated with this population recovery, we detected a range expansion, which may have led to an eastward range expansion of the European cluster to Turkey and thereof southwards and eastwards. Overall, the interplay of several environmental factors and the recent evolutionary history affected the distribution of genetic diversity in wild peas where each subpopulations were differently affected by those factors and processes.
Genotyping arrays are tools for high-throughput genotyping, which is beneficial in constructing saturated genetic maps and therefore high-resolution mapping of complex traits. Since the report of the first cucumber genome draft, genetic maps have been constructed mainly based on simple-sequence repeats (SSRs) or on combinations of SSRs and sequence-related amplified polymorphism (SRAP). In this study, we developed the first cucumber genotyping array consisting of 32,864 single-nucleotide polymorphisms (SNPs). These markers cover the cucumber genome with a median interval of ~2 Kb and have expected genotype calls in parents/F1 hybridizations as a training set. The training set was validated with Fluidigm technology and showed 96% concordance with the genotype calls in the parents/F1 hybridizations. Application of the genotyping array was illustrated by constructing a 598.7 cM genetic map based on a '9930' × 'Gy14' recombinant inbred line (RIL) population comprised of 11,156 SNPs. Marker collinearity between the genetic map and reference genomes of the two parents was estimated at R2 = 0.97. We also used the array-derived genetic map to investigate chromosomal rearrangements, regional recombination rate, and specific regions with segregation distortions. Finally, 82% of the linkage-map bins were polymorphic in other cucumber variants, suggesting that the array can be applied for genotyping in other lines. The genotyping array presented here, together with the genotype calls of the parents/F1 hybridizations as a training set, should be a powerful tool in future studies with high-throughput cucumber genotyping. An ultrahigh-density linkage map constructed by this genotyping array on RIL population may be invaluable for assembly improvement, and for mapping important cucumber QTLs.