Design and evaluation of a large sequence-capture probe set and associated SNPs for diploid and haploid samples of Norway spruce (Picea abies)

2018 
Massively parallel sequencing has revolutionized the field of genetics by providing comparatively high-resolution insights into whole genomes for large number of species so far. However, whole-genome resequencing of many conspecific individuals remains cost-prohibitive for most species. This is especially true for species with very large genomes with extensive genomic redundancy, such as the genomes of coniferous trees. The genome assembly for the conifer Norway spruce (Picea abies) was the first published draft genome assembly for any gymnosperm. Our goal was to develop a dense set of genome-wide SNP markers for Norway spruce to be used for assembly improvement and population studies. From 80,000 initial probe candidates, we developed two partially-overlapping sets of sequence capture probes: one developed against 56 haploid megagametophytes, to aid assembly improvement; and the other developed against 6 diploid needle samples, to aid population studies. We focused probe development within genes, as delineated via the annotation of ~67,000 gene models accompanying P. abies assembly version 1.0. The 31,277 probes developed against megagametophytes covered 19,268 gene models (mean 1.62 probes/model). The 40,018 probes developed against diploid tissue covered 26,219 gene modules (mean 1.53 probes/model). Analysis of read coverage and variant quality around probe sites showed that initial alignment of captured reads should be done against the whole genome sequence, rather than a subset of probe-containing scaffolds, to overcome occasional capture of sequences outside of designed regions. All three probe sets, anchored to the P. abies 1.0 genome assembly and annotation, are available for download.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    25
    Citations
    NaN
    KQI
    []