Genomic exploration and molecular marker development in a large and complex conifer genome using RADseq and mRNAseq

2015 
M.-J. KARAM,* F. LEFEVRE,* M. BOU DAGHER-KHARRAT,† S. PINOSIO‡,§ and G.G. VENDRAMIN§*INRA, UR 629 Ecologie des For^ets M editerraneennes, URFM, Avignon, France, †Laboratoire Caracterisation Genomique desPlantes, Departement Sciences de la Vie et de la Terre, Faculte des Sciences, Campus Sciences et Technologies, Universite Saint-Joseph, Mar Roukos, Mkalles, Lebanon, ‡Istituto di Genomica Applicata (IGA), Udine, Italy, §Institute of Biosciences andBioresources, National Research Council, Florence, ItalyAbstractWe combined restriction site associated DNA sequencing (RADseq) using a hypomethylation-sensitive enzyme andmessenger RNA sequencing (mRNAseq) to develop molecular markers for the 16 gigabase genome of Cedrus atlanti-ca, a conifer tree species. With each method, Illumina reads from one individual were used to generate de novoassemblies. SNPs from the RADseq data set were detected in a panel of one single individual and three pools of threeindividuals each. We developed a flexible script to estimate the ascertainment bias in SNP detection considering thepooling and sampling effects on the probability of not detecting an existing polymorphism. Gene Ontology (GO) andtransposable element (TE) search analyses were applied to both data sets. The RADseq and the mRNAseq assembliesrepresented 0.1% and 0.6% of the genome, respectively. Genome complexity reduction resulted in 17% of the RADseqcontigs potentially coding for proteins. This rate was doubled in the mRNAseq data set, suggesting that RADseq alsoexplores noncoding low-repeat regions. The two methods gave very similar GO-slim profiles. As expected, the twoassemblies were poor in TE-like sequences (<4% of contigs length). We identified 17,348 single nucleotide polymor-phisms (SNPs) in the RADseq data set and 5,714 simple sequence repeats (SSRs) in the transcriptome. A subset of 282SNPs was validated using the Fluidigm genotyping technology, giving a conversion rate of 50.4%, falling within theexpected range for conifers. Increasing sample size had the greatest effect for ascertainment bias reduction. Theseresults validated the utility of the RADseq approach for highly complex genomes such as conifers.Keywords: Cedrus atlantica, next generation sequencing, RADseq, SNP, SSR, transcriptomeReceived 16 April 2014; revision received 30 August 2014; accepted 5 September 2014
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    60
    References
    22
    Citations
    NaN
    KQI
    []