Rice is an important food crop and a model plant for other cereal genomes. The Clemson University Genomics Institute framework project, begun two years ago in anticipation of the now ongoing international effort to sequence the rice genome, is nearing completion. Two bacterial artificial chromosome (BAC) libraries have been constructed from the Oqya sativa cultivar Nipponbare. Over 100000 BAC end sequences have been generated from these libraries and, at a current total of 28 Mbp, represent 6.5% of the total rice genome sequence. This sequence information has allowed us to draw first conclusions about unique and redundant rice genomic sequences. In addition, more than 60 000 clones (19 genome equivalents) have been successfully fingerprinted and assembled into contigs using FPC software. Many of these contigs have been anchored to the rice chromosomes using a variety of techniques. Hybridization experiments have shown these contigs to be very robust. Contig assembly and hybridization experiments have revealed some surprising insights into the organization of the rice genome, which will have significant repercussions for the sequencing effort. Integration of BAC end sequence data with anchored contig information has provided unexpected revelations on sequence organization at the chromosomal level.
Rice was chosen as a model organism for genome sequencing because of its economic importance, small genome size, and syntenic relationship with other cereal species. We have constructed a bacterial artificial chromosome fingerprint-based physical map of the rice genome to facilitate the whole-genome sequencing of rice. Most of the rice genome ( approximately 90.6%) was anchored genetically by overgo hybridization, DNA gel blot hybridization, and in silico anchoring. Genome sequencing data also were integrated into the rice physical map. Comparison of the genetic and physical maps reveals that recombination is suppressed severely in centromeric regions as well as on the short arms of chromosomes 4 and 10. This integrated high-resolution physical map of the rice genome will greatly facilitate whole-genome sequencing by helping to identify a minimum tiling path of clones to sequence. Furthermore, the physical map will aid map-based cloning of agronomically important genes and will provide an important tool for the comparative analysis of grass genomes.
Modern cultivated maize ( Zea mays L.)is one of the primary agronomic crops in the USA with an estimated genome size of 2500 megabases (Mb). To develop the resources for positional cloning and structural genomics in maize, we constructed a bacterial artificial chromosome (BAC) library for the inbred line B73 using the cloning enzyme Hin d III. The library contains 247 680 clones (645 384‐well plates). A random sampling of 697 clones indicated an average insert size of 136 kilobase (kb) (range = 42 to 379 kb) and 0.4% empty vectors. Screening the colony filters for chloroplast DNA content indicated an exceptionally low 0.18% contamination with chloroplast DNA. Thus, the library provides 13.5 haploid genome equivalents allowing >99% probability of recovering any specific sequence of interest. High‐density filters were gridded robotically using a Genetix Q‐BOT (Hampshire, UK) in a 4 by 4 double‐spotted array on 22.5‐cm 2 filters. Partial screening (6× coverage) of the library with 20 single copy probes identified an average 7.1 positive signals per probe, with a range of 3 to 15 positive signals per probe. To evaluate the utility of the library for sequence tagged connector (STC) analysis, 768 BAC clones were end sequenced in both forward and reverse directions giving a total of 1415 successful reads. End sequences were queried against SWISS‐PROT, Genbank NR, MIPS Arabidopsis , maize genomic sequence dbGSS, and maize cDNA database dbEST. Results in spreadsheet format from these searches is publicly available at the CUGI website ( www.genome.clemson.edu/projects/stc/maize/ZMMBBb/ ).
Bacterial artificial chromosome (BAC) physical maps embedding a large number of BAC end sequences (BESs) were generated for Oryza sativa ssp. indica varieties Minghui 63 (MH63) and Zhenshan 97 (ZS97) and were compared with the genome sequences of O. sativa spp. japonica cv. Nipponbare and O. sativa ssp. indica cv. 93-11. The comparisons exhibited substantial diversities in terms of large structural variations and small substitutions and indels. Genome-wide BAC-sized and contig-sized structural variations were detected, and the shared variations were analyzed. In the expansion regions of the Nipponbare reference sequence, in comparison to the MH63 and ZS97 physical maps, as well as to the previously constructed 93-11 physical map, the amounts and types of the repeat contents, and the outputs of gene ontology analysis, were significantly different from those of the whole genome. Using the physical maps of four wild Oryza species from OMAP (http://www.omap.org) as a control, we detected many conserved and divergent regions related to the evolution process of O. sativa. Between the BESs of MH63 and ZS97 and the two reference sequences, a total of 1532 polymorphic simple sequence repeats (SSRs), 71,383 SNPs, 1767 multiple nucleotide polymorphisms, 6340 insertions, and 9137 deletions were identified. This study provides independent whole-genome resources for intra- and intersubspecies comparisons and functional genomics studies in O. sativa. Both the comparative physical maps and the GBrowse, which integrated the QTL and molecular markers from GRAMENE (http://www.gramene.org) with our physical maps and analysis results, are open to the public through our Web site (http://gresource.hzau.edu.cn/resource/resource.html).
Bacterial artificial chromosome (BAC) clones are effective mapping and sequencing reagents for use with a wide variety of small and large genomes. This report describes the development of a physical framework for the genome of Bradyrhizobium japonicum , the nitrogen-fixing symbiont of soybean. A BAC library for B. japonicum was constructed that provides a 77-fold genome coverage based on an estimated genome size of 8.7 Mb. The library contains 4608 clones with an average insert size of 146 kb. To generate a physical map, the entire library was fingerprinted with Hin dIII, and the fingerprinted clones were assembled into contigs using the Fingerprint Contig software ( FPC ; Sanger Centre, UK). The FPC analysis placed 3410 clones in six large contigs. The ends of 1152 BAC inserts were sequenced to generate a sequence-tagged connector (STC) framework. To join and orient the contigs, high-density BAC colony filters were probed with 41 known gene probes and 17 end sequences from contig boundaries. STC sequences were searched against the public databases using FASTA and BLASTX algorithms. Query results allowed the identification of 113 high probability matches with putative functional identities that were placed on the physical map. Combined with the hybridization data, a high-resolution physical map with 194 positioned markers represented in two large contigs was developed, providing a marker every 45 kb. Of these markers, 177 are known or putative B. japonicum genes. Additionally, 1338 significant BLASTX results ( E < 10 −4 ) were manually sorted by function to produce a functionally categorized database of relevant B. japonicum STC sequences that can also be traced to specific locations in the physical map.
The order and orientation (arrangement) of all 91 sequenced scaffolds in the 12 pseudomolecules of the recently published tomato (Solanum lycopersicum, 2n = 2x = 24) genome sequence were positioned based on marker order in a high-density linkage map. Here, we report the arrangement of these scaffolds determined by two independent physical methods, bacterial artificial chromosome-fluorescence in situ hybridization (BAC-FISH) and optical mapping. By localizing BACs at the ends of scaffolds to spreads of tomato synaptonemal complexes (pachytene chromosomes), we showed that 45 scaffolds, representing one-third of the tomato genome, were arranged differently than predicted by the linkage map. These scaffolds occur mostly in pericentric heterochromatin where 77% of the tomato genome is located and where linkage mapping is less accurate due to reduced crossing over. Although useful for only part of the genome, optical mapping results were in complete agreement with scaffold arrangement by FISH but often disagreed with scaffold arrangement based on the linkage map. The scaffold arrangement based on FISH and optical mapping changes the positions of hundreds of markers in the linkage map, especially in heterochromatin. These results suggest that similar errors exist in pseudomolecules from other large genomes that have been assembled using only linkage maps to predict scaffold arrangement, and these errors can be corrected using FISH and/or optical mapping. Of note, BAC-FISH also permits estimates of the sizes of gaps between scaffolds, and unanchored BACs are often visualized by FISH in gaps between scaffolds and thus represent starting points for filling these gaps.
Asian cultivated rice consists of two subspecies: Oryza sativa subsp. indica and O. sativa subsp. japonica Despite the fact that indica rice accounts for over 70% of total rice production worldwide and is genetically much more diverse, a high-quality reference genome for indica rice has yet to be published. We conducted map-based sequencing of two indica rice lines, Zhenshan 97 (ZS97) and Minghui 63 (MH63), which represent the two major varietal groups of the indica subspecies and are the parents of an elite Chinese hybrid. The genome sequences were assembled into 237 (ZS97) and 181 (MH63) contigs, with an accuracy >99.99%, and covered 90.6% and 93.2% of their estimated genome sizes. Comparative analyses of these two indica genomes uncovered surprising structural differences, especially with respect to inversions, translocations, presence/absence variations, and segmental duplications. Approximately 42% of nontransposable element related genes were identical between the two genomes. Transcriptome analysis of three tissues showed that 1,059-2,217 more genes were expressed in the hybrid than in the parents and that the expressed genes in the hybrid were much more diverse due to their divergence between the parental genomes. The public availability of two high-quality reference genomes for the indica subspecies of rice will have large-ranging implications for plant biology and crop genetic improvement.