Examples of the complex architecture of the human transcriptome revealed by RACE and high-density tiling arrays

2005 
A significant challenge to a functional understanding of the human genome is obtaining a detailed knowledge of its transcriptional output. Ideally, a catalog of transcriptional activities in a genome should include all isoforms of coding and noncoding transcripts that are present in all tissue and cell types. Two large-scale efforts, sponsored independently by the National Cancer Institute (Strausberg et al. 1999, 2000, 2002) and RIKEN (Okazaki et al. 2002; Ota et al. 2004), have significantly contributed to the current understanding of the complexity of the human transcriptome. Several recent reports have also provided substantial evidence that the transcriptional output of the human genome is far more complex than can be explained by current collections of partial or full-length cDNAs (Chen et al. 2002; Kapranov et al. 2002; Okazaki et al. 2002; Saha et al. 2002; Rinn et al. 2003; Kampa et al. 2004; Ota et al. 2004; Cheng et al. 2005). The majority of recently detected regions of transcriptional activity in the human genome lies outside of the annotated areas and may, therefore, represent novel transcriptional units or hitherto undiscovered isoforms of known genes. Recently, we described the sites of transcription at a 5-bp resolution for 10 human chromosomes (30% of the nonrepeat portion of the human genome) (Cheng et al. 2005). As part of this study a total of 768 randomly selected unannotated regions of transcription were studied using a combination of RACE and high-density arrays to validate the presence of transcription occurring at the selected sites and to better understand the structures of the unannotated transcripts. A total of 634 of the 768 loci (82.6%) yielded a set of 5′- and/or 3′-RACE products, and ∼61% of surveyed loci show evidence of overlapping transcription on the positive and negative strands of the genome. RT-PCR reactions were conducted on 250 (57%) of the genomic loci that produced 5′- and 3′-RACE products from at least one genomic strand. In this report, we explore in depth the complex pattern and transcript structures observed in this genome survey. Using the combination of RACE and high-density arrays, transcript structures corresponding to unannotated array-detected regions mapping within as well as outside of the bounds of well-characterized coding genes were studied in depth. Examples of complex overlapping sense/antisense transcription within the bounds of known genes emerging from these studies are presented and discussed. The complexity of the organization and structure of the RNAs detected are consistent with the existence of a complex transcriptome whose organization and transcript structure have potentially important implications about the regulation of transcription and the possible interpretation of the naturally occurring genetic variation in humans.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    273
    Citations
    NaN
    KQI
    []