Identification and Characterization of the Potential Promoter Regions of 1031 Kinds of Human Genes

2001 
To understand the mechanism of transcriptional regulation, it is indispensable to identify and characterize the promoter. The promoter is usually located just proximal to or overlapping the transcription initiation site and contains several sequence motifs with which transcription factors (TFs) interact in a sequence-specific manner. When recruited, these TFs serve as molecular switches, which turn the transcription of the gene on or off. The combinations of the TF-binding motifs in promoters vary depending on the gene, so that an appropriate subset of genes can be expressed according to tissue types or developmental stages (Mitchell and Tjian 1989; Novina and Roy 1996). Among many TF-binding motifs, TATA box and initiator (Inr) are considered to be especially important because only these motifs are directly recognized by the general transcription factors (Roeder 1996; Smale 1997). GC box and CAAT box are also thought to be important promoter elements besides TATA box and Inr. Whether the promoter is located in CpG islands or not is also important for transcriptional regulation. CpG islands are defined as dispersed regions of DNA with a high frequency of CpG dinucleotide relative to the bulk genome (Gardiner-Garden and Frommer 1987; Larsen et al. 1992). When CpG islands remain unmethylated, TF-binding sites can be recognized by TF. In contrast, when methylated, the presence of 5-methylcytosine in CpG islands interferes with the binding of TFs and thus suppresses transcription (Cross and Bird 1995; Costello et al 2000). Despite the important roles of the promoters, the number of genes whose promoters have been identified is limited. In the Eukaryotic Promoter Database (EPD; Rel. 62; http://www.epd.isb-sib.ch; Perier et al. 2000), which accumulates previously-characterized promoter sequences, only 273 human promoters have been registered. This may be due to the fact that the exact mRNA start sites have not been identified for most of the genes. The conventional methods used to identify the mRNA start site, such as S1 mapping, primer extension, or 5′ RACE (Berk and Sharp 1977; McKnight and Kingsbury 1982; Schaefer 1995) are technically difficult and often lead to the inaccurate identification of the mRNA start sites. Previously, we developed a novel method to construct a full-length enriched and 5′-end enriched cDNA library (Maruyama and Sugano 1994; Suzuki et al. 1997). This “oligo-capping” method uses the cap structure of mRNA, which is the characteristic structure of the 5′ end of eukaryotic mRNAs. By three sequential enzyme reactions, the oligo-capping method replaces the cap structure of mRNA with synthetic oligoribonucleotide (Fig. ​(Fig.1).1). Using this 5′ oligoribonucleotide as a sequence tag, cDNAs that originally contained the cap structure are selectively cloned. This type of library (oligo-capped cDNA libraries) contained 50%–80% of the full-length cDNAs whose 5′ ends correspond to the mRNA start sites (Suzuki et al. 1997, 2000). Figure 1 Schematic representation of the construction of oligo-capped cDNA libraries. The cap structure of the mRNA was replaced with the 5′ oligonucleotide by the oligo-capping method, which consists of three enzymatic reaction steps. Bacterial alkaline ... The oligo-capped cDNA libraries are found to be good resources for identification of the mRNA start site for many genes. We have constructed oligo-capped cDNA libraries from 34 kinds of human tissues and cultured cells and sequenced the 5&prime ends of 100,000 clones from these cDNA libraries. By clustering the sequence data, we identified the mRNA start sites at least for 2251 genes. We aligned these transcriptional start sites onto the genomic sequences and retrieved adjacent sequences as the potential promoter regions (PPRs) for 1031 genes. Here we report the identification and characterization of our first 1031 PPRs.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    31
    References
    247
    Citations
    NaN
    KQI
    []