DNA Free Energy-Based Promoter Prediction and Comparative Analysis of Arabidopsis and

2011 
The cis-regulatory regions on DNA serve as binding sites for proteins such as transcription factors and RNA polymerase. The combinatorial interaction of these proteins plays a crucial role in transcription initiation, which is an important point of control in the regulation of gene expression. We present here an analysis of the performance of an in silico method for predicting cisregulatory regions in the plant genomes of Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) on the basis of free energy of DNA melting. For protein-coding genes, we achieve recall and precision of 96% and 42% for Arabidopsis and 97% and 31% for rice, respectively. For noncoding RNA genes, the program gives recall and precision of 94% and 75% for Arabidopsis and 95% and 90% for rice, respectively. Moreover, 96% of the false-positive predictions were located in noncoding regions of primary transcripts, out of which 20% were found in the first intron alone, indicating possible regulatory roles. The predictions for orthologous genes from the two genomes showed a good correlation with respect to prediction scores and promoter organization. Comparison of our results with an existing program for promoter prediction in plant genomes indicates that our method shows improved prediction capability. Sequencing and annotation of a large number of eukaryotic genomes has made available an enormous amount of information regarding genetic coding sequences (CDS). These data can be effectively utilized for studying and modifying the expression of genes if the location and contribution of cis-regulatory regions, which control spatial and temporal regulation of gene expression, are available. However, the precise annotation of regulatory regions is difficult as compared with the identification of genes, primarily because regulatory regions do not code for an identifiable product. In fact, regulatory regions are bound by proteins such as transcription factors, which bring about transcription and its regulation. Determining transcription factor-binding sites (TFBSs) from chromatin immunoprecipitation methods has limitations and requires a lot of downstream data processing (Farnham, 2009). Moreover, the mere binding of a transcription factor at a particular site does not warrant its involvement in the regulation of a gene. Development of computational approaches that enable accurate prediction of
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    73
    References
    0
    Citations
    NaN
    KQI
    []