A statistical analysis of the TRANSFAC database

2005 
Transcription factors are key regulatory elements that control gene expression. The TRANSFAC ® database represents the largest repository for experimentally derived transcription factor binding sites (TFBS). Understanding TFBS, which are typically conserved during evolution, helps us identify genomic regions related to human health and disease, and regions that might be predictive of patient outcomes. Here we present a statistical analysis of all TFBS in the TRANSFAC ® database. Our analysis suggests that current definition of TFBS core regions in TRANSFAC ® should be re-examined so as to capture a more precise notion of “cores.” We offer insight into more appropriate definitions of TFBS consensus sequences and core regions. These revised definitions provide a better understanding of the nature of transcription factor-DNA binding and assist with developing algorithms for de novo TFBS discovery as well as finding novel variants of known TFBS. © 2005 Elsevier Ireland Ltd. All rights reserved.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    47
    Citations
    NaN
    KQI
    []