Effect of flanking sequences on the accuracy of the recognition of transcription factor binding sites

2015 
The development of in vitro technologies has produced new experimental information on protein binding onto DNA, which is accumulated in databases and used in studies of mechanisms regulating gene expression and in the development of computer-assisted methods of binding site recognition in pro- and eukaryotic genomes. However, it is still questionable to what extent in vitro selected sequences reflect the actual structures of the real transcription factor (TF) binding sites. The Kullback–Leibler divergence has been applied to the comparison of frequency matrices of TF binding sites constructed on sets of artificially selected sequences and real sites. The similarity of core sequences of real and artificial sites has been observed for 80% of all TFs studied. For 20% of TFs, in vitro selected binding site sequences have a broader range of permissible significant nucleotides not found in real sites. The optimal lengths of DNA sequences containing real binding sites, at which the sites are recognized most accurately, are estimated by the weight matrix method. For approximately 80% of the TFs studied, the optimal binding site length notably exceeds the lengths of the core sequences, as well as the lengths of in vitro selected sites. The detected features of in vitro selected TF binding sites impose constraints on their use in the development of computer-assisted methods of the recognition of candidate sites in genomic sequences.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    37
    References
    1
    Citations
    NaN
    KQI
    []