Training models employing physico-chemical properties of DNA for protein binding site detection
2021
Transcription Factors (TFs) are one of the most important agents acting on gene expression regulation, fundamentally determining the organized functional operation of cellular machinery. At a molecular level, this effect is achieved by the sequence specific physical binding of TF proteins to particular parts of the DNA. Transcription Factors regulate gene expression in complex ways and the detection of their binding sites is an important part of many experiments. Predicting Transcription Factor Binding Sites (TFBS) from DNA sequence data has been a challenging task in the field of bioinformatics. The abundance of available DNA sequences strongly encourages the use of machine learning for this problem. Until now most of these efforts were primarily based on the traditional nucleotide-based representation of DNA. To elaborate a more detailed description of this macromolecule, we have worked out a new Physico-Chemical Descriptor (PCD) based DNA representation and used it as input for training neural networks to predict TFBSs. We show that the PCD representation is a viable format for deep learning models, and our feature selection investigation highlights the importance of proper PCD subset choices. The distinct prediction efficiencies detected upon the usage of arbitrarily selected feature subsets indicates that the different DNA features affect the DNA binding process of TFs to various extent.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
10
References
0
Citations
NaN
KQI