Accurate identification of RNA modification sites is of great significance in understanding the functions and regulatory mechanisms of RNAs. Recent advances have shown great promise in applying computational methods based on deep learning for accurate prediction of RNA modifications. However, those methods generally predicted only a single type of RNA modification. In addition, such methods suffered from the scarcity of the interpretability for their predicted results. In this work, a new Transformer-based deep learning method was proposed to predict multiple RNA modifications simultaneously, referred to as TransRNAm. More specifically, TransRNAm employs Transformer to extract contextual feature and convolutional neural networks to further learn high-latent feature representations of RNA sequences relevant for RNA modifications. Importantly, by integrating the self-attention mechanism in Transformer with convolutional neural network, TransRNAm is capable of not only capturing the critical nucleotide sites that contribute significantly to RNA modification prediction, but also revealing the underlying association among different types of RNA modifications. Consequently, this work provided an accurate and interpretable predictor for multiple RNA modification prediction, which may contribute to uncovering the sequence-based forming mechanism of RNA modification sites.
Nucleosome positioning is involved in diverse cellular biological processes by regulating the accessibility of DNA sequences to DNA-binding proteins and plays a vital role. Previous studies have manifested that the intrinsic preference of nucleosomes for DNA sequences may play a dominant role in nucleosome positioning. As a consequence, it is nontrivial to develop computational methods only based on DNA sequence information to accurately identify nucleosome positioning, and thus intend to verify the contribution of DNA sequences responsible for nucleosome positioning. In this work, we propose a new deep learning-based method, named DeepNup, which enables us to improve the prediction of nucleosome positioning only from DNA sequences. Specifically, we first use a hybrid feature encoding scheme that combines One-hot encoding and Trinucleotide composition encoding to encode raw DNA sequences; afterwards, we employ multiscale convolutional neural network modules that consist of two parallel convolution kernels with different sizes and gated recurrent units to effectively learn the local and global correlation feature representations; lastly, we use a fully connected layer and a sigmoid unit serving as a classifier to integrate these learned high-order feature representations and generate the final prediction outcomes. By comparing the experimental evaluation metrics on two benchmark nucleosome positioning datasets, DeepNup achieves a better performance for nucleosome positioning prediction than that of several state-of-the-art methods. These results demonstrate that DeepNup is a powerful deep learning-based tool that enables one to accurately identify potential nucleosome sequences.
Predicting the structure of protein-peptide complexes using computational approaches is a difficult problem whose major challenges are properly dealing with molecular flexibility and conformational changes both of the receptor and ligand. Although significant improvements have been achieved in the modeling of side chains, methods for the backbone flexibility in docking still need improvement. In this study a new method is presented for docking peptide into receptor in a full flexible docking manner. It is a parallel approach that combines all the processes during the docking of a folding peptide with a flexible receptor.
The control of the coordinated expression of genes is primarily regulated by the interactions between transcription factors (TFs) and their DNA binding sites, which are an integral part of transcriptional regulatory networks. There are many computational tools focused on determining TF binding or unbinding to a DNA sequence. However, other tools focused on further determining the relative preference of such binding are needed. Here, we propose a regression model with deep learning, called SemanticBI, to predict intensities of TF-DNA binding. SemanticBI is a convolutional neural network (CNN)-recurrent neural network (RNN) architecture model that was trained on an ensemble of protein binding microarray data sets that covered multiple TFs. Using this approach, SemanticBI exhibited superior accuracy in predicting binding intensities compared to other popular methods. Moreover, SemanticBI uncovered vectorized sequence-oriented features using its CNN-RNN architecture, which is an abstract representation of the original DNA sequences. Additionally, the use of SemanticBI raises the question of whether motifs are necessary for computational models of TF binding. The online SemanticBI service can be accessed at http://qianglab.scst.suda.edu.cn/semantic/.
<p>The focus of our study is to predict RNA-small molecule binding sites, with the overarching goal of exploring potential applications in the field of RNA drug targets. In response to this challenge, we present the MultiModRLBP method, a novel approach that integrates multi-modal features through the application of deep learning algorithms.</p>
Seasonal influenza viruses undergo frequent mutations on their surface hemagglutinin (HA) proteins to escape the host immune response. In these mutations, a few key amino acid sites are associated with significant antigenic cluster transitions. To recognize the cluster-transition determining sites of seasonal influenza A/H3N2 and A/H1N1 viruses systematically and quickly, we developed a computational model named RECDS (recognition of cluster-transition determining sites) to evaluate the contribution of a specific amino acid site on the HA protein in the whole history of antigenic evolution. In RECDS, we ranked all of the HA sites by calculating the contribution scores derived from the forest of gradient boosting classifiers trained by various sequence- and structure-based features. With the RECDS model, we found out that the sites determining influenza antigenicity were mostly around the receptor-binding domain both for the influenza A/H3N2 and A/H1N1 viruses. Specifically, half of the cluster-transition determining sites of the influenza A/H1N1 virus were located in the vestigial esterase domain and basic path area on the HA, which indicated that the differential driving force of the antigenic evolution of the A/H1N1 virus refers to the A/H3N2 virus. Beyond that, the footprints of substitutions responsible for antigenic evolution were inferred according to the phylogenetic trees for the cluster-transition determining sites. The monitoring of genetic variation occurring at these cluster-transition determining sites in circulating influenza viruses on a large scale will potentially reduce current assay workloads in influenza surveillance and the selection of new influenza vaccine strains.
Effectively and accurately predicting the effects of interactions between proteins after amino acid mutations is a key issue for understanding the mechanism of protein function and drug design. In this study, we present a deep graph convolution (DGC) network-based framework, DGCddG, to predict the changes of protein-protein binding affinity after mutation. DGCddG incorporates multi-layer graph convolution to extract a deep, contextualized representation for each residue of the protein complex structure. The mined channels of the mutation sites by DGC is then fitted to the binding affinity with a multi-layer perceptron. Experiments with results on multiple datasets show that our model can achieve relatively good performance for both single and multi-point mutations. For blind tests on datasets related to angiotensin-converting enzyme 2 binding with the SARS-CoV-2 virus, our method shows better results in predicting ACE2 changes, may help in finding favorable antibodies. Code and data availability: https://github.com/lennylv/DGCddG.
Side-chains are crucial for proteins expressing their biochemical characteristics. Packing protein side-chains is then a necessary task for protein structure prediction, and critical to some descendant and important applications, such as protein design, docking and point mutation analysis. Given all possible candidate rotamers for each residue of protein backbone, packing protein side-chains can be modeled as a combinatorial optimization problem without an accurate energy function. This paper presents a parallel approach, pacoPacker, to pack protein side-chains by ant colony optimization. Each ant colony is used to pack side-chains with the guidance of an energy function. Different colonies use different energy functions. These multiple colonies are running in parallel and cooperate with each other by sharing the pheromone matrix whose role is to tune sampling the rotamer library. In this way, the intelligences embedded in different energy functions can be brought together to find out the best side-chains for the protein backbone. Experimental study has been conducted on two typical benchmarks, and the results show that pacoPacker is competitive to the state-of-art systems.
The accurate packing of protein side chains is important for many computational biology problems, such as ab initio protein structure prediction, homology modelling, and protein design and ligand docking applications. Many of existing solutions are modelled as a computational optimisation problem. As well as the design of search algorithms, most solutions suffer from an inaccurate energy function for judging whether a prediction is good or bad. Even if the search has found the lowest energy, there is no certainty of obtaining the protein structures with correct side chains.We present a side-chain modelling method, pacoPacker, which uses a parallel ant colony optimisation strategy based on sharing a single pheromone matrix. This parallel approach combines different sources of energy functions and generates protein side-chain conformations with the lowest energies jointly determined by the various energy functions. We further optimised the selected rotamers to construct subrotamer by rotamer minimisation, which reasonably improved the discreteness of the rotamer library.We focused on improving the accuracy of side-chain conformation prediction. For a testing set of 442 proteins, 87.19% of X1 and 77.11% of X12 angles were predicted correctly within 40° of the X-ray positions. We compared the accuracy of pacoPacker with state-of-the-art methods, such as CIS-RR and SCWRL4. We analysed the results from different perspectives, in terms of protein chain and individual residues. In this comprehensive benchmark testing, 51.5% of proteins within a length of 400 amino acids predicted by pacoPacker were superior to the results of CIS-RR and SCWRL4 simultaneously. Finally, we also showed the advantage of using the subrotamers strategy. All results confirmed that our parallel approach is competitive to state-of-the-art solutions for packing side chains.This parallel approach combines various sources of searching intelligence and energy functions to pack protein side chains. It provides a frame-work for combining different inaccuracy/usefulness objective functions by designing parallel heuristic search algorithms.