A New Method of RNA Secondary Structure Prediction Based on Convolutional Neural Network and Dynamic Programming

2019 
In recent years, obtaining RNA secondary structure information has been playing important role in RNA and gene function research. Although some RNA secondary structures can be gained experimentally, in most cases efficient and accurate computational methods are still needed to predict RNA secondary structure. Current RNA secondary structure prediction methods are mainly based on the minimum free energy algorithm, which finds the optimal folding state of RNA in vivo by the iterative method to meet the minimum energy or other constraints. However, due to the complexity of biotic environment, true RNA structure always keeps the state of balance of biological potential energy, rather than the optimal folding state that meets the minimum energy. For short sequence RNA its equilibrium energy state of RNA folding organism is close to the minimum free energy state, therefore minimum free energy algorithm for predicting RNA secondary structure has higher accuracy. Nevertheless, as longer sequence RNA, because of its complex structure, constant folding causes its biopotential energy balance state to deviate far from the minimum free energy state, this results in a serious decline in the prediction accuracy of its secondary structure. In this paper we propose a novel RNA secondary structure prediction algorithm using convolutional neural network model combining with dynamic programming method to improve the accuracy with large scale RNA sequence and structure data. We analyze current experimental RNA sequence and structure data to construct a deep convolutional network model, and then we extract implicit features of effective classification from large-scale data to predict each base pairing probability of RNA sequence. For the obtained probabilities of RNA sequence base pairing, an enhanced dynamic programming method is applied to obtain the optimal RNA secondary structure. Results indicate that our proposed method is superior to the common RNA secondary structure prediction algorithms in predicting three benchmark RNA families. Based on the characteristics of deep learning algorithm, it can be inferred that the method proposed in this paper is 30% higher than other algorithms in prediction accuracy as the amount of real RNA structure data increases in the future.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    25
    Citations
    NaN
    KQI
    []