DeepLncLoc: a deep learning framework for long non-coding RNA subcellular localization prediction based on subsequence embedding
2021
MotivationLong non-coding RNAs (IncRNAs) are a class of RNA molecules with more than 200 nucleotides. A growing amount of evidence reveals that subcellular localization of lncRNAs can provide valuable insights into their biological functions. Existing computational methods for predicting lncRNA subcellular localization use k-mer features to encode lncRNA sequences. However, the sequence order information is lost by using only k-mer features. ResultsWe proposed a deep learning framework, DeepLncLoc, to predict lncRNA subcellular localization. In DeepLncLoc, we introduced a new subsequence embedding method that keeps the order information of lncRNA sequences. The subsequence embedding method first divides a sequence into some consecutive subsequences, and then extracts the patterns of each subsequence, last combines these patterns to obtain a complete representation of the lncRNA sequence. After that, a text convolutional neural network is employed to learn high-level features and perform the prediction task. Compared to traditional machine learning models with k-mer features and existing predictors, DeepLncLoc achieved better performance, which shows that DeepLncLoc could effectively predict lncRNA subcellular localization. Our study not only presented a novel computational model for predicting lncRNA subcellular localization but also provided a new subsequence embedding method which is expected to be applied in other sequence-based prediction tasks. AvailabilityThe DeepLncLoc web server, source code and datasets are freely available at http://bioinformatics.csu.edu.cn/DeepLncLoc/, and https://github.com/CSUBioGroup/DeepLncLoc. Contactlimin@mail.csu.edu.cn
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
30
References
0
Citations
NaN
KQI