Algorithms for ab intio and large scale prediction and classification of ncRNAs

2019 
The analysis of very large volumes of data generated by NGS (next-generation sequencing) requires the use of efficient bioinformatics tools. One of the aspects of this analysis is the identification of the non-coding RNAs (ncRNAs) that play important roles in many biological processes. The identification of ncRNAs by bioinformatics and computational tools raises two challenges: (i) prediction and classification (ab initio) of different types of ncRNAs, and (ii) large-scale processing of these data. Most currently existing tools for ncRNA prediction are specialized to one type of ncRNA, the largest number being dedicated to microRNAs (miRNAs). This is particularly the case of tools that we developed previously (and available on our software platform EvryRNA (http://EvryRNA.ibisc.univ-evry.fr)). Some tools of the literature can also determine other types of ncRNAs by comparison with sequences listed in various databases dedicated to ncRNAs (homology-based approach). In addition, there are tools to predict different types of ARNncs but without classification or by homology-based classification. The very few ab initio methods (very recently published) are very insufficient in term of prediction and time running. The goal of this project is to develop an ab initio algorithm for predicting and classifying at a large scale several classes of ncRNAs from NGS data, using both combinatory optimization and machine learning methods, and considering different types of ncRNAs features: features on sequence, secondary structure, genomic position, neighborhood, etc. One of the principal characteristics of ncARNs is its structure, notably the secondary structure. It is therefore important to take into account the structure in the ncRNA prediction algorithms, and the challenge is to develop fast algorithms to handle with huge volumes of NGS data. The developed algorithms will be applied for the identification of non-coding RNAs involved in sex determination in plants, particularly in cucurbit (melon, cucumber, …), where large volumes of data are available at IPS2.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []