Evaluation of deep-learning-based lncRNA identification tools

2019 
Long non-coding RNAs (lncRNAs, length above 200 nt) exert crucial biological roles and have been implicated in cancers. To characterize newly discovered transcripts, one major issue is to distinguish lncRNAs from mRNAs. Since experimental methods are time-consuming and costly, computational methods are preferred for large-scale lncRNA identification. In a recent study, Amin et al. evaluated three deep-learning-based lncRNA identification tools (i.e., lncRNAnet, LncADeep, and lncFinder) and concluded "The LncADeep PR (precision recall) curve is just above the no-skill model and LncADeep showed poor overall performance". This surprising conclusion is based on the authors9 use of a non-default setting of LncADeep. Actually, LncADeep has two models, one for full-length transcripts, and the other for transcripts including partial-length. Being aware of the difficulty of assembling full-length transcripts from RNA-seq dataset, LncADeep9s default model is for transcripts including partial-length. However, according to the results posted on Amin et al.9s website, the authors used LncADeep with full-length model, while they claimed to use the default setting of LncADeep, to identify lncRNAs from GENCODE dataset, which is composed of full- and partial-length transcripts. Thus, in their evaluation, the performance of LncADeep was underestimated. In this correspondence, we have tested LncADeep9s default setting (i.e., model for transcripts including partial-length) on the datasets used in Amin et al., and LncADeep achieved overall the best performance compared with the other tools9 results reported by Amin et al.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    8
    References
    0
    Citations
    NaN
    KQI
    []