Extracting LncRNA-protein Interactions from Literature Using a Text Feature-based Approach

2015 
Abstract Long non-coding RNAs (lncRNAs) play important roles in regulating transcriptional and posttranscriptional levels. Knowledge of lncRNA-protein interactions (LPIs) is crucial for biologists to explain biological mechanism and guide experiments. Since most freshly discovered LPIs can be extracted from biomedical literature, LPIs extraction by text mining is highly relevant. In this study, we apply a feature-based text mining method to extract LPIs from biomedical literatures. Our method is composed of three steps. Firstly, we operate text pre-processing to convert text from three databases into structured representations. Secondly, we extract a set of features from structured representation sentences. And these features are utilized to generate feature vectors for candidate LPIs pairs. Finally, a random forest classifier is trained by the feature vectors. When we evaluate the method on our dataset, the performance of our method achieves F-score of 79.3%, and the results suggest that as the first text mining approach, the proposed method can efficiently extract LPIs from biomedical literature
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    0
    Citations
    NaN
    KQI
    []