Long-distance disorder-disorder relation extraction with bootstrapped noisy data.

2020 
Abstract Objective Artificial intelligence in healthcare increasingly relies on relations in knowledge graphs for algorithm development. However, many important relations are not well covered in existing knowledge graphs. We aim to develop a novel long-distance relation extraction algorithm that leverages the article section structure and is trained with bootstrapped noisy data to identify important relations for diagnosis, including may cause, may be caused by, and differential diagnosis. Methods Known relations were extracted from semistructured web pages and a relational database and were paired with sentences containing corresponding medical concepts to form training data. The sentence form was extended to allow one concept to be in the title. An attention mechanism was applied to reduce the effect of noisily labeled sentences. Section structure embedding was added to provide additional context for relation expressions. Graph information was further incorporated into the model to differentiate the target relations whose expressions were often similar and interwoven. Results The extended sentence form allowed 1.75 times as many relations and 2.17 times as many sentences to be found compared to the conventional form. The various components of the proposed model all added to the accuracy. Overall, the positive sample accuracy of the proposed model was 9 percentage points higher than baseline deep learning models and 13 percentage points higher than naive Bayes and support vector machines. Conclusion Our bootstrap data preparation method and the extended sentence form could form a large training dataset to enable algorithm development and data mining efforts. Section structure embedding and graph information significantly increased prediction accuracy.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    42
    References
    1
    Citations
    NaN
    KQI
    []