Clinical Phrase Mining with Language Models

2020 
A vast amount of vital clinical data is available within unstructured texts such as discharge summaries and procedure notes in Electronic Medical Records (EMRs). Automatically transforming such unstructured data into structured units is crucial for effective data analysis in the field of clinical informatics. Recognizing phrases that reveal important medical information in a concise and thorough manner is a fundamental step in this process. Existing systems that are built for opendomain texts are designed to detect mostly non-medical phrases, while tools designed specifically for extracting concepts from clinical texts are not scalable to large corpora and often leave out essential context surrounding those detected clinical concepts. We address these issues by proposing a framework, CliniPhrase, which adapts domain-specific deep neural network based language models (such as ClinicalBERT) to effectively and efficiently extract high-quality phrases from clinical documents with a limited amount of training data. Experimental results on the MIMIC-III dataset show that our method can outperform the current state-of-the-art techniques by up to 18% in terms of F 1 measure while being very efficient (up to 48 times faster).11Our source code, pre-trained models and documentations are available online at: https://github.com/kaushikmani/PhraseMiningLM
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    41
    References
    0
    Citations
    NaN
    KQI
    []