A BERT model generates diagnostically relevant semantic embeddings from pathology synopses with active learning

2021 
Pathology synopses consist of semi-structured or unstructured text summarizing visual information by observing human tissue. Experts write and interpret these synopses with high domain-specific knowledge to extract tissue semantics and formulate a diagnosis in the context of ancillary testing and clinical information. The limited number of specialists available to interpret pathology synopses restricts the utility of the inherent information. Deep learning offers a tool for information extraction and automatic feature generation from complex datasets. Using an active learning approach, we developed a set of semantic labels for bone marrow aspirate pathology synopses. We then trained a transformer-based deep-learning model to map these synopses to one or more semantic labels, and extracted learned embeddings (i.e., meaningful attributes) from the model’s hidden layer. Here we demonstrate that with a small amount of training data, a transformer-based natural language model can extract embeddings from pathology synopses that capture diagnostically relevant information. On average, these embeddings can be used to generate semantic labels mapping patients to probable diagnostic groups with a micro-average F1 score of 0.779 Â ± 0.025. We provide a generalizable deep learning model and approach to unlock the semantic information inherent in pathology synopses toward improved diagnostics, biodiscovery and AI-assisted computational pathology. Pathology synopses are short texts describing microscopic features of human tissue. Medical experts use their knowledge to understand these synopses and formulate a diagnosis in the context of other clinical information. However, this takes time and there are a limited number of specialists available to interpret pathology synopses. A type of artificial intelligence (AI) called deep learning provides a possible means of extracting information from unstructured or semi-structured data such as pathology synopses. Here we use deep learning to extract diagnostically relevant textual information from pathology synopses. We show our approach can then map this textual information to one or more diagnostic keywords. We provide a generally applicable and scalable method to unlock the knowledge in pathology synopses as a step toward exploiting computer-aided pathology in the clinic. Mu et al. utilize a deep learning natural language processing model as part of an active learning approach to extract diagnostically relevant semantic information from bone marrow pathology synopses. Their findings demonstrate the potential for artificial intelligence in assisting clinicians in assessing, cataloging and triaging medical text datasets such as pathology synopses.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    34
    References
    0
    Citations
    NaN
    KQI
    []