Highly accurate classification of chest radiographic reports using a deep learning natural language model pretrained on 3.8 million text reports.

2020 
MOTIVATION The development of deep, bidirectional transformers such as BERT (Bidirectional Encoder Representations from Transformers) led to an outperformance of several Natural Language Processing (NLP) benchmarks. Especially in radiology, large amounts of free-text data are generated in the daily clinical workflow. These report texts could be of particular use for the generation of labels in machine learning, especially for image classification. However, as report text are mostly unstructured, advanced NLP methods are needed for text classification. While neural networks can be used for this purpose, they must first be trained on large amounts of manually labelled data in order to achieve good results. In contrast, BERT models can be pre-trained on unlabelledunlabeled data and then only require fine tuning on a small amount of manually labelled data to achieve even better results. RESULTS By using BERT to identify the most important findings in intensive care chest x-ray reports, we achieve areas under the receiver operation characteristics curve of 0.98 for congestion, 0.97 for effusion, 0.97 for consolidation and 0.99 for pneumothorax, surpassing the accuracy of previous approaches with comparatively little annotation effort. Our approach could help to improve information extraction from free-text medical reports. AVAILABILITY
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    9
    Citations
    NaN
    KQI
    []