Application of natural language processing and machine learning to radiology reports

Seoungdeok Jeon,Zachary T. Colburn,Joshua Sakai,Ling-Hong Hung,Ka Yee Yeung

Application of natural language processing and machine learning to radiology reports

2021

After radiologists perform a set of chest-x-rays (CXRs) they write a short report describing their observations and interpretations. Because these reports are free-text documents, there is the risk of miscommunication, which can result in reduced patient outcomes. We applied text mining methods to radiology reports in the MIMIC Chest X-ray (MIMIC-CXR) database [5], consisting of 227,835 de-identified free-text radiology reports. We selected relevant terms (features) and developed predictive models that take a radiology report as input and return the probability the report describes a positive diagnosis for pneumonia, a common respiratory condition characterized by the accumulation of fluid in the lungs. Subsequently, we evaluated the performance of different predictive models using the area under the curve (AUC) and the Brier Score. Due to the large number of reports in the MIMIC-CXR database, we generated and evaluated predictive models by randomly selecting 500, 1000, 2000, and 3000 reports. Specifically, we randomly selected reports and assigned 70% to the training set and 30% to the test set, created a term-document matrix giving the frequencies of different sets of 1 or 2 consecutive words (1-gram or 2-gram) using the R package tm [2], performed feature selection to identify terms that differentiate between cases, and trained the models using different classification methods, including the k nearest neighbor (KNN), random forest [4], gradient boosting machine [3], xgboost [1], and adaboost [6]. We repeated the process six times and computed the average assessment statistics. Our results indicate that all the models perform similarly on the test set except for KNN. KNN had the worst performance with an average Brier Score (ABS) of 0.313 and average AUC of 0.645. The other algorithms had high performance: random forest (ABS =0.174, AUC=0.836), gradient boosting (ABS=0.175, AUC=0.820), xgboost (ABS=0.177, AUC=0.814), and adaboost (ABS=0.163, 0.815). The high performance suggests machine learning models have the potential to impact patient care in radiology.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations