Retraining an open-source pneumothorax detecting machine learning algorithm for improved performance to medical images

2020 
Abstract Purpose To validate a machine learning model trained on an open source dataset and subsequently optimize it to chest X-rays with large pneumothoraces from our institution. Methods The study was retrospective in nature. The open-source chest X-ray (CXR8) dataset was dichotomized to cases with pneumothorax (PTX) and all other cases (non-PTX), resulting in 41,946 non-PTX and 4696 PTX cases for the training set and 11,120 non-PTX and 541 PTX cases for the validation set. A limited supervision machine learning model was constructed to incorporate both localized and unlocalized pathology. Cases were then queried from our health system from 2013 to 2017. A total of 159 pneumothorax and 682 non-pneumothorax cases were available for the training set. For the validation set, 48 pneumothorax and 1287 non-pneumothorax cases were available. The model was trained, a receiver operator curve (ROC) was created, and output metrics, including area under the curve (AUC), sensitivity and specificity were calculated. Results Initial training of the model using the CXR8 dataset resulted in an AUC of 0.90 for pneumothorax detection. Naively inferring our own validation dataset on the CXR8 trained model output an AUC of 0.59. After re-training the model with our own training dataset, the validation dataset inference output an AUC of 0.90. Conclusion Our study showed that even though you may get great results on open-source datasets, those models may not translate well to real world data without an intervening retraining process.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    5
    Citations
    NaN
    KQI
    []