Predicting benign, pre-invasive and invasive lung nodules on Computed Tomography scans using machine learning

2021 
Abstract Objective To investigate if machine learning algorithms can predict whether a lung nodule is benign, adenocarcinoma, or its preinvasive subtype from Computed Tomography (CT) images alone. Methods A dataset of chest CT scans containing lung nodules was collected with their pathologic diagnosis from several sources. The dataset was split randomly into training (70%), internal validation (15%), and independent test sets (15%) at the patient level. Two machine learning algorithms were developed, trained, and validated. The first used the support vector machine (SVM) model, the second used deep learning technology, namely a convolutional neural network (CNN). Receiver-operating characteristic (ROC) analysis was used to evaluate the performance of the classification on the test dataset. Results The SVM / CNN based models classified nodules into six categories resulting in an area under the curve (AUC) of: 0.59 / 0.65 when differentiating Atypical Adenomatous Hyperplasia (AAH) vs Adenocarcinoma in Situ (AIS), 0.87 / 0.86 with Minimally Invasive Adenocarcinoma (MIA) vs Invasive Adenocarcinoma (IA), 0.76 / 0.72 AAH+AIS vs MIA, 0.89 / 0.87 AAH+AIS vs MIA+IA and 0.93/ 0.92 AAH+AIS+MIA vs IA. Classifying Benign vs AAH+AIS+MIA vs IA resulted in a micro-average AUC of 0.93/0.94 for the SVM/CNN models, respectively. The CNN-based methods had higher sensitivities than the SVM-based methods but lower specificities and accuracies. Conclusion The machine learning algorithms demonstrated reasonable performance in differentiating benign vs preinvasive vs invasive adenocarcinoma from CT images alone. However, the prediction accuracy varies across its’ subtypes. This holds the potential for improved diagnostic capabilities with less invasive means.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    33
    References
    3
    Citations
    NaN
    KQI
    []