Deep learning (DL) is promising to detect glaucoma. However, patients' privacy and data security are major concerns when pooling all data for model development. We developed a privacy-preserving DL model using the federated learning (FL) paradigm to detect glaucoma from optical coherence tomography (OCT) images.
Purpose: To develop a three-dimensional (3D) deep learning algorithm to detect glaucoma using spectral-domain optical coherence tomography (SD-OCT) optic nerve head (ONH) cube scans and validate its performance on ethnically diverse real-world datasets and on cropped ONH scans. Methods: In total, 2461 Cirrus SD-OCT ONH scans of 1012 eyes were obtained from the Glaucoma Clinic Imaging Database at the Byers Eye Institute, Stanford University, from March 2010 to December 2017. A 3D deep neural network was trained and tested on this unique raw OCT cube dataset to identify a multimodal definition of glaucoma excluding other concomitant retinal disease and optic neuropathies. A total of 1022 scans of 363 glaucomatous eyes (207 patients) and 542 scans of 291 normal eyes (167 patients) from Stanford were included in training, and 142 scans of 48 glaucomatous eyes (27 patients) and 61 scans of 39 normal eyes (23 patients) were included in the validation set. A total of 3371 scans (Cirrus SD-OCT) from four different countries were used for evaluation of the model: the non overlapping test dataset from Stanford (USA) consisted of 694 scans: 241 scans from 113 normal eyes of 66 patients and 453 scans of 157 glaucomatous eyes of 89 patients. The datasets from Hong Kong (total of 1625 scans; 666 OCT scans from 196 normal eyes of 99 patients and 959 scans of 277 glaucomatous eyes of 155 patients), India (total of 672 scans; 211 scans from 147 normal eyes of 98 patients and 461 scans from 171 glaucomatous eyes of 101 patients), and Nepal (total of 380 scans; 158 scans from 143 normal eyes of 89 patients and 222 scans from 174 glaucomatous eyes of 109 patients) were used for external evaluation. The performance of the model was then evaluated on manually cropped scans from Stanford using a new algorithm called DiagFind. The ONH region was cropped by identifying the appropriate zone of the image in the expected location relative to Bruch's Membrane Opening (BMO) using a commercially available imaging software. Subgroup analyses were performed in groups stratified by eyes, myopia severity of glaucoma, and on a set of glaucoma cases without field defects. Saliency maps were generated to highlight the areas the model used to make a prediction. The model’s performance was compared to that of a glaucoma specialist using all available information on a subset of cases. Results: The 3D deep learning system achieved area under the curve (AUC) values of 0.91 (95% CI, 0.90–0.92), 0.80 (95% CI, 0.78–0.82), 0.94 (95% CI, 0.93–0.96), and 0.87 (95% CI, 0.85–0.90) on Stanford, Hong Kong, India, and Nepal datasets, respectively, to detect perimetric glaucoma and AUC values of 0.99 (95% CI, 0.97–1.00), 0.96 (95% CI, 0.93–1.00), and 0.92 (95% CI, 0.89–0.95) on severe, moderate, and mild myopia cases, respectively, and an AUC of 0.77 on cropped scans. The model achieved an AUC value of 0.92 (95% CI, 0.90–0.93) versus that of the human grader with an AUC value of 0.91 on the same subset of scans (\(P=0.99\)). The performance of the model in terms of recall on glaucoma cases without field defects was found to be 0.76 (0.68–0.85). Saliency maps highlighted the lamina cribrosa in glaucomatous eyes versus superficial retina in normal eyes as the regions associated with classification. Conclusions: A 3D convolutional neural network (CNN) trained on SD-OCT ONH cubes can distinguish glaucoma from normal cases in diverse datasets obtained from four different countries. The model trained on additional random cropping data augmentation performed reasonably on manually cropped scans, indicating the importance of lamina cribrosa in glaucoma detection. Translational Relevance: A 3D CNN trained on SD-OCT ONH cubes was developed to detect glaucoma in diverse datasets obtained from four different countries and on cropped scans. The model identified lamina cribrosa as the region associated with glaucoma detection.
Purpose: The purpose of this study was to develop a 3D deep learning system from spectral domain optical coherence tomography (SD-OCT) macular cubes to differentiate between referable and nonreferable cases for glaucoma applied to real-world datasets to understand how this would affect the performance. Methods: There were 2805 Cirrus optical coherence tomography (OCT) macula volumes (Macula protocol 512 × 128) of 1095 eyes from 586 patients at a single site that were used to train a fully 3D convolutional neural network (CNN). Referable glaucoma included true glaucoma, pre-perimetric glaucoma, and high-risk suspects, based on qualitative fundus photographs, visual fields, OCT reports, and clinical examinations, including intraocular pressure (IOP) and treatment history as the binary (two class) ground truth. The curated real-world dataset did not include eyes with retinal disease or nonglaucomatous optic neuropathies. The cubes were first homogenized using layer segmentation with the Orion Software (Voxeleron) to achieve standardization. The algorithm was tested on two separate external validation sets from different glaucoma studies, comprised of Cirrus macular cube scans of 505 and 336 eyes, respectively. Results: The area under the receiver operating characteristic (AUROC) curve for the development dataset for distinguishing referable glaucoma was 0.88 for our CNN using homogenization, 0.82 without homogenization, and 0.81 for a CNN architecture from the existing literature. For the external validation datasets, which had different glaucoma definitions, the AUCs were 0.78 and 0.95, respectively. The performance of the model across myopia severity distribution has been assessed in the dataset from the United States and was found to have an AUC of 0.85, 0.92, and 0.95 in the severe, moderate, and mild myopia, respectively. Conclusions: A 3D deep learning algorithm trained on macular OCT volumes without retinal disease to detect referable glaucoma performs better with retinal segmentation preprocessing and performs reasonably well across all levels of myopia. Translational Relevance: Interpretation of OCT macula volumes based on normative data color distributions is highly influenced by population demographics and characteristics, such as refractive error, as well as the size of the normative database. Referable glaucoma, in this study, was chosen to include cases that should be seen by a specialist. This study is unique because it uses multimodal patient data for the glaucoma definition, and includes all severities of myopia as well as validates the algorithm with international data to understand generalizability potential.
We aim to develop a multi-task three-dimensional (3D) deep learning (DL) model to detect glaucomatous optic neuropathy (GON) and myopic features (MF) simultaneously from spectral-domain optical coherence tomography (SDOCT) volumetric scans.Each volumetric scan was labelled as GON according to the criteria of retinal nerve fibre layer (RNFL) thinning, with a structural defect that correlated in position with the visual field defect (i.e., reference standard). MF were graded by the SDOCT en face images, defined as presence of peripapillary atrophy (PPA), optic disc tilting, or fundus tessellation. The multi-task DL model was developed by ResNet with output of Yes/No GON and Yes/No MF. SDOCT scans were collected in a tertiary eye hospital (Hong Kong SAR, China) for training (80%), tuning (10%), and internal validation (10%). External testing was performed on five independent datasets from eye centres in Hong Kong, the United States, and Singapore, respectively. For GON detection, we compared the model to the average RNFL thickness measurement generated from the SDOCT device. To investigate whether MF can affect the model's performance on GON detection, we conducted subgroup analyses in groups stratified by Yes/No MF. The area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, and accuracy were reported.A total of 8,151 SDOCT volumetric scans from 3,609 eyes were collected. For detecting GON, in the internal validation, the proposed 3D model had significantly higher AUROC (0.949 vs. 0.913, p < 0.001) than average RNFL thickness in discriminating GON from normal. In the external testing, the two approaches had comparable performance. In the subgroup analysis, the multi-task DL model performed significantly better in the group of "no MF" (0.883 vs. 0.965, p-value < 0.001) in one external testing dataset, but no significant difference in internal validation and other external testing datasets. The multi-task DL model's performance to detect MF was also generalizable in all datasets, with the AUROC values ranging from 0.855 to 0.896.The proposed multi-task 3D DL model demonstrated high generalizability in all the datasets and the presence of MF did not affect the accuracy of GON detection generally.