Convolutional neural networks (CNN) for computer-aided diagnosis of polyps are often trained using high-quality still images in a single chromoendoscopy imaging modality with sessile serrated lesions (SSLs) often excluded. This study developed a CNN from videos to classify polyps as adenomatous or nonadenomatous using standard narrow-band imaging (NBI) and NBI-near focus (NBI-NF) and created a publicly accessible polyp video database.
Colonoscopy remains the gold standard investigation for colorectal cancer screening as it offers the opportunity to both detect and resect pre-cancerous polyps. Computer-aided polyp characterisation can determine which polyps need polypectomy and recent deep learning-based approaches have shown promising results as clinical decision support tools. Yet polyp appearance during a procedure can vary, making automatic predictions unstable. In this paper, we investigate the use of spatio-temporal information to improve the performance of lesions classification as adenoma or non-adenoma. Two methods are implemented showing an increase in performance and robustness during extensive experiments both on internal and openly available benchmark datasets.
Withdrawal time (WT) is the time from reaching the caecum to exiting the anal canal minus time spent during phases of cleaning and intervention. Deducting these phases from the WT is not feasible in clinical practice as it requires manual measurement in real-time. This results in inaccuracies in the current measurement of endoscopists WT. Recent years has demonstrated the ability of artificial intelligence (AI) to detect caecal landmarks, however, its potential to detect phases of withdrawal is unexplored. Our aim was to develop convolutional neural networks (CNN) to detect phases of cleaning and intervention during colonoscopy withdrawal.
Methodology
Lower gastrointestinal endoscopy videos were prospectively collected at a single centre. Individual frames after the appendicular orifice or ileocaecal valve were first detected were annotated. The first frame an instrument was visualised during polypectomy up until the end of inspecting post-resection margins and biopsies was labelled as 'intervention'. Frames during suctioning of colonic content or washing were labelled as 'cleaning'. The remaining frames contributed to the procedural WT ('withdrawal' frames). The annotations were referenced as the gold standard. Two ResNet-101 CNNs pre-trained on ImageNet were developed to detect the phases of cleaning and intervention.
Results
87 endoscopy videos and 1,288,319 frames during withdrawal were annotated. This consisted of 437,359 withdrawal, 232,384 cleaning and 618,576 interventional frames. The procedures were split into training (70%), validation (10%) and testing (~20%) with no overlap of patients. Evaluated against a test-set of 17 videos, which totalled 306 minutes of withdrawal (including cleaning and interventional phases), the CNNs identified the interventional frames with 92.4% sensitivity and 95.8% specificity. For cleaning, the sensitivity was 83.0% and specificity 89.5%. Across the 17 videos, the ground truth mean WT was 8 minutes and 51 seconds. The absolute mean error of the AI predicted WT was 39 seconds per procedure. The AI system correctly categorised a procedure as less than or more than 6 minutes in 16 of the 17 procedures (94%). One procedure of more than 6 minutes was incorrectly categorised as under 6 minutes.
Conclusions
This pilot study demonstrated the feasibility of CNNs to differentiate the phases of withdrawal and to automate the measurement of WT.
Colonoscopy is the gold standard for early diagnosis and pre-emptive treatment of colorectal cancer by detecting and removing colonic polyps. Deep learning approaches to polyp detection have shown potential for enhancing polyp detection rates. However, the majority of these systems are developed and evaluated on static images from colonoscopies, whilst in clinical practice the treatment is performed on a real-time video feed. Non-curated video data remains a challenge, as it contains low-quality frames when compared to still, selected images often obtained from diagnostic records. Nevertheless, it also embeds temporal information that can be exploited to increase predictions stability. A hybrid 2D/3D convolutional neural network architecture for polyp segmentation is presented in this paper. The network is used to improve polyp detection by encompassing spatial and temporal correlation of the predictions while preserving real-time detections. Extensive experiments show that the hybrid method outperforms a 2D baseline. The proposed architecture is validated on videos from 46 patients and on the publicly available SUN polyp database. A higher performance and increased generalisability indicate that real-world clinical implementations of automated polyp detection can benefit from the hybrid algorithm and the inclusion of temporal information.
Optical diagnosis is the in-vivo prediction of colorectal polyp histopathology. Inter-observer variability amongst endoscopists has limited its application in clinical practice. Artificial intelligence may lead to a new generation of clinical support tools capable of characterising polyps. Research in this field has often relied upon retrospective datasets, which are subject to sample selection bias, and consist of a limited number of images of each polyp. Our aim was to develop a convolutional neural network (CNN) to characterise colorectal polyps as adenomatous or non-adenomatous using data collected prospectively.
Methods
Video data was collected prospectively from colonoscopy procedures at a single centre using Olympus 260 and 290 series scopes. Histopathological classification, location and morphology was recorded for each polyp. Video sequences of polyps in Narrow Band Imaging (NBI) and NBI-Near Focus (NBI-NF) were extracted. Both imaging modalities were used to increase the generalisability of the CNN. Frames with poor visualisation of the polyp surface texture due to mucus, stool, halation or motion artifact were excluded. The ground truth for each frame was the polyp annotated with a bounding box and labelled with the histopathology. A ResNet-101 CNN pre-trained on ImageNet was developed to classify the visual appearance of colorectal polyps as adenomatous or non-adenomatous.
Results
The final dataset consisted of 371 histologically confirmed polyps (235 adenomas, 77 sessile serrated lesions, 58 hyperplastic, 1 traditional serrated adenoma) from 199 patients with a total of 31,110 video frames annotated. Data was split, as shown in Figure 1, into a training (~50%), validation (~10%), and testing dataset (~40%) with no overlap of polyps or patients. On a per-frame analysis, the accuracy of the CNN optical characterisation was 91%, with a sensitivity of 91% to diagnose adenomas and a specificity of 90%. The CNN achieved an area under the curve (AUC) of 97%. On a per polyp analysis, the accuracy of the CNN characterisation was 92%, with a sensitivity of 92% and a specificity of 93%.
Conclusion
The largest annotated dataset of NBI polyp images has been collated for the training and evaluation of artificial intelligence to support optical diagnosis. This work demonstrated the capability of AI to differentiate adenomatous from non-adenomatous polyps in-vitro, with a high level of accuracy.
Intrauterine foetal surgery is the treatment option for several congenital malformations. For twin-to-twin transfusion syndrome (TTTS), interventions involve the use of laser fibre to ablate vessels in a shared placenta. The procedure presents a number of challenges for the surgeon, and computer-assisted technologies can potentially be a significant support. Vision-based sensing is the primary source of information from the intrauterine environment, and hence, vision approaches present an appealing approach for extracting higher level information from the surgical site.In this paper, we propose a framework to detect one of the key steps during TTTS interventions-ablation. We adopt a deep learning approach, specifically the ResNet101 architecture, for classification of different surgical actions performed during laser ablation therapy.We perform a two-fold cross-validation using almost 50 k frames from five different TTTS ablation procedures. Our results show that deep learning methods are a promising approach for ablation detection.To our knowledge, this is the first attempt at automating photocoagulation detection using video and our technique can be an important component of a larger assistive framework for enhanced foetal therapies. The current implementation does not include semantic segmentation or localisation of the ablation site, and this would be a natural extension in future work.
Rectal retroflexion rate (RR) is a key performance indicator (KPI) for colonoscopy. However, its measurement is often imprecise and cumbersome to audit as it relies on manual photo documentation and manual data entry in endoscopy reporting systems. Furthermore, it is not possible to quantify inspection time in the RR position. We aimed to develop a convolutional neural network (CNN) to automate detection of RR and quantify inspection time in the RR position ('rectal retroflexion time').
Methods
Endoscopy videos were prospectively collected from a single centre for training data (Site 1). Each video frame with visualisation of the endoscope in the RR position was annotated with a label of 'RR', and the remaining frames labelled as 'negative' for RR. The RR CNN was then evaluated with colonoscopy videos recorded from nine sites enrolled in a randomised controlled trial ('CADDIE Trial') that evaluates a polyp detection CNN. We randomly selected two endoscopists from each site (n=18) enrolled in the CADDIE trial and ten procedures from each endoscopist (n=180). These nine sites include Site 1 (internal test-set) (n=20) and Sites 2–9 (external test-set) (n=160). Videos were annotated as outlined above which was referenced as the ground truth.
Results
A weakly-supervised ResNet-101 CNN was trained with 185 video procedures collected from Site 1 (71,121 RR frames and 142,242 randomly sampled negative frames). RR was performed in each procedure in the CADDIE Trial test set (180/180). The CNN detected RR in 98.3% of procedures in the test-set (177/180). Each of the three procedures where it failed to detect RR were from a different site. In the per-frame analysis (51,134 RR frames, 102,268 negative frames), the accuracy was 97.6%, sensitivity 94.7%, specificity 99.0% and area under the curve 0.98. The ground truth median RR time was 7.6 seconds (IQR 4.5 – 12.3), and the artificial intelligence (AI) predicted median RR time was 7.4 seconds (IQR 4.2 – 12.2) (figure 1).
Conclusions
We demonstrated robust results in this novel application of AI to automate RR 'detection' and 'time measurement'. The considerable variation in RR time amongst endoscopists warrants further exploration as a possible new KPI.
Computational stereo is one of the classical problems in computer vision. Numerous algorithms and solutions have been reported in recent years focusing on developing methods for computing similarity, aggregating it to obtain spatial support and finally optimizing an energy function to find the final disparity. In this paper, we focus on the feature extraction component of stereo matching architecture and we show standard CNNs operation can be used to improve the quality of the features used to find point correspondences. Furthermore, we propose a simple space aggregation that hugely simplifies the correlation learning problem. Our results on benchmark data are compelling and show promising potential even without refining the solution.
Colonoscopic polypectomy can prevent colorectal cancer. Polyp detection rates vary considerably due to human error and missed adenomas may contribute to interval colorectal cancers. Automated polyp detection using deep learning may avoid these problems. Previous work focused on detecting the presence of polyps in individual frames captured from videos. Our aims in this pilot study were to extend this to video sequences and to explore future-proofing by using algorithms trained on old image processors to locate polyps found using newer endoscopic technologies.
Methods
We trained and validated a Convolutional Neuronal Network (CNN) on 18 517 frames created by merging research colonoscopy datasets (CVCClinic, ASUMayo, ETIS, CVCVideoDB and CVCColon) from the Medical Image Computing and Computer Assisted Intervention Society challenges. 75% of frames contained polyps in both standard and high definition (HD) from older processors including Olympus Exera II (160/165 series) and Pentax EPKi 7000 (90i series). Our test set consisted of 11 HD videos featuring polyps in white light collected using the latest Olympus 290 endoscopes at a UK tertiary centre. Estimated median polyp size was 4 mm (range 2–15) and morphology included (Paris Classification IIa=4, Is=6 and IIa +IIs LST-G=1). Images were manually annotated by drawing bounding boxes around polyps and quality controlled by removing uninformative frames (e.g. blurred). A total of 2611 polyp-containing frames were analysed in the test set. A true positive was scored if the computer-generated segmentation mask prediction overlapped with the bounding box. A false positive indicated a non-overlapping location (more than one can occur per frame).
Results
Our network operated at real-time video rate. F1-score accuracy was 92.5%. Sensitivity for polyp localisation was 98.5% and per-frame specificity 75.4%. Positive predictive value was 90.1%. Incorrect segmentation mask locations were predominantly limited to 3 videos and were generated by artefacts not represented during training.
Conclusion
We demonstrate through analysis of video frames that a CNN can locate polyps with high accuracy in real-time. The algorithm was trained using multiple endoscopy processors and worked with HD images from a new processor. This suggests that the CNN could remain useful as new endoscopic technologies are introduced. Further work will train our model on larger datasets including complete colonoscopy procedures. This should improve accuracy further. Such a system could be used as a red-flag technique to reduce missed adenomas during colonoscopy.