Background: Atelectasis and attic retraction pocket are two common tympanic membranes changes. However, general practitioners, pediatricians and otolaryngologists showed low diagnostic accuracy for these ear diseases. Therefore, there is a need to develop a deep learning model to detect atelectasis and attic retraction pocket automatically. Method: 6393 OME otoscopic images from 3 centers were used to develop and validate a deep learning model to detect atelectasis and attic retraction pocket. 3-fold random cross validation was adopted to divided dataset into training set and validation set. A team of otologists were assigned to diagnose and label. Receiver operating characteristic (ROC) curve, 3-fold average classification accuracy, sensitivity and specificity were used to assess the performance of deep learning model. Class Activation Mapping (CAM) was applied to show the discriminative region in the otoscopic images. Result: Among all the otoscopic images, 3564 (55.74%) images were identified with attic retraction pocket, and 2460 (38.48%) images were identified with atelectasis. The automatically diagnostic model of attic retraction pocket and atelectasis achieved 3-fold cross validation accuracy of 89% and 79%, AUC of 0.89 and 0.87, sensitivity of 0.93 and 0.71, and specificity of 0.62 and 0.84 respectively. Bigger and deeper atelectasis and attic retraction pocket showed more weight with red color in the heat map of CAM. Conclusion: Deep learning algorithm could be used to identify atelectasis and attic retraction pocket, which could be used as a tool to assist general practitioners, pediatricians and otolaryngologists. Key words: deep learning, otoscopic images, atelectasis, attic retraction pocket
Tracking by detection has become an attractive tracking technique, which treats tracking as an object detection problem and trains a detector to separate the target object from the background in each frame. While this strategy is effective to some extent, we argue that the task in tracking should be searching for a specific object instance instead of an object category. Based on this viewpoint, a novel framework based on object exemplar detectors is proposed for visual tracking. To build a specific and discriminative model to separate the object instance from the background, the proposed method trains an exemplar-based linear discriminant analysis (ELDA) classifier for the object exemplar, using the current tracked instance as the positive sample and massive negative samples obtained both offline and online. To improve the trackers' adaptivity, we use an ensemble of the above ELDA detectors and update them during the tracking to cover the variation in object appearance. Extensive experimental results on a large benchmark data set show that the proposed method outperforms many state-of-the-art trackers, demonstrating the effectiveness and robustness of the ELDA tracker.
Weakly supervised instance segmentation (WSIS) provides a promising way to address instance segmentation in the absence of sufficient labeled data for training. Previous attempts on WSIS usually follow a proposal-based paradigm, critical to which is the proposal scoring strategy. These works mostly rely on certain heuristic strategies for proposal scoring, which largely hampers the sustainable advances concerning WSIS. Towards this end, this paper introduces a novel framework for weakly supervised instance segmentation, called Weakly Supervised R-CNN (WS-RCNN). The basic idea is to deploy a deep network to learn to score proposals, under the special setting of weak supervision. To tackle the key issue of acquiring proposal-level pseudo labels for model training, we propose a so-called Attention-Guided Pseudo Labeling (AGPL) strategy, which leverages the local maximal (peaks) in image-level attention maps and the spatial relationship among peaks and proposals to infer pseudo labels. We also suggest a novel training loss, called Entropic OpenSet Loss, to handle background proposals more effectively so as to further improve the robustness. Comprehensive experiments on two standard benchmarking datasets demonstrate that the proposed WS-RCNN can outperform the state-of-the-art by a large margin, with an improvement of 11.6% on PASCAL VOC 2012 and 10.7% on MS COCO 2014 in terms of mAP50, which indicates that learning-based proposal scoring and the proposed WS-RCNN framework might be a promising way towards WSIS.
Appearance representation and the observation model are the most important components in designing a robust visual tracking algorithm for video-based sensors. Additionally, the exemplar-based linear discriminant analysis (ELDA) model has shown good performance in object tracking. Based on that, we improve the ELDA tracking algorithm by deep convolutional neural network (CNN) features and adaptive model update. Deep CNN features have been successfully used in various computer vision tasks. Extracting CNN features on all of the candidate windows is time consuming. To address this problem, a two-step CNN feature extraction method is proposed by separately computing convolutional layers and fully-connected layers. Due to the strong discriminative ability of CNN features and the exemplar-based model, we update both object and background models to improve their adaptivity and to deal with the tradeoff between discriminative ability and adaptivity. An object updating method is proposed to select the "good" models (detectors), which are quite discriminative and uncorrelated to other selected models. Meanwhile, we build the background model as a Gaussian mixture model (GMM) to adapt to complex scenes, which is initialized offline and updated online. The proposed tracker is evaluated on a benchmark dataset of 50 video sequences with various challenges. It achieves the best overall performance among the compared state-of-the-art trackers, which demonstrates the effectiveness and robustness of our tracking algorithm.
This Letter presents a computational model for saliency detection in natural images. While existing approaches usually make use of low-level or high-level visual features for establishing the saliency models, our method relies on midlevel visual cues, i.e., the superpixel representation of the image. In the proposed approach, the given image is first partitioned into superpixels. A fully connected superpixel graph is then constructed, and the random walk on the graph is adopted to measure saliency. In addition, a scheme based on multiple segmentations is used for multiscale processing. Our model has the advantage of generating high-resolution saliency maps with well-defined object borders. Experimental results on publicly available datasets demonstrate the proposed model can outperform the compared state-of-the-art saliency models.
Background: Laryngeal cancer (LCA) is a common malignancy of the head and neck region. Early diagnosis of LCA is very difficult because of its subtle abnormalities in early stage, especially for the inexperienced endoscopists with conventional white-light endoscopy. Computer-aided diagnosis were used for several diseases in recent years. This study was to develop a deep convolutional neural network (DCNN) that can automatically detect LCA in laryngoscopic images.Methods: A DCNN-based diagnostic system was constructed and trained using 15,239 laryngoscopic images of LCA, precancerous laryngeal lesions (PRELCA), benign laryngeal tumors (BLT) and normal tissues (NORM). An independent test set of 1,200 laryngoscopic images was applied to the constructed DCNN to evaluate its performance against experienced endoscopists.Findings: In the training set, the DCCN achieved the sensitivity of 0.920, the specificity of 0.716, the AUC of 0.922, and the overall accuracy of 0.858 for detecting LCA and PRELCA among all lesions and normal tissues. When compares to human experts in an independent test set, the DCCN' s performance on detection of LCA and PRELCA achieved the sensitivity of 0.933, the specificity of 0.797, the AUC of 0.952, and the overall accuracy of 0.901, which is comparable to that of experienced human expert with 10-20 years of work experience. Moreover, the overall accuracy of DCNN for detection of LCA is 0.786, which is also comparable to that of experienced human expert with 10-20 years of work experience and exceed the experts with less than 10 years of work experience.Interpretation: The DCNN had high sensitivity and specificity for automated detection of LCA and PRELCA from BLT and NORM in laryngoscopic images. This novel and effective approach facilitates earlier diagnosis of early LCA, resulting in improved clinical outcomes and reducing the burden of endoscopists.Funding: None.Declaration of Interest: The authors have no conflict of interests.Ethical Approval: The study was approved by the ethical review board of Sun Yat-sen Memorial Hospital, Sun Yat-sen University.
The smooth operation of autonomous underwater vehicles (AUVs) relies heavily on the accurate detection of surrounding objects. Toward this end, this letter presents a novel method for underwater object detection based on the gravity gradient differential and the gravity gradient differential ratio caused by the relative motion between the AUV and the object. Unlike the existing techniques, the proposed method works in a passive manner and achieves AUV invisibility without energy emission. In addition, for the proposed method, no gravity map or gravity gradient map is required, which improves its practicality. Experimental results demonstrate that the proposed method performs better than the existing methods.
Weakly supervised instance segmentation (WSIS) using only image-level labels is a challenging task due to the difficulty of aligning coarse annotations with the finer task. However, with the advancement of deep neural networks (DNNs), WSIS has garnered significant attention. Following a proposal-based paradigm, we encounter a redundant segmentation problem resulting from a single instance being represented by multiple proposals. For example, we feed a picture of a dog and proposals into the network and expect to output only one proposal containing a dog, but the network outputs multiple proposals. To address this problem, we propose a novel approach for WSIS that focuses on the online refinement of complete instances through the use of MaskIoU heads to predict the integrity scores of proposals and a Complete Instances Mining (CIM) strategy to explicitly model the redundant segmentation problem and generate refined pseudo labels. Our approach allows the network to become aware of multiple instances and complete instances, and we further improve its robustness through the incorporation of an Anti-noise strategy. Empirical evaluations on the PASCAL VOC 2012 and MS COCO datasets demonstrate that our method achieves state-of-the-art performance with a notable margin. Our implementation will be made available at https://github.com/ZechengLi19/CIM.