Crafting GBD-Net for Object Detection
Xingyu ZengWanli OuyangJunjie YanHongsheng LiTong XiaoKun WangYu LiuYucong ZhouBin YangZhe WangHui ZhouXiaogang Wang
137
Citation
52
Reference
10
Related Paper
Citation Trend
Abstract:
The visual cues from multiple support regions of different sizes and resolutions are complementary in classifying a candidate box in object detection. Effective integration of local and contextual visual cues from these regions has become a fundamental problem in object detection. In this paper, we propose a gated bi-directional CNN (GBD-Net) to pass messages among features from different support regions during both feature learning and feature extraction. Such message passing can be implemented through convolution between neighboring support regions in two directions and can be conducted in various layers. Therefore, local and contextual visual patterns can validate the existence of each other by learning their nonlinear relationships and their close interactions are modeled in a more complex way. It is also shown that message passing is not always helpful but dependent on individual samples. Gated functions are therefore needed to control message transmission, whose on-or-offs are controlled by extra visual evidence from the input sample. The effectiveness of GBD-Net is shown through experiments on three object detection datasets, ImageNet, Pascal VOC2007 and Microsoft COCO. Besides the GBD-Net, this paper also shows the details of our approach in winning the ImageNet object detection challenge of 2016, with source code provided on https://github.com/craftGBD/craftGBD. In this winning system, the modified GBD-Net, new pretraining scheme and better region proposal designs are provided. We also show the effectiveness of different network structures and existing techniques for object detection, such as multi-scale testing, left-right flip, bounding box voting, NMS, and context.Keywords:
Pascal (unit)
Feature (linguistics)
Code (set theory)
Pascal (unit)
Cite
Citations (11)
We present a novel detection method using a deep convolutional neural network (CNN), named AttentionNet. We cast an object detection problem as an iterative classification problem, which is the most suitable form of a CNN. AttentionNet provides quantized weak directions pointing a target object and the ensemble of iterative predictions from AttentionNet converges to an accurate object boundary box. Since AttentionNet is a unified network for object detection, it detects objects without any separated models from the object proposal to the post bounding-box regression. We evaluate AttentionNet by a human detection task and achieve the state-of-the-art performance of 65% (AP) on PASCAL VOC 2007/2012 with an 8-layered architecture only.
Pascal (unit)
Minimum bounding box
Bounding overwatch
Cite
Citations (171)
Research on object detection algorithms with higher accuracy and faster detection speed is currently the main concern. In order to improve detection performance, an improved object detection algorithm using YOLOv3-tiny based on pyramid pooling is proposed. First, an improved pyramid pooling module using adaptive average pooling is designed to efficiently extract global feature information, and then combine the module with YOLOv3-tiny to explore the impact of different combinations on the detection results. The experiment used PASCAL VOC2007 trainval and all PASCAL VOC2012 for training and validation, and used PASCAL VOC2007 test for testing. Experimental results show that the proposed network improves mAP by 1.8% compared to YOLOv3-tiny while the detection speed is almost the same, which better achieves the balance of detection speed and accuracy.
Pascal (unit)
Pooling
Pyramid (geometry)
Cite
Citations (1)
Pascal (unit)
Cite
Citations (2)
The growth of detection datasets and the multiple directions of object detection research provide both an unprecedented need and a great opportunity for a thorough evaluation of the current state of the field of categorical object detection. In this paper we strive to answer two key questions. First, where are we currently as a field: what have we done right, what still needs to be improved? Second, where should we be going in designing the next generation of object detectors? Inspired by the recent work of Hoiem et al. on the standard PASCAL VOC detection dataset, we perform a large-scale study on the Image Net Large Scale Visual Recognition Challenge (ILSVRC) data. First, we quantitatively demonstrate that this dataset provides many of the same detection challenges as the PASCAL VOC. Due to its scale of 1000 object categories, ILSVRC also provides an excellent test bed for understanding the performance of detectors as a function of several key properties of the object classes. We conduct a series of analyses looking at how different detection methods perform on a number of image-level and object-class-level properties such as texture, color, deformation, and clutter. We learn important lessons of the current object detection methods and propose a number of insights for designing the next generation object detectors.
Pascal (unit)
Object-class detection
Categorical variable
Cite
Citations (88)
Numerous studies in the field of object detection have been conducted over the past few decades. Several effective methods have been developed. Among various object detection algorithms, Faster RCNN offers excellent results in both detection speed and accuracy. It is a combination of Fast RCNN and RPN layers. This paper conducts a comparative study of object detection using Faster RCNN. The study shows that use of smaller convolutional network called Region Proposal Network improves performance of the system. It shows that object detection using Faster RCNN can give high accuracy and faster performance as compared to other methods and algorithms. It takes only 0.2 seconds to predict a single image. Also, it gives 70% Mean Accuracy Precision (mAP) on the PASCAL VOC 2007 and PASCAL VOC 2012 datasets.
Pascal (unit)
Cite
Citations (4)
Object detection performance, as measured on the PASCAL VOC dataset, has achieved a prominent result since systems based on the deep convolution neural network (CNN) was proposed. However, inaccurate localization remains a major factor causing error for detection. Building upon high-capacity CNN architectures, we address the problem by 1)combining a high-recall algorithm proposing candidate regions for an object bounding box with an algorithm reducing localization bias, and 2)utilizing box alignment which penalizing deviation via taking object boundaries into account, to instruct the procedure of proposing input of CNN. Experiments demonstrate that the proposed methods improve the detection performance over the baseline and many other methods on the PASCAL VOC 2007 dataset.
Pascal (unit)
Minimum bounding box
Bounding overwatch
Convolution (computer science)
Cite
Citations (0)
We introduce YOLO9000, a state-of-the-art, real-time object detection system that can detect over 9000 object categories. First we propose various improvements to the YOLO detection method, both novel and drawn from prior work. The improved model, YOLOv2, is state-of-the-art on standard detection tasks like PASCAL VOC and COCO. At 67 FPS, YOLOv2 gets 76.8 mAP on VOC 2007. At 40 FPS, YOLOv2 gets 78.6 mAP, outperforming state-of-the-art methods like Faster RCNN with ResNet and SSD while still running significantly faster. Finally we propose a method to jointly train on object detection and classification. Using this method we train YOLO9000 simultaneously on the COCO detection dataset and the ImageNet classification dataset. Our joint training allows YOLO9000 to predict detections for object classes that don't have labelled detection data. We validate our approach on the ImageNet detection task. YOLO9000 gets 19.7 mAP on the ImageNet detection validation set despite only having detection data for 44 of the 200 classes. On the 156 classes not in COCO, YOLO9000 gets 16.0 mAP. But YOLO can detect more than just 200 classes; it predicts detections for more than 9000 different object categories. And it still runs in real-time.
Pascal (unit)
Cite
Citations (455)
We introduce YOLO9000, a state-of-the-art, real-time object detection system that can detect over 9000 object categories. First we propose various improvements to the YOLO detection method, both novel and drawn from prior work. The improved model, YOLOv2, is state-of-the-art on standard detection tasks like PASCAL VOC and COCO. At 67 FPS, YOLOv2 gets 76.8 mAP on VOC 2007. At 40 FPS, YOLOv2 gets 78.6 mAP, outperforming state-of-the-art methods like Faster RCNN with ResNet and SSD while still running significantly faster. Finally we propose a method to jointly train on object detection and classification. Using this method we train YOLO9000 simultaneously on the COCO detection dataset and the ImageNet classification dataset. Our joint training allows YOLO9000 to predict detections for object classes that don't have labelled detection data. We validate our approach on the ImageNet detection task. YOLO9000 gets 19.7 mAP on the ImageNet detection validation set despite only having detection data for 44 of the 200 classes. On the 156 classes not in COCO, YOLO9000 gets 16.0 mAP. But YOLO can detect more than just 200 classes; it predicts detections for more than 9000 different object categories. And it still runs in real-time.
Pascal (unit)
Cite
Citations (105)
In recent years, object detection has made great progress with the continuous development of deep neural network. At present, there are many different fully supervised object detection algorithms in the field of computer vision, which are basically saturated, while object detection in a weakly supervised manner is more challenging than strongly supervised object detection. Since nowadays mature object detection algorithms rely heavily on strongly labeled datasets, but strong labeled datasets are very expensive and require huge datasets to support in order to train a better object detection model, weakly supervised object detection has received more and more attention. In this paper, a new module can be embedded in the framework of weakly supervised object detection, three modules are introduced into the weakly supervised object detection framework, which is used to generate high-quality proposals and screen these proposals, and finally selecting more accurate proposal boxes that are beneficial for subsequent training, and demonstrate their effectiveness on the PASCAL VOC2007 and PASCAL VOC2012 datasets, in which this paper achieves a significant improvement over the existing classic weakly supervised object detection algorithms with significant improvements.
Pascal (unit)
Object-class detection
Cite
Citations (0)