logo
    View Independent Vehicle Model Recognition Using Semantic Segmentation and Image Retrieval
    0
    Citation
    0
    Reference
    20
    Related Paper
    Scene classification and semantic segmentation are two important research directions in computer vision. They are widely used in the research of automatic driving and human–computer interaction. The purpose of the scene classification is to use the image classification to determine the category of the scene in an image by analyzing the background and the target object, while semantic segmentation aims to classify the image at the pixel level and mark the position and semantic information of the scene unit. In this paper, we aimed to train the semantic segmentation neural network in different scenarios to obtain the models with the same number of scene categories, which they are used to process the images. During the process of the actual test, the semantic segmentation dataset was firstly divided into three categories based on the scene classification algorithm. Then the semantic segmentation neural network is trained under three scenarios, and three semantic segmentation network models are obtained accordingly. To test the property of our methods, the semantic segmentation models we got were selected to treat other pictures, and the results obtained from the performance of scene-aware semantic segmentation were much better than semantic segmentation without considering categories. Our study provided an essential improvement of semantic segmentation by adding category information into consideration, which will be helpful to obtain more precise models for further picture analysis.
    Segmentation-based object categorization
    Citations (19)
    Although pedestrian detection has been largely improved with the emergence of convolutional neural networks (CNN), the performance in autonomous driving still faces various challenges, which mainly include large-scale variation, illumination variation, and occlusion of different levels. A robust pedestrian detector enhanced by semantic segmentation is proposed. Inspired by the benefits of multitask learning, our main idea lies in integrating the task of semantic segmentation into the detection framework with auxiliary supervision, inheriting the merits of the two-stream network. Specifically, anchor boxes with various scales are paved on the feature maps of a base CNN; detection is performed based on bounding box classification and regression. On the other stream, semantic segmentation is also performed based on the same feature maps. Extensive experiments on the recently published large-scale pedestrian detection benchmark, i.e., CityPersons, show that the additional supervision from semantic segmentation can significantly improve the detection accuracy without extra computational burdens during inference, which demonstrates the superiority of the proposed method.
    Pedestrian detection
    Citations (1)
    The detection, precise segmentation and classification of specific objects is an important task in many computer vision and image analysis problems, particularly in medical domains. Existing methods such as template matching typically require excessive computation and user interaction, particularly if the desired objects have a variety of different shapes. This paper presents a new approach that uses unsupervised learning to find a set of templates specific to the objects being outlined by the user. The templates are formed by averaging the shapes that belong to a particular cluster, and are used to guide an intelligent search through the space of possible objects. This results in decreased time and increased accuracy for repetitive segmentation problems, as system performance improves with continued use. Further, the information gained through clustering and user feedback is used to classify the objects for problems in which shape is relevant to the classification. The effectiveness of the resulting system is demonstrated on two applications: a medical diagnosis task using cytological images and a vehicle recognition task.
    Template matching
    Template
    Contextual image classification
    Citations (8)
    We describe a system for vehicle make and model recognition (MMR) that automatically detects and classifies the make and model of a car from a live camera mounted above the highway. Vehicles are detected using a histogram of oriented gradient detector and then classified by a convolutional neural network (CNN) incorporating the frontal view of the car. We propose a semiautomatic data-selection approach for the vehicle detector and the classifier, by using an automatic number plate recognition engine to minimize human effort. The resulting classification has a top-1 accuracy of 97.3% for 500 vehicle models. This paper presents a more extensive in-depth evaluation. We evaluate the effect of occlusion and have found that the most informative vehicle region is the grill at the front. Recognition remains accurate when the left or right part of vehicles is occluded. The small fraction of misclassifications mainly originates from errors in the dataset, or from insufficient visual information for specific vehicle models. Comparison of state-of-the-art CNN architectures shows similar performance for the MMR problem, supporting our findings that the classification performance is dominated by the dataset quality.
    Citations (1)
    The paper outlines an integrated image processing environment that uses neural networks for object recognition and classification. The image processing environment which is Windows based, encapsulates a multiple-document interface (MDI) and is menu driven. Object (shape) parameter extraction is focused on features that are invariant in terms of translation, rotation and scale transformations. The neural network models incorporated into the environment allow both clustering and classification of objects from the analysed image. Mapping neural networks perform input sensitivity analysis on the extracted feature measurements and thus facilitates the removal of irrelevant features and improvements in the degree of generalisation.
    Image translation
    Citations (2)
    Depth learning has been applied in semantic segmentation and object recognition in computer vision. In this paper, we propose a high-efficiency pixel classification convolutional neural network based on encoder-decoder structure. Using depth image to enhance CNNS network make color image and depth information can be detected jointly. The experimental data shows that through the fusion of depth features representing different scales of information, the algorithm architecture can be jointly optimized and obtain more sophisticated image semantic features. Compared with similar methods, image semantic segmentation algorithm has obvious advantages.
    Feature (linguistics)
    Segmentation-based object categorization
    Semantic feature
    Object detection and semantic segmentation are two main themes in object retrieval from high-resolution remote sensing images, which have recently achieved remarkable performance by surfing the wave of deep learning and, more notably, convolutional neural networks. In this paper, we are interested in a novel, more challenging problem of vehicle instance segmentation, which entails identifying, at a pixel level, where the vehicles appear as well as associating each pixel with a physical instance of a vehicle. In contrast, vehicle detection and semantic segmentation each only concern one of the two. We propose to tackle this problem with a semantic boundary-aware multitask learning network. More specifically, we utilize the philosophy of residual learning to construct a fully convolutional network that is capable of harnessing multilevel contextual feature representations learned from different residual blocks. We theoretically analyze and discuss why residual networks can produce better probability maps for pixelwise segmentation tasks. Then, based on this network architecture, we propose a unified multitask learning network that can simultaneously learn two complementary tasks, namely, segmenting vehicle regions and detecting semantic boundaries. The latter subproblem is helpful for differentiating "touching" vehicles that are usually not correctly separated into instances. Currently, data sets with a pixelwise annotation for vehicle extraction are the ISPRS data set and the IEEE GRSS DFC2015 data set over Zeebrugge, which specializes in a semantic segmentation. Therefore, we built a new, more challenging data set for vehicle instance segmentation, called the Busy Parking Lot Unmanned Aerial Vehicle Video data set, and we make our data set available at http://www.sipeo.bgu.tum.de/downloads so that it can be used to benchmark future vehicle instance segmentation algorithms.
    Aerial image
    Citations (182)
    This research proposes an object recognition system using image processing and neural network based classification. The system is capable of recognizing 7 objects from an uncluttered background by extracting color, texture and shape features. The proposed system consists of image segmentation, feature extraction and classification. Diverse neural network topology settings have been employed for evaluation. Experimental results indicate that the proposed system achieves high accuracy 98% accurate for real-time object recognition tasks.
    Feature (linguistics)
    3D single-object recognition
    Contextual image classification
    Citations (0)
    This article puts forward a kind of huge amounts of multi-object image recognition method -- BVCNN. Firstly, BING method is used to recognize images, which greatly reduces the time of estimating image targets, and makes it possible that quickly identify multiple target images, compared to traditional convolution neural networks only achieving single target image recognition, Secondly, vectorization of deep convolutional neural networks is used for deep learning of characteristics in local image and recognition, which speeds up network training and testing, thirdly, using the context information in multi-object image classification, to a certain extent, helps to distinguish individual of similar characteristics according to environment, improving the multi-object image recognition accuracy. According to experiments, identifying a single image by this model only need less than 1 s, and this model can be used for image information fusion.
    Convolution (computer science)
    Citations (5)