Drones, or general UAVs, equipped with cameras have been fast deployed with a wide range of applications, including agriculture, aerial photography, and surveillance. Consequently, automatic understanding of visual data collected from drones becomes highly demanding, bringing computer vision and drones more and more closely. To promote and track the evelopments of object detection and tracking algorithms, we have organized two challenge workshops in conjunction with ECCV 2018, and ICCV 2019, attracting more than 100 teams around the world. We provide a large-scale drone captured dataset, VisDrone, which includes four tracks, i.e., (1) image object detection, (2) video object detection, (3) single object tracking, and (4) multi-object tracking. In this paper, we first presents a thorough review of object detection and tracking datasets and benchmarks, and discuss the challenges of collecting large-scale drone-based object detection and tracking datasets with fully manual annotations. After that, we describe our VisDrone dataset, which is captured over various urban/suburban areas of 14 different cities across China from North to South. Being the largest such dataset ever published, VisDrone enables extensive evaluation and investigation of visual analysis algorithms on the drone platform. We provide a detailed analysis of the current state of the field of large-scale object detection and tracking on drones, and conclude the challenge as well as propose future directions. We expect the benchmark largely boost the research and development in video analysis on drone platforms. All the datasets and experimental results can be downloaded from the website: this https URL.
Sparse models in dictionary learning have been successfully applied in a wide variety of machine learning and computer vision problems, and have also recently been of increasing research interest. Another interesting related problem based on a linear equality constraint, namely the sparse null space problem (SNS), first appeared in 1986, and has since inspired results on sparse basis pursuit. In this paper, we investigate the relation between the SNS problem and the analysis dictionary learning problem, and show that the SNS problem plays a central role, and may be utilized to solve dictionary learning problems. Moreover, we propose an efficient algorithm of sparse null space basis pursuit, and extend it to a solution of analysis dictionary learning. Experimental results on numerical synthetic data and real-world data are further presented to validate the performance of our method.
We consider subspace clustering under sparse noise, for which a non-convex optimization framework based on sparse data representations has been recently developed. This setup is suitable for a large variety of applications with high dimensional data, such as image processing, which is naturally decomposed into a sparse unstructured foreground and a background residing in a union of low-dimensional subspaces. In this framework, we further discuss both performance and implementation of the key optimization problem. We provide an analysis of this optimization problem demonstrating that our approach is capable of recovering linear subspaces as a local optimal solution for sufficiently large data sets and sparse noise vectors. We also propose a sequential algorithmic solution, which is particularly useful for extremely large data sets and online vision applications such as video processing.
Real-world data often follow a long-tailed distribution as the frequency of each class is typically different. For example, a dataset can have a large number of under-represented classes and a few classes with more than sufficient data. However, a model to represent the dataset is usually expected to have reasonably homogeneous performances across classes. Introducing class-balanced loss and advanced methods on data re-sampling and augmentation are among the best practices to alleviate the data imbalance problem. However, the other part of the problem about the under-represented classes will have to rely on additional knowledge to recover the missing information. In this work, we present a novel approach to address the long-tailed problem by augmenting the under-represented classes in the feature space with the features learned from the classes with ample samples. In particular, we decompose the features of each class into a class-generic component and a class-specific component using class activation maps. Novel samples of under-represented classes are then generated on the fly during training stages by fusing the class-specific features from the under-represented classes with the class-generic features from confusing classes. Our results on different datasets such as iNaturalist, ImageNet-LT, Places-LT and a long-tailed version of CIFAR have shown the state of the art performances.
Sparse models in dictionary learning have been successfully applied in a wide variety of machine learning and computer vision problems, and as a result have recently attracted increased research interest. Another interesting related problem based on linear equality constraints, namely the sparse null space (SNS) problem, first appeared in 1986 and has since inspired results on sparse basis pursuit. In this paper, we investigate the relation between the SNS problem and the analysis dictionary learning (ADL) problem, and show that the SNS problem plays a central role, and may be utilized to solve dictionary learning problems. Moreover, we propose an efficient algorithm of sparse null space basis pursuit (SNS-BP) and extend it to a solution of ADL. Experimental results on numerical synthetic data and real-world data are further presented to validate the performance of our method.
In recent years, deep learning, particularly Convolutional Neural Network (CNN), has shown great efficacy for solving various vision tasks. In image segmentation, it has been demonstrated that a CNN can greatly outperform other approaches. However, special attention has to be paid towards setting various parameters in the CNN that affects the scale of the feature map generated at the last convolutional layer, where scale here refers to the ratio of the number of pixels in the original input image that correspond to each pixel in the feature map. Quite often, the optimal settings are tied to the specific problem on hand and can be fairly challenging to determine. To overcome such an issue, this paper proposes a multiscale Fully Convolutional Network (FCN) that combines networks trained at various scales, thereby allowing for conducting segmentation more generically. Moreover, such a multiscale architecture allows for incremental fine-tuning as more training images become available later on and new networks can be trained and added to the combined network. Such flexibility has great utility in applications such as industrial inspection, where training images may not be readily available initially, but yet requires a high level of accuracy. This paper will validate our findings by reporting the results that we have obtained by applying multiscale FCN to the inspection of aircraft engine part.