The estimation of the projective structure of a scene from image correspondences can be formulated as the minimization of the mean-squared distance between predicted and observed image points with respect to the projection matrices, the scene point positions, and their depths. Since these unknowns are not independent, constraints must be chosen to ensure that the optimization process. is well posed. This paper examines three plausible choices, and shows that the first one leads to the Sturm-Triggs projective factorization algorithm, while the other two lead to new provably-convergent approaches. Experiments with synthetic and real data are used to compare the proposed techniques to the Sturm-Triggs algorithm and bundle adjustment.
Using a saliency measure based on the global property of contour closure, we have developed a method that reliably segments out salient contours bounding unknown objects from real edge images. The measure also incorporates the Gestalt principles of proximity and smooth continuity that previous methods have exploited. Unlike previous measures, we incorporate contour closure by finding the eigen-solution associated with a stochastic process that models the distribution of contours passing through edges in the scene. The segmentation algorithm utilizes the saliency measure to identify multiple closed contours by finding strongly-connected components on an induced graph. The determination of strongly-connected components is a direct consequence of the property of closure. We report for the first time, results on large real images for which segmentation takes an average of about 10 secs per object on a general-purpose workstation. The segmentation is made efficient for such large images by exploiting the inherent symmetry in the task.
Many modeling tasks in computer vision, e.g. structure from motion, shape/reflectance from shading, filter synthesis have a low-dimensional intrinsic structure even though the dimension of the input data can be relatively large. We propose a simple but surprisingly effective iterative randomized algorithm that drastically cuts down the time required for recovering the intrinsic structure. The computational cost depends only on the intrinsic dimension of the structure of the task. It is based on the recently proposed Cascade Basis Reduction (CBR) algorithm that was developed in the context of steerable filters. A key feature of our algorithm compared with CBR is that an arbitrary a priori basis for the task is not required. This allows us to extend the applicability of the algorithm to tasks beyond steerable filters such as structure from motion. We prove the convergence for the new algorithm. In practice the new algorithm is much faster than CBR for the same modeling error. We demonstrate this speed-up for the construction of a steerable basis for Gabor filters. We also demonstrate the generality of the new algorithm by applying it to to an example from structure from motion without missing data.
Using a saliency measure based on the global property of contour closure, we have developed a segmentation method which identifies smooth closed contours bounding objects of unknown shape in real images. The saliency measure incorporates the Gestalt principles of proximity and good continuity that previous methods have also exploited. Unlike previous methods, we incorporate contour closure by finding the eigenvector with the largest positive real eigenvalue of a transition matrix for a Markov process where edges from the image serve as states. Element (i, j) of the transition matrix is the conditional probability that a contour which contains edge j will also contain edge i. We show how the saliency measure, defined for individual edges, can be used to derive a saliency relation, defined for pairs of edges, and further show that strongly-connected components of the graph representing the saliency relation correspond to smooth closed contours in the image. Finally, we report for the first time, results on large real images for which segmentation takes an average of about 10 seconds per object on a general-purpose workstation.
Abstract : The reliable detection of an object of interest in an input image with arbitrary background clutter and occlusion has to a large extent remained an elusive goal in computer vision. Traditional model-based approaches are inappropriate for a multi-class object detection task primarily due to difficulties in modeling arbitrary object classes. Instead, we develop a detection framework whose core component is a nearest neighbor search over object parts. The performance of the overall system is critically dependent on the distance measure used in the nearest neighbor search. A distance measure that minimizes the mis-classification risk for the 1-nearest neighbor search can be shown to be the probability that a pair of input measurements belong to different classes. This pair-wise probability is not in general a metric distance measure. Furthermore, it can out-perform any metric distance, approaching even the Bayes optimal performance. In practice, we seek a model for the optimal distance measure that combines the discriminative powers of more elementary distance measures associated with a collection of simple feature spaces that are easy and efficient to implement; in our work, we use histograms of various feature types like color, texture and local shape properties. For performing efficient nearest neighbor search over large training sets, the linear model was extended to discretized distance measures that combines distance measures associated with discriminators organized in a tree-like structure. Finally, the nearest neighbor search over object parts was integrated into a whole object detection system and evaluated against both an indoor detection task as well as a face recognition task yielding promising results.
This paper addresses the problem of acquiring realistic visual models of the shape and appearance of complex three-dimensional (3D) scenes from collec-tions of images, a process dubbed 3D photography. We focus on three instances of this problem: (1) the image-based construction of projective visual hulls of complex surfaces from weakly-calibrated photographs; (2) the automated matching and registration of photographs of textured surfaces using affine-invariant patches and their geometric relationships; and (3) an approach to projective motion analysis and self-calibration explicitly accounting for natural camera constraints such as zero skew and capable of handling large numbers of images in an efficient and uniform manner. We also briefly discuss some relat-ed applications of oriented differential projective geometry to computer vision problems, including the determination of the ordering of rim segments in pro-jective visual hull computation, and a purely projective proof of Koenderink's famous characterization of the local shape of visual contours.
This paper presents a technique for using training data to design image filters for appearance-based object recognition. Rather than scanning the image with a single set of filters and using the results to test for the existence of objects, we use many sets of filters and take linear combinations of their outputs. The combining coefficients are optimized in a training phase to encourage discriminability between the filter responses for distinct parts of the object and clutter. Our experiments on three popular filter types show that by using this approach to combine sets of filters whose design parameters vary over a wide range, we can achieve detection performance competitive with that of any individual filter set. This in turn can ease the task of fine-tuning the settings for both the filters and the mechanisms that analyze their outputs.1
We approach the task of object discrimination as that of learning efficient "codes" for each object class in terms of responses to a set of chosen discriminants. We formulate this approach in an energy minimization framework. The "code" is built incrementally by successively constructing discriminants that focus on pairs of training images of objects that are currently hard to classify. The particular discriminants that we use partition the set of objects of interest into two well-separated groups. We find the optimal discriminant as well as partition by formulating an objective criteria that measures the well-separateness of the partition. We derive an iterative solution that alternates between the solutions for two generalized eigenproblems, one for the discriminant parameters and the other for the indicator variables denoting the partition. We show how the optimization can easily be biased to focus on hard to classify pairs, which enables us to choose new discriminants one by one in a sequential manner We validate our approach on a challenging face discrimination task using parts as features and show that it compares favorably with the performance of an eigenspace method.
We develop a multi-class object detection framework whose core component is a nearest neighbor search over object part classes. The performance of the overall system is critically dependent on the distance measure used in the nearest neighbor search. A distance measure that minimizes the misclassification risk for the 1-nearest neighbor search can be shown to be the probability that a pair of input image measurements belong to different classes. In practice, we model the optimal distance measure using a linear logistic model that combines the discriminative powers of more elementary distance measures associated with a collection of simple to construct feature spaces like color, texture and local shape properties. Furthermore, in order to perform search over large training sets efficiently, the same framework was extended to find hamming distance measures associated with simple discriminators. By combining this discrete distance model with the continuous model, we obtain a hierarchical distance model that is both fast and accurate. Finally, the nearest neighbor search over object part classes was integrated into a whole object detection system and evaluated against an indoor detection task yielding good results.
Kettner, R. E., S. Mahamud, H.-C. Leung, N. Sitkoff, J. C. Houk, B. W. Peterson, and A. G. Barto. Prediction of complex two-dimensional trajectories by a cerebellar model of smooth pursuit eye movement. J. Neurophysiol. 77: 2115–2130, 1997. A neural network model based on the anatomy and physiology of the cerebellum is presented that can generate both simple and complex predictive pursuit, while also responding in a feedback mode to visual perturbations from an ongoing trajectory. The model allows the prediction of complex movements by adding two features that are not present in other pursuit models: an array of inputs distributed over a range of physiologically justified delays, and a novel, biologically plausible learning rule that generated changes in synaptic strengths in response to retinal slip errors that arrive after long delays. To directly test the model, its output was compared with the behavior of monkeys tracking the same trajectories. There was a close correspondence between model and monkey performance. Complex target trajectories were created by summing two or three sinusoidal components of different frequencies along horizontal and/or vertical axes. Both the model and the monkeys were able to track these complex sum-of-sines trajectories with small phase delays that averaged 8 and 20 ms in magnitude, respectively. Both the model and the monkeys showed a consistent relationship between the high- and low-frequency components of pursuit: high-frequency components were tracked with small phase lags, whereas low-frequency components were tracked with phase leads. The model was also trained to track targets moving along a circular trajectory with infrequent right-angle perturbations that moved the target along a circle meridian. Before the perturbation, the model tracked the target with very small phase differences that averaged 5 ms. After the perturbation, the model overshot the target while continuing along the expected nonperturbed circular trajectory for 80 ms, before it moved toward the new perturbed trajectory. Monkeys showed similar behaviors with an average phase difference of 3 ms during circular pursuit, followed by a perturbation response after 90 ms. In both cases, the delays required to process visual information were much longer than delays associated with nonperturbed circular and sum-of-sines pursuit. This suggests that both the model and the eye make short-term predictions about future events to compensate for visual feedback delays in receiving information about the direction of a target moving along a changing trajectory. In addition, both the eye and the model can adjust to abrupt changes in target direction on the basis of visual feedback, but do so after significant processing delays.