Combining multiple visual processing streams for locating and classifying objects in video

Dylan M. Paiton,Steven P. Brumby,Garrett T. Kenyon,G. J. Kunde,Kris D. Peterson,Mohammed I. Ham,Peter F. Schultz,John S. George

Combining multiple visual processing streams for locating and classifying objects in video

2012

Automated, invariant object detection has proven itself to be a substantial challenge for the artificial intelligence research community. In computer vision, many different benchmarks have been established using whole-image classification based on datasets that are too small to eliminate statistical artifacts. As an alternative, we used a new dataset consisting of ∼62GB (on the order of 40,000 2Mpixel frames) of compressed high-definition aerial video, which we employed for both object classification and localization. Our algorithms mimic the processing pathways in primate visual cortex, exploiting color/texture, shape/form and motion. We then combine the data using a clustering technique to produce a final output in the form of labeled bounding boxes around objects of interest in the video. Localization adds additional complexity not generally found in whole-image classification problems. Our results are evaluated qualitatively and quantitatively using a scoring metric that assessed the overlap between our detections and ground-truth.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations