Scalable online annotation & object localisation for broadcast media production.

Charles Gray

Scalable online annotation & object localisation for broadcast media production.

2016

Charles Gray

More video content is being produced by production companies and professional videographers than ever before thanks to the adoption of digital media technologies at every stage of the production pipeline. With hundreds of hours of footage being captured by even a small production company, organising and searching these collections has become a very challenging and time-consuming task. This thesis aims to investigate online video annotation for broadcast media production, including scalable video concept detection and object localisation. Most production tools and research focuses on asset management of large-scale video collections, but we also focus on making sense of content within an individual production video by extracting salient metadata and localising objects. We present a scalable semantic video concept detection framework, applied to automated metadata annotation (video logging) in a broadcast production environment. Video logging demands both accurate and fast concept detection. Whilst research often focuses on the former, the latter is essential in practical scenarios where days of footage may be shot per broadcast episode and production is dependent on immediate availability of metadata. We present a hierarchical classification framework that delivers benefits to both through two contributions. First, a dynamic weighting scheme for combining video features from multiple modalities enabling higher accuracy detection rates over diverse production footage. Second, a hierarchical classification strategy that exploits ontological relationships between concepts to scale sub-linearly with the number of classes, yielding a real-time solution. We demonstrate an end-to-end production system using a cloud-based architecture with our detection framework. We also describe a novel fully automatic algorithm for identifying salient objects in video based on their motion. Spatially coherent clusters of optical flow vectors are sampled to generate estimates of affine motion parameters local to super-pixels identified within each frame. These estimates, combined with spatial data, form coherent point distributions in a 5D solution space corresponding to objects or parts there-of. These distributions are temporally de-noised using a particle filtering approach, and clustered to estimate the position and motion parameters of salient moving objects in the clip. We demonstrate localization of salient object/s in a variety of clips exhibiting moving and cluttered backgrounds.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations