language-icon Old Web
English
Sign In

TokyoTechCanon at TRECVID 2012

2012 
We aim at developing a high-performance semantic indexing system using Gaussian-mixture-model (GMM) supervectors and tree-structured GMMs [1, 2, 3]. GMM supervectors corresponding to six types of audio and visual features are extracted from video shots. Tree-structured GMMs reduce the computational cost of maximum a posteriori (MAP) adaptation for estimating GMM parameters while keeping accuracy at high levels. This year, we introduce two new low-level features of HOG-Dense and LBP-Dense and video-clip scores. HOG-Dense and LBP-Dense are extracted from up to 100 frames per shot by using dense sampling. The video-clip score is defined as the maximum value of shot scores among all the shots in a video clip and is used for re-ranking video shots. Our best result was 32.10% in terms of Mean InfAP, which was ranked first over all semantic indexing runs in the full task.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    5
    Citations
    NaN
    KQI
    []