Face Feature Recovery via Temporal Fusion for Person Search

2020 
Searching actors from videos by a single portrait image is a challenging task, due to large variations of video scenes and intra-person appearance. To tackle this problem, most recent works apply deep neural networks for detecting and extracting robust facial features for matching. However, when the face of an actor is not detected due to occlusion, such image-matching based strategies would not be applicable. To address the issue, we propose a unique framework of "Face Feature Recovery via Temporal Fusion" to synthesize virtual facial features by observing both temporal and contextual information. Once such face features are extracted, a simple extension to the k-nearest neighbors for re-ranking, "Iterative k-nearest Multi-fusion", is presented to utilize both face and body features for improved person search. We conduct extensive experiments to evaluate the performance of our framework on the challenging extended version of the Cast Search in Movies (ECSM) dataset [1]. Without utilizing tracklet information during training, the proposed approach still performs favorably against recent works in searching actors of interest from movie videos. Besides, we also show the proposed approach can be fused with them to further improve the performance.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    0
    Citations
    NaN
    KQI
    []