Adaptive Context Reading Network for Movie Scene Detection

2020 
Video scene detection is the task of temporally segmenting a video into its basic story units called scenes. We propose a temporal context aware scene detection method. For each shot in a video, we store the time-indexed features of its surrounding shots as its context memory. A context-reading operation is performed to read the most relevant information from the memory which is used to update the feature of the query shot. To adaptively determine the temporal scale of context memory for different queries, we apply a bank of context memories of different temporal scales to generate multiple context reads, and adaptively aggregate them according to their confidence scores. The adaptive context-reading is guided by a structure learning objective which encourages each shot to read the most appropriate context such that the global structure of scene can be revealed in the feature space. With the context-aware shot features learned by our method, we perform clustering to find the scene boundaries. Our experiments demonstrate that adaptively modeling temporal context yields the state-of-the-art results on the existing video scene detection datasets. We also construct a large-scale dataset for the task and our ablation studies on it show that the performance gains owe to the proposed adaptive context reading.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    86
    References
    0
    Citations
    NaN
    KQI
    []