Watch Hours in Minutes: Summarizing Videos with User Intent.

2020 
With the ever increasing growth of videos, automatic video summarization has become an important task which has attracted lot of interest in the research community. One of the challenges which makes it a hard problem to solve is presence of multiple ‘correct answers’. Because of the highly subjective nature of the task, there can be different “ideal” summaries of a video. Modelling user intent in the form of queries has been posed in literature as a way to alleviate this problem. The query-focused summary is expected to contain shots which are relevant to the query in conjunction with other important shots. For practical deployments in which very long videos need to be summarized, this need to capture user’s intent becomes all the more pronounced. In this work, we propose a simple two stage method which takes user query and video as input and generates a query-focused summary. Specifically, in the first stage, we employ attention within a segment and across all segments, combined with the query to learn the feature representation of each shot. In the second stage, such learned features are again fused with the query to learn the score of each shot by regressing through fully connected layers. We then assemble the summary by arranging the top scoring shots in chronological order. Extensive experiments on a benchmark query-focused video summarization dataset for long videos give better results as compared to the current state of the art, thereby demonstrating the effectiveness of our method even without employing computationally expensive architectures like LSTMs, variational autoencoders, GANs or reinforcement learning, as done by most past works.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    44
    References
    0
    Citations
    NaN
    KQI
    []