Temporal Segment Convolutional Kernel Networks for Sequence Modeling of Videos

2019 
Sequence modeling is crucial for video action recognition. In this paper, we propose temporal segment convolutional kernel networks (TS-CKN), where we take advantage of convolutional neural networks to facilitate the extraction of appearance features, while time sequence is modeled with deep kernel networks. We employ the kernel methods to capture time-varying information of videos and propose a training method for kernel map approximation by matrix backpropagation. This leads to the model named deep kernel networks which can be easily integrated with existing deep learning models such as Resnet. Our approach also samples several video clips sparsely in the video and unifies class predictions from all clips. More importantly, all parameters of our model can be learned by stochastic optimization in an end-to-end manner. We evaluate our method on two standard action recognition datasets including HMDB-51 and UCF-101, achieving the state-of-the-art results.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    0
    Citations
    NaN
    KQI
    []