Efficient Spatio-Temporal Network with Gated Fusion for Video Super-Resolution

2021 
Video super-resolution (VSR) has drawn much attention in research community recently. As the typical three-dimensional (3D) signals, how to exploit spatio-temporal features in videos effectively and efficiently is critical for VSR. Most methods explore spatio-temporal feature through optical flow estimation and motion compensation. Although promising, these methods suffer from the trade-off between model performance and complexity. In this paper, we present a novel efficient spatio-temporal network (denoted as “ESTN") for VSR, which is designed to separately encode spatial and temporal video frames through two parallel streams. In particular, several 2D and 3D convolutions are utilized to encode on the central frame and consecutive frames for feature extraction. Besides, for the better ability of adaptive alignment at the feature level, instead of direct addition, we propose to learn a gate module consisting of deformable convolution to fuse the spatial and temporal features from the two streams. Such design enables efficient spatio-temporal exploration and maintains a lightweight model. Experiments demonstrate that the proposed ESTN achieves competitive or even better performance than its competitors with similar parameters.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    0
    Citations
    NaN
    KQI
    []