Relation-Based Associative Joint Location for Human Pose Estimation in Videos

2022 
Video-based human pose estimation (VHPE) is a vital yet challenging task. While deep learning algorithms have made tremendous progress for the VHPE, lots of these approaches to this task implicitly model the long-range interaction between joints by expanding the receptive field of the convolution or designing a graph manually. Unlike prior methods, we design a lightweight and plug-and-play joint relation extractor (JRE) to explicitly and automatically model the associative relationship between joints. The JRE takes the pseudo heatmaps of joints as input and calculates their similarity. In this way, the JRE can flexibly learn the correlation between any two joints, allowing it to learn the rich spatial configuration of human poses. Furthermore, the JRE can infer invisible joints according to the correlation between joints, which is beneficial for locating occluded joints. Then, combined with temporal semantic continuity modeling, we propose a Relation-based Pose Semantics Transfer Network (RPSTN) for video-based human pose estimation. Specifically, to capture the temporal dynamics of poses, the pose semantic information of the current frame is transferred to the next with a joint relation guided pose semantics propagator (JRPSP). The JRPSP can transfer the pose semantic features from the non-occluded frame to the occluded frame. The proposed RPSTN achieves state-of-the-art or competitive results on the video-based Penn Action, Sub-JHMDB, PoseTrack2018, and HiEve datasets. Moreover, the proposed JRE improves the performance of backbones on the image-based COCO2017 dataset. Code is available at https://github.com/YHDang/pose-estimation .
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []