A Novel Approach of Intelligent Computing for Multiperson Pose Estimation with Deep High Spatial Resolution and Multiscale Features

2021 
Currently, human pose estimation (HPE) methods mainly rely on the design framework of Convolutional Neural Networks (CNNs). These CNNs typically consist of high-to-low-resolution subnetworks (encoder) to learn semantic information and low-to-high subnetworks (decoder) to raise the resolution for keypoint localization. Because too low-resolution feature maps in encoder will inevitably lose some spatial information, which cannot be recovered in the upsampling stages, keeping high spatial resolution features is critical for human pose estimation. On the other hand, due to scale variation of human body parts, multiscale features are also very important for human pose estimation. In this paper, a novel backbone network is proposed specifically for HPE, named High Spatial Resolution and Multiscale Networks (HSR-MSNet), which maintain high spatial resolution features in deeper layers of the encoder and meanwhile construct multiscale features within one single residual block via subgroup splitting and fusion of feature maps. Experiments show that our approach outperforms other state-of-the-art methods with more accurate keypoint locations on COCO dataset.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    31
    References
    0
    Citations
    NaN
    KQI
    []