Human Pose Estimation via Multi-Scale Intermediate Supervision Convolution Network

2019 
Human pose estimation is a technique for locating the key points of a human in the picture and video. It can be used in the fields of human-computer interaction, motion recognition, Character behavior analysis and so on. At present, the supervision information is a series of single-scale heat map without matching the multi-scale key points of ground truth while training the CNN model. This type of supervision makes it more likely that the predicted key points will deviate from the real position. To improve the prediction accuracy, this paper proposes a novel model named human pose estimation via multi-scale intermediate supervision convolution network. Heatmaps are generated on three scale which are determined by the standard deviation of 2D Gaussian distribution. Residual network model is composed of three stage using ResNet50 as the backbone network. Each stage includes a ResNet50 and three deconvolution layers. The output from ResNet50 in three stages corresponds to the heat maps annotation of large, medium and small sizes respectively, and the intermediate supervision is realized twice in the output of the first and second stages. In the test phase, the output from the last stage is used for calculating the final key point's coordinates by non-maximum suppression. To demonstrate effectiveness of our network, two benchmark datasets are used for training and testing: the key point detection subset of COCO dataset and MPII Human Pose dataset. The test result of PCK@0.1 reached 37.2% on the MPII validation dataset, which is 2.1% higher than other methods. The results of the mAP test on the COCO validation dataset reached 75.5%, an increase of 1.2% compared with other methods. The results indicate that the multi-scale relay supervised convolutional network model proposed in this paper can reduce the influence of the non-correspondence between the size of key points and the size of heatmap ground truth in human pose estimation, thus improving the accuracy and achieving better performance when the evaluation criteria are stricter.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    0
    Citations
    NaN
    KQI
    []