Fully Convolutional Network for 3D Human Skeleton Estimation from a Single View for Action Analysis

2019 
Elderly health monitoring is important nowadays. This paper proposes to extract 3D skeleton for elderly for monitoring their daily behavior (such as walk, fall-down, etc.). Our technique uses two "Fully Convolutional Networks" (FCNs) to estimate 3D Human pose (skeleton) from the corresponding 2D pose (skeleton), which can be estimated from a single RGB-image. Our FCNs contain two-stages: the 1st stage is to estimate a 3D Anchor Pose (i.e., the most similar and frequently occurring pose in the dataset) from the 2D skeleton, while the 2nd stage is to further regress/refine the 3D anchor pose to its final state (parameters). We also apply global or object-centered normalization methods as a pre-processing step so as to be applicable when cameras of different FOV (Field of View), focal length, or object distances, are encountered. According to experiments, our two-stage FCNs are capable of achieving an MPJPE (Mean per joint position error) of 38.84 mm (better than other methods in comparison, 45.5 mm) when a 2D ground truth pose is used as the input. When cascaded with other 2D pose estimator (e.g., stacked Hourglass model), the average MPJPE is about 67.71 mm. Tis can be further improved if a 17-joints human skeleton model is adopted and re-trained based on the dataset H36M.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    4
    Citations
    NaN
    KQI
    []