Joint Hand-Object Pose Estimation with Differentiably-Learned Physical Contact Point Analysis

2021 
Hand-object pose estimation aims to jointly estimate 3D poses of hands and the held objects. During the interaction between hands and objects, the position and motion of keypoints in hands and objects are tightly related and there naturally exist some physical restrictions, which is usually ignored by most previous methods. To address this issue, we propose a learnable physical affinity loss to regularize the joint estimation of hand and object poses. The physical constraints mainly focus on enhancing the stability of grasping, which is the most common interaction manner between hands and objects. Together with the physical affinity loss, a context-aware graph network is also proposed to jointly learn independent geometry prior and interaction messages. The whole pipeline consists of two components. First an image encoder is used to predict 2D keypoints from RGB image and then a contextual graph module is designed to convert 2D keypoints into 3D estimations. Our graph module treats the keypoints of hands and objects as two sub-graphs and estimates initial 3D coordinates according to their topology structure separately. Then the two sub-graphs are merged into a whole graph to capture the interaction information and further refine the 3D estimation results. Experimental results show that both our physical affinity loss and our context-aware graph network can effectively capture the relationship and improve the accuracy of 3D pose estimation.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    0
    Citations
    NaN
    KQI
    []