Learning to Refine Human Pose Estimation
105
Citation
50
Reference
10
Related Paper
Citation Trend
Abstract:
Multi-person pose estimation in images and videos is an important yet challenging task with many applications. Despite the large improvements in human pose estimation enabled by the development of convolutional neural networks, there still exist a lot of difficult cases where even the state-of-the-art models fail to correctly localize all body joints. This motivates the need for an additional refinement step that addresses these challenging cases and can be easily applied on top of any existing method. In this work, we introduce a pose refinement network (PoseRefiner) which takes as input both the image and a given pose estimate and learns to directly predict a refined pose by jointly reasoning about the input-output space. In order for the network to learn to refine incorrect body joint predictions, we employ a novel data augmentation scheme for training, where we model "hard" human pose cases. We evaluate our approach on four popular large-scale pose estimation benchmarks such as MPII Single- and Multi-Person Pose Estimation, PoseTrack Pose Estimation, and PoseTrack Pose Tracking, and report systematic improvement over the state of the art.Keywords:
Tracking (education)
We have presented a framework to obtain camera pose (i.e., position and orientation in the 3D space) with real scale information of the uncalibrated multi-view images and the intrinsic camera parameters automatically. Our framework consists of two key steps. First, the initial value of the intrinsic camera and the pose parameters were extracted from homography estimation based on the contour model of some planar objects. Second, a refinement of the intrinsic camera and pose parameters was operated by the bundle adjustment procedure. Our framework can provide a complete flow of pose estimation of disorderly or orderly uncalibrated multi-view images, which can be used in vision tasks requiring scale information. Real multi-view images were utilized to demonstrate the robustness, flexibility and accuracy of the proposed framework. The proposed framework was also applied in 3D reconstruction.
Bundle adjustment
Robustness
Homography
Cite
Citations (2)
In this paper, we present a novel system for simultaneously performing segmentation and 2D pose motion recovery for the articulated object in a video sequence. The system first preprocesses pixels into superpixels to reduce the number of nodes which largely affects the computational complexity of later optimizations. By starting from true pose estimation obtained with user assistants on each key frame, a parallel pose tracking procedure, whose energy function considers boundary, appearance and pose prior information as well, is conducted forward and backward on in-between frames. With different searching strategies, multiple pose candidates are inferred to help recover missed true poses. Finally, by solving the cost function of the pose motion recovery, which exploits the temporal coherence of object movement, the pose motion and the video object are produced at the mean time. As a parameterized tree-based articulated model drawn by the user is applied to denote the pose, our method is generic and can be used for any articulated object.
Cite
Citations (0)
The problem of identifying the 3D pose of a known object from a given 2D image has important applications in Computer Vision. Our proposed method of registering a 3D model of a known object on a given 2D photo of the object has numerous advantages over existing methods. It does not require prior training, knowledge of the camera parameters, explicit point correspondences or matching features between the image and model. Unlike techniques that estimate a partial 3D pose (as in an overhead view of traffic or machine parts on a conveyor belt), our method estimates the complete 3D pose of the object. It works on a single static image from a given view under varying and unknown lighting conditions. For this purpose we derive a novel illumination-invariant distance measure between the 2D photo and projected 3D model, which is then minimised to find the best pose parameters. Results for vehicle pose detection in real photographs are presented.
Monocular vision
Monocular
Point set registration
Cite
Citations (6)
In the field of human-robot interaction (HRI), recognition of humans in a robot’s surroundings is a crucial task. Besides the localization, the estimation of a person’s 3D pose based on monocular camera images is a challenging problem on a mobile platform. For this purpose, an appearancebased approach, using a 3D model of the human upper body, has been developed end experimentally investigated. For a real time tracking, the state of the person is estimated by a particle filter tracker, which uses different observation models for evaluating pose hypotheses. The 6D body pose is modeled by 4 parameters for the torso position and orientation as well as 2 for the head pan and tilt. In order to achieve real time operation, a smooth fit value function simplifies the particle filter’s convergence. Futhermore, a sparse feature based model eliminates the need for computationally expensive geometric transformations of the image, as required by conventional Active Appearance Models (AAM). The initialization problem of the pose tracker is overcome by integrating a Histograms of Oriented Gradients (HOG) detector.
Initialization
Torso
Tracking (education)
Feature (linguistics)
Cite
Citations (0)
The paper presented vector-based pose estimation in handling self-occlusion and foreshortening. Vector-based pose estimation basically based on 3D pose estimation from the image of user's motion captured by a monocular camera. Based on the method, a 3D human full body model is constructed. The silhouettes extraction from the image captured is being matched with the projection on a virtual image plane. Multipart alignment will be used to adjust 3D pose of the graphical 3D human full body based on silhouettes extraction. Each of body part alignment will be used to define a set of vector of body part represented as a skeleton model. The obtained model is used to identify the human pose and the associated 3D motion parameters. The vector for each body part is used as 3D pose information to achieve the marker-free interaction in the augmented environment and expected to enhance the tracking accuracy on monocular 3D pose estimation. In future work, this method will be applied for interaction in augmented reality (AR) environment.
Monocular
Tracking (education)
Cite
Citations (0)
Head pose is an important indicator of a person's attention, gestures, and communicative behavior with applications in human computer interaction, multimedia and vision systems. In this paper, we present a novel head pose estimation system by performing head region detection using the Kinect [2], followed by face detection, feature tracking, and finally head pose estimation using an active camera. Ten feature points on the face are defined and tracked by an Active Appearance Model (AAM). We propose to use the scene flow approach to estimate the head pose from 2D video sequences. This estimation is based upon a generic 3D head model through the prior knowledge of the head shape and the geometric relationship between the 2D images and a 3D generic model. We have tested our head pose estimation algorithm with various cameras at various distances in real time. The experiments demonstrate the feasibility and advantages of our system.
Feature (linguistics)
Human head
Optical Flow
Tracking (education)
Cite
Citations (17)
This paper addresses tracking and 3D pose estimation of human faces with large pose and expression changes in video sequences obtained from an un-calibrated monocular camera. The classical pose estimation methods suffer from two disadvantages: (1) a 3D head model or a reference frame is always needed and the camera should be calibrated in advance; (2) it is difficult to deal with non-rigid motion, which is very common for human faces. In this paper, we present a pose estimation system, which is able to overcome the above disadvantages. For each frame, a 2D active appearance model is adopted to reliably track the face and facial features with large pose and expression variations. Then we utilize a recently developed non-rigid structure from motion (SFM) technique to recover the 3D face shape. Instead of direct using the rotation matrix resulted from SFM, we propose a method to use robust statistics and 3D-2D feature point correspondence to accurately recover the 3D head pose. Our experiments have demonstrated the effectiveness and efficiency of the approach.
Feature (linguistics)
Structure from Motion
Monocular
Tracking (education)
Cite
Citations (11)
Conventional edge-based object pose estimation in robot vision shows low accuracy and slow convergence performance. In this paper, a corner-based object pose estimation method is proposed and studied. The classical pinhole camera model and model-based object pose estimation or tracking are employ, as well as the iterative optimization for accurate estimation. In our method, a new corner matching strategy is proposed since good corresponding is easy to be met for image corners. Then, good fitting results between the CAD model and object in the images can be obtained. Experimental results show that the new method can achieve more accurate and faster pose estimation than the conventional methods.
Tracking (education)
Cite
Citations (0)
We address the problem of model-based pose estimation from image sequences. While most methods build on local features, we use object silhouettes only, which are weaker, but considerably more robust cues. Without initialization of the pose, we are able to track the pose of rigid models through a video sequence, despite varying texture, illumination and appearance. Additionally our method handles multiple objects inherently and jointly estimates pose and object type. The method works at interactive frame rates, which makes it an ideal tool for augmented reality applications, active inspection systems and robotic manipulation tasks.
Initialization
Active appearance model
Cite
Citations (1)
This paper presents a novel approach to estimate 3D head pose dynamically from a sequence of input images. The exact head pose estimation and facial motion tracking are critical problems to be solved in developing a vision based human computer interaction system. Given an initial reference template of head image and corresponding head pose, the full head motion is recovered by using a cylindrical head model. By updating the template dynamically in order to accommodate gradual changes in lighting, it is possible to recover head pose robustly regardless of light variation and self-occlusion. For this, we adopt optical flow along with iteratively re-weighted least square technique. From the experiments, we can show the proposed approach efficiently estimate 3D head pose.
Optical Flow
Tracking (education)
Facial motion capture
Human head
Cite
Citations (12)