On Exploring Pose Estimation as an Auxiliary Learning Task for Visible-Infrared Person Re-Identificationon Exploring Pose Estimation as an Auxiliary Learning Task
0
Citation
0
Reference
10
Related Paper
Abstract:
Visible-infrared person re-identification (VI-ReID) has been challenging due to the existence of large discrepancies between visible and infrared modalities. Most pioneering approaches reduce intra-modality variations and inter-modality discrepancies by learning modality-shared features. However, an explicit modality-shared cue, i.e., body keypoints, has not been fully exploited in VI-ReID. Additionally, existing feature learning paradigms imposed constraints on either global features or partitioned feature stripes, which neglect the prediction consistency of global and part features. To address the above problems, we exploit Pose Estimation as an auxiliary learning task to assist VI-ReID in an end-to-end framework. By jointly training these two tasks in a mutually beneficial manner, our model learns higher quality ID-related features. On top of it, the learnings of global features and local features are seamlessly synchronized by Hierarchical Feature Constraint (HFC), where the former supervises the latter using the knowledge distillation strategy. Experimental results on two benchmark VI-ReID datasets show that the proposed method consistently improves state-of-the-art methods by significant margins. Specifically, our method achieves nearly 20%Â mAP improvements against the state-of-the-art method on the RegDB dataset. Our intriguing findings highlight the usage of auxiliary task learning in VI-ReID. Our source code is available at https://github.com/yoqim/Pose\_VIReIDOver the past decade, there has been a growing interest in human pose estimation. Although much work has been done on 2D pose estimation, 3D pose estimation has still been relatively studied less. In this paper, we propose a top-bottom based two-stage 3D estimation framework. GloabalNet and RefineNet in our 2D pose estimation process enable us to find occluded or invisible 2D joints while 2D-to-3D pose estimator composed of residual blocks is used to lift 2D joints to 3D joints effectively. The proposed method achieves promising results with mean per joint position error at 42.39 on the validation dataset on `3D Human Pose Estimation within the ECCV 2018 PoseTrack Challenge.'
Lift (data mining)
Pyramid (geometry)
Cite
Citations (3)
Abstract With the recent increase in interest in machine learning and computer vision, camera-based pose estimation has emerged as a promising new technology. One of the most popular libraries for camera-based pose estimation is MediaPipe Pose due to its computational efficiency, ease of use, and the fact that it is open-source. However, little work has been performed to establish how accurate the library is and whether it is suitable for usage in, for example, physical therapy. This paper aims to provide an initial assessment of this. We find that the pose estimation is highly dependent on the camera’s viewing angle as well as the performed exercise. While high accuracy can be achieved under optimal conditions, the accuracy quickly decreases when the conditions are less favourable.
Cite
Citations (11)
Multi-task learning (MTL) in deep neural networks for NLP has recently received increasing interest due to some compelling benefits, including its potential to efficiently regularize models and to reduce the need for labeled data. While it has brought significant improvements in a number of NLP tasks, mixed results have been reported, and little is known about the conditions under which MTL leads to gains in NLP. This paper sheds light on the specific task relations that can lead to gains from MTL models over single-task setups.
Deep Neural Networks
Labeled data
Cite
Citations (15)
Abstract Human pose estimation is a technique, which identifies the human body’s landmarks in images and videos. Human pose estimation can be divided into single person pose estimation and multi-person pose estimation, also an estimated human poses in crowded places as well as in videos. Depends upon the application such as activity recognition, Animation, Sports, Augmented reality, etc., Pose estimation output can be in 2D or 3D coordinate format. 3D pose estimation is estimated considering joint angles in 2D. Some challenges like small and barely visible joints, strong articulations, occlusions, clothing, and lighting changes increase difficulty in estimating pose. Remarkable progress has gained in the field of human pose estimation using Deep learning-based CNN models. In this paper, we compare and summarized various deep learning models for pose estimation of a single person and multi-person.
Cite
Citations (5)
Most existing techniques for articulated Human Pose Estimation (HPE)consider each person independently. Here we tackle the problem in a new setting,coined Human Pose Coestimation (PCE), where multiple people are in a common,but unknown pose. The task of PCE is to estimate their poses jointly and toproduce prototypes characterizing the shared pose. Since the poses of the individual people should be similar to the prototype, PCE has less freedom compared to estimating each pose independently, which simplifies the problem.We demonstrate our PCE technique on two applications. The first is estimating the pose of people performing the same activity synchronously, such as during aerobics, cheerleading, and dancing in a group. We show that PCE improves pose estimation accuracy over estimating each person independently. The second application is learning prototype poses characterizing a pose class directly from an image search engine queried by the class name (e.g., “lotus pose”). We show that PCE leads to better pose estimation in such images, and it learns meaningful prototypes which can be used as priors for pose estimation in novel images.
Cite
Citations (37)
In this work, we introduce pose interpreter networks for 6-DoF object pose estimation. In contrast to other CNN-based approaches to pose estimation that require expensively annotated object pose data, our pose interpreter network is trained entirely on synthetic pose data. We use object masks as an intermediate representation to bridge real and synthetic. We show that when combined with a segmentation model trained on RGB images, our synthetically trained pose interpreter network is able to generalize to real data. Our end-to-end system for object pose estimation runs in real-time (20 Hz) on live RGB data, without using depth information or ICP refinement.
RGB color model
Representation
Cite
Citations (39)
Previous expressive 3D human pose and mesh estimation methods mostly rely on a single image feature vector to predict 3D rotations of human joints (i.e., 3D rotational pose) from an input image. However, the single image feature vector lacks human joint-level features. To resolve the limitation, we present Pose2Pose, a 3D positional pose-guided 3D rotational pose prediction framework for expressive 3D human pose and mesh estimation. Pose2Pose extracts the joint-level features on the position of human joints (i.e., positional pose) using a positional pose-guided pooling, and the joint-level features are used for the 3D rotational pose prediction. Our Pose2Pose is trained in an end-to-end manner and largely outperforms previous expressive methods. The codes will be publicly available.
Feature (linguistics)
Pooling
Feature vector
Cite
Citations (19)
Multi-person pose estimation in images and videos is an important yet challenging task with many applications. Despite the large improvements in human pose estimation enabled by the development of convolutional neural networks, there still exist a lot of difficult cases where even the state-of-the-art models fail to correctly localize all body joints. This motivates the need for an additional refinement step that addresses these challenging cases and can be easily applied on top of any existing method. In this work, we introduce a pose refinement network (PoseRefiner) which takes as input both the image and a given pose estimate and learns to directly predict a refined pose by jointly reasoning about the input-output space. In order for the network to learn to refine incorrect body joint predictions, we employ a novel data augmentation scheme for training, where we model "hard" human pose cases. We evaluate our approach on four popular large-scale pose estimation benchmarks such as MPII Single- and Multi-Person Pose Estimation, PoseTrack Pose Estimation, and PoseTrack Pose Tracking, and report systematic improvement over the state of the art.
Tracking (education)
Cite
Citations (105)
We propose to leverage recent advances in reliable 2D pose estimation with Convolutional Neural Networks (CNN) to estimate the 3D pose of people from depth images in multi-person Human-Robot Interaction (HRI) scenarios. Our method is based on the observation that using the depth information to obtain 3D lifted points from 2D body landmark detections provides a rough estimate of the true 3D human pose, thus requiring only a refinement step. In that line our contributions are threefold. (i) we propose to perform 3D pose estimation from depth images by decoupling 2D pose estimation and 3D pose refinement; (ii) we propose a deep-learning approach that regresses the residual pose between the lifted 3D pose and the true 3D pose; (iii) we show that despite its simplicity, our approach achieves very competitive results both in accuracy and speed on two public datasets and is therefore appealing for multi-person HRI compared to recent state-of-the-art methods.
Leverage (statistics)
Landmark
Cite
Citations (0)
Human Pose Estimation is a fast developing field and lately gone forward with the advanced finding of the Kinect system. For 3D pose estimation, this system performs good but the 2D pose estimation has not solved yet. In Computer Vision, articulated body pose estimation, systems detect the pose of a human body, that consists of joints and flexible parts using. Human body pose estimation models are complex therefore it is one of longest-lasting problems in computer vision.There is a need to develop accurate articulated body pose estimation systems to detect the pose of bodies like hands, legs,head etc. Pose estimation has many applications that can benefit such as robotics, human computer interaction, video surveillance, multimedia, augmented reality, video retrieval and biometrics or intelligent surveillance. Images and videos can have many challenges like background clutters, varying lighting conditions, unconstrained clothing of the person, occlusion etc. A comparative study included in this paper mainly focusing on different 2D human pose estimation methods like pictorial structure, silhouette method, skeletonization, model for shape context matching, segmentation, features extraction and recognition tools, research advantages and drawbacks are provided as well.
Skeletonization
Human body
Cite
Citations (0)