Real-time Monocular 3D People Localization and Tracking on Embedded System

2021 
Localizing people in 3D space, rather than in original 2D image plane, provides a more comprehensive understanding of the scene and brings up more potential applications. However, inferring 3D locations usually requires stereo camera or additional sensors since deriving depth information from single image is regarded as an ill-posed problem. With recent progress in deep learning methods, depth estimation neural network can provide convincing depth map by a single RGB image. This work develops a people localization and tracking method based on a monocular camera. Specifically, an efficient self-supervised monocular depth estimation method is adopted to generate pseudo depth map. Afterwards, 2D object detection results are adopted for finding accurate people location. Finally, a filter based tracking method is adopted to fuse temporal information and improve the accuracy. Aiming to provide a real time solution for people tracking on embedded system, our methods are deployed and tested on a NVIDIA Jetson Xavier NX develop kit. The proposed efficient localization and tracking method is validated by a group of field tests. The overall performance reaches 12 fps with an acceptable accuracy compared to ground truth.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    0
    Citations
    NaN
    KQI
    []