ETS-3D: An Efficient Two-Stage Framework for Stereo 3D Object Detection

2022 
We propose an efficient two-stage framework for stereo 3D object detection, called ETS-3D. Contrary to many recent approaches that rely on depth maps predicted using time-consuming stereo matching models, our approach utilizes the well-designed features to generate high-quality 3D proposals in stage-1, without explicitly exploiting predicted depth map. Specifically, we leverage pixel-wise correlation to produce normalized cost volumes to weight the left image features, and fuse multi-scale weighted features to obtain the weighted and fused features for 3D proposal generation. To maintain fast computation, only the filtered positive 3D proposals are fed into the stage-2 sub-network for further proposal refinement and quality prediction. Furthermore, we reconstruct the 3D proposal features in stage-2 to make use of different feature representations, achieving more accurate detection results. The experimental results on the KITTI 3D object detection benchmark demonstrate that our method achieves state-of-the-art performance, and can run at more than 10 fps.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []