RVDet: Feature-level Fusion of Radar and Camera for Object Detection

2021 
Obstacle perception based on radar sensor has drawn wide attentions in autonomous driving due to robust performance and low cost. It is significant to utilize fusion, e.g., camera information, to further enhance the radar perception ability. Although much progress has been made, we still observe two problems: First, the spatial alignment among multi-modal data is intractable when involving multiple radar and camera sensors. Second, most existing works are based on object-level fusion, which inevitably has information loss leading to a performance degradation. To this end, we propose a feature-level fusion detection framework based on multiple radars and cameras, termed as the RVDet. We first establish an occupancy grid map by using 4 corner radars and extract radar features in the bird's eye view(BEV). Meantime, the image features of 4 fish-eye cameras are obtained using a pretraining vision detection model. Then, an adaptive projection network is employed to transform all the 4 image features to the BEV domain and integrate them to a dense spatial feature map aligned with the radar feature. Last, the carefully aligned multi-modal feature maps are jointly sent to a deep fusion network to predict final fused detection results. Experiments show that both object detection and positioning performance achieve significant gains by the proposed method in a custom dataset.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    0
    Citations
    NaN
    KQI
    []