Learning Cross-Domain Descriptors for 2D-3D Matching with Hard Triplet Loss and Spatial Transformer Network.

2021 
The 2D-3D matching determine the spatial relationship between 2D and 3D space, which can be used for Augmented Reality (AR) and robot pose estimation, and provides support for multi-sensor fusion. Specifically, the cross-domain descriptor extraction between 2D images and 3D point clouds is a solution to achieve 2D-3D matching. Essentially, the 3D point cloud volumes and 2D image patches can be sampled based on the keypoints of 3D point clouds and 2D images, which are used to learn the cross-domain descriptors for 2D-3D matching. However, it is difficult to achieve 2D-3D matching by using handcrafted descriptors; meanwhile, the cross-domain descriptors based on learning is vulnerable to translation, scale, rotation of cross-domain data. In this paper, we propose a novel network, HAS-Net, for learning cross-domain descriptors to achieve 2D image patch and 3D point cloud volume matching. The HAS-Net introduces the spatial transformer network (STN) to overcome the translation, scale, rotation and more generic warping of 2D image patches. In addition, the HAS-Net uses the negative sample sampling strategy of hard triplet loss to solve the uncertainty of randomly sampling negative samples during training, thereby improving the ability to distinguish hardest samples. Experiments demonstrate the superiority of HAS-Net on the 2D-3D retrieval and matching. To demonstrate the robustness of the learned descriptors, the 3D descriptors of cross-domain descriptors learned by HAS-Net are applied in 3D global registration.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    0
    Citations
    NaN
    KQI
    []