Fixing Defect of Photometric Loss for Self-Supervised Monocular Depth Estimation

2021 
View-synthesis-based methods have shown very promising results for the task of unsupervised depth estimation in single images. Most existing approaches synthesize a new image and employ it as the supervision signal for depth and pose prediction. There are two problems in these approaches: 1) There are many combinations of pose and depth that can synthesize a certain new image; therefore, reconstructing the depth and pose based on the view-synthesis method from only two images is an inherently ill-posed problem; 2) The model is trained under the photometric consistency assumption that the brightness or gradient is constant when applied to the video sequences. However, this assumption is easily violated in realistic scenes due to light changes, reflective surfaces and occlusions. To overcome the first drawback, we exploit the point cloud consistency constraint to eliminate ambiguity. To overcome the second drawback, we use threshold masks to filter dynamic and occluded points and introduce matching point constraints that implicitly encode the geometry relationship between two matched points to improve the precision of depth prediction. In addition, we employ epipolar constraints to compensate for the instability of the photometric error in textureless regions and varying illumination conditions. The experimental results on the KITTI, Cityscapes and NYUv2 datasets show that the method can improve the accuracy of depth prediction and enhance the robustness of the model in handling textureless regions and illumination changes. The code and data are available at https://github.com/XTUPRLAB/FixUnDepth.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    2
    Citations
    NaN
    KQI
    []