The computation of the disparity for the pixels in the weak texture area has always been a difficult task in stereo vision. The non-local method based on a minimum spanning tree (MST) provides a solution to construct content-adaptive support regions to perform cost aggregation. However, it always introduces error disparity in slanted surfaces and is sensitive to noise and highly textural regions. The window-based methods are not effective for information dissemination. To overcome the problem mentioned above, this paper proposes an approximate geodesic distance tree filter, which utilizes geodesic distance as a pixels similarity metric and recursive techniques to perform the filtering process. The filtering process is performed recursively in four directions (namely from top-left, top-right, and vice versa), which make our filter with linear complexity. Our filter has advantages in the sense that: (1) the pixel similarity metric is approximated geodesic distance; (2) the computational complexity is linear to the image pixel. Due to these reasons, the proposed method can properly cope with cost aggregation in the textureless regions and preserve the boundary of disparity maps. We demonstrate the strength of our filter in several applications.
Abstract The desired solution to many labelling problems in computer vision is a spatially smooth result where label changes are aligned with the edges of the guidance image. It can be obtained traditionally by smoothing the label costs using edge‐aware filters. However, local filters incorporate the information in a local support region to obtain locally‐optimal and non‐local tree‐based filters, which often overuse piece‐wise constant assumptions. In this paper, we propose a spatial‐tree filter for cost aggregation. The tree model incorporates the spatial affinity into the tree structure. The tree distance between two pixels on our spatial tree is an approximated geodesic distance, which acts as a pixel similarity metric. The filtering process was implemented by recursively techniques in four directions: Top‐to‐bottom, left‐to‐right, and vice‐versa. Thus, the complexity of our approach is linear to the number of image pixels. Extensive experiments demonstrate the effectiveness and efficiency of our spatial‐tree filter in image smoothing and stereo matching. Our method performs better than the existing tree‐based non‐local method in cost aggregation.
Self-supervised learning has shown very promising results for monocular depth estimation. Scene structure and local details both are significant clues for high-quality depth estimation. Recent works suffer from the lack of explicit modeling of scene structure and proper handling of details information, which leads to a performance bottleneck and blurry artefacts in predicted results. In this paper, we propose the Channel-wise Attention-based Depth Estimation Network (CADepth-Net) with two effective contributions: 1) The structure perception module employs the self-attention mechanism to capture long-range dependencies and aggregates discriminative features in channel dimensions, explicitly enhances the perception of scene structure, obtains the better scene understanding and rich feature representation. 2) The detail emphasis module re-calibrates channel-wise feature maps and selectively emphasizes the informative features, aiming to highlight crucial local details information and fuse different level features more efficiently, resulting in more precise and sharper depth prediction. Furthermore, the extensive experiments validate the effectiveness of our method and show that our model achieves the state-of-the-art results on the KITTI benchmark and Make3D datasets.
The accuracy and speed of semi-global matching (SGM) make it widely used in many computer vision problems. However, SGM often struggles in dealing with pixels in the homogeneous regions and also suffers from streak artefacts for weak smoothness constraints. Meanwhile, we observe that the global method usually fails in occluded areas. The disparities for occluded pixels are typically the average of the disparity of nearby pixels. The local method can propagate the information into occluded pixels with a similar color. In this paper, we propose a novel, to the best of our knowledge, four-direction global matching with a cost volume update scheme to cope with textureless regions and occlusion. The proposed method makes two changes in the recursive formula: a) the computation process considers four visited nodes to enforce more smooth constraints; b) the recursive formula integrates cost filtering to propagate reliable information farther in nontextured regions. Thus, our method can inherit the speed of SGM, properly avoid streaking artefacts, and deal with the occluded pixel. Extensive experiments in stereo matching on Middlebury demonstrate that our method outperforms typical SGM-based cost aggregation approaches and other state-of-the-art local methods.
Self-supervised learning has shown very promising results for monocular depth estimation. Scene structure and local details both are significant clues for high-quality depth estimation. Recent works suffer from the lack of explicit modeling of scene structure and proper handling of details information, which leads to a performance bottleneck and blurry artefacts in predicted results. In this paper, we propose the Channel-wise Attention-based Depth Estimation Network (CADepth-Net) with two effective contributions: 1) The structure perception module employs the self-attention mechanism to capture long-range dependencies and aggregates discriminative features in channel dimensions, explicitly enhances the perception of scene structure, obtains the better scene understanding and rich feature representation. 2) The detail emphasis module re-calibrates channel-wise feature maps and selectively emphasizes the informative features, aiming to highlight crucial local details information and fuse different level features more efficiently, resulting in more precise and sharper depth prediction. Furthermore, the extensive experiments validate the effectiveness of our method and show that our model achieves the state-of-the-art results on the KITTI benchmark and Make3D datasets.
Efficiency and accuracy of semi-global matching (SGM) make it outperform many stereo matching algorithms and is widely used under challenging occasions. However, SGM only incorporates information along a scanline in each pass and lacks interaction between scanlines, resulting in streak artifacts in the disparity image. We introduce a local edge-aware filtering method to SGM to enhance the interaction of neighboring scanlines, since streak artifacts can be avoided. We use bilateral weights based on intensity similarity and spatial affinity between pixels to build connections among scanlines. In each pass, we recursively estimate the aggregated cost of SGM and compute the weighted average of aggregated costs for pixels in the orthogonal direction to obtain the output of our method along each scanline. As one-dimensional bilateral filtering is used in our method, the extra computation is linear to image resolution and label space, which is a small fraction of that needed by SGM. We present ablation studies using stereo pairs under both constrained and natural conditions to verify the effectiveness of our method. Extensive experiments on Middlebury and Karlsruhe Institute of Technology and Toyota Technology Institute datasets demonstrate that our method removes all streak artifacts, improves the quality of the disparity image, and outperforms many other non-local cost aggregation approaches.
How to overcome the noise interference between rooms and obtain a complete and accurate segmentation result is not only a research hotspot, but also a necessity for indoor navigation and 3D reconstruction. Walls are basic components of a building, and are also natural boundaries between rooms. From this perspective, this paper proposes a fully automatic room segmentation approach based on wall constraints using point clouds. Firstly, wall centerlines are extracted as boundaries between rooms to cut off their connection. Then, missing walls are inferenced based on the Manhattan world assumption to obtain a relatively accurate initial segmentation of point clouds. Finally, a global segmentation optimization is conducted by formulating room segmentation as a Markov Random Field optimization problem, which is efficiently solved by the graph cut algorithm. Our method is compared with the state-of-the-art methods on multiple datasets which have various point densities and noise levels. Experiments show that our method significantly outperforms the-state-of-art methods in terms of precision, recall and IoU.
Abstract It is well-known that the minimum spanning tree (MST) is widely used in image segment, edge-preserving filtering, and stereo matching. However, the non-local (NL) filter based on the MST generally results in overly smooth images, since it ignores spatial affinity. In this paper, we propose a new spatial minimum spanning tree filter (SMSTF) to improve the performance of the NL filter by designing a spatial MST to avoid over-smoothing problems, by introducing recursive techniques to implement the filtering process. The SMSTF has the advantages that: (1) the kernel of our filter considers spatial affinity and similarity of intensity; (2) The size of the filter kernel is the entire image domain; (3) the complexity of the SMSTF is linear to the number of image pixels. For these reasons, our filter achieves excellent edge-preserving results. Extensive experiments demonstrate the versatility of the proposed method in a variety of image processing and computer vision tasks, including edge-preserving smoothing, stylization, colorization, and stereo matching.
It is challenging to extend the support region of state-of-the-art local edge-preserving filtering approaches to the entire input image on account of huge memory cost and heavy computational burden. In this paper, we propose an O(N) time recursive non-local edge-aware filter. A novel graph and a linear time algorithm are presented to effectively propagate information across the entire image. In this graph, information is propagated along all directions to avoid visual artifacts. Both the intensity similarity and the spatial affinity are utilized to estimate the similarity of neighboring pixels, and the distance of any two pixels is the length of the path between these two pixels on a binary tree. Specifically, the input image is filtered at four directions, namely left-to-right, right-to-left, top-to-bottom and bottom-to-top. In each pass, the un-normalized output and the normalization constant of the root node are computed recursively from leaf nodes to the root node on a binary tree in linear time. The filtering output is the average of the outputs for these four directions. A comparison with other non-local edge-aware filters is presented to show the advantages of our approach. Extensive experiments demonstrate that our recursive non-local edge-preserving filter is effective in a variety of computer vision and image processing applications, including edge-aware filtering, detail enhancement, stylization, and stereo matching.