Deep eyes: Joint depth inference using monocular and binocular cues

Depth estimation plays an important role in many applications of computer vision. Depth can be inferred from multiple images of same scene taken from different station points or from monocular cues associated with a single image. Stereo vision, an extensively used method for gathering depth information from 2D scene endures with view synthesis, which is 3D scene reconstruction. The automated systems such as robotics, constructing a 3D spatial model of a scene and tracking object in 3D space are some of the applications that rely on stereo vision. Stereo vision concept is exploited by many vision based remote control systems to access the machine in a touch-free environment. Most of the existing approaches use multiple images to perceive depth such as stereopsis, structure from motion, depth from defocus/focus, deep learning, machine learning.

Monocular

Depth map

Structure from Motion

Stereo cameras

3D Reconstruction

Computer stereo vision

Binocular disparity

Monocular vision

View synthesis

Source

Cite

Citations (0)

Salient region detection based on binocular vision

Zhong Liu Weihai Chen Yuhua Zou Xingming Wu

Selective visual attention is a kind of mechanism of the primate visual system for rapidly focusing on attractive objects or regions in visual environment. Numerous visual attention models have been developed and optimized over the past decades. Most of the existing models concentrate on static monocular image, but little attention has been devoted to stereo depth information which is an important aspect of human perception. A region-based binocular saliency detection approach considering depth information is proposed in this paper. The difference of left and right image is used for computing disparity map and coarse saliency map. Hue, saturation, and intensity (HSI) color space is adopted and mean-shift algorithm is used for image segmentation. This study shows that the proposed region-based saliency computational method can effectively detect salient region, and it is more suitable for real time applications such as obstacle detection and visual navigation for its simplicity.

Monocular

Binocular disparity

Hue

10.1109/iciea.2012.6361031

Cite

Citations (3)

Stereo Matching Using Relaxation Labeling Based on Edge and Orientation Features

Springer eBooks (1991)

Jesse S. Jin

Binocular disparity

Monocular

Feature (linguistics)

10.1007/978-1-4615-3390-0_36

Cite

Citations (0)

A Single Texture Object 3-D Depth Recovery

Lijing Zhang Xufeng Li

In this paper, we propose a method that combines the monocular vision with the binocular vision to process a single texture image that the feature points difficult to extract and restore the depth information of the inclined plane objects. Through integration of the two approaches, we combine the advantages of monocular vision in terms of the recovery of the basic structure of objects in the image with the advantages of binocular vision in terms of the accurate depth estimation after correctly determining the corresponding feature points, obtain high precision and accuracy depth estimates. Experiments show that application of this integration can obtain high-accuracy depth map. Moreover, a high quality 3-D reconstruction is also obtained from the accuracy depth map.

Monocular

Depth map

Feature (linguistics)

Monocular vision

Texture (cosmology)

10.1109/etcs.2010.467

Cite

Citations (0)

3D Perception Based Quality Pooling: Stereopsis, Binocular Rivalry, and Binocular Suppression

IEEE Journal of Selected Topics in Signal Processing (2015)

Kwanghyun Lee Sanghoon Lee

One of the most challenging ongoing issues in the field of 3D visual research is how to interpret human 3D perception over virtual 3D space between the human eye and a 3D display. When a human being perceives a 3D structure, the brain classifies the scene into the binocular or monocular vision region depending on the availability of binocular depth perception in the unit of a certain region (coarse 3D perception). The details of the scene are then perceived by applying visual sensitivity to the classified 3D structure (fine 3D perception) with reference to the fixation. Furthermore, we include the coarse and fine 3D perception in the quality assessment, and propose a human 3D Perception-based Stereo image quality pooling (3DPS) model. In 3DPS we divide the stereo image into segment units, and classify each segment as either the binocular or monocular vision region. We assess the stereo image according to the classification by applying different visual weights to the pooling method to achieve more accurate quality assessment. In particular, it is demonstrated that 3DPS performs remarkably for quality assessment of stereo images distorted by coding and transmission errors.

Binocular disparity

Binocular rivalry

Monocular

Stereo display

Pooling

10.1109/jstsp.2015.2393296

Cite

Citations (62)

A Survey on Depth Estimation for 3D Reconstruction of Images

Studies in Indian Place Names (2020)

A Shruthiba R Deepu

Monocular

Stereo cameras

Structure from Motion

Depth map

Computer stereo vision

3D Reconstruction

Active vision

Monocular vision

Binocular disparity

Source

Cite

Citations (0)

Active binocular stereo

Noriaki Maru Atsushi Nishikawa F. Miyazaki S. Arimoto

Passive binocular stereo tends to produce stereo correspondence errors and require much computation. A method which overcomes these drawbacks by moving the stereo camera actively is presented. The method utilizes a motion parallax acquired by a monocular motion stereo to restrict the search range of binocular disparity. Using only the uniqueness of disparity makes it possible to find reliable binocular disparity and occlusion very efficiently. Experimental results with complicated scenes are presented to demonstrate the effectiveness of this method.< >

Parallax

Monocular

Binocular disparity

10.1109/cvpr.1993.341143

Cite

Citations (5)

Depth estimation using monocular and stereo cues

International Joint Conference on Artificial Intelligence (2007)

Ashutosh Saxena Jamie Schulte Andrew Y. Ng

Depth estimation in computer vision and robotics is most commonly done via stereo vision (stereopsis), in which images from two cameras are used to triangulate and estimate distances. However, there are also numerous monocular visual cues--such as texture variations and gradients, defocus, color/haze, etc. --that have heretofore been little exploited in such systems. Some of these cues apply even in regions without texture, where stereo would work poorly. In this paper, we apply a Markov Random Field (MRF) learning algorithm to capture some of these monocular cues, and incorporate them into a stereo system. We show that by adding monocular cues to stereo (triangulation) ones, we obtain significantly more accurate depth estimates than is possible using either monocular or stereo cues alone. This holds true for a large variety of environments, including both indoor environments and unstructured outdoor environments containing trees/forests, buildings, etc. Our approach is general, and applies to incorporating monocular cues together with any off-the-shelf stereo system.

Monocular

Markov random field

Monocular vision

Computer stereo vision

Source

Cite

Citations (210)

3D Perception of Biomimetic Eye Based on Motion Vision and Stereo Vision

Wang Qingbi

In order to overcome the narrow visual field of binocular vision and the low precision of monocular vision, a binocular biomimetic eye platform with 4 rotational degrees of freedom is designed based on the structural characteristics of human eyes, so that the robot can achieve human-like environment perception with binocular stereo vision and monocular motion vision. Initial location and parameters calibration of the biomimetic eye platform are accomplished based on the vision alignment strategy and hand-eye calibration. The methods of binocular stereo perception and monocular motion stereo perception are given based on the dynamically changing external parameters. The former perceives the 3D information through the two images obtained by two cameras in real-time and their relative posture, and the latter perceives the 3D information by synthesize multiple images obtained by one camera and its corresponding postures at multiple adjacent moments.Experimental results shows that the relative perception accuracy of binocular vision is 0.38% and the relative perception accuracy of monocular motion vision is 0.82%. In conclusion, the method proposed can broaden the field of binocular vision,and ensure the accuracy of binocular perception and monocular motion perception.

Monocular

Monocular vision

Binocular disparity

Field of view

Cite

Citations (4)