logo
    Deep eyes: Joint depth inference using monocular and binocular cues
    5
    Citation
    66
    Reference
    10
    Related Paper
    Citation Trend
    Depth estimation plays an important role in many applications of computer vision. Depth can be inferred from multiple images of same scene taken from different station points or from monocular cues associated with a single image. Stereo vision, an extensively used method for gathering depth information from 2D scene endures with view synthesis, which is 3D scene reconstruction. The automated systems such as robotics, constructing a 3D spatial model of a scene and tracking object in 3D space are some of the applications that rely on stereo vision. Stereo vision concept is exploited by many vision based remote control systems to access the machine in a touch-free environment. Most of the existing approaches use multiple images to perceive depth such as stereopsis, structure from motion, depth from defocus/focus, deep learning, machine learning.
    Monocular
    Depth map
    Structure from Motion
    Stereo cameras
    3D Reconstruction
    Computer stereo vision
    Binocular disparity
    Monocular vision
    View synthesis
    Citations (0)
    Selective visual attention is a kind of mechanism of the primate visual system for rapidly focusing on attractive objects or regions in visual environment. Numerous visual attention models have been developed and optimized over the past decades. Most of the existing models concentrate on static monocular image, but little attention has been devoted to stereo depth information which is an important aspect of human perception. A region-based binocular saliency detection approach considering depth information is proposed in this paper. The difference of left and right image is used for computing disparity map and coarse saliency map. Hue, saturation, and intensity (HSI) color space is adopted and mean-shift algorithm is used for image segmentation. This study shows that the proposed region-based saliency computational method can effectively detect salient region, and it is more suitable for real time applications such as obstacle detection and visual navigation for its simplicity.
    Monocular
    Binocular disparity
    Hue
    In this paper, we propose a method that combines the monocular vision with the binocular vision to process a single texture image that the feature points difficult to extract and restore the depth information of the inclined plane objects. Through integration of the two approaches, we combine the advantages of monocular vision in terms of the recovery of the basic structure of objects in the image with the advantages of binocular vision in terms of the accurate depth estimation after correctly determining the corresponding feature points, obtain high precision and accuracy depth estimates. Experiments show that application of this integration can obtain high-accuracy depth map. Moreover, a high quality 3-D reconstruction is also obtained from the accuracy depth map.
    Monocular
    Depth map
    Feature (linguistics)
    Monocular vision
    Texture (cosmology)
    Citations (0)
    One of the most challenging ongoing issues in the field of 3D visual research is how to interpret human 3D perception over virtual 3D space between the human eye and a 3D display. When a human being perceives a 3D structure, the brain classifies the scene into the binocular or monocular vision region depending on the availability of binocular depth perception in the unit of a certain region (coarse 3D perception). The details of the scene are then perceived by applying visual sensitivity to the classified 3D structure (fine 3D perception) with reference to the fixation. Furthermore, we include the coarse and fine 3D perception in the quality assessment, and propose a human 3D Perception-based Stereo image quality pooling (3DPS) model. In 3DPS we divide the stereo image into segment units, and classify each segment as either the binocular or monocular vision region. We assess the stereo image according to the classification by applying different visual weights to the pooling method to achieve more accurate quality assessment. In particular, it is demonstrated that 3DPS performs remarkably for quality assessment of stereo images distorted by coding and transmission errors.
    Binocular disparity
    Binocular rivalry
    Monocular
    Stereo display
    Pooling
    Citations (62)
    Depth estimation plays an important role in many applications of computer vision.  Depth can be inferred from multiple images of same scene taken from different station points or from monocular cues associated with a single image. Stereo vision, an extensively used method for gathering depth information from 2D scene endures with view synthesis, which is 3D scene reconstruction. The automated systems such as robotics, constructing a 3D spatial model of a scene and tracking object in 3D space are some of the applications that rely on stereo vision. Stereo vision concept is exploited by many vision based remote control systems to access the machine in a touch-free environment. Most of the existing approaches use multiple images to perceive depth such as stereopsis, structure from motion, depth from defocus/focus, deep learning, machine learning.
    Monocular
    Stereo cameras
    Structure from Motion
    Depth map
    Computer stereo vision
    3D Reconstruction
    Active vision
    Monocular vision
    Binocular disparity
    Citations (0)
    Passive binocular stereo tends to produce stereo correspondence errors and require much computation. A method which overcomes these drawbacks by moving the stereo camera actively is presented. The method utilizes a motion parallax acquired by a monocular motion stereo to restrict the search range of binocular disparity. Using only the uniqueness of disparity makes it possible to find reliable binocular disparity and occlusion very efficiently. Experimental results with complicated scenes are presented to demonstrate the effectiveness of this method.< >
    Parallax
    Monocular
    Binocular disparity
    Citations (5)
    Depth estimation in computer vision and robotics is most commonly done via stereo vision (stereopsis), in which images from two cameras are used to triangulate and estimate distances. However, there are also numerous monocular visual cues--such as texture variations and gradients, defocus, color/haze, etc. --that have heretofore been little exploited in such systems. Some of these cues apply even in regions without texture, where stereo would work poorly. In this paper, we apply a Markov Random Field (MRF) learning algorithm to capture some of these monocular cues, and incorporate them into a stereo system. We show that by adding monocular cues to stereo (triangulation) ones, we obtain significantly more accurate depth estimates than is possible using either monocular or stereo cues alone. This holds true for a large variety of environments, including both indoor environments and unstructured outdoor environments containing trees/forests, buildings, etc. Our approach is general, and applies to incorporating monocular cues together with any off-the-shelf stereo system.
    Monocular
    Markov random field
    Monocular vision
    Computer stereo vision
    Citations (210)
    In order to overcome the narrow visual field of binocular vision and the low precision of monocular vision, a binocular biomimetic eye platform with 4 rotational degrees of freedom is designed based on the structural characteristics of human eyes, so that the robot can achieve human-like environment perception with binocular stereo vision and monocular motion vision. Initial location and parameters calibration of the biomimetic eye platform are accomplished based on the vision alignment strategy and hand-eye calibration. The methods of binocular stereo perception and monocular motion stereo perception are given based on the dynamically changing external parameters. The former perceives the 3D information through the two images obtained by two cameras in real-time and their relative posture, and the latter perceives the 3D information by synthesize multiple images obtained by one camera and its corresponding postures at multiple adjacent moments.Experimental results shows that the relative perception accuracy of binocular vision is 0.38% and the relative perception accuracy of monocular motion vision is 0.82%. In conclusion, the method proposed can broaden the field of binocular vision,and ensure the accuracy of binocular perception and monocular motion perception.
    Monocular
    Monocular vision
    Binocular disparity
    Field of view
    Citations (4)