TripleFormer: improving transformer-based image classification method using multiple self-attention inputs
2
Citation
35
Reference
10
Related Paper
Citation Trend
A robot needs to localize an unknown object before grasping it. When the robot only has a monocular sensor, how can it get the object pose? In this work, we present a method of localizing the 6-DOF pose of a target object using a robotic arm and a hand-mounted monocular camera. The method includes an object recognition and a localization process. The recognition process uses point features on a surface of the target as a model of the object. The object localization process combines the robotic motion data and image data to calculate the 6-DOF pose of the object. This method can process objects containing textured planes. We verify the method in real tests.
Monocular vision
Monocular
Cite
Citations (1)
Abstract In this paper, an object recognition method and a pose estimation approach using stereo vision is presented. The proposed approach was used for position based visual servoing of a 6 DoF manipulator. The object detection and recognition method was designed with the purpose of increasing robustness. A RGB color-based object descriptor and an online correction method is proposed for object detection and recognition. Pose was estimated by using the depth information derived from stereo vision camera and an SVD based method. Transformation between the desired pose and object pose was calculated and later used for position based visual servoing. Experiments were carried out to verify the proposed approach for object recognition. The stereo camera was also tested to see whether the depth accuracy is adequate. The proposed object recognition method is invariant to scale, orientation and lighting condition which increases the level of robustness. The accuracy of stereo vision camera can reach 1 mm. The accuracy is adequate for tasks such as grasping and manipulation.
Robustness
Visual Servoing
RGB color model
Computer stereo vision
Cite
Citations (4)
In this paper we present an algorithm for detecting objects in a sequence of color images taken from a moving camera. The first step of our algorithm is the estimation of motion in the image plane. Instead of calculating optical flow, tracking single points, edges or regions over a sequence of images, we determine the motion of clusters, built by grouping of pixels in a color/position feature space. The second step is a motion-based segmentation, where adjacent clusters with similar trajectories are combined to build object hypotheses. Our application area is vision-based driving assistance. The algorithm has been successfully tested in traffic scenes containing objects, such as cars, motorcycles, and pedestrians.
Optical Flow
Tracking (education)
Feature (linguistics)
Image plane
Sequence (biology)
Position (finance)
Match moving
Motion field
Cite
Citations (18)
The stripe laser based stereo vision is often used in robot vision-guided system in the eye-in-hand configuration. The 3D scene is reconstructed from many 3D stripes obtained in stripe laser based stereo vision. But 3D objects can not be recognized by 3D stripe information. In 3D cluttered scene, the recognition of 3D objects is also difficult due to the object pose and match. In fact, the video from camera of stripe laser based stereo vision can be benefit to recognize 3D objects. This paper proposes an approach of the object-oriented vision-guided robot that video segmentation, tracking and recognition are used to guide robot to reduce the complexity of 3D object detection, recognition and pose estimation. Experimental results demonstrate the effectiveness of the approach.
Structured Light
Stereo cameras
Machine Vision
Computer stereo vision
Cite
Citations (2)
Tracking (education)
Cite
Citations (15)
Tracking (education)
Computer stereo vision
Cite
Citations (17)
Moving cameras are needed for a wide range of applications in robotics, vehicle systems, surveillance, etc. However, many foreground object segmentation methods reported in the literature are unsuitable for such settings; these methods assume that the camera is fixed and the background changes slowly, and are inadequate for segmenting objects in video if there is significant motion of the camera or background. To address this shortcoming, a new method for segmenting foreground objects is proposed that utilizes binocular video. The method is demonstrated in the application of tracking and segmenting people in video who are approximately facing the binocular camera rig. Given a stereo image pair, the system first tries to find faces. Starting at each face, the region containing the person is grown by merging regions from an over-segmented color image. The disparity map is used to guide this merging process. The system has been implemented on a consumer-grade PC, and tested on video sequences of people indoors obtained from a moving camera rig. As can be expected, the proposed method works well in situations where other foreground-background segmentation methods typically fail. We believe that this superior performance is partly due to the use of object detection to guide region merging in disparity/color foreground segmentation, and partly due to the use of disparity information available with a binocular rig, in contrast with most previous methods that assumed monocular sequences.
Monocular
Market Segmentation
Binocular disparity
Tracking (education)
Cite
Citations (2)
A novel object tracking method based on RGB-D camera is proposed to handle fast appearance change, occlusion, background clutter which may arise for vision-based robot navigation. It makes use of appearance and depth information that are complementary to each other in visual perception to get robust tracking. First, RGB image and depth information are captured by the RGB-D camera. Then, an online updating appearance model is created with features extracted from RGB image. A motion model is created on plan-view map that is drawn from depth information and camera parameters. The estimation of object position and scale is performed on the motion model. Finally, appearance features are combined with position and scale information to track the target. The performance of our method is compared with a state-of-art video tracking method. It shows that our tracking method is more stable and accurate, and has overwhelming superiority when there is a great appearance change. A vision-based robot using our tracking method can navigate in cluttered environment successfully.
RGB color model
Tracking (education)
Cite
Citations (0)
This article proposes a hand-eye calibration using a new and easy method suitable for a camera mounted on the end-effector of an industrial robot using only a single image. The hand-eye calibration information could be used in robotic picking up of cubes using a monocular camera. Images captured from a particular pose of the camera have been segmented using a fusion of multiple methods such that the object information is obtained even in cases when there is less contrast between the object and the background, or in the presence of variation in lighting. The edge information, and subsequently the pose of the object was estimated using minimum number of images. In some of the cases a single image was sufficient but in case only a single edge edge is obtained, an additional image is grabbed after aligning the camera with the detected edge. An additional edge is estimated using a directional thresholding operation. The edge information in 3-D obtained using the calibration information was then used to calculate the pose of the object to facilitate robotic pick up. To ensure safety; a verification of the estimate was done using projection of the computed coordinates, and final pick up was done while monitoring the force to avoid damage due to collisions. The proposed approaches were physically implemented and experimentally validated.
Monocular
Cite
Citations (6)
In order to improve the accuracy and efficiency of robot grasping, we propose a new method for transparent object detection and location that utilize depth image, RGB image and IR image. In detection process, an active depth sensor (RealSense) is firstly employed to retrieve the transparent candidates from the depth image and the corresponding candidates in the RGB image and IR image are then extracted separately. A transparent candidate classification algorithm is subsequently presented that uses SIFT features to recognize the transparent ones from the candidates. In location process, we obtain a new group of RGB images and IR images by adjusting camera orientation to make its optical axis perpendicular to the normal direction of the plane on which the object is placed. The object contours in RGB image and IR image are then extracted, respectively. The three-dimensional object is finally reconstructed by means of stereo matching of the two contours, and the current pose information of the object is calculated in the end. In order to verify the feasibility of the method, we built a hand-eye test system with a movable industrial robot to detect and capture transparent objects at different locations. The final test results demonstrate that the method is more general and effective than the traditional one.
RGB color model
Scale-invariant feature transform
Cite
Citations (27)