TripleFormer: improving transformer-based image classification method using multiple self-attention inputs

The Visual Computer (2024)

Yu Gong Peng Wu Renjie Xu Xiaoming Zhang Tao Wang Xuan Li

Citation

Reference

Related Paper

Citation Trend

Topics:

Currency Recognition and Detection

Advanced Neural Network Applications

Machine Learning and ELM

10.1007/s00371-024-03294-6

Cite

6-DOF object localization by combining monocular vision and robot arm kinematics

Kun Liu Weiwei Shang Shuang Du Shuang Cong

A robot needs to localize an unknown object before grasping it. When the robot only has a monocular sensor, how can it get the object pose? In this work, we present a method of localizing the 6-DOF pose of a target object using a robotic arm and a hand-mounted monocular camera. The method includes an object recognition and a localization process. The recognition process uses point features on a surface of the target as a model of the object. The object localization process combines the robotic motion data and image data to calculate the 6-DOF pose of the object. This method can process objects containing textured planes. We verify the method in real tests.

Monocular vision

Monocular

10.23919/chicc.2017.8028399

Cite

Citations (1)

An Object Detection and Pose Estimation Approach for Position Based Visual Servoing

Electrical Control and Communication Engineering (2017)

Lei Shi

Abstract In this paper, an object recognition method and a pose estimation approach using stereo vision is presented. The proposed approach was used for position based visual servoing of a 6 DoF manipulator. The object detection and recognition method was designed with the purpose of increasing robustness. A RGB color-based object descriptor and an online correction method is proposed for object detection and recognition. Pose was estimated by using the depth information derived from stereo vision camera and an SVD based method. Transformation between the desired pose and object pose was calculated and later used for position based visual servoing. Experiments were carried out to verify the proposed approach for object recognition. The stereo camera was also tested to see whether the depth accuracy is adequate. The proposed object recognition method is invariant to scale, orientation and lighting condition which increases the level of robustness. The accuracy of stereo vision camera can reach 1 mm. The accuracy is adequate for tasks such as grasping and manipulation.

Robustness

Visual Servoing

RGB color model

Computer stereo vision

10.1515/ecce-2017-0005

Cite

Citations (4)

Motion-based Object Detection and Tracking in Color Image Sequence

Asian Conference on Computer Vision (2000)

Bernd Heisele

In this paper we present an algorithm for detecting objects in a sequence of color images taken from a moving camera. The first step of our algorithm is the estimation of motion in the image plane. Instead of calculating optical flow, tracking single points, edges or regions over a sequence of images, we determine the motion of clusters, built by grouping of pixels in a color/position feature space. The second step is a motion-based segmentation, where adjacent clusters with similar trajectories are combined to build object hypotheses. Our application area is vision-based driving assistance. The algorithm has been successfully tested in traffic scenes containing objects, such as cars, motorcycles, and pedestrians.

Optical Flow

Tracking (education)

Feature (linguistics)

Image plane

Sequence (biology)

Position (finance)

Match moving

Motion field

Source

Cite

Citations (18)

Object-oriented stripe structured-light vision-guided robot

2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC) (2017)

Liang Zhang Jianzhou Zhang

The stripe laser based stereo vision is often used in robot vision-guided system in the eye-in-hand configuration. The 3D scene is reconstructed from many 3D stripes obtained in stripe laser based stereo vision. But 3D objects can not be recognized by 3D stripe information. In 3D cluttered scene, the recognition of 3D objects is also difficult due to the object pose and match. In fact, the video from camera of stripe laser based stereo vision can be benefit to recognize 3D objects. This paper proposes an approach of the object-oriented vision-guided robot that video segmentation, tracking and recognition are used to guide robot to reduce the complexity of 3D object detection, recognition and pose estimation. Experimental results demonstrate the effectiveness of the approach.

Structured Light

Stereo cameras

Machine Vision

Computer stereo vision

10.1109/smc.2017.8122842

Cite

Citations (2)

Tracking in 3D: Image Variability Decomposition for Recovering Object Pose and Illumination

Pattern Analysis and Applications (1999)

Peter N. Belhumeur Gregory D. Hager

Tracking (education)

10.1007/s100440050017

Cite

Citations (15)

Stereo object tracking with fusion of texture, color and disparity information

Signal Processing Image Communication (2014)

Olga Zoidi Nikos Nikolaidis Anastasios Tefas Ioannis Pitas

Tracking (education)

Computer stereo vision

10.1016/j.image.2014.03.004

Cite

Citations (17)

Foreground object segmentation from binocular stereo video

Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE (2005)

K. M. Law Stan Sclaroff

Moving cameras are needed for a wide range of applications in robotics, vehicle systems, surveillance, etc. However, many foreground object segmentation methods reported in the literature are unsuitable for such settings; these methods assume that the camera is fixed and the background changes slowly, and are inadequate for segmenting objects in video if there is significant motion of the camera or background. To address this shortcoming, a new method for segmenting foreground objects is proposed that utilizes binocular video. The method is demonstrated in the application of tracking and segmenting people in video who are approximately facing the binocular camera rig. Given a stereo image pair, the system first tries to find faces. Starting at each face, the region containing the person is grown by merging regions from an over-segmented color image. The disparity map is used to guide this merging process. The system has been implemented on a consumer-grade PC, and tested on video sequences of people indoors obtained from a moving camera rig. As can be expected, the proposed method works well in situations where other foreground-background segmentation methods typically fail. We believe that this superior performance is partly due to the use of object detection to guide region merging in disparity/color foreground segmentation, and partly due to the use of disparity information available with a binocular rig, in contrast with most previous methods that assumed monocular sequences.

Monocular

Market Segmentation

Binocular disparity

Tracking (education)

10.1117/12.630207

Cite

Citations (2)

Robust object tracking based on RGB-D camera

Wenjing Qi Yinfei Yang Meng Yi Yunfeng Li Zygmunt Pizlo

A novel object tracking method based on RGB-D camera is proposed to handle fast appearance change, occlusion, background clutter which may arise for vision-based robot navigation. It makes use of appearance and depth information that are complementary to each other in visual perception to get robust tracking. First, RGB image and depth information are captured by the RGB-D camera. Then, an online updating appearance model is created with features extracted from RGB image. A motion model is created on plan-view map that is drawn from depth information and camera parameters. The estimation of object position and scale is performed on the motion model. Finally, appearance features are combined with position and scale information to track the target. The performance of our method is compared with a state-of-art video tracking method. It shows that our tracking method is more stable and accurate, and has overwhelming superiority when there is a great appearance change. A vision-based robot using our tracking method can navigate in cluttered environment successfully.

RGB color model

Tracking (education)

10.1109/wcica.2014.7053184

Cite

Citations (0)

Hand-eye calibration using a single image and robotic picking up using images lacking in contrast

Riby Abraham Boby

This article proposes a hand-eye calibration using a new and easy method suitable for a camera mounted on the end-effector of an industrial robot using only a single image. The hand-eye calibration information could be used in robotic picking up of cubes using a monocular camera. Images captured from a particular pose of the camera have been segmented using a fusion of multiple methods such that the object information is obtained even in cases when there is less contrast between the object and the background, or in the presence of variation in lighting. The edge information, and subsequently the pose of the object was estimated using minimum number of images. In some of the cases a single image was sufficient but in case only a single edge edge is obtained, an additional image is grabbed after aligning the camera with the detected edge. An additional edge is estimated using a directional thresholding operation. The edge information in 3-D obtained using the calibration information was then used to calculate the pose of the object to facilitate robotic pick up. To ensure safety; a verification of the estimate was done using projection of the computed coordinates, and final pick up was done while monitoring the force to avoid damage due to collisions. The proposed approaches were physically implemented and experimentally validated.

Monocular

10.1109/nir50484.2020.9290197

Cite

Citations (6)

Transparent object detection and location based on RGB-D camera

Journal of Physics Conference Series (2019)

Guohua Chen Junyi Wang Aijun Zhang

In order to improve the accuracy and efficiency of robot grasping, we propose a new method for transparent object detection and location that utilize depth image, RGB image and IR image. In detection process, an active depth sensor (RealSense) is firstly employed to retrieve the transparent candidates from the depth image and the corresponding candidates in the RGB image and IR image are then extracted separately. A transparent candidate classification algorithm is subsequently presented that uses SIFT features to recognize the transparent ones from the candidates. In location process, we obtain a new group of RGB images and IR images by adjusting camera orientation to make its optical axis perpendicular to the normal direction of the plane on which the object is placed. The object contours in RGB image and IR image are then extracted, respectively. The three-dimensional object is finally reconstructed by means of stereo matching of the two contours, and the current pose information of the object is calculated in the end. In order to verify the feasibility of the method, we built a hand-eye test system with a movable industrial robot to detect and capture transparent objects at different locations. The final test results demonstrate that the method is more general and effective than the traditional one.

RGB color model

Scale-invariant feature transform

10.1088/1742-6596/1183/1/012011

Cite

Citations (27)