Large models training is plagued by the intense compute cost and limited hardware memory. A practical solution is low-precision representation but is troubled by loss in numerical accuracy and unstable training rendering the model less useful. We argue that low-precision floating points can perform well provided the error is properly compensated at the critical locations in the training process. We propose Collage which utilizes multi-component float representation in low-precision to accurately perform operations with numerical errors accounted. To understand the impact of imprecision to training, we propose a simple and novel metric which tracks the lost information during training as well as differentiates various precision strategies. Our method works with commonly used low-precision such as half-precision ($16$-bit floating points) and can be naturally extended to work with even lower precision such as $8$-bit. Experimental results show that pre-training using Collage removes the requirement of using $32$-bit floating-point copies of the model and attains similar/better training performance compared to $(16, 32)$-bit mixed-precision strategy, with up to $3.7\times$ speedup and $\sim 15\%$ to $23\%$ less memory usage in practice.
Automatic change detection is an important and difficult task in the field of remote sensing. In this study, a deep Siamese convolutional network based on the fusion of high- and low-level features is proposed for change detection in remote sensing images. Given that low-level features correspond to low-order ones (e.g., texture) that are sensitive to change and that high-level features can accurately reflect image category information (e.g., semantic information), we fuse these features to enhance the abstractness and robustness of the extracted features in the change detection framework. The whole system is end-to-end and does not require any pre- or post-processing. Experimental results on three datasets show that our method is superior to other advanced methods by adding a high- and low-level fusion framework.
Over the past two decades, magnetic resonance imaging (MRI) has been widely applied into the diagnosis of knee joint diseases. Due to the complexity and diversity of MRI data, traditional feature extraction requires manual searching for features to segment meniscus, and the final segmentation results still need to be further filtered. Therefore, it is necessary to design a novel method to automatically extract features directly from images. In this study, we develop a framework to implement this goal by using a mask region-based convolution neural network (Mask R-CNN) without manual intervention. In order to highlight the proportion of meniscus, we first preprocess the original image data so that it is reduced to about 1/8 of the original size, and then input the preprocessed image data into the trained Mask R-CNN. Afterwards, transfer learning is used to generate the weight of our network. By testing 1000 images, the mean intersection over union (IOU) and dice similarity coefficient (DSC) are up to 83.68% and 91.13%, respectively. The current results demonstrate that our approach is feasible and has a potential significance in clinical practice.
Accurate and reliable fruit detection in the orchard environment is an important step for yield estimation and robotic harvesting. However, the existing detection methods often target large and relatively sparse fruits, but they cannot provide a good solution for small and densely distributed fruits. This paper proposes a YOLOv3-Litchi model based on YOLOv3 to detect densely distributed litchi fruits in large visual scenes. We adjusted the prediction scale and reduced the network layer to improve the detection ability of small and dense litchi fruits and ensure the detection speed. From flowering to 50 days after maturity, we collected a total of 266 images, including 16,000 fruits, and then used them to construct the litchi dataset. Then, the k-means++ algorithm is used to cluster the bounding boxes in the labeled data to determine the priori box size suitable for litchi detection. We trained an improved YOLOv3-Litchi model, tested its litchi detection performance, and compared YOLOv3-Litchi with YOLOv2, YOLOv3, and Faster R-CNN on the actual detection effect of litchi and used the F1 value and the average detection time as the assessed value. The test results show that the F1 of YOLOv3-Litchi is higher than that of YOLOv2 algorithm 0.1, higher than that of YOLOv3 algorithm 0.08, and higher than that of Faster R-CNN algorithm 0.05; the average detection time of YOLOv3-Litchi is 29.44 ms faster than that of YOLOv2 algorithm, 19.56 ms faster than that of YOLOv3 algorithm ms, and 607.06 ms faster than that of Faster R-CNN algorithm. And the detection speed of the improved model is faster. The proposed model remits optimal detection performance for small and dense fruits. The work presented here may provide a reference for further study on fruit-detection methods in natural environments.
This paper proposes a scheme of terrain aided navigation based on the principle of computer vision. Being different from the conventional terrain matching technique, the scheme uses CCD camera rather than barometer and radio altimeter as the sensing element. Terrain elevation information within an area rather than along the course of the vehicle's flight is drawn from CCD images according to the principle of computer vision. Shorter flight time is needed to gather sufficient information for successful terrain matching, hence, the scheme provides estimation and compensation for errors of the inertial navigation system more rapidly.