The research community has witnessed the powerful potential of self-supervised Masked Image Modeling (MIM), which enables the models capable of learning visual representation from unlabeled data. In this paper, to incorporate both the crucial global structural information and local details for dense prediction tasks, we alter the perspective to the frequency domain and present a new MIM-based framework named FreMIM for self-supervised pre-training to better accomplish medical image segmentation tasks. Based on the observations that the detailed structural information mainly lies in the high-frequency components and the high-level semantics are abundant in the low-frequency counterparts, we further incorporate multi-stage supervision to guide the representation learning during the pre-training phase. Extensive experiments on three benchmark datasets show the superior advantage of our FreMIM over previous state-of-the-art MIM methods. Compared with various baselines trained from scratch, our FreMIM could consistently bring considerable improvements to model performance. The code will be publicly available at https://github.com/Rubics-Xuan/FreMIM.
Back propagation (BP) neural network is used to approximate the dynamic character of nonlinear discrete-time system. Considering the unmodeling dynamics of the system, the weights of neural network are updated by using a dead-zone algorithm and a robust adaptive controller based on the BP neural network is proposed. For the situation that jumping change parameters exist, multiple neural networks with multiple weights are built to cover the uncertainty of parameters, and multiple controllers based on these models are set up. At every sample time, a performance index function based on the identification error will be used to choose the optimal model and the corresponding controller. Different kinds of combinations of fixed model and adaptive model will be used for robust multiple models adaptive control (MMAC). The proof of stability and convergence of MMAC are given, and the significant efficacy of the proposed methods is tested by simulation.
Deep learning-based medical volumetric segmentation methods either train the model from scratch or follow the standard ``pre-training then fine-tuning" paradigm. Although fine-tuning a pre-trained model on downstream tasks can harness its representation power, the standard full fine-tuning is costly in terms of computation and memory footprint. In this paper, we present the study on parameter-efficient transfer learning for medical volumetric segmentation and propose a new framework named Med-Tuning based on intra-stage feature enhancement and inter-stage feature interaction. Additionally, aiming at exploiting the intrinsic global properties of Fourier Transform for parameter-efficient transfer learning, a new adapter block namely Med-Adapter with a well-designed Fourier Transform branch is proposed for effectively and efficiently modeling the crucial global context for medical volumetric segmentation. Given a large-scale pre-trained model on 2D natural images, our method can exploit both the crucial spatial multi-scale feature and volumetric correlations along slices for accurate segmentation. Extensive experiments on three benchmark datasets (including CT and MRI) show that our method can achieve better results than previous parameter-efficient transfer learning methods on segmentation tasks, with much less tuned parameter costs. Compared to full fine-tuning, our method reduces the fine-tuned model parameters by up to 4x, with even better segmentation performance. The code will be made publicly available at https://github.com/jessie-chen99/Med-Tuning.
Multiscale representation of images is extremely often applied in various fields, and as proven before, the centroid-based algorithm for binary images is effective. In this paper, a new centroid-based algorithm of multiscale representation for greyscale images is presented. The same with binary images, the approach is able to preserve symmetry of the original image and even keep its shape or topology. Also in this paper, a greyscale image is supposed as a 3-D binary image in order to apply multiscale representation for binary image. All examples in this paper show that the new method can be efficiently applied in many different fields.
Aiming at VRP in modern military logistics, this paper sets up the multi-objective VRP mathematical model of military logistics in wartime. This model is solved by NSGA II, and improves the dependence on the initial population and the deficiency in the practical application from NSGA II, and has improved the population initialization and intersection in NSGA II algorithm by the introduction of Greedy algorithm. The algorithm is realized by Matlab programming and applied by examples. After the simulation is improved, NSGA II is able to solve the multi-objective VRP of military logistics in wartime.
Referring image segmentation (RIS) is a fundamental vision-language task that intends to segment a desired object from an image based on a given natural language expression. Due to the essentially distinct data properties between image and text, most of existing methods either introduce complex designs towards fine-grained vision-language alignment or lack required dense alignment, resulting in scalability issues or mis-segmentation problems such as over- or under-segmentation. To achieve effective and efficient fine-grained feature alignment in the RIS task, we explore the potential of masked multimodal modeling coupled with self-distillation and propose a novel cross-modality masked self-distillation framework named CM-MaskSD, in which our method inherits the transferred knowledge of image-text semantic alignment from CLIP model to realize fine-grained patch-word feature alignment for better segmentation accuracy. Moreover, our CM-MaskSD framework can considerably boost model performance in a nearly parameter-free manner, since it shares weights between the main segmentation branch and the introduced masked self-distillation branches, and solely introduces negligible parameters for coordinating the multimodal features. Comprehensive experiments on three benchmark datasets (ie RefCOCO, RefCOCO+, G-Ref) for the RIS task convincingly demonstrate the superiority of our proposed framework over previous state-of-the-art methods.
The surface defects of steel strip have diverse and complex features, and surface defects caused by different production lines tend to have different characteristics. Therefore, the detection algorithms for the surface defects of steel strip should have good generalization performance. Aiming at detecting surface defects of steel strip, we established a dataset of six types of surface defects on cold-rolled steel strip and augmented it in order to reduce over-fitting. We improved the You Only Look Once (YOLO) network and made it all convolutional. Our improved network, which consists of 27 convolution layers, provides an end-to-end solution for the surface defects detection of steel strip. We evaluated the six types of defects with our network and reached performance of 97.55% mAP and 95.86% recall rate. Besides, our network achieves 99% detection rate with speed of 83 FPS, which provides methodological support for real-time surface defects detection of steel strip. It can also predict the location and size information of defect regions, which is of great significance for evaluating the quality of an entire steel strip production line.
Canny is a classic algorithm of edge detection which has been widely applied in various fields of image processing for years. However, the algorithm has some defects. The most serious defect is that the traditional canny algorithm can’t set threshold adaptively. If the threshold set manually is not accurate, it will seriously affect the quality of the algorithm to detect the edge. This makes the poor adaptability of the algorithm. This paper proposes a method which combines maximum entropy method with Otsu method to determine the high and low threshold of Canny algorithm. Experiments show that the modified algorithm has stronger robustness than traditional method. For the images which have complex distributions of grey level histogram, the modified algorithm has better performance.