In Federated Learning (FL), data communication among clients is denied. However, it is difficult to learn from the decentralized client data, which is under-sampled, especially for segmentation tasks that need to extract enough contextual semantic information. Existing FL studies always average client models to one global model in segmentation tasks while neglecting the diverse knowledge extracted by the models. To maintain and utilize the diverse knowledge, we propose a novel training paradigm called Federated Learning with Z-average and Cross-teaching (FedZaCt) to deal with segmentation tasks. From the model parameters’ aspect, the Z-average method constructs individual client models, which maintain diverse knowledge from multiple client data. From the model distillation aspect, the Cross-teaching method transfers the other client models’ knowledge to supervise the local client model. In particular, FedZaCt does not have the global model during the training process. After training, all client models are aggregated into the global model by averaging all client model parameters. The proposed methods are applied to two medical image segmentation datasets including our private aortic dataset and a public HAM10000 dataset. Experimental results demonstrate that our methods can achieve higher Intersection over Union values and Dice scores.
Surgical tool segmentation and action recognition are fundamental building blocks in many computer-assisted intervention applications, ranging from surgical skills assessment to decision support systems. Nowadays, learning-based action recognition and segmentation approaches outperform classical methods, relying, however, on large, annotated datasets. Furthermore, action recognition and tool segmentation algorithms are often trained and make predictions in isolation from each other, without exploiting potential cross-task relationships. With the EndoVis 2022 SAR-RARP50 challenge, we release the first multimodal, publicly available, in-vivo, dataset for surgical action recognition and semantic instrumentation segmentation, containing 50 suturing video segments of Robotic Assisted Radical Prostatectomy (RARP). The aim of the challenge is twofold. First, to enable researchers to leverage the scale of the provided dataset and develop robust and highly accurate single-task action recognition and tool segmentation approaches in the surgical domain. Second, to further explore the potential of multitask-based learning approaches and determine their comparative advantage against their single-task counterparts. A total of 12 teams participated in the challenge, contributing 7 action recognition methods, 9 instrument segmentation techniques, and 4 multitask approaches that integrated both action recognition and instrument segmentation.
Electric pump feed systems use the high-speed and high-performance motor to replace the traditional turbine as the driving machine of pumps. A new feed system scheme is proposed in which a small-flow electric pump pressurizes hydrogen peroxide to the gas generator. The system keeps the performance of the engine at a high level, while being able to achieve continuous regulation and depth regulation. The balance calculation method and reliability allocation method are used. The results show that, the power of the electric pump for fuel path and oxidizer path is 600.0 W and 1732.6 W, respectively. The total mass of the engine is 69.76 kg, and the mass percentage of the electric pump system is small, less than 3.2% of the overall system mass. The effect of the electric pump system on the engine mass is small, but it improves the engine regulation capability. The reliability of this engine system is calculated to be 0.9802, which is not less than the target value of 0.98 with a certain margin, according to the reliability assigned to each combined part.
As loop-closure detection plays a fundamental role in any simultaneous localization and mapping (SLAM) system, through its ability to recognize previously visited locations, one of its main objectives is to permit consistent map generation for an extended period. Within large-scale SLAM autonomy, the scalability in terms of timing needed for database search and the storage requirements has to be addressed. In this paper, a low-storage visual loop-closure detection technique is proposed. Our system is based on the incremental bag-of-tracked-words scheme for the trajectory mapping still, the generated visual representations are reduced to lower dimensions through a resampling process. This way, we achieve to shorten the overall database size and searching time, while at the same time preserving the high performance. The evaluation, which took place on different well-known datasets, exhibits the system's low-storage requirements and high recall scores compared to the baseline version and other state-of-the-art approaches.
Abstract In recent years, the robotics community has extensively examined methods concerning the place recognition task within the scope of simultaneous localization and mapping applications. This article proposes an appearance‐based loop closure detection pipeline named “Fast and Incremental Loop closure Detection (FILD++). First, the system is fed by consecutive images and, via passing them twice through a single convolutional neural network, global and local deep features are extracted. Subsequently, a hierarchical navigable small‐world graph incrementally constructs a visual database representing the robot's traversed path based on the computed global features. Finally, a query image, grabbed each time step, is set to retrieve similar locations on the traversed route. An image‐to‐image pairing follows, which exploits local features to evaluate the spatial information. Thus, in the proposed article, we propose a single network for global and local feature extraction in contrast to our previous work (FILD), while an exhaustive search for the verification process is adopted over the generated deep local features avoiding the utilization of hash codes. Exhaustive experiments on eleven publicly available data sets exhibit the system's high performance (achieving the highest recall score on eight of them) and low execution times (22.05 ms on average in New College, which is the largest one containing 52,480 images) compared to other state‐of‐the‐art approaches.
During simultaneous localization and mapping, the robot should build a map of its surroundings and simultaneously estimate its pose in the generated map. However, a fundamental task is to detect loops, i.e., previously visited areas, allowing consistent map generation. Moreover, within long-term mapping, every autonomous system needs to address its scalability in terms of storage requirements and database search. In this paper, we present a low-complexity sequence-based visual loop-closure detection pipeline. Our system dynamically segments the traversed route through a feature matching technique in order to define sub-maps. In addition, visual words are generated incrementally for the corresponding sub-maps representation. Comparisons among these sequences-of-images are performed thanks to probabilistic scores originated from a voting scheme. When a candidate sub-map is indicated, global descriptors are utilized for image-to-image pairing. Our evaluation took place on several publicly-available datasets exhibiting the system's low complexity and high recall compared to other state-of-the-art approaches.