Existing multi-view image generation methods often make invasive modifications to pre-trained text-to-image (T2I) models and require full fine-tuning, leading to (1) high computational costs, especially with large base models and high-resolution images, and (2) degradation in image quality due to optimization difficulties and scarce high-quality 3D data. In this paper, we propose the first adapter-based solution for multi-view image generation, and introduce MV-Adapter, a versatile plug-and-play adapter that enhances T2I models and their derivatives without altering the original network structure or feature space. By updating fewer parameters, MV-Adapter enables efficient training and preserves the prior knowledge embedded in pre-trained models, mitigating overfitting risks. To efficiently model the 3D geometric knowledge within the adapter, we introduce innovative designs that include duplicated self-attention layers and parallel attention architecture, enabling the adapter to inherit the powerful priors of the pre-trained models to model the novel 3D knowledge. Moreover, we present a unified condition encoder that seamlessly integrates camera parameters and geometric information, facilitating applications such as text- and image-based 3D generation and texturing. MV-Adapter achieves multi-view generation at 768 resolution on Stable Diffusion XL (SDXL), and demonstrates adaptability and versatility. It can also be extended to arbitrary view generation, enabling broader applications. We demonstrate that MV-Adapter sets a new quality standard for multi-view image generation, and opens up new possibilities due to its efficiency, adaptability and versatility.
Due to the complexity of wetland ecosystems, wetlands have a wide area of alternating land and water zones and complex vegetation composition, making it challenging to achieve dynamic displays of virtual wetland scenes using three-dimensional modeling. This study proposes a workflow of game engine-based virtual wetland scene construction for the rapid modeling of virtual wetland scenes. The virtual wetland scene construction work utilized Poyang Lake as the primary research area. It integrated unmanned aerial vehicle data collection technology and geographic information technology with 3D (three-dimensional) modeling of wetland elements and scene program modeling of the game engine to complete the construction and dynamic development of virtual wetland scenes. In addition, it used various virtual reality technologies to display the virtual wetland scene. The virtual scene of Poyang Lake combined with actual data was more realistic and had higher simulation. In reality, the digital wetland scene of Poyang Lake realizes multiple forms of virtual experience and provides users with a profoundly immersive virtual experience. This comprehensive virtual scene workflow in the study can serve as a technical resource for building 3D scenes. It can also provide a technical reference for the digital twin watershed project of Poyang Lake, which has practical application value.
Large-scale datasets play a vital role in computer vision. But current datasets are annotated blindly without differentiation to samples, making the data collection inefficient and unscalable. The open question is how to build a mega-scale dataset actively. Although advanced active learning algorithms might be the answer, we experimentally found that they are lame in the realistic annotation scenario where out-of-distribution data is extensive. This work thus proposes a novel active learning framework for realistic dataset annotation. Equipped with this framework, we build a high-quality vision dataset -- Bamboo, which consists of 69M image classification annotations with 119K categories and 28M object bounding box annotations with 809 categories. We organize these categories by a hierarchical taxonomy integrated from several knowledge bases. The classification annotations are four times larger than ImageNet22K, and that of detection is three times larger than Object365. Compared to ImageNet22K and Objects365, models pre-trained on Bamboo achieve superior performance among various downstream tasks (6.2% gains on classification and 2.1% gains on detection). We believe our active learning framework and Bamboo are essential for future work.
Point cloud registration aims at estimating the geometric transformation between two point cloud scans, in which point-wise correspondence estimation is the key to its success. In addition to previous methods that seek correspondences by hand-crafted or learnt geometric features, recent point cloud registration methods have tried to apply RGB-D data to achieve more accurate correspondence. However, it is not trivial to effectively fuse the geometric and visual information from these two distinctive modalities, especially for the registration problem. In this work, we propose a new Geometry-Aware Visual Feature Extractor (GAVE) that employs multi-scale local linear transformation to progressively fuse these two modalities, where the geometric features from the depth data act as the geometry-dependent convolution kernels to transform the visual features from the RGB data. The resultant visual-geometric features are in canonical feature spaces with alleviated visual dissimilarity caused by geometric changes, by which more reliable correspondence can be achieved. The proposed GAVE module can be readily plugged into recent RGB-D point cloud registration framework. Extensive experiments on 3D Match and ScanNet demonstrate that our method outperforms the state-of-the-art point cloud registration methods even without correspondence or pose supervision. The code is available at: https://github.com/514DNA/LLT.
Download This Paper Open PDF in Browser Add Paper to My Library Share: Permalink Using these links will ensure access to this page indefinitely Copy URL Copy DOI
Corneal transplantation constitutes one of the major treatments in severe cases of corneal diseases. The lack of cornea donors as well as other limitations of corneal transplantation necessitate the development of artificial corneal substitutes. Biosynthetic cornea model using 3D printing technique is promising to generate artificial corneal structure that can resemble the structure of the native human cornea and is applicable for regenerative medicine. Research on bioprinting artificial cornea has raised interest into the wide range of materials and cells that can be utilized as bioinks for optimal clarity, biocompatibility, and tectonic strength. With continued advances in biomaterials science and printing technology, it is believed that bioprinted cornea will eventually achieve a level of clinical functionality and practicality as to replace donated corneal tissues, with their associated limitations such as limited or unsteady supply, and possible infectious disease transmission. Here, we review the literature on bioprinting strategies, 3D corneal modelling, material options, and cellularization strategies in relation to keratoprosthesis design. The progress, limitations and expectations of recent cases of 3D bioprinting of artifial cornea are discussed. An outlook on the rise of 3D bioprinting in corneal reconstruction and regeneration is provided.