Many GAN inversion methods have emerged to embed a given real image into the latent space of GAN for real image editing. These methods usually use a latent space composed of a series of one-dimensional vectors as an optimization space to reconstruct real images such as W+ latent space. However, the reconstructed image of these methods is usually difficult to maintain the rich detailed information in the real image. How to better preserve details in the real image is still a challenge. To solve this problem, we propose a spatially-adaptive latent space, called SA latent space, and adopt it as the optimization latent space in GAN inversion task. In particular, we use the affine transformation parameters of each convolutional layer in the generator to form the SA latent space and change affine transformation parameters from a one-dimensional vector to a spatially-adaptive three-dimensional tensor. With the more expressive latent space, we can better reconstruct the details of the real image. Extensive experiments suggest that the image reconstruction quality can be significantly improved while maintaining the semantic disentanglement ability of latent code. The code is available at https://github.com/zhang-lingyun/SalS-GAN.
Query-based models are extensively used in 3D object detection tasks, with a wide range of pre-trained checkpoints readily available online. However, despite their popularity, these models often require an excessive number of object queries, far surpassing the actual number of objects to detect. The redundant queries result in unnecessary computational and memory costs. In this paper, we find that not all queries contribute equally -- a significant portion of queries have a much smaller impact compared to others. Based on this observation, we propose an embarrassingly simple approach called \bd{G}radually \bd{P}runing \bd{Q}ueries (GPQ), which prunes queries incrementally based on their classification scores. It is straightforward to implement in any query-based method, as it can be seamlessly integrated as a fine-tuning step using an existing checkpoint after training. With GPQ, users can easily generate multiple models with fewer queries, starting from a checkpoint with an excessive number of queries. Experiments on various advanced 3D detectors show that GPQ effectively reduces redundant queries while maintaining performance. Using our method, model inference on desktop GPUs can be accelerated by up to 1.31x. Moreover, after deployment on edge devices, it achieves up to a 67.86\% reduction in FLOPs and a 76.38\% decrease in inference time. The code will be available at \url{https://github.com/iseri27/Gpq}.
Point cloud analysis has achieved significant development and is well-performed in multiple downstream tasks like point cloud classification and segmentation, etc. Being conscious of the simplicity of the position encoding structure in Transformer-based architectures, we attach importance to the position encoding as a high-dimensional part and the patch encoder to offer multi-scale information. Together with the sequential Transformer, the whole module with position encoding comprehensively constructs a multi-scale feature abstraction module that considers both the local parts from the patch and the global parts from center points as position encoding. With only a few parameters, the position embedding module fits the setting of PEFT (Parameter-Efficient Fine-Tuning) tasks pretty well. Thus we unfreeze these parameters as a fine-tuning part. At the same time, we review the existing prompt and adapter tuning methods, proposing a fresh way of prompts and synthesizing them with adapters as dynamic adjustments. Our Proposed method of PEFT tasks, namely PPT, with only 1.05% of parameters for training, gets state-of-the-art results in several mainstream datasets, such as 95.01% accuracy in the ScanObjectNN OBJ_BG dataset. Codes will be released at https://github.com/zsc000722/PPT.
Journal Article Improving the Reliability of the Operating System Inside a VM Get access Zheng Hao, Zheng Hao 1Department of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, China Search for other works by this author on: Oxford Academic Google Scholar Dong Xiaoshe, Dong Xiaoshe 1Department of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, China Search for other works by this author on: Oxford Academic Google Scholar Zhu Zhengdong, Zhu Zhengdong * 1Department of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, China *Corresponding author: zdzhu@mail.xjtu.edu.cn Search for other works by this author on: Oxford Academic Google Scholar Chen Baoke, Chen Baoke 1Department of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, China Search for other works by this author on: Oxford Academic Google Scholar Bai Xiuxiu, Bai Xiuxiu 1Department of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, China Search for other works by this author on: Oxford Academic Google Scholar Zhang Xingjun, Zhang Xingjun 1Department of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, China Search for other works by this author on: Oxford Academic Google Scholar Wang Endong Wang Endong 2State Key Laboratory of High-End Server & Storage Technology, Ji'nan, China Search for other works by this author on: Oxford Academic Google Scholar The Computer Journal, Volume 59, Issue 5, May 2016, Pages 715–740, https://doi.org/10.1093/comjnl/bxv111 Published: 09 May 2016 Article history Received: 02 June 2015 Revision received: 29 August 2015 Published: 09 May 2016