Abstract Texturing 3D shapes is of great importance in computer graphics with applications ranging from game design to augmented reality. However, the processes of texture generation are usually tedious, time-consuming and labor-intensive. In this paper, we propose an automatic texture generation algorithm for 3D shapes based on conditional Generative Adversarial Networks (cGAN). The core of our algorithm includes sampling the model outline and building a cGAN in order to generate model textures automatically. In particular, we propose a novel edge detection method using 3D model information which can accurately find the outline of the model to improve the quality of the generated texture. Due to the adaptability of the algorithm, our approach is suitable for texture generation for most 3D models. Experimental results show the efficiency of our algorithm which can easily generate high quality model textures.
The recently developed pure Transformer architectures have attained promising accuracy on point cloud learning benchmarks compared to convolutional neural networks. However, existing point cloud Transformers are computationally expensive since they waste a significant amount of time on structuring the irregular data. To solve this shortcoming, we present Sparse Window Attention (SWA) module to gather coarse-grained local features from non-empty voxels, which not only bypasses the expensive irregular data structuring and invalid empty voxel computation, but also obtains linear computational complexity with respect to voxel resolution. Meanwhile, to gather fine-grained features about the global shape, we introduce relative attention (RA) module, a more robust self-attention variant for rigid transformations of objects. Equipped with the SWA and RA, we construct our neural architecture called PVT that integrates both modules into a joint framework for point cloud learning. Compared with previous Transformer-based and attention-based models, our method attains top accuracy of 94.0% on classification benchmark and 10x inference speedup on average. Extensive experiments also valid the effectiveness of PVT on part and semantic segmentation benchmarks (86.6% and 69.2% mIoU, respectively).
Point cloud upsampling has been extensively studied, however the existing approaches suffer from the losing of structural information due to neglect of spatial dependencies between points. In this work, we propose PU-GAT, a novel 3D point cloud upsampling method that leverages graph attention networks to enhance the learning of structural information over the baselines. Specifically, we first design a local-global feature extraction unit to learn multiscale features of the points, while reconstructing the point cloud based on the learned multi-scale features. Then, we construct an up-down-up feature expansion unit, which effectively improves the feature expansion capability. Furthermore, we incorporate self-attention mechanism in both modules to capture global semantic information. Extensive experiments on synthetic and real data have shown that our method achieves superior performance against previous methods quantitatively and qualitatively.
Multi-person motion prediction remains a challenging problem, especially in the joint representation learning of individual motion and social interactions. Most prior methods only involve learning local pose dynamics for individual motion (without global body trajectory) and also struggle to capture complex interaction dependencies for social interactions. In this paper, we propose a novel Social-Aware Motion Transformer (SoMoFormer) to effectively model individual motion and social interactions in a joint manner. Specifically, SoMoFormer extracts motion features from sub-sequences in displacement trajectory space to effectively learn both local and global pose dynamics for each individual. In addition, we devise a novel social-aware motion attention mechanism in SoMoFormer to further optimize dynamics representations and capture interaction dependencies simultaneously via motion similarity calculation across time and social dimensions. On both short- and long-term horizons, we empirically evaluate our framework on multi-person motion datasets and demonstrate that our method greatly outperforms state-of-the-art methods of single- and multi-person motion prediction. Code will be made publicly available upon acceptance.
We present a novel active learning approach for shape cosegmentation based on graph convolutional networks (GCNs). The premise of our approach is to represent the collections of three-dimensional shapes as graph-structured data, where each node in the graph corresponds to a primitive patch of an oversegmented shape, and is associated with a representation initialized by extracting features. Then, the GCN operates directly on the graph to update the representation of each node based on a layer-wise propagation rule, which aggregates information from its neighbors, and predicts the labels for unlabeled nodes. Additionally, we further suggest an active learning strategy that queries the most informative samples to extend the initial training samples of GCN to generate more accurate predictions of our method. Our experimental results on the Shape COSEG dataset demonstrate the effectiveness of our approach.