Simiao Zuo, Qingyu Yin, Haoming Jiang, Shaohui Xi, Bing Yin, Chao Zhang, Tuo Zhao. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track). 2023.
The semantic consistency, loss of detail features, and other problems are caused by the traditional text generation image method's focus on extracting the information transmitted by the previous layer while ignoring the information of the layer's features that are lost during the subsequent propagation process. Text-to-images combining information blending structure and attention (CAGAN) with information fusion capabilities is suggested in order to address the aforementioned issues. The word and sentence levels of the text's characteristics are encoded using the pre-trained BiLSTM model. The network structure includes a feature fusion module that receives upper-level transmission information while maintaining feature information at the same level. The resulting image's semantic coherence is enhanced, and the image's features are refined, thanks to the application of the affine transformation to map the visual features based on natural language descriptions. According to the experimental findings, the CAGAN's FID score in the CUB dataset is 15.93, and its IS index is 4.84±0.04 in the CUB dataset. It has been greatly improved when compared to the current standard AttGAN and DMGAN models, demonstrating the usefulness of the suggested network.
Unsupervised text embedding has shown great power in a wide range of NLP tasks. While text embeddings are typically learned in the Euclidean space, directional similarity is often more effective in tasks such as word similarity and document clustering, which creates a gap between the training stage and usage stage of text embedding. To close this gap, we propose a spherical generative model based on which unsupervised word and paragraph embeddings are jointly learned. To learn text embeddings in the spherical space, we develop an efficient optimization algorithm with convergence guarantee based on Riemannian optimization. Our model enjoys high efficiency and achieves state-of-the-art performances on various text embedding tasks including word similarity and document clustering.
Probabilistic time-series forecasting enables reliable decision making across many domains. Most forecasting problems have diverse sources of data containing multiple modalities and structures. Leveraging information from these data sources for accurate and well-calibrated forecasts is an important but challenging problem. Most previous works on multi-view time-series forecasting aggregate features from each data view by simple summation or concatenation and do not explicitly model uncertainty for each data view. We propose a general probabilistic multi-view forecasting framework CAMul, which can learn representations and uncertainty from diverse data sources. It integrates the information and uncertainty from each data view in a dynamic context-specific manner, assigning more importance to useful views to model a well-calibrated forecast distribution. We use CAMul for multiple domains with varied sources and modalities and show that CAMul outperforms other state-of-art probabilistic forecasting models by over 25% in accuracy and calibration.
Sequential recommendation requires understanding the dynamic patterns of users' behaviors, contexts, and preferences from their historical interactions. Most existing works focus on modeling user-item interactions only from the item level, ignoring that they are driven by latent shopping intentions (e.g., ballpoint pens, miniatures, etc). The detection of the underlying shopping intentions of users based on their historical interactions is a crucial aspect for e-commerce platforms, such as Amazon, to enhance the convenience and efficiency of their customers' shopping experiences. Despite its significance, the area of main shopping intention detection remains under-investigated in the academic literature. To fill this gap, we propose a graph-regularized stochastic Transformer method, G-STO. By considering intentions as sets of products and user preferences as compositions of intentions, we model both of them as stochastic Gaussian embeddings in the latent representation space. Instead of training the stochastic representations from scratch, we develop a global intention relational graph as prior knowledge for regularization, allowing relevant shopping intentions to be distributionally close. Finally, we feed the newly regularized stochastic embeddings into Transformer-based models to encode sequential information from the intention transitions. We evaluate our main shopping intention identification model on three different real-world datasets, where G-STO achieves significantly superior performances to the baselines by 18.08% in Hit@1, 7.01% in Hit@10, and 6.11% in NDCG@10 on average.
Trajectory similarity computation is a fundamental problem for various applications in trajectory data analysis. However, the high computation cost of existing trajectory similarity measures has become the key bottleneck for trajectory analysis at scale. While there have been many research efforts for reducing the complexity, they are specific to one similarity measure and often yield limited speedups. We propose NeuTraj to accelerate trajectory similarity computation. NeuTraj is generic to accommodate any existing trajectory measure and fast to compute the similarity of a given trajectory pair in linear time. Furthermore, NeuTraj is elastic to collaborate with all spatial-based trajectory indexing methods to reduce the search space. NeuTraj samples a number of seed trajectories from the given database, and then uses their pair-wise similarities as guidance to approximate the similarity function with a neural metric learning framework. NeuTraj features two novel modules to achieve accurate approximation of the similarity function: (1) a spatial attention memory module that augments existing recurrent neural networks for trajectory encoding; and (2) a distance-weighted ranking loss that effectively transcribes information from the seed-based guidance. With these two modules, NeuTraj can yield high accuracies and fast convergence rates even if the training data is small. Our experiments on two real-life datasets show that NeuTraj achieves over 80% accuracy on Fre chet, Hausdorff, ERP and DTW measures, which outperforms state-of-the-art baselines consistently and significantly. It obtains 50x-1000x speedup over bruteforce methods and 3x-500x speedup over existing approximate algorithms, while yielding more accurate approximations of the similarity functions.
Large language models (LLMs) have been recently leveraged as training data generators for various natural language processing (NLP) tasks. While previous research has explored different approaches to training models using generated data, they generally rely on simple class-conditional prompts, which may limit the diversity of the generated data and inherit systematic biases of LLM. Thus, we investigate training data generation with diversely attributed prompts (e.g., specifying attributes like length and style), which have the potential to yield diverse and attributed generated data. Our investigation focuses on datasets with high cardinality and diverse domains, wherein we demonstrate that attributed prompts outperform simple class-conditional prompts in terms of the resulting model's performance. Additionally, we present a comprehensive empirical study on data generation encompassing vital aspects like bias, diversity, and efficiency, and highlight three key observations: firstly, synthetic datasets generated by simple prompts exhibit significant biases, such as regional bias; secondly, attribute diversity plays a pivotal role in enhancing model performance; lastly, attributed prompts achieve the performance of simple class-conditional prompts while utilizing only 5\% of the querying cost of ChatGPT associated with the latter. The data and code are available on \url{https://github.com/yueyu1030/AttrPrompt}.