A knowledge graph (KG) consists of a set of interconnected typed entities and their attributes. Recently, KGs are popularly used as the auxiliary information to enable more accurate, explainable, and diverse user preference recommendations. Specifically, existing KG-based recommendation methods target modeling high-order relations/dependencies from long connectivity user-item interactions hidden in KG. However, most of them ignore the cold-start problems (i.e., user cold-start and item cold-start) of recommendation analytics, which restricts their performance in scenarios when involving new users or new items. Inspired by the success of meta-learning on scarce training samples, we propose a novel meta-learning based framework called MetaKG, which encompasses a collaborative-aware meta learner and a knowledge-aware meta learner, to capture meta users' preference and entities' knowledge for cold-start recommendations. The collaborative-aware meta learner aims to locally aggregate user preferences for each user preference learning task. In contrast, the knowledge-aware meta learner is to globally generalize knowledge representation across different user preference learning tasks. Guided by two meta learners, MetaKG can effectively capture the high-order collaborative relations and semantic representations, which could be easily adapted to cold-start scenarios. Besides, we devise a novel adaptive task scheduler which can adaptively select the informative tasks for meta learning in order to prevent the model from being corrupted by noisy tasks. Extensive experiments on various cold-start scenarios using three real data sets demonstrate that our presented MetaKG outperforms all the existing state-of-the-art competitors in terms of effectiveness, efficiency, and scalability.
Collaborative Filtering (CF) is a widely used and effective technique for recommender systems. In recent decades, there have been significant advancements in latent embedding-based CF methods for improved accuracy, such as matrix factorization, neural collaborative filtering, and LightGCN. However, the explainability of these models has not been fully explored. Adding explainability to recommendation models can not only increase trust in the decision-making process, but also have multiple benefits such as providing persuasive explanations for item recommendations, creating explicit profiles for users and items, and assisting item producers in design improvements.
Trajectory data has the potential to greatly benefit a wide-range of real-world applications, such as tracking the spread of the disease through people's movement patterns and providing personalized location-based services based on travel preference. However, privacy concerns and data protection regulations have limited the extent to which this data is shared and utilized. To overcome this challenge, local differential privacy provides a solution by allowing people to share a perturbed version of their data, ensuring privacy as only the data owners have access to the original information. Despite its potential, existing point-based perturbation mechanisms are not suitable for real-world scenarios due to poor utility, dependence on external knowledge, high computational overhead, and vulnerability to attacks. To address these limitations, we introduce LDPTrace, a novel locally differentially private trajectory synthesis framework. Our framework takes into account three crucial patterns inferred from users' trajectories in the local setting, allowing us to synthesize trajectories that closely resemble real ones with minimal computational cost. Additionally, we present a new method for selecting a proper grid granularity without compromising privacy. Our extensive experiments using real-world as well as synthetic data, various utility metrics and attacks, demonstrate the efficacy and efficiency of LDPTrace.
Knowledge graph (KG) plays an increasingly important role to improve the recommendation performance and interpretability. A recent technical trend is to design end-to-end models based on information propagation schemes. However, existing propagation-based methods fail to (1) model the underlying hierarchical structures and relations, and (2) capture the high-order collaborative signals of items for learning high-quality user and item representations. In this paper, we propose a new model, called Hierarchy-Aware Knowledge Gated Network (HAKG), to tackle the aforementioned problems. Technically, we model users and items (that are captured by a user-item graph), as well as entities and relations (that are captured in a KG) in hyperbolic space, and design a hyperbolic aggregation scheme to gather relational contexts over KG. Meanwhile, we introduce a novel angle constraint to preserve characteristics of items in the embedding space. Furthermore, we propose a dual item embeddings design to represent and propagate collaborative signals and knowledge associations separately, and leverage the gated aggregation to distill discriminative information for better capturing user behavior patterns. Experimental results on three benchmark datasets show that, HAKG achieves significant improvement over the state-of-the-art methods like CKAN, Hyper-Know, and KGIN. Further analyses on the learned hyperbolic embeddings confirm that HAKG offers meaningful insights into the hierarchies of data.
Federated learning (FL) has emerged as a paradigm to train a global machine learning model in a distributed manner while taking privacy concerns and data protection regulations into consideration. Although a variety of FL algorithms have been proposed, the training efficiency of FL remains challenging due to massive mathematical computations and expensive client-server communication costs. However, existing FL-acceleration studies are limited as they can only solve the computation and communication overheads separately, which is suboptimal and constrains their acceleration ability. Moreover, previous studies are typically designed for specific FL scenarios and can support only one or two FL models, thus exhibiting poor generality.To fill these critical voids, we propose FLBooster, which provides unified and efficient acceleration capacity for a broad range of FL models. This is the first proposal to solve the computation and communication overheads simultaneously. Specifically, we utilize GPUs to boost the computation-intensive homomorphic encryption (HE) operations in a parallel manner, which significantly reduces the computation costs. On the other hand, a simple but efficient compression method is designed to lighten the exchange of data volumes between client and server. Extensive experiments using four standard FL models on three datasets show that FLBooster acquires superior speed-up gains (i.e., 14.3× – 138×) over state-of-the-art acceleration systems. Finally, we integrate FLBooster into the open-source FL benchmark FATE and offer user-friendly APIs for development.
Deep learning based trajectory similarity computation holds the potential for improved efficiency and adaptability over traditional similarity computation. However, existing learning-based trajectory similarity learning solutions prioritize spatial similarity over temporal similarity, making them suboptimal for time-aware analyses. To this end, we propose ST2Vec, a representation learning based solution that considers fine-grained spatial and temporal relations between trajectories to enable spatio-temporal similarity computation in road networks. Specifically, ST2Vec encompasses two steps: (i) spatial and temporal modeling that encode spatial and temporal information of trajectories, where a generic temporal modeling module is proposed for the first time; and (ii) spatio-temporal co-attention fusion, where two fusion strategies are designed to enable the generation of unified spatio-temporal embeddings of trajectories. Further, under the guidance of triplet loss, ST2Vec employs curriculum learning in model optimization to improve convergence and effectiveness. An experimental study offers evidence that ST2Vec outperforms state-of-the-art competitors substantially in terms of effectiveness and efficiency, while showing low parameter sensitivity and good model robustness. Moreover, similarity involved case studies including top-k querying and DBSCAN clustering offer further insight into the capabilities of ST2Vec.
Traffic prediction has drawn increasing attention for its ubiquitous real-life applications in traffic management, urban computing, public safety, and so on. Recently, the availability of massive trajectory data and the success of deep learning motivate a plethora of deep traffic prediction studies. However, the existing neural-network-based approaches tend to ignore the correlations between multiple types of moving objects located in the same spatio-temporal traffic area, which is suboptimal for traffic prediction analytics. In this paper, we propose a multi-source deep traffic prediction framework over spatio-temporal trajectory data, termed as MDTP. The framework includes two phases: spatio-temporal feature modeling and multi-source bridging. We present an enhanced graph convolutional network (GCN) model combined with long short-term memory network (LSTM) to capture the spatial dependencies and temporal dynamics of traffic in the feature modeling phase. In the multi-source bridging phase, we propose two methods, Sum and Concat, to connect the learned features from different trajectory data sources. Extensive experiments on two real-life datasets show that MDTP i) has superior efficiency, compared with classical time-series methods, machine learning methods, and state-of-the-art neural-network-based approaches; ii) offers a significant performance improvement over the single-source traffic prediction approach; and iii) performs traffic predictions in seconds even on tens of millions of trajectory data. we develop MDTP + , a user-friendly interactive system to demonstrate traffic prediction analysis.
Trajectory clustering has played an essential role in trajectory mining tasks. It serves in a wide range of real-life applications, including transportation, location-based services, behavioral study, and so on. To support trajectory clustering analytics, a plethora of trajectory clustering methods have been proposed, which mainly extend traditional clustering algorithms by using spatio-temporal characteristics of trajectories. However, existing traditional trajectory clustering approaches based on raw trajectory representation highly rely on hand-craft similarity metrics, and can not capture hidden spatial dependencies in trajectory data, which is inefficient and inflexible for clustering analysis. To this end, we propose an end-to-end deep trajectory clustering framework via self-training, termed as E 2 DTC, inspired by the data-driven capabilities of deep neural networks. E 2 DTC does not require any additional manual feature extraction operations, and can be easily adapted for trajectory clustering analytics on any trajectory dataset. Extensive experimental evaluations on three real-life datasets show that our framework E 2 DTC achieves superior accuracy and efficiency, compared with classical clustering methods (i.e., K-Medoids) and state-of-the-art neural-network based approaches (i.e., t2vec).
Materialized model query aims to find the most appropriate materialized model as the initial model for model reuse. It is the precondition of model reuse, and has recently attracted much attention. Nonetheless, the existing methods suffer from the need to provide source data, limited range of applications, and inefficiency since they do not construct a suitable metric to measure the target-related knowledge of materialized models. To address this, we present ${\sf MMQ}$ , a source-data free, general, efficient, and effective materialized model query framework. It uses a Gaussian mixture-based metric called separation degree to rank materialized models. For each materialized model, ${\sf MMQ}$ first vectorizes the samples in the target dataset into probability vectors by directly applying this model, then utilizes Gaussian distribution to fit for each class of probability vectors, and finally uses separation degree on the Gaussian distributions to measure the target-related knowledge of the materialized model. Moreover, we propose an improved ${\sf MMQ}$ ( ${\sf I\text{-}MMQ}$ ), which significantly reduces the query time while retaining the query performance of ${\sf MMQ}$ . Extensive experiments on a range of practical model reuse workloads demonstrate the effectiveness and efficiency of ${\sf MMQ}$ .