Yuntao Du

Harbin Institute of Technology

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

MetaKG: Meta-Learning on Knowledge Graph for Cold-Start Recommendation

IEEE Transactions on Knowledge and Data Engineering (2022)

Yuntao Du Xinjun Zhu Lu Chen Ziquan Fang Yunjun Gao

A knowledge graph (KG) consists of a set of interconnected typed entities and their attributes. Recently, KGs are popularly used as the auxiliary information to enable more accurate, explainable, and diverse user preference recommendations. Specifically, existing KG-based recommendation methods target modeling high-order relations/dependencies from long connectivity user-item interactions hidden in KG. However, most of them ignore the cold-start problems (i.e., user cold-start and item cold-start) of recommendation analytics, which restricts their performance in scenarios when involving new users or new items. Inspired by the success of meta-learning on scarce training samples, we propose a novel meta-learning based framework called MetaKG, which encompasses a collaborative-aware meta learner and a knowledge-aware meta learner, to capture meta users' preference and entities' knowledge for cold-start recommendations. The collaborative-aware meta learner aims to locally aggregate user preferences for each user preference learning task. In contrast, the knowledge-aware meta learner is to globally generalize knowledge representation across different user preference learning tasks. Guided by two meta learners, MetaKG can effectively capture the high-order collaborative relations and semantic representations, which could be easily adapted to cold-start scenarios. Besides, we devise a novel adaptive task scheduler which can adaptively select the informative tasks for meta learning in order to prevent the model from being corrupted by noisy tasks. Extensive experiments on various cold-start scenarios using three real data sets demonstrate that our presented MetaKG outperforms all the existing state-of-the-art competitors in terms of effectiveness, efficiency, and scalability.

Cold start (automotive)

Preference learning

Feature Learning

10.1109/tkde.2022.3168775

Cite

Citations (46)

Real-Time Trajectory Synthesis with Local Differential Privacy

2022 IEEE 38th International Conference on Data Engineering (ICDE) (2024)

Yujia Hu Yuntao Du Zhikun Zhang Ziquan Fang Lu Chen

Differential Privacy

10.1109/icde60146.2024.00137

Cite

Citations (0)

Towards Explainable Collaborative Filtering with Taste Clusters Learning

Proceedings of the ACM Web Conference 2022 (2023)

Yuntao Du Jianxun Lian Jing Yao Xiting Wang Mingqi Wu

Collaborative Filtering (CF) is a widely used and effective technique for recommender systems. In recent decades, there have been significant advancements in latent embedding-based CF methods for improved accuracy, such as matrix factorization, neural collaborative filtering, and LightGCN. However, the explainability of these models has not been fully explored. Adding explainability to recommendation models can not only increase trust in the decision-making process, but also have multiple benefits such as providing persuasive explanations for item recommendations, creating explicit profiles for users and items, and assisting item producers in design improvements.

10.1145/3543507.3583303

Cite

Citations (3)

LDPTrace: Locally Differentially Private Trajectory Synthesis

Proceedings of the VLDB Endowment (2023)

Yuntao Du Yujia Hu Zhikun Zhang Ziquan Fang Lu Chen

Trajectory data has the potential to greatly benefit a wide-range of real-world applications, such as tracking the spread of the disease through people's movement patterns and providing personalized location-based services based on travel preference. However, privacy concerns and data protection regulations have limited the extent to which this data is shared and utilized. To overcome this challenge, local differential privacy provides a solution by allowing people to share a perturbed version of their data, ensuring privacy as only the data owners have access to the original information. Despite its potential, existing point-based perturbation mechanisms are not suitable for real-world scenarios due to poor utility, dependence on external knowledge, high computational overhead, and vulnerability to attacks. To address these limitations, we introduce LDPTrace, a novel locally differentially private trajectory synthesis framework. Our framework takes into account three crucial patterns inferred from users' trajectories in the local setting, allowing us to synthesize trajectories that closely resemble real ones with minimal computational cost. Additionally, we present a new method for selecting a proper grid granularity without compromising privacy. Our extensive experiments using real-world as well as synthetic data, various utility metrics and attacks, demonstrate the efficacy and efficiency of LDPTrace.

Differential Privacy

Granularity

Synthetic data

10.14778/3594512.3594520

Cite

Citations (19)

HAKG: Hierarchy-Aware Knowledge Gated Network for Recommendation

arXiv (Cornell University) (2022)

Yuntao Du Xinjun Zhu Lu Chen Baihua Zheng Yunjun Gao

Knowledge graph (KG) plays an increasingly important role to improve the recommendation performance and interpretability. A recent technical trend is to design end-to-end models based on information propagation schemes. However, existing propagation-based methods fail to (1) model the underlying hierarchical structures and relations, and (2) capture the high-order collaborative signals of items for learning high-quality user and item representations. In this paper, we propose a new model, called Hierarchy-Aware Knowledge Gated Network (HAKG), to tackle the aforementioned problems. Technically, we model users and items (that are captured by a user-item graph), as well as entities and relations (that are captured in a KG) in hyperbolic space, and design a hyperbolic aggregation scheme to gather relational contexts over KG. Meanwhile, we introduce a novel angle constraint to preserve characteristics of items in the embedding space. Furthermore, we propose a dual item embeddings design to represent and propagate collaborative signals and knowledge associations separately, and leverage the gated aggregation to distill discriminative information for better capturing user behavior patterns. Experimental results on three benchmark datasets show that, HAKG achieves significant improvement over the state-of-the-art methods like CKAN, Hyper-Know, and KGIN. Further analyses on the learned hyperbolic embeddings confirm that HAKG offers meaningful insights into the hierarchies of data.

Interpretability

Leverage (statistics)

Discriminative model

Knowledge graph

Benchmark (surveying)

10.48550/arxiv.2204.04959

Cite

Citations (4)

FLBooster: A Unified and Efficient Platform for Federated Learning Acceleration

2022 IEEE 38th International Conference on Data Engineering (ICDE) (2023)

Zhihao Zeng Yuntao Du Ziquan Fang Lu Chen Shiliang Pu

Federated learning (FL) has emerged as a paradigm to train a global machine learning model in a distributed manner while taking privacy concerns and data protection regulations into consideration. Although a variety of FL algorithms have been proposed, the training efficiency of FL remains challenging due to massive mathematical computations and expensive client-server communication costs. However, existing FL-acceleration studies are limited as they can only solve the computation and communication overheads separately, which is suboptimal and constrains their acceleration ability. Moreover, previous studies are typically designed for specific FL scenarios and can support only one or two FL models, thus exhibiting poor generality.To fill these critical voids, we propose FLBooster, which provides unified and efficient acceleration capacity for a broad range of FL models. This is the first proposal to solve the computation and communication overheads simultaneously. Specifically, we utilize GPUs to boost the computation-intensive homomorphic encryption (HE) operations in a parallel manner, which significantly reduces the computation costs. On the other hand, a simple but efficient compression method is designed to lighten the exchange of data volumes between client and server. Extensive experiments using four standard FL models on three datasets show that FLBooster acquires superior speed-up gains (i.e., 14.3× – 138×) over state-of-the-art acceleration systems. Finally, we integrate FLBooster into the open-source FL benchmark FATE and offer user-friendly APIs for development.

Homomorphic Encryption

Benchmark (surveying)

Generality

Models of communication

10.1109/icde55515.2023.00241

Cite

Citations (3)

Spatio-Temporal Trajectory Similarity Learning in Road Networks

Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2022)

Ziquan Fang Yuntao Du Xinjun Zhu Danlei Hu Lu Chen

Deep learning based trajectory similarity computation holds the potential for improved efficiency and adaptability over traditional similarity computation. However, existing learning-based trajectory similarity learning solutions prioritize spatial similarity over temporal similarity, making them suboptimal for time-aware analyses. To this end, we propose ST2Vec, a representation learning based solution that considers fine-grained spatial and temporal relations between trajectories to enable spatio-temporal similarity computation in road networks. Specifically, ST2Vec encompasses two steps: (i) spatial and temporal modeling that encode spatial and temporal information of trajectories, where a generic temporal modeling module is proposed for the first time; and (ii) spatio-temporal co-attention fusion, where two fusion strategies are designed to enable the generation of unified spatio-temporal embeddings of trajectories. Further, under the guidance of triplet loss, ST2Vec employs curriculum learning in model optimization to improve convergence and effectiveness. An experimental study offers evidence that ST2Vec outperforms state-of-the-art competitors substantially in terms of effectiveness and efficiency, while showing low parameter sensitivity and good model robustness. Moreover, similarity involved case studies including top-k querying and DBSCAN clustering offer further insight into the capabilities of ST2Vec.

Robustness

Similarity (geometry)

Feature Learning

10.1145/3534678.3539375

Cite

Citations (24)

MDTP

Proceedings of the VLDB Endowment (2021)

Ziquan Fang Pan Lu Chen Lü Yuntao Du Yunjun Gao

Traffic prediction has drawn increasing attention for its ubiquitous real-life applications in traffic management, urban computing, public safety, and so on. Recently, the availability of massive trajectory data and the success of deep learning motivate a plethora of deep traffic prediction studies. However, the existing neural-network-based approaches tend to ignore the correlations between multiple types of moving objects located in the same spatio-temporal traffic area, which is suboptimal for traffic prediction analytics. In this paper, we propose a multi-source deep traffic prediction framework over spatio-temporal trajectory data, termed as MDTP. The framework includes two phases: spatio-temporal feature modeling and multi-source bridging. We present an enhanced graph convolutional network (GCN) model combined with long short-term memory network (LSTM) to capture the spatial dependencies and temporal dynamics of traffic in the feature modeling phase. In the multi-source bridging phase, we propose two methods, Sum and Concat, to connect the learned features from different trajectory data sources. Extensive experiments on two real-life datasets show that MDTP i) has superior efficiency, compared with classical time-series methods, machine learning methods, and state-of-the-art neural-network-based approaches; ii) offers a significant performance improvement over the single-source traffic prediction approach; and iii) performs traffic predictions in seconds even on tens of millions of trajectory data. we develop MDTP + , a user-friendly interactive system to demonstrate traffic prediction analysis.

Feature (linguistics)

10.14778/3457390.3457394

Cite

Citations (30)

E²DTC: An End to End Deep Trajectory Clustering Framework via Self-Training

2022 IEEE 38th International Conference on Data Engineering (ICDE) (2021)

Ziquan Fang Yuntao Du Chen Lü Yujia Hu Yunjun Gao

Trajectory clustering has played an essential role in trajectory mining tasks. It serves in a wide range of real-life applications, including transportation, location-based services, behavioral study, and so on. To support trajectory clustering analytics, a plethora of trajectory clustering methods have been proposed, which mainly extend traditional clustering algorithms by using spatio-temporal characteristics of trajectories. However, existing traditional trajectory clustering approaches based on raw trajectory representation highly rely on hand-craft similarity metrics, and can not capture hidden spatial dependencies in trajectory data, which is inefficient and inflexible for clustering analysis. To this end, we propose an end-to-end deep trajectory clustering framework via self-training, termed as E ² DTC, inspired by the data-driven capabilities of deep neural networks. E ² DTC does not require any additional manual feature extraction operations, and can be easily adapted for trajectory clustering analytics on any trajectory dataset. Extensive experimental evaluations on three real-life datasets show that our framework E ² DTC achieves superior accuracy and efficiency, compared with classical clustering methods (i.e., K-Medoids) and state-of-the-art neural-network based approaches (i.e., t2vec).

10.1109/icde51399.2021.00066

Cite

Citations (18)

Finding Materialized Models for Model Reuse

IEEE Transactions on Knowledge and Data Engineering (2023)

Minjun Zhao Lu Chen Keyu Yang Yuntao Du Yunjun Gao

Materialized model query aims to find the most appropriate materialized model as the initial model for model reuse. It is the precondition of model reuse, and has recently attracted much attention. Nonetheless, the existing methods suffer from the need to provide source data, limited range of applications, and inefficiency since they do not construct a suitable metric to measure the target-related knowledge of materialized models. To address this, we present

${\sf MMQ}$

, a source-data free, general, efficient, and effective materialized model query framework. It uses a Gaussian mixture-based metric called separation degree to rank materialized models. For each materialized model,

${\sf MMQ}$

first vectorizes the samples in the target dataset into probability vectors by directly applying this model, then utilizes Gaussian distribution to fit for each class of probability vectors, and finally uses separation degree on the Gaussian distributions to measure the target-related knowledge of the materialized model. Moreover, we propose an improved

${\sf MMQ}$

(

${\sf I\text{-}MMQ}$

), which significantly reduces the query time while retaining the query performance of

${\sf MMQ}$

. Extensive experiments on a range of practical model reuse workloads demonstrate the effectiveness and efficiency of

${\sf MMQ}$

10.1109/tkde.2023.3270923

Cite

Citations (1)