Search, recommendation, and online advertising are the three most important information-providing mechanisms on the web. These information seeking techniques, satisfying users' information needs by suggesting users personalized objects (information or services) at the appropriate time and place, play a crucial role in mitigating the information overload problem. With recent great advances in deep reinforcement learning (DRL), there have been increasing interests in developing DRL based information seeking techniques. These DRL based techniques have two key advantages -- (1) they are able to continuously update information seeking strategies according to users' real-time feedback, and (2) they can maximize the expected cumulative long-term reward from users where reward has different definitions according to information seeking applications such as click-through rate, revenue, user satisfaction and engagement. In this paper, we give an overview of deep reinforcement learning for search, recommendation, and online advertising from methodologies to applications, review representative algorithms, and discuss some appealing research directions.
Online recommendation and advertising are two major income channels for online recommendation platforms (e.g. e-commerce and news feed site). However, most platforms optimize recommending and advertising strategies by different teams separately via different techniques, which may lead to suboptimal overall performances. To this end, in this paper, we propose a novel two-level reinforcement learning framework to jointly optimize the recommending and advertising strategies, where the first level generates a list of recommendations to optimize user experience in the long run; then the second level inserts ads into the recommendation list that can balance the immediate advertising revenue from advertisers and the negative influence of ads on long-term user experience. To be specific, the first level tackles high combinatorial action space problem that selects a subset items from the large item space; while the second level determines three internally related tasks, i.e., (i) whether to insert an ad, and if yes, (ii) the optimal ad and (iii) the optimal location to insert. The experimental results based on real-world data demonstrate the effectiveness of the proposed framework. We have released the implementation code to ease reproductivity.
Self-attention models have achieved state-of-the-art performance in sequential recommender systems by capturing the sequential dependencies among user-item interactions. However, they rely on positional embeddings to retain the sequential information, which may break the semantics of item embeddings. In addition, most existing works assume that such sequential dependencies exist solely in the item embeddings, but neglect their existence among the item features. In this work, we propose a novel sequential recommender system (MLP4Rec) based on the recent advances of MLP-based architectures, which is naturally sensitive to the order of items in a sequence. To be specific, we develop a tri-directional fusion scheme to coherently capture sequential, cross-channel and cross-feature correlations. Extensive experiments demonstrate the effectiveness of MLP4Rec over various representative baselines upon two benchmark datasets. The simple architecture of MLP4Rec also leads to the linear computational complexity as well as much fewer model parameters than existing self-attention methods.
Feature selection plays an impactful role in deep recommender systems, which selects a subset of the most predictive features, so as to boost the recommendation performance and accelerate model optimization. The majority of existing feature selection methods, however, aim to select only a fixed subset of features. This setting cannot fit the dynamic and complex environments of practical recommender systems, where the contribution of a specific feature varies significantly across user-item interactions. In this paper, we propose an adaptive feature selection framework, AdaFS, for deep recommender systems. To be specific, we develop a novel controller network to automatically select the most relevant features from the whole feature space, which fits the dynamic recommendation environment better. Besides, different from classic feature selection approaches, the proposed controller can adaptively score each example of user-item interactions, and identify the most informative features correspondingly for subsequent recommendation tasks. We conduct extensive experiments based on two public benchmark datasets from a real-world recommender system. Experimental results demonstrate the effectiveness of AdaFS, and its excellent transferability to the most popular deep recommendation models.
Deep Recommender Systems (DRS) are increasingly dependent on a large number of feature fields for more precise recommendations. Effective feature selection methods are consequently becoming critical for further enhancing the accuracy and optimizing storage efficiencies to align with the deployment demands. This research area, particularly in the context of DRS, is nascent and faces three core challenges. Firstly, variant experimental setups across research papers often yield unfair comparisons, obscuring practical insights. Secondly, the existing literature's lack of detailed analysis on selection attributes, based on large-scale datasets and a thorough comparison among selection techniques and DRS backbones, restricts the generalizability of findings and impedes deployment on DRS. Lastly, research often focuses on comparing the peak performance achievable by feature selection methods, an approach that is typically computationally infeasible for identifying the optimal hyperparameters and overlooks evaluating the robustness and stability of these methods. To bridge these gaps, this paper presents ERASE, a comprehensive bEnchmaRk for feAture SElection for DRS. ERASE comprises a thorough evaluation of eleven feature selection methods, covering both traditional and deep learning approaches, across four public datasets, private industrial datasets, and a real-world commercial platform, achieving significant enhancement. Our code is available online for ease of reproduction.
Deep learning has been widely applied in recommender systems, which has achieved revolutionary progress recently. However, most existing learning-based methods assume that the user and item distributions remain unchanged between the training phase and the test phase. However, the distribution of user and item features can naturally shift in real-world scenarios, potentially resulting in a substantial decrease in recommendation performance. This phenomenon can be formulated as an Out-Of-Distribution (OOD) recommendation problem. To address this challenge, we propose a novel Dual Test-Time-Training framework for OOD Recommendation, termed DT3OR. In DT3OR, we incorporate a model adaptation mechanism during the test-time phase to carefully update the recommendation model, allowing the model to specially adapt to the shifting user and item features. To be specific, we propose a self-distillation task and a contrastive task to assist the model learning both the user's invariant interest preferences and the variant user/item characteristics during the test-time phase, thus facilitating a smooth adaptation to the shifting features. Furthermore, we provide theoretical analysis to support the rationale behind our dual test-time training framework. To the best of our knowledge, this paper is the first work to address OOD recommendation via a test-time-training strategy. We conduct experiments on three datasets with various backbones. Comprehensive experimental results have demonstrated the effectiveness of DT3OR compared to other state-of-the-art baselines.
Knowledge graphs (KGs), which consist of triples, are inherently incomplete and always require completion procedure to predict missing triples. In real-world scenarios, KGs are distributed across clients, complicating completion tasks due to privacy restrictions. Many frameworks have been proposed to address the issue of federated knowledge graph completion. However, the existing frameworks, including FedE, FedR, and FEKG, have certain limitations. = FedE poses a risk of information leakage, FedR's optimization efficacy diminishes when there is minimal overlap among relations, and FKGE suffers from computational costs and mode collapse issues. To address these issues, we propose a novel method, i.e., Federated Latent Embedding Sharing Tensor factorization (FLEST), which is a novel approach using federated tensor factorization for KG completion. FLEST decompose the embedding matrix and enables sharing of latent dictionary embeddings to lower privacy risks. Empirical results demonstrate FLEST's effectiveness and efficiency, offering a balanced solution between performance and privacy. FLEST expands the application of federated tensor factorization in KG completion tasks.
Considering the balance between revenue and resource consumption for industrial recommender systems, intelligent recommendation computing has been emerging recently. Existing solutions deploy the same recommendation model to serve users indiscriminately, which is sub-optimal for total revenue maximization. We propose a multi-model service solution by deploying different-complexity models to serve different-valued users. An automated dynamic model generation framework AutoGen is elaborated to efficiently derive multiple parameter-sharing models with diverse complexities and adequate predictive capabilities. A mixed search space is designed and an importance-aware progressive training scheme is proposed to prevent interference between different architectures, which avoids the model retraining and improves the search efficiency, thereby efficiently deriving multiple models. Extensive experiments are conducted on two public datasets to demonstrate the effectiveness and efficiency of AutoGen.
CD19-directed chimeric antigen receptor-T (CAR-T) cells with a 4-1BB or CD28 co-stimulatory domain have shown impressive antitumor activity against relapsed or refractory B cell acute lymphoblastic leukemia (r/r B-ALL). However, a parallel comparison of their performances in r/r B-ALL therapy has not been sufficiently reported. Here, we manufactured 4-1BB- and CD28-based CD19 CAR-T cells using the same process technology and evaluated their efficacy and safety in r/r B-ALL therapy based on pre-clinical and exploratory clinical investigations. In B-ALL-bearing mice, a similar antitumor effect and CAR-T kinetics in peripheral blood were observed at the CAR-T dose of 1 × 107/mouse. However, when the dose was decreased to 1 × 106/mouse, 4-1BB CAR-T cells were more potent in eradicating tumor cells and showed longer persistence than CD28 CAR-T cells. Retrospective analysis of an exploratory clinical study that used 4-1BB- or CD28-based CAR-T cells to treat r/r B-ALL was performed. Compared with CD28 CAR-T cells, 4-1BB CAR-T cells resulted in higher antitumor efficacy and less severe adverse events. This study demonstrated that the performance of 4-1BB CAR-T cells was superior to that of CD28 CAR-T cells in suppressing CD19+ B-ALL, at least under our manufacturing process. CD19-directed chimeric antigen receptor-T (CAR-T) cells with a 4-1BB or CD28 co-stimulatory domain have shown impressive antitumor activity against relapsed or refractory B cell acute lymphoblastic leukemia (r/r B-ALL). However, a parallel comparison of their performances in r/r B-ALL therapy has not been sufficiently reported. Here, we manufactured 4-1BB- and CD28-based CD19 CAR-T cells using the same process technology and evaluated their efficacy and safety in r/r B-ALL therapy based on pre-clinical and exploratory clinical investigations. In B-ALL-bearing mice, a similar antitumor effect and CAR-T kinetics in peripheral blood were observed at the CAR-T dose of 1 × 107/mouse. However, when the dose was decreased to 1 × 106/mouse, 4-1BB CAR-T cells were more potent in eradicating tumor cells and showed longer persistence than CD28 CAR-T cells. Retrospective analysis of an exploratory clinical study that used 4-1BB- or CD28-based CAR-T cells to treat r/r B-ALL was performed. Compared with CD28 CAR-T cells, 4-1BB CAR-T cells resulted in higher antitumor efficacy and less severe adverse events. This study demonstrated that the performance of 4-1BB CAR-T cells was superior to that of CD28 CAR-T cells in suppressing CD19+ B-ALL, at least under our manufacturing process.