GPU Accelerated Boosted Trees and Deep Neural Networks for Better Recommender Systems

2021 
In this paper we present our 1st place solution of the ACM RecSys 2021 challenge. Twitter provided a dataset of around 1 billion tweets-user pairs to develop models predicting user engagements. The challenge simulates a realistic production environment by introducing latency constraints of an average of 6ms per example on a single CPU core with 64GB memory. Our solution ranked first having the highest score in each of the eight performance metrics to calculate the final position. Our final submission is an ensemble of stacked models, using in total 5 XGBoost models and 3 neural networks. Although ensembles are rarely used in production environments, we demonstrate their superiority and feasibility to inference the test dataset given the time limitation and computational resources. As an alternative to retraining, continuous training or fine-tuning models when new data is available, we stacked trained models to leverage additional data. Stacking requires less data to calibrate the trained models. Finally, we analyze the benefits of a GPU-accelerated production environment. Using open source libraries, such as Forrest Inference Library, NVTabular, RAPIDS cuDF, PyTorch and TensorFlow, we are able to accelerate our final solution by 257x from 23 hours and 40 minutes to 5.5 minutes and reduce total cost by 88% using a single NVIDIA A100 GPU with 40GB memory, enabling opportunities for significant cost savings and larger models with higher accuracies. We published our solution on github.com1
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    5
    References
    0
    Citations
    NaN
    KQI
    []