CuLDA_CGS: solving large-scale LDA problems on GPUs

Xiaolong Xie,Yun Liang,Xiuhong Li,Wei Tan

CuLDA_CGS: solving large-scale LDA problems on GPUs

2019

Xiaolong Xie
Yun Liang
Xiuhong Li
Wei Tan

GPUs have benefited many ML algorithms. However, we observe that the performance of existing Latent Dirichlet Allocation(LDA) solutions on GPUs are not satisfying. We present CuLDA_CGS, an efficient approach to accelerate large-scale LDA problems. We delicately design workload partition and synchronization mechanism to exploit multiple GPUs. We also optimize the algorithm from the sampling algorithm, parallelization, and data compression perspectives. Experiment evaluations show that compared with the state-of-the-art LDA solutions, CuLDA_CGS outperforms them by a large margin (up to 7.3X) on a single GPU.

Keywords:

Computer science
Synchronization
Parallel computing
CUDA
Latent Dirichlet allocation
Workload
Exploit
Sampling (statistics)
Data compression
Topic model
Scalability
Source code
Speedup
Caffè
Throughput
Distributed computing

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations