CuLDA_CGS: solving large-scale LDA problems on GPUs

2019 
GPUs have benefited many ML algorithms. However, we observe that the performance of existing Latent Dirichlet Allocation(LDA) solutions on GPUs are not satisfying. We present CuLDA_CGS, an efficient approach to accelerate large-scale LDA problems. We delicately design workload partition and synchronization mechanism to exploit multiple GPUs. We also optimize the algorithm from the sampling algorithm, parallelization, and data compression perspectives. Experiment evaluations show that compared with the state-of-the-art LDA solutions, CuLDA_CGS outperforms them by a large margin (up to 7.3X) on a single GPU.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    35
    References
    3
    Citations
    NaN
    KQI
    []