Digesting Multilingual Reader Comments via Latent Discussion Topics with Commonality and Specificity

2016 
Many news websites from different regions in the world allow readers to write comments in their own languages about an event. Digesting such enormous amount of comments in different languages is difficult. One elegant way to digest and organize these comments is to detect latent discussion topics with the consideration of language attributes. Some discussion topics are common topics shared between languages whereas some topics are specifically dominated by a particular language. To tackle this task of discovering discussion topics that exhibit commonality or specificity from news reader comments written in different languages, we propose a new model called TDCS based on graphical models, which can cope with the language gap and detect language-common and language-specific latent discussion topics simultaneously. Our TDCS model also exploits comment-oriented clues via a scalable Dirichlet Multinomial Regression method. To learn the model parameters, we develop an inference method which alternates between EM and Gibbs sampling. Experimental results show that our proposed TDCS model can provide an effective way to digest multilingual news reader comments.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    0
    Citations
    NaN
    KQI
    []