A universal topic framework (UniZ) and its application in online search

2015 
Probabilistic topic models, such as PLSA and LDA, are gaining popularity in many fields due to their high-quality results. Unfortunately, existing topic models suffer from two drawbacks: (1) model complexity and (2) disjoint topic groups. That is, when a topic model involves multiple entities (such as authors, papers, conferences, and institutions) and they are connected through multiple relationships, the model becomes too difficult to analyze and often leads to in-tractable solutions. Also, different entity types are classified into disjoint topic groups that are not directly comparable, so it is difficult to see whether heterogeneous entities (such as authors and conferences) are on the same topic or not (e.g., are Rakesh Agrawal and KDD related to the same topic?). In this paper, we propose a novel universal topic framework (UniZ) that addresses these two drawbacks using "prior topic incorporation." Since our framework enables representation of heterogeneous entities in a single universal topic space, all entities can be directly compared within the same topic space. In addition, UniZ breaks complex models into much smaller units, learns the topic group of each entity from the smaller units, and then propagates the learned topics to others. This way, it leverages all the available signals without introducing significant computational complexity, enabling a richer representation of entities and highly accurate results. In a widely-used DBLP dataset prediction problem, our approach achieves the best prediction performance over many state-of-the-art methods. We also demonstrate practical potential of our approach with search logs from a commercial search engine.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    2
    Citations
    NaN
    KQI
    []