Wikipedia-based semantic smoothing for the language modeling approach to information retrieval

2010 
Semantic smoothing for the language modeling approach to information retrieval is significant and effective to improve retrieval performance. In previous methods such as the translation model, individual terms or phrases are used to do semantic mapping. These models are not very efficient when faced with ambiguous words and phrases because they are unable to incorporate contextual information. To overcome this limitation, we propose a novel Wikipedia-based semantic smoothing method that decomposes a document into a set of weighted Wikipedia concepts and then maps those unambiguous Wikipedia concepts into query terms. The mapping probabilities from each Wikipedia concept to individual terms are estimated through the EM algorithm. Document models based on Wikipedia concept mapping are then derived. The new smoothing method is evaluated on the TREC Ad Hoc Track (Disks 1, 2, and 3) collections. Experiments show significant improvements over the two-stage language model, as well as the language model with translation-based semantic smoothing.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    8
    Citations
    NaN
    KQI
    []