BBM: bayesian browsing model from petabyte-scale data

Chao Liu,Fan Guo,Christos Faloutsos

BBM: bayesian browsing model from petabyte-scale data

2009

Chao Liu
Fan Guo
Christos Faloutsos

Given a quarter of petabyte click log data, how can we estimate the relevance of each URL for a given query? In this paper, we propose the Bayesian Browsing Model (BBM), a new modeling technique with following advantages: (a) it does exact inference; (b) it is single-pass and parallelizable; (c) it is effective. We present two sets of experiments to test model effectiveness and efficiency. On the first set of over 50 million search instances of 1.1 million distinct queries, BBM out-performs the state-of-the-art competitor by 29.2% in log-likelihood while being 57 times faster. On the second click-log set, spanning a quarter of petabyte data, we showcase the scalability of BBM: we implemented it on a commercial MapReduce cluster, and it took only 3 hours to compute the relevance for 1.15 billion distinct query-URL pairs.

Keywords:

Machine learning
Computer science
Parallelizable manifold
Data mining
Artificial intelligence
Petabyte
Inference
Scalability
Bayesian probability
Bayesian inference
click model
log data

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations