Efficient Algorithms for Finding Approximate Heavy Hitters in Personalized PageRanks
2018
Given a directed graph G, a source node s, and a target node t, the personalized PageRank (PPR of t with respect to s is the probability that a random walk starting from s terminates at t. The average of the personalized PageRank score of t with respect to each source node υ∈ V is exactly the PageRank score π( t ) of node t , which denotes the overall importance of node t in the graph. A heavy hitter of node t is a node whose contribution to π( t ) is above a φ fraction, where φ is a value between 0 and 1. Finding heavy hitters has important applications in link spam detection, classification of web pages, and friend recommendations. In this paper, we propose BLOG, an efficient framework for three types of heavy hitter queries: the pairwise approximate heavy hitter (AHH), the reverse AHH, and the multi-source reverse AHH queries. For pairwise AHH queries, our algorithm combines the Monte-Carlo approach and the backward propagation approach to reduce the cost of both methods, and incorporates new techniques to deal with high in-degree nodes. For reverse AHH and multi-source reverse AHH queries, our algorithm extends the ideas behind the pairwise AHH algorithm with a new "logarithmic bucketing'' technique to improve the query efficiency. Extensive experiments demonstrate that our BLOG is far more efficient than alternative solutions on the three queries.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
35
References
12
Citations
NaN
KQI