Profile Diversity for Query Processing using User Recommendations

2015 
More than 90% of the queries submitted to content sharing platforms, such as Flickr, are vague, i.e.only contain a few keywords, thus complicating the task of effectively returning interesting results. To overcome this limitation, many platforms use recommendation strategies to filter the results. But, recommendations usually tend to return highly redundant items. Content diversification has been studied as a solution to overcome this problem. However, it may suffer from at least two limitations: poor content description and semantic ambiguity.In this paper, we investigate profile diversity for searching web items. Profile diversification enables to address the problem of returning redundant items, and enhances the quality of diversification. We propose a threshold-based approach to return the most relevant and most popular documents while satisfying content and profile diversity constraints. Our approach includes a family of techniques allowing to efficiently retrieve the desired documents. To evaluate our solution, we have run intensive experiments, including a user survey, on three datasets; in more than 75% of the cases, profile diversity is similar or preferred by the users compared to other approaches. Additionally our optimization techniques enable to reduce the response time up to 12 times compared to a baseline greedy diversification algorithm. HighlightsWe propose a specific scoring function for content and profile diversification using a probabilistic model.We propose a greedy threshold-based top-k algorithm to process queries using our profile diversity score using the concept of candidate list.We propose various techniques for optimizing the computation of top-k diversified profiles.To evaluate the benefits of our scoring function and optimization techniques, we ran our algorithms using three datasets: two from Del.icio.us and one from Flickr. The results show that our approach increases the overall quality of recommendations and that our optimizing strategies reduce significantly the response time of the diversified top-k computation
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    4
    Citations
    NaN
    KQI
    []