Offline Comparative Evaluation with Incremental, Minimally-Invasive Online Feedback

2018 
We investigate the use of logged user interaction data---queries and clicks---for offline evaluation of new search systems in the context of counterfactual analysis. The challenge of evaluating a new ranker against log data collected from a static production ranker is that new rankers may retrieve documents that have never been seen in the logs before, and thus lack any logged feedback from users. Additionally, the ranker itself could bias user actions such that even documents that have been seen in the logs would have exhibited different interaction patterns had they been retrieved and ranked by the new ranker. We present a methodology for incrementally logging interactions on previously-unseen documents for use in computation of an unbiased estimator of a new ranker's effectiveness. Our method is very lightly invasive with respect to the production ranker results to insure against users becoming dissatisfied if the new ranker is poor. We demonstrate how well our methods work in a simulation environment designed to be challenging for such methods to argue that they are likely to work in a wide variety of scenarios.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    37
    References
    16
    Citations
    NaN
    KQI
    []