Evaluating the Predictivity of IR Experiments

2021 
Experimental evaluation is regarded as a critical element of any research activity in Information Retrieval, and is typically used to support assertions of the form "Technique A provides better retrieval effectiveness than does Technique B". Implicit in such claims are the characteristics of the data to which the results apply, in terms of both the queries used and the documents they were applied to. Here we explore the role of evaluation on a collection as a prediction of relative performance on collections that have different characteristics. In particular, by synthesizing new collections that vary from each other in a controlled way, we show that it is possible to explore the reliability of an IR evaluation pipeline, and to better understand the complex interrelationship between documents, queries, and metrics that is an important part of any experimental validation. Our results show that predictivity declines as the collection is varied, even in simple ways such as shifting in focus from one document source to another similar source.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    0
    Citations
    NaN
    KQI
    []