Size and Source Matter: Understanding Inconsistencies in Test Collection-Based Evaluation

Timothy Jones,Andrew Turpin,Stefano Mizzaro,Falk Scholer,Mark Sanderson

Size and Source Matter: Understanding Inconsistencies in Test Collection-Based Evaluation

2014

Past work showed that significant inconsistencies between retrieval results occurred on different test collections, even when one of the test collections contained only a subset of the documents in the other. However, the experimental methodologies in that paper made it hard to determine the cause of the inconsistencies. Using a novel methodology that eliminates the problems with uneven distribution of relevant documents, we confirm that observing a statistically significant improvement between two IR systems can be strongly influenced by the choice of documents in the test collection. We investigate two possible causes of this problem of test collections. Our results show that collection size and document source have a strong influence in the way that a test collection will rank one retrieval system relative to another. This is of particular interest when constructing test collections, as we show that using different subsets of a collection produces differing evaluation results.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations