Correlation, Prediction and Ranking of Evaluation Metrics in Information Retrieval.

2019 
Given limited time and space, IR studies often report few evaluation metrics which must be carefully selected. To inform such selection, we first quantify correlation between 23 popular IR metrics on 8 TREC test collections. Next, we investigate prediction of unreported metrics: given 1–3 metrics, we assess the best predictors for 10 others. We show that accurate prediction of MAP, P@10, and RBP can be achieved using 2–3 other metrics. We further explore whether high-cost evaluation measures can be predicted using low-cost measures. We show RBP(p = 0.95) at cutoff depth 1000 can be accurately predicted given measures computed at depth 30. Lastly, we present a novel model for ranking evaluation metrics based on covariance, enabling selection of a set of metrics that are most informative and distinctive. A greedy-forward approach is guaranteed to yield sub-modular results, while an iterative-backward method is empirically found to achieve the best results.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    50
    References
    4
    Citations
    NaN
    KQI
    []