Future directions in learning to rank

Citation

Reference

Related Paper

Citation Trend

Abstract:

The results of the learning to rank challenge showed that the quality of the predictions from the top competitors are very close from each other. This raises a question: is learning to rank a solved problem? On the on hand, it is likely that only small incremental progress can be made in the core and traditional problematics of learning to rank. The challenge was set in this standard learning to rank scenario: optimize a ranking measure on a test set. But on the other hand, there are a lot of related questions and settings in learning to rank that have not been yet fully explored. We review some of them in this paper and hope that researchers interested in learning to rank will try to answer these challenging and exciting research questions.

Keywords:

Rank (graph theory)

Learning to Rank

Mean reciprocal rank

Competitor analysis

Topics:

Multi-Criteria Decision Making

Forecasting Techniques and Applications

Bayesian Modeling and Causal Inference

Source

Cite

Large Margin Rank Boundaries for Ordinal Regression

The MIT Press eBooks (2000)

Ralf Herbrich Thore Graepel Klaus Obermayer

Margin (machine learning)

Ordinal optimization

Rank (graph theory)

10.7551/mitpress/1113.003.0010

Cite

Citations (1,103)

Learning to Rank with Nonsmooth Cost Functions

Advances in neural information processing systems (2007)

Christopher J. C. Burges Robert Ragno Quoc V. Le

The quality measures used in information retrieval are particularly difficult to optimize directly, since they depend on the model scores only through the sorted order of the documents returned for a given query.Thus, the derivatives of the cost with respect to the model parameters are either zero, or are undefined.In this paper, we propose a class of simple, flexible algorithms, called LambdaRank, which avoids these difficulties by working with implicit cost functions.We describe LambdaRank using neural network models, although the idea applies to any differentiable function class.We give necessary and sufficient conditions for the resulting implicit cost function to be convex, and we show that the general method has a simple mechanical interpretation.We demonstrate significantly improved accuracy, over a state-of-the-art ranking algorithm, on several datasets.We also show that LambdaRank provides a method for significantly speeding up the training phase of that ranking algorithm.Although this paper is directed towards ranking, the proposed method can be extended to any non-smooth and multivariate cost functions.

Rank (graph theory)

10.7551/mitpress/7503.003.0029

Cite

Citations (698)

From RankNet to LambdaRank to LambdaMART: An Overview

Christopher J. C. Burges

LambdaMART is the boosted tree version of LambdaRank, which is based on RankNet. RankNet, LambdaRank, and LambdaMART have proven to be very successful algorithms for solving real world ranking problems: for example an ensemble of LambdaMART rankers won Track 1 of the 2010 Yahoo! Learning To Rank Challenge. The details of these algorithms are spread across several papers and reports, and so here we give a self-contained, detailed and complete description of them.

Learning to Rank

Rank (graph theory)

Source

Cite

Citations (1,043)

Expected reciprocal rank for graded relevance

Olivier Chapelle Donald Metlzer Ya Zhang Pierre Grinspan

While numerous metrics for information retrieval are available in the case of binary relevance, there is only one commonly used metric for graded relevance, namely the Discounted Cumulative Gain (DCG). A drawback of DCG is its additive nature and the underlying independence assumption: a document in a given position has always the same gain and discount independently of the documents shown above it. Inspired by the "cascade" user model, we present a new editorial metric for graded relevance which overcomes this difficulty and implicitly discounts documents which are shown below very relevant documents. More precisely, this new metric is defined as the expected reciprocal length of time that the user will take to find a relevant document. This can be seen as an extension of the classical reciprocal rank to the graded relevance case and we call this metric Expected Reciprocal Rank (ERR). We conduct an extensive evaluation on the query logs of a commercial search engine and show that ERR correlates better with clicks metrics than other editorial metrics.

Reciprocal

Relevance

Rank (graph theory)

Mean reciprocal rank

Independence

10.1145/1645953.1646033

Cite

Citations (796)

Adapting ranking SVM to document retrieval

Yunbo Cao Jun Xu Tie‐Yan Liu Hang Li Yalou Huang

The paper is concerned with applying learning to rank to document retrieval. Ranking SVM is a typical method of learning to rank. We point out that there are two factors one must consider when applying Ranking SVM, in general a "learning to rank" method, to document retrieval. First, correctly ranking documents on the top of the result list is crucial for an Information Retrieval system. One must conduct training in a way that such ranked results are accurate. Second, the number of relevant documents can vary from query to query. One must avoid training a model biased toward queries with a large number of relevant documents. Previously, when existing methods that include Ranking SVM were applied to document retrieval, none of the two factors was taken into consideration. We show it is possible to make modifications in conventional Ranking SVM, so it can be better used for document retrieval. Specifically, we modify the "Hinge Loss" function in Ranking SVM to deal with the problems described above. We employ two methods to conduct optimization on the loss function: gradient descent and quadratic programming. Experimental results show that our method, referred to as Ranking SVM for IR, can outperform the conventional Ranking SVM and other existing methods for document retrieval on two datasets.

Ranking SVM

Learning to Rank

Hinge loss

Rank (graph theory)

10.1145/1148170.1148205

Cite

Citations (555)

Learning to rank using gradient descent

Chris Burges Tal Shaked Erin Renshaw Ari Lazier Matt Deeds

We investigate using gradient descent methods for learning ranking functions; we propose a simple probabilistic cost function, and we introduce RankNet, an implementation of these ideas using a neural network to model the underlying ranking function. We present test results on toy data and on data from a commercial internet search engine.

Stochastic Gradient Descent

Learning to Rank

Rank (graph theory)

Descent (aeronautics)

10.1145/1102351.1102363

Cite

Citations (2,643)

McRank: Learning to Rank Using Multiple Classification and Gradient Boosting

Ping Li Qiang Wu Christopher J. C. Burges

We cast the ranking problem as (1) multiple classification (Mc) (2) multiple ordinal classification, which lead to computationally tractable learning algorithms for relevance ranking in Web search. We consider the DCG criterion (discounted cumulative gain), a standard quality measure in information retrieval. Our approach is motivated by the fact that perfect classifications result in perfect DCG scores and the DCG errors are bounded by classification errors. We propose using the Expected Relevance to convert class probabilities into ranking scores. The class probabilities are learned using a gradient boosting tree algorithm. Evaluations on large-scale datasets show that our approach can improve LambdaRank [5] and the regressions-based ranker [6], in terms of the (normalized) DCG scores. An efficient implementation of the boosting tree algorithm is also presented.

Boosting

Gradient boosting

Learning to Rank

Cite

Citations (426)

Greedy function approximation: A gradient boosting machine.

The Annals of Statistics (2001)

Jerome H. Friedman

Function estimation/approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest-descent minimization. A general gradient descent “boosting” paradigm is developed for additive expansions based on any fitting criterion.Specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification. Special enhancements are derived for the particular case where the individual additive components are regression trees, and tools for interpreting such “TreeBoost” models are presented. Gradient boosting of regression trees produces competitive, highly robust, interpretable procedures for both regression and classification, especially appropriate for mining less than clean data. Connections between this approach and the boosting methods of Freund and Shapire and Friedman, Hastie and Tibshirani are discussed.

Gradient boosting

Boosting

Minification

10.1214/aos/1013203451

Cite

Citations (22,716)

Cumulated gain-based evaluation of IR techniques

ACM transactions on office information systems (2002)

Kalervo Järvelin Jaana Kekäläinen

Modern large retrieval environments tend to overwhelm their users by their large output. Since all documents are not of equal relevance to their users, highly relevant documents should be identified and ranked first for presentation. In order to develop IR techniques in this direction, it is necessary to develop evaluation approaches and methods that credit IR methods for their ability to retrieve highly relevant documents. This can be done by extending traditional evaluation methods, that is, recall and precision based on binary relevance judgments, to graded relevance judgments. Alternatively, novel measures based on graded relevance judgments may be developed. This article proposes several novel measures that compute the cumulative gain the user obtains by examining the retrieval result up to a given ranked position. The first one accumulates the relevance scores of retrieved documents along the ranked result list. The second one is similar but applies a discount factor to the relevance scores in order to devaluate late-retrieved documents. The third one computes the relative-to-the-ideal performance of IR techniques, based on the cumulative gain they are able to yield. These novel measures are defined and discussed and their use is demonstrated in a case study using TREC data: sample system run results for 20 queries in TREC-7. As a relevance base we used novel graded relevance judgments on a four-point scale. The test results indicate that the proposed measures credit IR methods for their ability to retrieve highly relevant documents and allow testing of statistical significance of effectiveness differences. The graphs based on the measures also provide insight into the performance IR techniques and allow interpretation, for example, from the user point of view.

Relevance

10.1145/582415.582418

Cite

Citations (4,301)

A support vector method for optimizing average precision

Yisong Yue Thomas Finley Filip Radlinski Thorsten Joachims

Machine learning is commonly used to improve ranked retrieval systems. Due to computational difficulties, few learning techniques have been developed to directly optimize for mean average precision (MAP), despite its widespread use in evaluating such systems. Existing approaches optimizing MAP either do not find a globally optimal solution, or are computationally expensive. In contrast, we present a general SVM learning algorithm that efficiently finds a globally optimal solution to a straightforward relaxation of MAP. We evaluate our approach using the TREC 9 and TREC 10 Web Track corpora (WT10g), comparing against SVMs optimized for accuracy and ROCArea. In most cases we show our method to produce statistically significant improvements in MAP scores.

10.1145/1277741.1277790

Cite

Citations (719)