Skip to main content

2011 | Buch

Learning to Rank for Information Retrieval

insite
SUCHEN

Über dieses Buch

Due to the fast growth of the Web and the difficulties in finding desired information, efficient and effective information retrieval systems have become more important than ever, and the search engine has become an essential tool for many people.

The ranker, a central component in every search engine, is responsible for the matching between processed queries and indexed documents. Because of its central role, great attention has been paid to the research and development of ranking technologies. In addition, ranking is also pivotal for many other information retrieval applications, such as collaborative filtering, definition ranking, question answering, multimedia retrieval, text summarization, and online advertisement. Leveraging machine learning technologies in the ranking process has led to innovative and more effective ranking models, and eventually to a completely new research area called “learning to rank”.

Liu first gives a comprehensive review of the major approaches to learning to rank. For each approach he presents the basic framework, with example algorithms, and he discusses its advantages and disadvantages. He continues with some recent advances in learning to rank that cannot be simply categorized into the three major approaches – these include relational ranking, query-dependent ranking, transfer ranking, and semisupervised ranking. His presentation is completed by several examples that apply these technologies to solve real information retrieval problems, and by theoretical discussions on guarantees for ranking performance.

This book is written for researchers and graduate students in both information retrieval and machine learning. They will find here the only comprehensive description of the state of the art in a field that has driven the recent advances in search engine development.

Inhaltsverzeichnis

Frontmatter

Overview of Learning to Rank

Frontmatter
Chapter 1. Introduction
Abstract
In this chapter, we give a brief introduction to learning to rank for information retrieval. Specifically, we first introduce the ranking problem by taking document retrieval as an example. Second, conventional ranking models proposed in the literature of information retrieval are reviewed, and widely used evaluation measures for ranking are mentioned. Third, the motivation of using machine learning technology to solve the problem of ranking is given, and existing learning-to-rank algorithms are categorized and briefly depicted.
Tie-Yan Liu

Major Approaches to Learning to Rank

Frontmatter
Chapter 2. The Pointwise Approach
Abstract
In this chapter, we introduce the pointwise approach to learning to rank. Specifically, we will cover the regression-based algorithms, classification-based algorithms, and ordinal regression-based algorithms, and then make discussions on their advantages and disadvantages.
Tie-Yan Liu
Chapter 3. The Pairwise Approach
Abstract
In this chapter we will introduce the pairwise approach to learning to rank. Specifically we first introduce several example algorithms, whose major differences are in the loss functions. Then we discuss the limitations of these algorithms and present some improvements that enable better ranking performance.
Tie-Yan Liu
Chapter 4. The Listwise Approach
Abstract
In this chapter, we introduce the listwise approach to learning to rank. Specifically, we first introduce those listwise ranking algorithms that minimize measure-specific loss functions (also referred to as direct optimization methods), and then introduce some other algorithms whose loss functions are not directly related to evaluation measures for information retrieval.
Tie-Yan Liu
Chapter 5. Analysis of the Approaches
Abstract
In this chapter, we introduce the analysis on the major approaches to learning to rank, which looks at the relationship between their loss functions and widely used evaluation measures. Basically, for many state-of-the-art ranking methods, it has been proven that their loss functions can upper bound measure-based ranking errors. Therefore the minimization of these loss functions can lead to the optimization of the evaluation measures in certain situations.
Tie-Yan Liu

Advanced Topics in Learning to Rank

Frontmatter
Chapter 6. Relational Ranking
Abstract
In this chapter, we introduce a novel task for learning to rank, which does not only consider the properties of each individual document in the ranking process, but also considers the inter-relationship between documents. According to different relationships (e.g., similarity, preference, and dissimilarity), the task may correspond to different real applications (e.g., pseudo relevance feedback, topic distillation, and search result diversification). Several approaches to solve this new task are reviewed in this chapter, and future research directions along this line are discussed.
Tie-Yan Liu
Chapter 7. Query-Dependent Ranking
Abstract
In this chapter, we introduce query-dependent ranking. Considering the large differences between different queries, it might not be the best choice to use a single ranking function to deal with all kinds of queries. Instead, one may achieve performance gain by leveraging the query differences. To consider the query difference in training, one can use a query-dependent loss function. To further consider the query difference in the test process, a query-dependent ranking function is needed. Several ways of learning a query-dependent ranking function are reviewed in this chapter, including query classification-based approach, query clustering-based approach, nearest neighbor-based approach, and two-layer learning-based approach. Discussions are also made on the future research directions along this line.
Tie-Yan Liu
Chapter 8. Semi-supervised Ranking
Abstract
In this chapter, we introduce semi-supervised learning for ranking. The motivation of this topic comes from the fact that we can always collect a large number of unlabeled documents or queries at a low cost. It would be very helpful if one can leverage such unlabeled data in the learning-to-rank process. In this chapter, we mainly review a transductive approach and an inductive approach to this task, and discuss how to improve these approaches by taking the unique properties of ranking into consideration.
Tie-Yan Liu
Chapter 9. Transfer Ranking
Abstract
In this chapter, we introduce transfer learning for ranking, or transfer ranking for short. Transfer ranking is a task to transfer the knowledge contained in one learning-to-rank dataset or problem to another learning-to-rank dataset or problem. This is a typical task in real ranking applications, e.g., transferring from an old training set to a new one, or transferring from one market to another. In this chapter, we will briefly review the feature-level transfer ranking approach and the instance-level transfer ranking approach, and discuss the future research directions along this line.
Tie-Yan Liu

Benchmark Datasets for Learning to Rank

Frontmatter
Chapter 10. The LETOR Datasets
Abstract
In this chapter, we introduce the LETOR benchmark datasets, including the following aspects: document corpora (together with query sets), document sampling, feature extraction, meta information, cross validation, and major ranking tasks supported.
Tie-Yan Liu
Chapter 11. Experimental Results on LETOR
Abstract
In this chapter, we take the official evaluation results published at the LETOR website as the source to perform discussions on the performances of different learning-to-rank methods.
Tie-Yan Liu
Chapter 12. Other Datasets
Abstract
In this chapter, we introduce two new benchmark datasets, released by Yahoo and Microsoft. These datasets originate from the training data used in commercial search engines and are much larger than the LETOR datasets in terms of both number of queries and number of documents per query.
Tie-Yan Liu

Practical Issues in Learning to Rank

Frontmatter
Chapter 13. Data Preprocessing for Learning to Rank
Abstract
This chapter is concerned with data processing for learning to rank. In order to learn an effective ranking model, the first step is to prepare high-quality training data. There are several important issues to be considered regarding the training data. First, it should be considered how to get the data labeled on a large scale but at a low cost. Click-through log mining is one of the feasible approaches for this purpose. Second, since the labeled data are not always correct and effective, selection of the queries and documents, as well as their features should also be considered. In this chapter, we will review several pieces of previous work on these topics, and also make discussions on the future work.
Tie-Yan Liu
Chapter 14. Applications of Learning to Rank
Abstract
In this chapter, we introduce some applications of learning to rank. The major purpose is to demonstrate how to use an existing learning-to-rank algorithm to solve a real ranking problem. In particular, we will take question answering, multimedia retrieval, text summarization, online advertising, etc. as examples, for illustration. One will see from these examples that the key step is to extract effective features for the objects to be ranked by considering the unique properties of the application, and to prepare a set of training data. Then it becomes straightforward to train a ranking model from the data and use it for ranking new objects.
Tie-Yan Liu

Theories in Learning to Rank

Frontmatter
Chapter 15. Statistical Learning Theory for Ranking
Abstract
In this chapter, we introduce the statistical learning theory for ranking. In order to better understand existing learning-to-rank algorithms, and to design better algorithms, it is very helpful to deeply understand their theoretical properties. In this chapter, we give the big picture of theoretical analysis for ranking, and point out several important issues to be investigated: statistical ranking framework, generalization ability, and statistical consistency for ranking methods.
Tie-Yan Liu
Chapter 16. Statistical Ranking Framework
Abstract
In this chapter, we introduce the statistical ranking framework. In order to analyze the theoretical properties of learning-to-rank methods, the very first step is to establish the right probabilistic context for the analysis. This is just what the statistical ranking framework addresses. In this chapter we will show three ranking frameworks used in the literature of learning to rank, i.e., the document ranking framework, the subset ranking framework, and the two-layer ranking framework. The discussions in this chapter set the stage for further discussions on generalization ability and statistical consistency in the following chapters.
Tie-Yan Liu
Chapter 17. Generalization Analysis for Ranking
Abstract
In this chapter, we introduce the generalization analysis on learning-to-rank methods. In particular, we first introduce the uniform generalization bounds and then the algorithm-dependent generalization bounds. The uniform bounds hold for any ranking function in a given function class. The algorithm-dependent bounds instead consider the specific ranking function learned by the given algorithm, thus can usually be tighter. The bounds introduced in this chapter are derived under different ranking frameworks, and can explain behaviors of different learning-to-rank algorithms. We also show the limitations of existing analyses and discuss how to improve them in future work.
Tie-Yan Liu
Chapter 18. Statistical Consistency for Ranking
Abstract
In this chapter, we introduce the statistical consistency of learning-to-rank methods. In particular, we will introduce the existing results on statistical consistency under different ranking frameworks, and with respect to different true losses, e.g., the pairwise 0–1 loss, the permutation-level 0–1 loss, the top-k loss, and the weighted Kendall’s τ loss. Then we will make discussions on these results and point out the ways of further improving them.
Tie-Yan Liu

Summary and Outlook

Frontmatter
Chapter 19. Summary
Abstract
In this chapter, we summarize the entire book. In particular, we show the example algorithms introduced in this book in a figure. We then provide the answers to several important questions regarding learning to rank raised at the beginning of the book.
Tie-Yan Liu
Chapter 20. Future Work
Abstract
In this chapter, we discuss the possible future work on learning to rank. In particular, we show some potential research topics along the following directions: sample selection bias, direct learning from logs, feature engineering, advanced ranking models, large-scale learning to rank, online complexity, robust learning to rank, and online learning to rank. At the end of this chapter, we will make brief discussions on the new scenarios beyond ranking, which seems to be the future trend of search. Algorithmic and theoretical discussions on the new scenario may lead to another promising research direction.
Tie-Yan Liu

Appendix

Frontmatter
Chapter 21. Mathematical Background
Abstract
In this chapter, we introduce some mathematical background that can assist the readers to better understand the book. In particular, we first introduce the probability theory, including probability space, random variables, probability distributions, expectations and variances. Then we provide some basics of linear algebra and matrix computation, such as the matrix operations and properties, eigenvalues and eigenvectors. After that, we will briefly review convex optimization, which covers the concepts of convex set and convex function, the conditions for convexity, Lagrangian duality, and the KTT conditions.
Tie-Yan Liu
Chapter 22. Machine Learning
Abstract
In this chapter, we introduce some basics of machine learning. Specifically, we first give a brief introduction to regression and classification, which are typical machine learning tasks studied in the literature. To this end, we take linear regression, neural networks, support vector machines, Boosting, and K nearest neighbors as examples, introduce their basic ideas, and discuss their properties. Many of these algorithms have been extended to handle the problem of learning to rank. Then, we move onto the statistical learning theory, which is concerned with the theoretical guarantee of the generalization of a machine learning algorithm from the training set to the unseen test set. Familiarity with the notations and theorems mentioned in this chapter will be very helpful for the readers to understand different learning to rank algorithms and the statistical learning theory for ranking.
Tie-Yan Liu
Backmatter
Metadaten
Titel
Learning to Rank for Information Retrieval
verfasst von
Tie-Yan Liu
Copyright-Jahr
2011
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-14267-3
Print ISBN
978-3-642-14266-6
DOI
https://doi.org/10.1007/978-3-642-14267-3

Neuer Inhalt