research-article

Learning to Rank from Noisy Data

Authors:
Wenkui Ding

Department of Electronic Engineering, Tsinghua University

Department of Electronic Engineering, Tsinghua University
View Profile

,
Xiubo Geng

Yahoo! Labs Beijing

Yahoo! Labs Beijing
View Profile

,
Xu-Dong Zhang

Department of Electronic Engineering, Tsinghua University

Department of Electronic Engineering, Tsinghua University
View Profile

ACM Transactions on Intelligent Systems and Technology Volume 7 Issue 1Article No.: 1pp 1–21https://doi.org/10.1145/2576230

Published:07 October 2015Publication History

ACM Transactions on Intelligent Systems and Technology

Abstract

Learning to rank, which learns the ranking function from training data, has become an emerging research area in information retrieval and machine learning. Most existing work on learning to rank assumes that the training data is clean, which is not always true, however. The ambiguity of query intent, the lack of domain knowledge, and the vague definition of relevance levels all make it difficult for common annotators to give reliable relevance labels to some documents. As a result, the relevance labels in the training data of learning to rank usually contain noise. If we ignore this fact, the performance of learning-to-rank algorithms will be damaged.

In this article, we propose considering the labeling noise in the process of learning to rank and using a two-step approach to extend existing algorithms to handle noisy training data. In the first step, we estimate the degree of labeling noise for a training document. To this end, we assume that the majority of the relevance labels in the training data are reliable and we use a graphical model to describe the generative process of a training query, the feature vectors of its associated documents, and the relevance labels of these documents. The parameters in the graphical model are learned by means of maximum likelihood estimation. Then the conditional probability of the relevance label given the feature vector of a document is computed. If the probability is large, we regard the degree of labeling noise for this document as small; otherwise, we regard the degree as large. In the second step, we extend existing learning-to-rank algorithms by incorporating the estimated degree of labeling noise into their loss functions. Specifically, we give larger weights to those training documents with smaller degrees of labeling noise and smaller weights to those with larger degrees of labeling noise. As examples, we demonstrate the extensions for McRank, RankSVM, RankBoost, and RankNet. Empirical results on benchmark datasets show that the proposed approach can effectively distinguish noisy documents from clean ones, and the extended learning-to-rank algorithms can achieve better performances than baselines.

References

D. Angluin and P. Laird. 1988. Learning from noisy examples. Machine Learning 2, 4, 343--370. Google ScholarDigital Library
Ricardo Baeza-Yates and Berthier Ribeiro-Neto. 1999. Modern Information Retrieval. Addison Wesley, New York, NY. Google ScholarDigital Library
R. Baeza-Yates, B. Ribeiro-Neto, and others. 1999. Modern information retrieval. Addison-Wesley, Reading, MA. Google ScholarDigital Library
C. E. Brodley and M. A. Friedl. 1999. Identifying mislabeled training data. Journal of Artificial Intelligence Research 11, 131--167.Google ScholarCross Ref
C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine Learning. ACM, New York, NY, 89--96. Google ScholarDigital Library
Y. Cao, J. Xu, T. Y. Liu, H. Li, Y. Huang, and H. W. Hon. 2006. Adapting ranking SVM to document retrieval. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 193. Google ScholarDigital Library
Z. Cao, T. Qin, T. Y. Liu, M. F. Tsai, and H. Li. 2007. Learning to rank: From pairwise approach to listwise approach. In Proceedings of the 24th International Conference on Machine Learning. ACM, New York, NY, 129--136. Google ScholarDigital Library
V. R. Carvalho, J. L. Elsas, W. W. Cohen, and J. G. Carbonell. 2008. A meta-learning approach for robust rank learning. In SIGIR 2008 Workshop on Learning to Rank for Information Retrieval, Vol. 1. ACM, New York, NY.Google Scholar
O. Chapelle and Y. Chang. 2011. Yahoo&excl; learning to rank challenge overview. Journal of Machine Learning Research-Proceedings Track 14, 1--24.Google Scholar
C. L. Clarke, N. Craswell, and I. Soboroff. 2009. Overview of the trec 2009 web track. Technical Report. DTIC Document.Google Scholar
S. Cronen-Townsend and W. B. Croft. 2002. Quantifying query ambiguity. In Proceedings of the 2nd International Conference on Human Language Technology Research. Morgan Kaufmann Publishers Inc., Burlington, MA, 109. Google ScholarDigital Library
K. Duh and K. Kirchhoff. 2008. Learning to rank with partially-labeled data. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 251--258. Google ScholarDigital Library
Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. 2003. An efficient boosting algorithm for combining preferences. The Journal of Machine Learning Research 4, 933--969. Google ScholarDigital Library
X. Geng, T. Qin, T.-Y. Liu, and X.-Q. Cheng. 2012. A noise-tolerant graphical model for ranking. Information Processing & Management 48, 2, 374--383. Google ScholarDigital Library
E. F. Harrington. 2003. Online ranking/collaborative filtering using the perceptron algorithm. In Proceedings of the 24th International Conference on Machine Learning, Vol. 20. 250.Google Scholar
L. Hong, R. Bekkerman, J. Adler, and B. D. Davison. 2012. Learning to rank social update streams. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 651--660. Google ScholarDigital Library
K. Järvelin and J. Kekäläinen. 2000. IR evaluation methods for retrieving highly relevant documents. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 41--48. Google ScholarDigital Library
K. Järvelin and J. Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20, 4, 422--446. Google ScholarDigital Library
F. Jelinek. 1980. Interpolated estimation of Markov source parameters from sparse data. In Proceedings of the Workshop on Pattern Recognition in Practice. Amsterdam, The Netherlands.Google Scholar
T. Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 133--142. Google ScholarDigital Library
T. Joachims. 2003. Transductive learning via spectral graph partitioning. In Proceedings of the 20th International Conference on Machine Learning (ICML’03), Vol. 3. 290--297.Google Scholar
N. D. Lawrence and B. Schölkopf. 2001. Estimating a kernel Fisher discriminant in the presence of label noise. In Proceedings of the 18th International Conference on Machine Learning. Morgan Kaufmann Publishers Inc., Burlington, MA, 306--313. Google ScholarDigital Library
P. Li, C. Burges, Q. Wu, J. C. Platt, D. Koller, Y. Singer, and S. Roweis. 2007a. McRank: Learning to rank using multiple classification and gradient boosting. Advances in Neural Information Processing Systems.Google Scholar
Y. Li, L. F. A. Wessels, D. de Ridder, and M. J. T. Reinders. 2007b. Classification in the presence of class noise using a probabilistic kernel Fisher method. Pattern Recognition 40, 12, 3349--3357. Google ScholarDigital Library
T. Y. Liu. 2009. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval 3, 3, 225--331. Google ScholarDigital Library
T. Y. Liu, J. Xu, T. Qin, W. Xiong, and H. Li. 2007. Letor: Benchmark dataset for research on learning to rank for information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval Workshop. 3--10.Google Scholar
S. Niu, J. Guo, Y. Lan, and X. Cheng. 2012. Top-k learning to rank: Labeling, ranking and evaluation. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 751--760. Google ScholarDigital Library
U. Ozertem, O. Chapelle, P. Donmez, and E. Velipasaoglu. 2012. Learning to suggest: A machine learning framework for ranking query suggestions. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 25--34. Google ScholarDigital Library
J. C. Platt. 1998. Sequential minimal optimization: A fast algorithm for training support vector machines. Technical Report MSR-TR-98-14. Microsoft Research.Google Scholar
T. Qin, T. Y. Liu, W. Ding, X. Jun, and H. Li. 2010. Microsoft Learning to Rank Datasets. Retrieved September 7, 2015 from http://research.microsoft.com/en-us/projects/mslr/.Google Scholar
Stephen E. Robertson and Steve Walker. 1994. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Springer-Verlag, New York, Inc., 232--241. Google ScholarDigital Library
A. Severyn and A. Moschitti. 2012. Structural relationships for large-scale learning of answer re-ranking. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 741--750. Google ScholarDigital Library
R. J. Vanderbei. 1999. LOQO: An interior point code for quadratic programming. Optimization methods and software 11, 1--4, 451--484.Google Scholar
L. Wang, P. N. Bennett, and K. Collins-Thompson. 2012. Robust ranking models via risk-sensitive optimization. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 761--770. Google ScholarDigital Library
J. Xu, C. Chen, G. Xu, and H. Li. 2010. Improving quality of training data for learning to rank using click-through data. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. ACM, New York, NY. Google ScholarDigital Library
Y. Yue, T. Finley, F. Radlinski, and T. Joachims. 2007. A support vector method for optimizing average precision. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 271--278. Google ScholarDigital Library
C. Zhai and J. Lafferty. 2004. A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems 22, 2, 214. Google ScholarDigital Library
H. Zhang. 2004. The optimality of naive Bayes. In Proceedings of the 17th Florida Artificial Intelligence Research Society Conference. 562--567.Google Scholar

Index Terms

Learning to Rank from Noisy Data
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Learning to rank

Recommendations

How to handle noisy labels for robust learning from uncertainty
Abstract
Most deep neural networks (DNNs) are trained with large amounts of noisy labels when they are applied. As DNNs have the high capacity to fit any noisy labels, it is known to be difficult to train DNNs robustly with noisy labels. These ...
Read More
Training Data Optimization for Pairwise Learning to Rank
ICTIR '20: Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval

This paper studies data optimization for Learning to Rank (LtR), by dropping training labels to increase ranking accuracy. Our work is inspired by data dropout, showing some training data do not positively influence learning and are better dropped out, ...
Read More
Learning to rectify for robust learning with noisy labels
Highlights
- Our WarPI is the first probabilistic model to resolve label noise within the meta-learning scenario.
Abstract
Label noise significantly degrades the generalization ability of deep models in applications. Effective strategies and approaches (e.g., re-weighting or loss correction) are designed to alleviate the negative impact of label noise when ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Intelligent Systems and Technology Volume 7, Issue 1
October 2015
293 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/2830012
Editor:
Yu Zheng
Microsoft Research, China
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 October 2015
- Accepted: 1 December 2013
- Revised: 1 October 2013
- Received: 1 June 2013
Published in tist Volume 7, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Noisy data
robust learning
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 508
  Total Downloads
- Downloads (Last 12 months)37
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Learning to Rank from Noisy Data

ACM Transactions on Intelligent Systems and Technology

Abstract

References

Cited By

Index Terms

Recommendations

How to handle noisy labels for robust learning from uncertainty

Training Data Optimization for Pairwise Learning to Rank

Learning to rectify for robust learning with noisy labels