Abstract
Learning to rank, which learns the ranking function from training data, has become an emerging research area in information retrieval and machine learning. Most existing work on learning to rank assumes that the training data is clean, which is not always true, however. The ambiguity of query intent, the lack of domain knowledge, and the vague definition of relevance levels all make it difficult for common annotators to give reliable relevance labels to some documents. As a result, the relevance labels in the training data of learning to rank usually contain noise. If we ignore this fact, the performance of learning-to-rank algorithms will be damaged.
In this article, we propose considering the labeling noise in the process of learning to rank and using a two-step approach to extend existing algorithms to handle noisy training data. In the first step, we estimate the degree of labeling noise for a training document. To this end, we assume that the majority of the relevance labels in the training data are reliable and we use a graphical model to describe the generative process of a training query, the feature vectors of its associated documents, and the relevance labels of these documents. The parameters in the graphical model are learned by means of maximum likelihood estimation. Then the conditional probability of the relevance label given the feature vector of a document is computed. If the probability is large, we regard the degree of labeling noise for this document as small; otherwise, we regard the degree as large. In the second step, we extend existing learning-to-rank algorithms by incorporating the estimated degree of labeling noise into their loss functions. Specifically, we give larger weights to those training documents with smaller degrees of labeling noise and smaller weights to those with larger degrees of labeling noise. As examples, we demonstrate the extensions for McRank, RankSVM, RankBoost, and RankNet. Empirical results on benchmark datasets show that the proposed approach can effectively distinguish noisy documents from clean ones, and the extended learning-to-rank algorithms can achieve better performances than baselines.
- D. Angluin and P. Laird. 1988. Learning from noisy examples. Machine Learning 2, 4, 343--370. Google ScholarDigital Library
- Ricardo Baeza-Yates and Berthier Ribeiro-Neto. 1999. Modern Information Retrieval. Addison Wesley, New York, NY. Google ScholarDigital Library
- R. Baeza-Yates, B. Ribeiro-Neto, and others. 1999. Modern information retrieval. Addison-Wesley, Reading, MA. Google ScholarDigital Library
- C. E. Brodley and M. A. Friedl. 1999. Identifying mislabeled training data. Journal of Artificial Intelligence Research 11, 131--167.Google ScholarCross Ref
- C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine Learning. ACM, New York, NY, 89--96. Google ScholarDigital Library
- Y. Cao, J. Xu, T. Y. Liu, H. Li, Y. Huang, and H. W. Hon. 2006. Adapting ranking SVM to document retrieval. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 193. Google ScholarDigital Library
- Z. Cao, T. Qin, T. Y. Liu, M. F. Tsai, and H. Li. 2007. Learning to rank: From pairwise approach to listwise approach. In Proceedings of the 24th International Conference on Machine Learning. ACM, New York, NY, 129--136. Google ScholarDigital Library
- V. R. Carvalho, J. L. Elsas, W. W. Cohen, and J. G. Carbonell. 2008. A meta-learning approach for robust rank learning. In SIGIR 2008 Workshop on Learning to Rank for Information Retrieval, Vol. 1. ACM, New York, NY.Google Scholar
- O. Chapelle and Y. Chang. 2011. Yahoo! learning to rank challenge overview. Journal of Machine Learning Research-Proceedings Track 14, 1--24.Google Scholar
- C. L. Clarke, N. Craswell, and I. Soboroff. 2009. Overview of the trec 2009 web track. Technical Report. DTIC Document.Google Scholar
- S. Cronen-Townsend and W. B. Croft. 2002. Quantifying query ambiguity. In Proceedings of the 2nd International Conference on Human Language Technology Research. Morgan Kaufmann Publishers Inc., Burlington, MA, 109. Google ScholarDigital Library
- K. Duh and K. Kirchhoff. 2008. Learning to rank with partially-labeled data. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 251--258. Google ScholarDigital Library
- Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. 2003. An efficient boosting algorithm for combining preferences. The Journal of Machine Learning Research 4, 933--969. Google ScholarDigital Library
- X. Geng, T. Qin, T.-Y. Liu, and X.-Q. Cheng. 2012. A noise-tolerant graphical model for ranking. Information Processing & Management 48, 2, 374--383. Google ScholarDigital Library
- E. F. Harrington. 2003. Online ranking/collaborative filtering using the perceptron algorithm. In Proceedings of the 24th International Conference on Machine Learning, Vol. 20. 250.Google Scholar
- L. Hong, R. Bekkerman, J. Adler, and B. D. Davison. 2012. Learning to rank social update streams. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 651--660. Google ScholarDigital Library
- K. Järvelin and J. Kekäläinen. 2000. IR evaluation methods for retrieving highly relevant documents. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 41--48. Google ScholarDigital Library
- K. Järvelin and J. Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20, 4, 422--446. Google ScholarDigital Library
- F. Jelinek. 1980. Interpolated estimation of Markov source parameters from sparse data. In Proceedings of the Workshop on Pattern Recognition in Practice. Amsterdam, The Netherlands.Google Scholar
- T. Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 133--142. Google ScholarDigital Library
- T. Joachims. 2003. Transductive learning via spectral graph partitioning. In Proceedings of the 20th International Conference on Machine Learning (ICML’03), Vol. 3. 290--297.Google Scholar
- N. D. Lawrence and B. Schölkopf. 2001. Estimating a kernel Fisher discriminant in the presence of label noise. In Proceedings of the 18th International Conference on Machine Learning. Morgan Kaufmann Publishers Inc., Burlington, MA, 306--313. Google ScholarDigital Library
- P. Li, C. Burges, Q. Wu, J. C. Platt, D. Koller, Y. Singer, and S. Roweis. 2007a. McRank: Learning to rank using multiple classification and gradient boosting. Advances in Neural Information Processing Systems.Google Scholar
- Y. Li, L. F. A. Wessels, D. de Ridder, and M. J. T. Reinders. 2007b. Classification in the presence of class noise using a probabilistic kernel Fisher method. Pattern Recognition 40, 12, 3349--3357. Google ScholarDigital Library
- T. Y. Liu. 2009. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval 3, 3, 225--331. Google ScholarDigital Library
- T. Y. Liu, J. Xu, T. Qin, W. Xiong, and H. Li. 2007. Letor: Benchmark dataset for research on learning to rank for information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval Workshop. 3--10.Google Scholar
- S. Niu, J. Guo, Y. Lan, and X. Cheng. 2012. Top-k learning to rank: Labeling, ranking and evaluation. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 751--760. Google ScholarDigital Library
- U. Ozertem, O. Chapelle, P. Donmez, and E. Velipasaoglu. 2012. Learning to suggest: A machine learning framework for ranking query suggestions. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 25--34. Google ScholarDigital Library
- J. C. Platt. 1998. Sequential minimal optimization: A fast algorithm for training support vector machines. Technical Report MSR-TR-98-14. Microsoft Research.Google Scholar
- T. Qin, T. Y. Liu, W. Ding, X. Jun, and H. Li. 2010. Microsoft Learning to Rank Datasets. Retrieved September 7, 2015 from http://research.microsoft.com/en-us/projects/mslr/.Google Scholar
- Stephen E. Robertson and Steve Walker. 1994. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Springer-Verlag, New York, Inc., 232--241. Google ScholarDigital Library
- A. Severyn and A. Moschitti. 2012. Structural relationships for large-scale learning of answer re-ranking. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 741--750. Google ScholarDigital Library
- R. J. Vanderbei. 1999. LOQO: An interior point code for quadratic programming. Optimization methods and software 11, 1--4, 451--484.Google Scholar
- L. Wang, P. N. Bennett, and K. Collins-Thompson. 2012. Robust ranking models via risk-sensitive optimization. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 761--770. Google ScholarDigital Library
- J. Xu, C. Chen, G. Xu, and H. Li. 2010. Improving quality of training data for learning to rank using click-through data. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. ACM, New York, NY. Google ScholarDigital Library
- Y. Yue, T. Finley, F. Radlinski, and T. Joachims. 2007. A support vector method for optimizing average precision. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 271--278. Google ScholarDigital Library
- C. Zhai and J. Lafferty. 2004. A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems 22, 2, 214. Google ScholarDigital Library
- H. Zhang. 2004. The optimality of naive Bayes. In Proceedings of the 17th Florida Artificial Intelligence Research Society Conference. 562--567.Google Scholar
Index Terms
- Learning to Rank from Noisy Data
Recommendations
How to handle noisy labels for robust learning from uncertainty
AbstractMost deep neural networks (DNNs) are trained with large amounts of noisy labels when they are applied. As DNNs have the high capacity to fit any noisy labels, it is known to be difficult to train DNNs robustly with noisy labels. These ...
Training Data Optimization for Pairwise Learning to Rank
ICTIR '20: Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information RetrievalThis paper studies data optimization for Learning to Rank (LtR), by dropping training labels to increase ranking accuracy. Our work is inspired by data dropout, showing some training data do not positively influence learning and are better dropped out, ...
Learning to rectify for robust learning with noisy labels
Highlights- Our WarPI is the first probabilistic model to resolve label noise within the meta-learning scenario.
AbstractLabel noise significantly degrades the generalization ability of deep models in applications. Effective strategies and approaches (e.g., re-weighting or loss correction) are designed to alleviate the negative impact of label noise when ...
Comments