skip to main content
research-article

Learning to Rank from Noisy Data

Published:07 October 2015Publication History
Skip Abstract Section

Abstract

Learning to rank, which learns the ranking function from training data, has become an emerging research area in information retrieval and machine learning. Most existing work on learning to rank assumes that the training data is clean, which is not always true, however. The ambiguity of query intent, the lack of domain knowledge, and the vague definition of relevance levels all make it difficult for common annotators to give reliable relevance labels to some documents. As a result, the relevance labels in the training data of learning to rank usually contain noise. If we ignore this fact, the performance of learning-to-rank algorithms will be damaged.

In this article, we propose considering the labeling noise in the process of learning to rank and using a two-step approach to extend existing algorithms to handle noisy training data. In the first step, we estimate the degree of labeling noise for a training document. To this end, we assume that the majority of the relevance labels in the training data are reliable and we use a graphical model to describe the generative process of a training query, the feature vectors of its associated documents, and the relevance labels of these documents. The parameters in the graphical model are learned by means of maximum likelihood estimation. Then the conditional probability of the relevance label given the feature vector of a document is computed. If the probability is large, we regard the degree of labeling noise for this document as small; otherwise, we regard the degree as large. In the second step, we extend existing learning-to-rank algorithms by incorporating the estimated degree of labeling noise into their loss functions. Specifically, we give larger weights to those training documents with smaller degrees of labeling noise and smaller weights to those with larger degrees of labeling noise. As examples, we demonstrate the extensions for McRank, RankSVM, RankBoost, and RankNet. Empirical results on benchmark datasets show that the proposed approach can effectively distinguish noisy documents from clean ones, and the extended learning-to-rank algorithms can achieve better performances than baselines.

References

  1. D. Angluin and P. Laird. 1988. Learning from noisy examples. Machine Learning 2, 4, 343--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ricardo Baeza-Yates and Berthier Ribeiro-Neto. 1999. Modern Information Retrieval. Addison Wesley, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Baeza-Yates, B. Ribeiro-Neto, and others. 1999. Modern information retrieval. Addison-Wesley, Reading, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. E. Brodley and M. A. Friedl. 1999. Identifying mislabeled training data. Journal of Artificial Intelligence Research 11, 131--167.Google ScholarGoogle ScholarCross RefCross Ref
  5. C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine Learning. ACM, New York, NY, 89--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y. Cao, J. Xu, T. Y. Liu, H. Li, Y. Huang, and H. W. Hon. 2006. Adapting ranking SVM to document retrieval. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 193. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Z. Cao, T. Qin, T. Y. Liu, M. F. Tsai, and H. Li. 2007. Learning to rank: From pairwise approach to listwise approach. In Proceedings of the 24th International Conference on Machine Learning. ACM, New York, NY, 129--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. V. R. Carvalho, J. L. Elsas, W. W. Cohen, and J. G. Carbonell. 2008. A meta-learning approach for robust rank learning. In SIGIR 2008 Workshop on Learning to Rank for Information Retrieval, Vol. 1. ACM, New York, NY.Google ScholarGoogle Scholar
  9. O. Chapelle and Y. Chang. 2011. Yahoo! learning to rank challenge overview. Journal of Machine Learning Research-Proceedings Track 14, 1--24.Google ScholarGoogle Scholar
  10. C. L. Clarke, N. Craswell, and I. Soboroff. 2009. Overview of the trec 2009 web track. Technical Report. DTIC Document.Google ScholarGoogle Scholar
  11. S. Cronen-Townsend and W. B. Croft. 2002. Quantifying query ambiguity. In Proceedings of the 2nd International Conference on Human Language Technology Research. Morgan Kaufmann Publishers Inc., Burlington, MA, 109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. Duh and K. Kirchhoff. 2008. Learning to rank with partially-labeled data. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 251--258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. 2003. An efficient boosting algorithm for combining preferences. The Journal of Machine Learning Research 4, 933--969. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. X. Geng, T. Qin, T.-Y. Liu, and X.-Q. Cheng. 2012. A noise-tolerant graphical model for ranking. Information Processing & Management 48, 2, 374--383. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. E. F. Harrington. 2003. Online ranking/collaborative filtering using the perceptron algorithm. In Proceedings of the 24th International Conference on Machine Learning, Vol. 20. 250.Google ScholarGoogle Scholar
  16. L. Hong, R. Bekkerman, J. Adler, and B. D. Davison. 2012. Learning to rank social update streams. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 651--660. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. Järvelin and J. Kekäläinen. 2000. IR evaluation methods for retrieving highly relevant documents. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 41--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. Järvelin and J. Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20, 4, 422--446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. F. Jelinek. 1980. Interpolated estimation of Markov source parameters from sparse data. In Proceedings of the Workshop on Pattern Recognition in Practice. Amsterdam, The Netherlands.Google ScholarGoogle Scholar
  20. T. Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 133--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Joachims. 2003. Transductive learning via spectral graph partitioning. In Proceedings of the 20th International Conference on Machine Learning (ICML’03), Vol. 3. 290--297.Google ScholarGoogle Scholar
  22. N. D. Lawrence and B. Schölkopf. 2001. Estimating a kernel Fisher discriminant in the presence of label noise. In Proceedings of the 18th International Conference on Machine Learning. Morgan Kaufmann Publishers Inc., Burlington, MA, 306--313. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Li, C. Burges, Q. Wu, J. C. Platt, D. Koller, Y. Singer, and S. Roweis. 2007a. McRank: Learning to rank using multiple classification and gradient boosting. Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  24. Y. Li, L. F. A. Wessels, D. de Ridder, and M. J. T. Reinders. 2007b. Classification in the presence of class noise using a probabilistic kernel Fisher method. Pattern Recognition 40, 12, 3349--3357. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. Y. Liu. 2009. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval 3, 3, 225--331. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. T. Y. Liu, J. Xu, T. Qin, W. Xiong, and H. Li. 2007. Letor: Benchmark dataset for research on learning to rank for information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval Workshop. 3--10.Google ScholarGoogle Scholar
  27. S. Niu, J. Guo, Y. Lan, and X. Cheng. 2012. Top-k learning to rank: Labeling, ranking and evaluation. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 751--760. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. U. Ozertem, O. Chapelle, P. Donmez, and E. Velipasaoglu. 2012. Learning to suggest: A machine learning framework for ranking query suggestions. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 25--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. C. Platt. 1998. Sequential minimal optimization: A fast algorithm for training support vector machines. Technical Report MSR-TR-98-14. Microsoft Research.Google ScholarGoogle Scholar
  30. T. Qin, T. Y. Liu, W. Ding, X. Jun, and H. Li. 2010. Microsoft Learning to Rank Datasets. Retrieved September 7, 2015 from http://research.microsoft.com/en-us/projects/mslr/.Google ScholarGoogle Scholar
  31. Stephen E. Robertson and Steve Walker. 1994. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Springer-Verlag, New York, Inc., 232--241. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Severyn and A. Moschitti. 2012. Structural relationships for large-scale learning of answer re-ranking. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 741--750. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. R. J. Vanderbei. 1999. LOQO: An interior point code for quadratic programming. Optimization methods and software 11, 1--4, 451--484.Google ScholarGoogle Scholar
  34. L. Wang, P. N. Bennett, and K. Collins-Thompson. 2012. Robust ranking models via risk-sensitive optimization. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 761--770. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Xu, C. Chen, G. Xu, and H. Li. 2010. Improving quality of training data for learning to rank using click-through data. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. ACM, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Y. Yue, T. Finley, F. Radlinski, and T. Joachims. 2007. A support vector method for optimizing average precision. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 271--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. C. Zhai and J. Lafferty. 2004. A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems 22, 2, 214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. H. Zhang. 2004. The optimality of naive Bayes. In Proceedings of the 17th Florida Artificial Intelligence Research Society Conference. 562--567.Google ScholarGoogle Scholar

Index Terms

  1. Learning to Rank from Noisy Data

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Intelligent Systems and Technology
      ACM Transactions on Intelligent Systems and Technology  Volume 7, Issue 1
      October 2015
      293 pages
      ISSN:2157-6904
      EISSN:2157-6912
      DOI:10.1145/2830012
      • Editor:
      • Yu Zheng
      Issue’s Table of Contents

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 October 2015
      • Accepted: 1 December 2013
      • Revised: 1 October 2013
      • Received: 1 June 2013
      Published in tist Volume 7, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader