ABSTRACT
In information retrieval, relevance of documents with respect to queries is usually judged by humans, and used in evaluation and/or learning of ranking functions. Previous work has shown that certain level of noise in relevance judgments has little effect on evaluation, especially for comparison purposes. Recently learning to rank has become one of the major means to create ranking models in which the models are automatically learned from the data derived from a large number of relevance judgments. As far as we know, there was no previous work about quality of training data for learning to rank, and this paper tries to study the issue. Specifically, we address three problems. Firstly, we show that the quality of training data labeled by humans has critical impact on the performance of learning to rank algorithms. Secondly, we propose detecting relevance judgment errors using click-through data accumulated at a search engine. Two discriminative models, referred to as sequential dependency model and full dependency model, are proposed to make the detection. Both models consider the conditional dependency of relevance labels and thus are more powerful than the conditionally independent model previously proposed for other tasks. Finally, we verify that using training data in which the errors are detected and corrected by our method, we can improve the performance of learning to rank algorithms.
- R. Agrawal, A. Halverson, K. Kenthapadi, N. Mishra, and P. Tsaparas. Generating labels from clicks. In WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 172--181. ACM, 2009. Google ScholarDigital Library
- P. Bailey, N. Craswell, I. Soboroff, P. Thomas, A.P. de Vries, and E. Yilmaz. Relevance assessment: are judges exchangeable and does it matter. In SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 667--674. ACM, 2008. Google ScholarDigital Library
- B. Carterette and R. Jones. Evaluating search engines by modeling the relationship between relevance and clicks. In Advances in Neural Information Processing Systems, volume 20, 2007.Google Scholar
- N. Craswell and M. Szummer. Random walks on the click graph. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 239--246. ACM, 2007. Google ScholarDigital Library
- Y. Freund, R. Iyer, R.E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res., 4:933--969, 2003. Google ScholarDigital Library
- S.P. Harter. Variations in relevance assessments and the measurement of retrieval effectiveness. J. Am. Soc. Inf. Sci., 47(1):37--49, 1996. Google ScholarDigital Library
- S. Ji, K. Zhou, C. Liao, Z. Zheng, G.-R. Xue, O. Chapelle, G. Sun, and H. Zha. Global ranking by exploiting user clicks. In SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 35--42. ACM, 2009. Google ScholarDigital Library
- T. Joachims. Optimizing search engines using clickthrough data. In KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 133--142. ACM, 2002. Google ScholarDigital Library
- T. Joachims, L. Granka, B. Pan, H. Hembrooke, and G. Gay. Accurately interpreting clickthrough data as implicit feedback. In SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 154--161. ACM, 2005. Google ScholarDigital Library
- J. Kamps, M. Koolen, and A. Trotman. Comparative analysis of clicks and judgments for ir evaluation. In WSCD '09: Proceedings of the 2009 workshop on Web Search Click Data, pages 80--87. ACM, 2009. Google ScholarDigital Library
- J.D. Lafferty, A. McCallum, and F.C.N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML '01: Proceedings of the Eighteenth International Conference on Machine Learning, pages 282--289. Morgan Kaufmann Publishers Inc., 2001. Google ScholarDigital Library
- C. Macdonald and I. Ounis. Usefulness of quality click-through data for training. In WSCD '09: Proceedings of the 2009 workshop on Web Search Click Data, pages 75--79. ACM, 2009. Google ScholarDigital Library
- Q. Mei, D. Zhou, and K. Church. Query suggestion using hitting time. In CIKM '08: Proceeding of the 17th ACM conference on Information and knowledge management, pages 469--478. ACM, 2008. Google ScholarDigital Library
- D. Metzler and W.B. Croft. A markov random field model for term dependencies. In SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 472--479. ACM, 2005. Google ScholarDigital Library
- F. Qiu and J. Cho. Automatic identification of user interest for personalized search. In WWW '06: Proceedings of the 15th international conference on World Wide Web, pages 727--736. ACM, 2006. Google ScholarDigital Library
- F. Radlinski and T. Joachims. Query chains: learning to rank from implicit feedback. In KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 239--248. ACM, 2005. Google ScholarDigital Library
- F. Radlinski, M. Kurup, and T. Joachims. How does clickthrough data reflect retrieval quality? In CIKM'08: Proceeding of the 17th ACM conference on Information and knowledge management, pages 43--52. ACM, 2008. Google ScholarDigital Library
- P. Ravikumar and J. Lafferty. Quadratic programming relaxations for metric labeling and markov random field map estimation. In ICML '06: Proceedings of the 23rd international conference on Machine learning, pages 737--744. ACM, 2006. Google ScholarDigital Library
- E. Voorhees and D. Harman. Overview of the fifth text retrieval conference (trec-5). In Proceedings of the sixth Text REtrieval Conference (TREC-6). (NIST Special Publication 500--240, 1997.Google Scholar
- E.M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. In SIGIR'98: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 315--323. ACM, 1998. Google ScholarDigital Library
- J. Xu and H. Li. Adarank: a boosting algorithm for information retrieval. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 391--398. ACM, 2007. Google ScholarDigital Library
- Y. Yue, T. Finley, F. Radlinski, and T. Joachims. A support vector method for optimizing average precision. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 271--278. ACM, 2007. Google ScholarDigital Library
Index Terms
- Improving quality of training data for learning to rank using click-through data
Recommendations
Optimizing web search using web click-through data
CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge managementThe performance of web search engines may often deteriorate due to the diversity and noisy information contained within web pages. User click-through data can be used to introduce more accurate description (metadata) for web pages, and to improve the ...
Usefulness of quality click-through data for training
WSCD '09: Proceedings of the 2009 workshop on Web Search Click DataModern Information Retrieval (IR) systems often employ document weighting models with many parameters that require to be appropriately set for effective retrieval performance. To obtain these parameter settings, quality training is usually required, ...
Learning to rank only using training data from related domain
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrievalLike traditional supervised and semi-supervised algorithms, learning to rank for information retrieval requires document annotations provided by domain experts. It is costly to annotate training data for different search domains and tasks. We propose to ...
Comments