ABSTRACT
Effective ranking functions are an essential part of commercial search engines. We focus on developing a regression framework for learning ranking functions for improving relevance of search engines serving diverse streams of user queries. We explore supervised learning methodology from machine learning, and we distinguish two types of relevance judgments used as the training data: 1) absolute relevance judgments arising from explicit labeling of search results; and 2) relative relevance judgments extracted from user click throughs of search results or converted from the absolute relevance judgments. We propose a novel optimization framework emphasizing the use of relative relevance judgments. The main contribution is the development of an algorithm based on regression that can be applied to objective functions involving preference data, i.e., data indicating that a document is more relevant than another with respect to a query. Experimental results are carried out using data sets obtained from a commercial search engine. Our results show significant improvements of our proposed methods over some existing methods.
- R. Atterer, M. Wunk, and A. Schmidt. Knowing the user's every move: user activity tracking for website usability evaluation and implicit interaction. Proceedings of the 15th International Conference on World Wide Web 203--212,2006. Google ScholarDigital Library
- A. Berger. Statistical machine learning for information retrieval Ph.D. Thesis, School of Computer Science, Carnegie Mellon University, 2001. Google ScholarDigital Library
- D. Bertsekas. Nonlinear programming Athena Scienti?c, second edition, 1999.Google Scholar
- C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. Proceedings of international conference on Machine learning 89--96, 2005. Google ScholarDigital Library
- H. Chen. Machine Learning for information retrieval: Neural networks, symbolic learning and genetic algorithms. JASIS 46:194--216, 1995. Google ScholarDigital Library
- W. Cooper, F. Gey and A. Chen. Probabilistic retrieval in the TIPSTER collections: an application of staged logistic regression. Proceedings of TREC 73--88, 1992.Google Scholar
- D. Cossock and T. Zhang. Subset ranking using regression. COLT 2006. Google ScholarDigital Library
- Y. Freund, R. Iyer, R. Schapire and Y. Singer. An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research 4:933--969, 2003. Google ScholarDigital Library
- J. Friedman. Greedy function approximation: a gradient boosting machine. Ann. Statist. 29:1189--1232, 2001.Google ScholarCross Ref
- N. Fuhr. Optimum polynomial retrieval functions based on probability ranking principle. ACM Transactions on Information Systems 7:183--204, 1989. Google ScholarDigital Library
- F. Gey, A. Chen, J. He and J. Meggs. Logistic regression at TREC4: probabilistic retrieval from full text document collections. Proceedings of TREC 65--72, 1995.Google Scholar
- K. Järvelin and J.Kekäläinen.Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20:422--446, 2002. Google ScholarDigital Library
- T. Joachims. Optimizing search engines using clickthrough data. Proceedings of the ACM Conference on Knowledge Discovery and Data Mining 2002. Google ScholarDigital Library
- T. Joachims. Evaluating retrieval performance using clickthrough data. Proceedings of the SIGIR Workshop on Mathematical/Formal Methods in Information Retrieval 2002.Google Scholar
- T. Joachims, L. Granka, B. Pang, H. Hembrooke, and G. Gay. Accurately Interpreting Clickthrough Data as Implicit Feedback. Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2005. Google ScholarDigital Library
- J. Ponte and W. Croft. A language modeling approach to information retrieval. In Proceedings of the ACM Conference on Research and Development in Information Retrieval 1998. Google ScholarDigital Library
- G. Salton. Automatic Text Processing. Addison Wesley, Reading, MA, 1989. Google ScholarDigital Library
- H. Turtle and W. B. Croft. Inference networks for document retrieval. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 1-24, 1990. Google ScholarDigital Library
- H. Zha, Z. Zheng, H. Fu and G. Sun. Incorporating query difference for learning retrieval functions in worldwidewebsearch. Proceedings of the 15th ACM Conference on Information and Knowledge Management 2006. Google ScholarDigital Library
- Diane Kelly and Jaime Teevan. Implicit Feedback for Inferring User Preference: A Bibliography. SIGIR Forum 32:2, 2003. Google ScholarDigital Library
- F. Radlinski and T. Joachims. Query chains: Learning to rank from implicit feedback. Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD), 2005. Google ScholarDigital Library
- C. Zhai and J. Lafferty. A risk minimization framework for information retrieval, Information Processing and Management 42:31--55, 2006. Google ScholarDigital Library
Index Terms
- A regression framework for learning ranking functions using relative relevance judgments
Recommendations
Learning to rank with ties
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrievalDesigning effective ranking functions is a core problem for information retrieval and Web search since the ranking functions directly impact the relevance of the search results. The problem has been the focus of much of the research at the intersection ...
Smoothing DCG for learning to rank: a novel approach using smoothed hinge functions
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementDiscounted cumulative gain (DCG) is widely used for evaluating ranking functions. It is therefore natural to learn a ranking function that directly optimizes DCG. However, DCG is non-smooth, rendering gradient-based optimization algorithms inapplicable. ...
Genetic Programming-Based Discovery of Ranking Functions for Effective Web Search
Web search engines have become an integral part of the daily life of a knowledge worker, who depends on these search engines to retrieve relevant information from the Web or from the company's vast document databases. Current search engines are very ...
Comments