ABSTRACT
In this paper we address the issue of learning to rank for document retrieval. In the task, a model is automatically created with some training data and then is utilized for ranking of documents. The goodness of a model is usually evaluated with performance measures such as MAP (Mean Average Precision) and NDCG (Normalized Discounted Cumulative Gain). Ideally a learning algorithm would train a ranking model that could directly optimize the performance measures with respect to the training data. Existing methods, however, are only able to train ranking models by minimizing loss functions loosely related to the performance measures. For example, Ranking SVM and RankBoost train ranking models by minimizing classification errors on instance pairs. To deal with the problem, we propose a novel learning algorithm within the framework of boosting, which can minimize a loss function directly defined on the performance measures. Our algorithm, referred to as AdaRank, repeatedly constructs 'weak rankers' on the basis of reweighted training data and finally linearly combines the weak rankers for making ranking predictions. We prove that the training process of AdaRank is exactly that of enhancing the performance measure used. Experimental results on four benchmark datasets show that AdaRank significantly outperforms the baseline methods of BM25, Ranking SVM, and RankBoost.
- R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, May 1999. Google ScholarDigital Library
- C. Burges, R. Ragno, and Q. Le. Learning to rank with nonsmooth cost functions. In Advances in Neural Information Processing Systems 18, pages 95--402. MIT Press, Cambridge, MA, 2006.Google Scholar
- C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In ICML 22, pages 89--96, 2005. Google ScholarDigital Library
- Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon. Adapting ranking SVM to document retrieval. In SIGIR 29, pages 186--193, 2006. Google ScholarDigital Library
- D. Cossock and T. Zhang. Subset ranking using regression. In COLT, pages 605--619, 2006. Google ScholarDigital Library
- N. Craswell, D. Hawking, R. Wilkinson, and M. Wu. Overview of the TREC 2003 web track. In TREC, pages 78--92, 2003.Google Scholar
- N. Duffy and D. Helmbold. Boosting methods for regression. Mach. Learn., 47(2-3):153--200, 2002. Google ScholarDigital Library
- Y. Freund, R. D. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4:933--969, 2003. Google ScholarDigital Library
- Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci., 55(1):119--139, 1997. Google ScholarDigital Library
- J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: A statistical view of boosting. The Annals of Statistics, 28(2):337--374, 2000.Google ScholarCross Ref
- G. Fung, R. Rosales, and B. Krishnapuram. Learning rankings via convex hull separation. In Advances in Neural Information Processing Systems 18, pages 395--402. MIT Press, Cambridge, MA, 2006.Google ScholarDigital Library
- T. Hastie, R. Tibshirani, and J. H. Friedman. The Elements of Statistical Learning. Springer, August 2001.Google ScholarCross Ref
- R. Herbrich, T. Graepel, and K. Obermayer. Large Margin rank boundaries for ordinal regression. MIT Press, Cambridge, MA, 2000.Google Scholar
- W. Hersh, C. Buckley, T. J. Leone, and D. Hickam. Ohsumed: an interactive retrieval evaluation and new large test collection for research. In SIGIR, pages 192--201, 1994. Google ScholarDigital Library
- K. Jarvelin and J. Kekalainen. IR evaluation methods for retrieving highly relevant documents. In SIGIR 23, pages 41--48, 2000. Google ScholarDigital Library
- T. Joachims. Optimizing search engines using clickthrough data. In SIGKDD 8, pages 133--142, 2002. Google ScholarDigital Library
- T. Joachims. A support vector method for multivariate performance measures. In ICML 22, pages 377--384, 2005. Google ScholarDigital Library
- J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In SIGIR 24, pages 111--119, 2001. Google ScholarDigital Library
- D. A. Metzler, W. B. Croft, and A. McCallum. Direct maximization of rank-based metrics for information retrieval. Technical report, CIIR, 2005.Google Scholar
- R. Nallapati. Discriminative models for information retrieval. In SIGIR 27, pages 64--71, 2004. Google ScholarDigital Library
- L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.Google Scholar
- J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In SIGIR 21, pages 275--281, 1998. Google ScholarDigital Library
- T. Qin, T.-Y. Liu, X.-D. Zhang, Z. Chen, and W.-Y. Ma. A study of relevance propagation for web search. In SIGIR 28, pages 408--415, 2005. Google ScholarDigital Library
- S. E. Robertson and D. A. Hull. The TREC-9 filtering track final report. In TREC, pages 25--40, 2000.Google Scholar
- R. E. Schapire, Y. Freund, P. Barlett, and W. S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. In ICML 14, pages 322--330, 1997. Google ScholarDigital Library
- R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. Mach. Learn., 37(3):297--336, 1999. Google ScholarDigital Library
- R. Song, J. Wen, S. Shi, G. Xin, T. yan Liu, T. Qin, X. Zheng, J. Zhang, G. Xue, and W.-Y. Ma. Microsoft Research Asia at web track and terabyte track of TREC 2004. In TREC, 2004.Google Scholar
- A. Trotman. Learning to rank. Inf. Retr., 8(3):359--381, 2005. Google ScholarDigital Library
- J. Xu, Y. Cao, H. Li, and Y. Huang. Cost-sensitive learning of SVM for ranking. In ECML, pages 833--840, 2006. Google ScholarDigital Library
- G.-R. Xue, Q. Yang, H.-J. Zeng, Y. Yu, and Z. Chen. Exploiting the hierarchical structure for link analysis. In SIGIR 28, pages 186--193, 2005. Google ScholarDigital Library
- H. Yu. SVM selective sampling for ranking with application to data retrieval. In SIGKDD 11, pages 354--363, 2005. Google ScholarDigital Library
Index Terms
- AdaRank: a boosting algorithm for information retrieval
Recommendations
Learning to rank with partially-labeled data
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrievalRanking algorithms, whose goal is to appropriately order a set of objects/documents, are an important component of information retrieval systems. Previous work on ranking algorithms has focused on cases where only labeled data is available for training (...
Learning to rank with groups
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge managementAn essential issue in document retrieval is ranking, and the documents are ranked by their expected relevance to a given query. Multiple labels are used to represent different level of relevance for documents to a given query, and the corresponding ...
On Application of Learning to Rank for E-Commerce Search
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information RetrievalE-Commerce (E-Com) search is an emerging important new application of information retrieval. Learning to Rank (LETOR) is a general effective strategy for optimizing search engines, and is thus also a key technology for E-Com search. While the use of ...
Comments