ABSTRACT
Many real-world data mining tasks require the achievement of two distinct goals when applied to unseen data: first, to induce an accurate preference ranking, and second to give good regression performance. In this paper, we give an efficient and effective Combined Regression and Ranking method (CRR) that optimizes regression and ranking objectives simultaneously. We demonstrate the effectiveness of CRR for both families of metrics on a range of large-scale tasks, including click prediction for online advertisements. Results show that CRR often achieves performance equivalent to the best of both ranking-only and regression-only approaches. In the case of rare events or skewed distributions, we also find that this combination can actually improve regression performance due to the addition of informative ranking constraints.
Supplemental Material
- G. Aggarwal, A. Goel, and R. Motwani. Truthful auctions for pricing search keywords. In EC '06: Proceedings of the 7th ACM conference on Electronic commerce, 2006. Google ScholarDigital Library
- C. M. Bishop. Pattern Recognition and Machine Learning. Springer-Verlag New York, Inc., 2006. Google ScholarDigital Library
- L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems, volume 20, pages 161--168. NIPS Foundation (http://books.nips.cc), 2008.Google Scholar
- A. P. Bradley. The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognition, 30, 1997. Google ScholarDigital Library
- C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In ICML '05: Proceedings of the 22nd international conference on Machine learning, 2005. Google ScholarDigital Library
- Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li. Learning to rank: from pairwise approach to listwise approach. In ICML '07: Proceedings of the 24th international conference on Machine learning, 2007. Google ScholarDigital Library
- D. Chakrabarti, D. Agarwal, and V. Josifovski. Contextual advertising by combining relevance with click feedback. In WWW '08: Proceeding of the 17th international conference on World Wide Web, 2008. Google ScholarDigital Library
- M. Ciaramita, V. Murdock, and V. Plachouras. Online learning from click data for sponsored search. In WWW '08: Proceeding of the 17th international conference on World Wide Web, 2008. Google ScholarDigital Library
- J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient projections onto the l1-ball for learning in high dimensions. In ICML '08: Proceedings of the 25th international conference on Mach ine learning, 2008. Google ScholarDigital Library
- J. L. Elsas, V. R. Carvalho, and J. G. Carbonell. Fast learning of document ranking functions with the committee perceptron. In WSDM '08: Proceedings of the international conference on Web search and web data mining, 2008. Google ScholarDigital Library
- Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res., 4, 2003. Google ScholarDigital Library
- T. Joachims. Optimizing search engines using clickthrough data. In KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, 2002. Google ScholarDigital Library
- T. Joachims. A support vector method for multivariate performance measures. In ICML '05: Proceedings of the 22nd international conference on Machine learning, 2005. Google ScholarDigital Library
- T. Joachims. Training linear svms in linear time. In KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 2006. Google ScholarDigital Library
- K. Koh, S.-J. Kim, and S. Boyd. An interior-point method for large-scale l1-regularized logistic regression. J. Mach. Learn. Res., 8, 2007. Google ScholarDigital Library
- J. Langford, L. Li, and T. Zhang. Sparse online learning via truncated gradient. J. Mach. Learn. Res., 10, 2009. Google ScholarDigital Library
- D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. J. Mach. Learn. Res., 5:361--397, 2004. Google ScholarDigital Library
- T.-Y. Liu, T. Qin, J. Xu, W. Xiong, and H. Li. LETOR: Benchmark dataset for research on learning to rank for information retrieval. In LR4IR 2007: Workshop on Learning to Rank for Information Retrieval, in conjunction with SIGIR 2007, 2007.Google Scholar
- M.-F. M.F. Balcan and A. Blum. On a theory of learning with similarity functions. In ICML '06: Proceedings of the 23rd international conference on Machine learning, 2006. Google ScholarDigital Library
- T. M. Mitchell. Generative and discriminative classifiers: Naive bayes and logistic regression. In Machine Learning. http://www.cs.cmu.edu/~tom/mlbook/NBayesLogReg.pdf, 2005.Google Scholar
- D. Sculley. Large scale learning to rank. In NIPS 2009 Workshop on Advances in Ranking, 2009.Google Scholar
- D. Sculley, R. G. Malkin, S. Basu, and R. J. Bayardo. Predicting bounce rates in sponsored search advertisements. In KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 2009. Google ScholarDigital Library
- S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for svm. In ICML '07: Proceedings of the 24th international conference on Machine learning, 2007. Google ScholarDigital Library
- J. Xu and H. Li. Adarank: a boosting algorithm for information retrieval. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, 2007. Google ScholarDigital Library
- Y. Yue, T. Finley, F. Radlinski, and T. Joachims. A support vector method for optimizing average precision. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, 2007. Google ScholarDigital Library
Index Terms
- Combined regression and ranking
Recommendations
Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge managementThis work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...
A regression framework for learning ranking functions using relative relevance judgments
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrievalEffective ranking functions are an essential part of commercial search engines. We focus on developing a regression framework for learning ranking functions for improving relevance of search engines serving diverse streams of user queries. We explore ...
Regression by Re-Ranking
Highlights- A novel rank-based approach is proposed for regression tasks;
- The method allows ...
AbstractSeveral approaches based on regression have been developed in the past few years with the goal of improving prediction results, including the use of ranking strategies. Re-ranking has been exploited and successfully employed in several ...
Comments