ABSTRACT
There is a fundamental tradeoff between effectiveness and efficiency when designing retrieval models for large-scale document collections. Effectiveness tends to derive from sophisticated ranking functions, such as those constructed using learning to rank, while efficiency gains tend to arise from improvements in query evaluation and caching strategies. Given their inherently disjoint nature, it is difficult to jointly optimize effectiveness and efficiency in end-to-end systems. To address this problem, we formulate and develop a novel cascade ranking model, which unlike previous approaches, can simultaneously improve both top k ranked effectiveness and retrieval efficiency. The model constructs a cascade of increasingly complex ranking functions that progressively prunes and refines the set of candidate documents to minimize retrieval latency and maximize result set quality. We present a novel boosting algorithm for learning such cascades to directly optimize the tradeoff between effectiveness and efficiency. Experimental results show that our cascades are faster and return higher quality results than comparable ranking models.
- A. Arampatzis, J. Kamps, and S. Robertson. Where to stop reading a ranked list? SIGIR 2009. Google ScholarDigital Library
- R. Baeza-Yates, A. Gionis, F. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri. The impact of caching on search engines. SIGIR 2007. Google ScholarDigital Library
- L. Barroso and U. Hölzle. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Morgan & Claypool, 2009. Google ScholarDigital Library
- M. Bendersky, D. Metzler, and W. Croft. Learning concept importance using a weighted dependence model. WSDM 2010. Google ScholarDigital Library
- E. Brown. Fast evaluation of structured queries for information retrieval. SIGIR 1995. Google ScholarDigital Library
- C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. ICML 2005. Google ScholarDigital Library
- F. Cacheda, V. Plachouras, and I. Ounis. A case study of distributed information retrieval architectures to index one terabyte of text. IP &M, 41:1141--1161, 2005. Google ScholarDigital Library
- B. Cambazoglu, H. Zaragoza, O. Chapelle, J. Chen, C. Liao, Z. Zheng, and J. Degenhardt. Early exit optimizations for additive machine learned ranking systems. WSDM 2010. Google ScholarDigital Library
- Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon. Adapting ranking SVM to document retrieval. SIGIR 2006. Google ScholarDigital Library
- D. Carmel, D. Cohen, R. Fagin, E. Farchi, M. Herscovici, Y. Maarek, and A. Soffer. Static index pruning for information retrieval systems. SIGIR 2001. Google ScholarDigital Library
- K. Collins-Thompson. Reducing the risk of query expansion via robust constrained optimization. CIKM 2009. Google ScholarDigital Library
- J. Hamilton. On designing and deploying Internet-scale services. LISA 2007. Google ScholarDigital Library
- T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, 2009.Google ScholarCross Ref
- E. Kanoulas, V. Pavlu, K. Dai, and J. Aslam. Modeling the score distribution of relevant and non-relevant documents. ICTIR 2009. Google ScholarDigital Library
- J. Lin, D. Metzler, T. Elsayed, and L. Wang. Of Ivory and Smurfs: Loxodontan MapReduce experiments for web search. TREC 2009.Google Scholar
- T.-Y. Liu. Learning to rank for information retrieval. FnTIR, 3(3), 2009.Google Scholar
- I. Matveeva, C. Burges, T. Burkard, A. Laucius, and L. Wong. High accuracy retrieval with multiple nested ranker. SIGIR 2007. Google ScholarDigital Library
- D. Metzler. Automatic feature selection in the Markov random field model for information retrieval. CIKM 2007. Google ScholarDigital Library
- J. Nocedal and S. Wright. Numerical optimization. Springer, 2006.Google Scholar
- A. Ntoulas and J. Cho. Pruning policies for two-tiered inverted index with correctness guarantee. SIGIR 2007. Google ScholarDigital Library
- G. Skobeltsyn, F. Junqueira, V. Plachouras, and R. Baeza-Yates. ResIn: A combination of results caching and index pruning for high-performance web search engines. SIGIR 2008. Google ScholarDigital Library
- R. Steuer. Multiple Criteria Optimization: Theory, Computation and Application. John Wiley, 1986.Google Scholar
- T. Strohman, H. Turtle, and W. Croft. Optimization strategies for complex queries. SIGIR 2005. Google ScholarDigital Library
- R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, 58(1):267--288, 1996.Google Scholar
- H. Turtle and J. Flood. Query evaluation: Strategies and optimizations. IP&M, 31(6):831--850, 1995. Google ScholarDigital Library
- P. Viola and M. Jones. Robust real-time object detection. International Journal of Computer Vision, 57(2):137--154, 2002. Google ScholarDigital Library
- L. Wang, J. Lin, and D. Metzler. Learning to efficiently rank. SIGIR 2010. Google ScholarDigital Library
- L. Wang, D. Metzler, and J. Lin. Ranking under temporal constraints. CIKM 2010. Google ScholarDigital Library
- D. Weiss & B. Taskar. Structured prediction cascades. 13th Conference on AI and Statistics, 2010.Google Scholar
- J. Xu & H. Li. AdaRank: A boosting algorithm for information retrieval. SIGIR 2007. Google ScholarDigital Library
Index Terms
- A cascade ranking model for efficient ranked retrieval
Recommendations
Joint Optimization of Cascade Ranking Models
WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data MiningReducing excessive costs in feature acquisition and model evaluation has been a long-standing challenge in learning-to-rank systems. A cascaded ranking architecture turns ranking into a pipeline of multiple stages, and has been shown to be a powerful ...
Efficient Cost-Aware Cascade Ranking in Multi-Stage Retrieval
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information RetrievalComplex machine learning models are now an integral part of modern, large-scale retrieval systems. However, collection size growth continues to outpace advances in efficiency improvements in the learning models which achieve the highest effectiveness. ...
Speeding up Document Ranking with Rank-based Features
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information RetrievalLearning to Rank (LtR) is an effective machine learning methodology for inducing high-quality document ranking functions. Given a query and a candidate set of documents, where query-document pairs are represented by feature vectors, a machine-learned ...
Comments