research-article

A cascade ranking model for efficient ranked retrieval

Authors:
Lidan Wang

University of Maryland, College Park, MD, USA

University of Maryland, College Park, MD, USA
View Profile

,
Jimmy Lin

University of Maryland, College Park, MD, USA

University of Maryland, College Park, MD, USA
View Profile

,
Donald Metzler

University of Southern California, Marina del Rey, CA, USA

University of Southern California, Marina del Rey, CA, USA
View Profile

SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information RetrievalJuly 2011Pages 105–114https://doi.org/10.1145/2009916.2009934

Published:24 July 2011Publication History

SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

Pages 105–114

ABSTRACT

There is a fundamental tradeoff between effectiveness and efficiency when designing retrieval models for large-scale document collections. Effectiveness tends to derive from sophisticated ranking functions, such as those constructed using learning to rank, while efficiency gains tend to arise from improvements in query evaluation and caching strategies. Given their inherently disjoint nature, it is difficult to jointly optimize effectiveness and efficiency in end-to-end systems. To address this problem, we formulate and develop a novel cascade ranking model, which unlike previous approaches, can simultaneously improve both top k ranked effectiveness and retrieval efficiency. The model constructs a cascade of increasingly complex ranking functions that progressively prunes and refines the set of candidate documents to minimize retrieval latency and maximize result set quality. We present a novel boosting algorithm for learning such cascades to directly optimize the tradeoff between effectiveness and efficiency. Experimental results show that our cascades are faster and return higher quality results than comparable ranking models.

References

A. Arampatzis, J. Kamps, and S. Robertson. Where to stop reading a ranked list? SIGIR 2009. Google ScholarDigital Library
R. Baeza-Yates, A. Gionis, F. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri. The impact of caching on search engines. SIGIR 2007. Google ScholarDigital Library
L. Barroso and U. Hölzle. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Morgan & Claypool, 2009. Google ScholarDigital Library
M. Bendersky, D. Metzler, and W. Croft. Learning concept importance using a weighted dependence model. WSDM 2010. Google ScholarDigital Library
E. Brown. Fast evaluation of structured queries for information retrieval. SIGIR 1995. Google ScholarDigital Library
C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. ICML 2005. Google ScholarDigital Library
F. Cacheda, V. Plachouras, and I. Ounis. A case study of distributed information retrieval architectures to index one terabyte of text. IP &M, 41:1141--1161, 2005. Google ScholarDigital Library
B. Cambazoglu, H. Zaragoza, O. Chapelle, J. Chen, C. Liao, Z. Zheng, and J. Degenhardt. Early exit optimizations for additive machine learned ranking systems. WSDM 2010. Google ScholarDigital Library
Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon. Adapting ranking SVM to document retrieval. SIGIR 2006. Google ScholarDigital Library
D. Carmel, D. Cohen, R. Fagin, E. Farchi, M. Herscovici, Y. Maarek, and A. Soffer. Static index pruning for information retrieval systems. SIGIR 2001. Google ScholarDigital Library
K. Collins-Thompson. Reducing the risk of query expansion via robust constrained optimization. CIKM 2009. Google ScholarDigital Library
J. Hamilton. On designing and deploying Internet-scale services. LISA 2007. Google ScholarDigital Library
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, 2009.Google ScholarCross Ref
E. Kanoulas, V. Pavlu, K. Dai, and J. Aslam. Modeling the score distribution of relevant and non-relevant documents. ICTIR 2009. Google ScholarDigital Library
J. Lin, D. Metzler, T. Elsayed, and L. Wang. Of Ivory and Smurfs: Loxodontan MapReduce experiments for web search. TREC 2009.Google Scholar
T.-Y. Liu. Learning to rank for information retrieval. FnTIR, 3(3), 2009.Google Scholar
I. Matveeva, C. Burges, T. Burkard, A. Laucius, and L. Wong. High accuracy retrieval with multiple nested ranker. SIGIR 2007. Google ScholarDigital Library
D. Metzler. Automatic feature selection in the Markov random field model for information retrieval. CIKM 2007. Google ScholarDigital Library
J. Nocedal and S. Wright. Numerical optimization. Springer, 2006.Google Scholar
A. Ntoulas and J. Cho. Pruning policies for two-tiered inverted index with correctness guarantee. SIGIR 2007. Google ScholarDigital Library
G. Skobeltsyn, F. Junqueira, V. Plachouras, and R. Baeza-Yates. ResIn: A combination of results caching and index pruning for high-performance web search engines. SIGIR 2008. Google ScholarDigital Library
R. Steuer. Multiple Criteria Optimization: Theory, Computation and Application. John Wiley, 1986.Google Scholar
T. Strohman, H. Turtle, and W. Croft. Optimization strategies for complex queries. SIGIR 2005. Google ScholarDigital Library
R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, 58(1):267--288, 1996.Google Scholar
H. Turtle and J. Flood. Query evaluation: Strategies and optimizations. IP&M, 31(6):831--850, 1995. Google ScholarDigital Library
P. Viola and M. Jones. Robust real-time object detection. International Journal of Computer Vision, 57(2):137--154, 2002. Google ScholarDigital Library
L. Wang, J. Lin, and D. Metzler. Learning to efficiently rank. SIGIR 2010. Google ScholarDigital Library
L. Wang, D. Metzler, and J. Lin. Ranking under temporal constraints. CIKM 2010. Google ScholarDigital Library
D. Weiss & B. Taskar. Structured prediction cascades. 13th Conference on AI and Statistics, 2010.Google Scholar
J. Xu & H. Li. AdaRank: A boosting algorithm for information retrieval. SIGIR 2007. Google ScholarDigital Library

Index Terms

A cascade ranking model for efficient ranked retrieval
1. Information systems
  1. Information retrieval

Recommendations

Joint Optimization of Cascade Ranking Models
WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

Reducing excessive costs in feature acquisition and model evaluation has been a long-standing challenge in learning-to-rank systems. A cascaded ranking architecture turns ranking into a pipeline of multiple stages, and has been shown to be a powerful ...
Read More
Efficient Cost-Aware Cascade Ranking in Multi-Stage Retrieval
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

Complex machine learning models are now an integral part of modern, large-scale retrieval systems. However, collection size growth continues to outpace advances in efficiency improvements in the learning models which achieve the highest effectiveness. ...
Read More
Speeding up Document Ranking with Rank-based Features
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Learning to Rank (LtR) is an effective machine learning methodology for inducing high-quality document ranking functions. Given a query and a candidate set of documents, where query-document pairs are represented by feature vectors, a machine-learned ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
July 2011
1374 pages
ISBN:9781450307574
DOI:10.1145/2009916
General Chairs:
Wei-Ying Ma
Microsoft Research Asia, China
,
Jian-Yun Nie
University of Montreal, Canada
,
Program Chairs:
Ricardo Baeza-Yates
Yahoo! Research, Spain
,
Tat-Seng Chua
National University of Singapore
,
W. Bruce Croft
University of Massachusetts, Amherst, USA
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 July 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
effectiveness
efficiency
learning to rank
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 113
  Total Citations
  View Citations
- 967
  Total Downloads
- Downloads (Last 12 months)83
- Downloads (Last 6 weeks)14
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A cascade ranking model for efficient ranked retrieval

SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Joint Optimization of Cascade Ranking Models

Efficient Cost-Aware Cascade Ranking in Multi-Stage Retrieval

Speeding up Document Ranking with Rank-based Features