ABSTRACT
Simulation and analysis have shown that selective search can reduce the cost of large-scale distributed information retrieval. By partitioning the collection into small topical shards, and then using a resource ranking algorithm to choose a subset of shards to search for each query, fewer postings are evaluated. Here we extend the study of selective search using a fine-grained simulation investigating: selective search efficiency in a parallel query processing environment; the difference in efficiency when term-based and sample-based resource selection algorithms are used; and the effect of two policies for assigning index shards to machines. Results obtained for two large datasets and four large query logs confirm that selective search is significantly more efficient than conventional distributed search. In particular, we show that selective search is capable of both higher throughput and lower latency in a parallel environment than is exhaustive search.
- R. Aly, D. Hiemstra, and T. Demeester. Taily: Shard Selection Using the Tail of Score Distributions. In Proc. SIGIR, pages 673--682, 2013. Google ScholarDigital Library
- F. Cacheda, V. Carneiro, V. Plachouras, and I. Ounis. Performance analysis of distributed information retrieval architectures using an improved network simulation model. Inf. Proc. Man., 43: 204--224, 2007. Google ScholarDigital Library
- Y. Kim, J. Callan, J. S. Culpepper, and A. Moffat. Does selective search benefit from WAND optimization? In Proc. ECIR, 2016. To appear.Google ScholarCross Ref
- A. Kulkarni. Efficient and Effective Large-Scale Search. PhD thesis, Carnegie Mellon University, 2013.Google Scholar
- A. Kulkarni and J. Callan. Document allocation policies for selective searching of distributed indexes. In Proc. CIKM, pages 449--458, 2010. Google ScholarDigital Library
- A. Kulkarni and J. Callan. Selective search: Efficient and effective search of large textual collections. ACM Trans. Inf. Sys., 33 (4): 17:1--17:33, 2015. Google ScholarDigital Library
- A. Kulkarni, A. Tigelaar, D. Hiemstra, and J. Callan. Shard ranking and cutoff estimation for topically partitioned collections. In Proc. CIKM, pages 555--564, 2012. Google ScholarDigital Library
- A. Moffat, W. Webber, and J. Zobel. Load balancing for term-distributed parallel retrieval. In Proc. SIGIR, pages 348--355, 2006. Google ScholarDigital Library
- A. L. Powell, J. C. French, J. Callan, M. Connell, and C. L. Viles. The impact of database selection on distributed searching. In Proc. SIGIR, pages 232--239, 2000. Google ScholarDigital Library
Index Terms
- Load-Balancing in Distributed Selective Search
Recommendations
Query-Biased Partitioning for Selective Search
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge ManagementSelective search is a cluster-based distributed retrieval architecture that reduces computational costs by partitioning a corpus into topical shards, and selectively searching them. Prior research formed topical shards by clustering the corpus based on ...
Dynamic Shard Cutoff Prediction for Selective Search
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information RetrievalSelective search architectures use resource selection algorithms such as Rank-S or Taily to rank index shards and determine how many to search for a given query. Most prior research evaluated solutions by their ability to improve efficiency without ...
Selective Search: Efficient and Effective Search of Large Textual Collections
The traditional search solution for large collections divides the collection into subsets (shards), and processes the query against all shards in parallel (exhaustive search). The search cost and the computational requirements of this approach are often ...
Comments