ABSTRACT
We present an efficient query evaluation method based on a two level approach: at the first level, our method iterates in parallel over query term postings and identifies candidate documents using an approximate evaluation taking into account only partial information on term occurrences and no query independent factors; at the second level, promising candidates are fully evaluated and their exact scores are computed. The efficiency of the evaluation process can be improved significantly using dynamic pruning techniques with very little cost in effectiveness. The amount of pruning can be controlled by the user as a function of time allocated for query evaluation. Experimentally, using the TREC Web Track data, we have determined that our algorithm significantly reduces the total number of full evaluations by more than 90%, almost without any loss in precision or recall. At the heart of our approach there is an efficient implementation of a new Boolean construct called WAND or Weak AND that might be of independent interest.
- Altavista. http://www.altavista.com.Google Scholar
- L. Bahl, P. de Souza, P. Gopalakrishnan, D. Nahamoo, and M. Picheny. A fast match for continuous speech recognition using allophonic models. In Proceedings of IEEE ICASSP, pages 17--20, March 1992.Google ScholarCross Ref
- K. Beyer, A. Jhingran, B. Lyle, S. Rajagopalan, and E. Shekita. Pivot join: A runtime operator for text search. Submitted for publication, 2003.Google Scholar
- B. H. Bloom. Space time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422--426, July 1970. Google ScholarDigital Library
- S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. WWW7/C omputer Networks and ISDN Systems, 30:107--117, April 1998. Google ScholarDigital Library
- C. Buckley and A. F. Lewit. Optimization of inverted vector searches. In Proceedings of the Eighth International ACM-SIGIR Conference, pages 97--110, Montreal, Canada, June 1985. Google ScholarDigital Library
- M. Burrows. Object-oriented interface for an index. US Patent 5,809,502, 1998.Google Scholar
- M. Burrows. Method for parsing, indexing and searching world-wide-web pages. US Patent 5,864,863, 1999.Google Scholar
- D. Carmel, E. Amitay, M. Herscovici, Y. S. Maarek, Y. Petruschka, and A. Soffer. Juru at TREC 10 - Experiments with Index Pruning. In Proceeding of Tenth Text REtrieval Conference (TREC-10). National Institute of Standards and Technology (NIST), 2001.Google Scholar
- C. Clarke, G. Cormack, and F. Burkowski. Shortest substring ranking. In Proceedings of the Fourth Text Retrieval Conference (TREC-4). National Institute of Standards and Technology (NIST), November 1995.Google Scholar
- Google. http://www.google.com.Google Scholar
- D. Harman and G. Candela. Retrieving records from a gigabyte of text on a minicomputer using statistical ranking. Journal of the American Society of Information Science, 41(8):581--589, 1990.Google ScholarCross Ref
- D. Hawking and N. Craswell. Overview of the TREC-2001 Web Track. In E. M. Voorhees and D. K. Harman, editors, Proceedings of the Tenth Text Retrieval Conference (TREC-10). National Institute of Standards and Technology (NIST), 2001.Google Scholar
- M. Kaszkiel and J. Zobel. Term-ordered query evaluation versus document-ordered query evaluation for large document databases. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 343--344, Melbourne, Austrailia, August 1998. Google ScholarDigital Library
- Y. Maarek and F. Smadja. Full text indexing based on lexical relations: An application: Software libraries. In Proceedings of the Twelfth International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 198--206, Cambridge, MA, June 1989. Google ScholarDigital Library
- A. Moffat and J. Zobel. Fast ranking in limited space. In Proceedings of the 10th IEEE International Conference on Data Engineering, pages 428--4376, Houston, TX, February 1994. Google ScholarDigital Library
- J. K. Mullin. Optimal semijoins for distributed database systems. IEEE Transactions on Software Engineering, 16(5):558, May 1990. Google ScholarDigital Library
- M. Persin. Document filtering for fast ranking. In Proceedings of the Seventeenth International ACM-SIGIR Conference on Research and Development in Information Retrieval, pages 339--348, Dublin, Ireland, July 1994. Google ScholarDigital Library
- G. Salton, E. A. Fox, and H. Wu. Extended boolean information retrieval. Communications of the ACM, 26(11):1022--1036, 1983. Google ScholarDigital Library
- H. Turtle and J. Flood. Query evaluation: Strategies and optimizations. Information Processing and Management, 31(6):831--850, 1995. Google ScholarDigital Library
- A. N. Vo and A. Moffat. Compressed inverted files with reduced decoding overheads. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 290--297, Melbourne, Australia, August 1998. Google ScholarDigital Library
- E. M. Voorhees and D. K. Harman. Overview of the Tenth Text REtrieval Conference (TREC-10). In Proceedings of the Tenth Text Retrieval Conference (TREC-10). National Institute of Standards and Technology (NIST), 2001.Google ScholarCross Ref
Index Terms
- Efficient query evaluation using a two-level retrieval process
Recommendations
Query evaluation using overlapping views: completeness and efficiency
SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of dataWe study the problem of finding efficient equivalent view-based rewritings of relational queries, focusing on query optimization using materialized views under the assumption that base relations cannot contain duplicate tuples. A lot of work in the ...
Efficient Top-k Query Answering through its Top-N Rewritings Using Views
PIKM '15: Proceedings of the 8th Workshop on Ph.D. Workshop in Information and Knowledge ManagementRecently, various algorithms were proposed to speed up top-k query answering by using multiple materialized query results. Nevertheless, for most of the proposed algorithms, a potentially costly view selection operation is required. In fact, the ...
A Comparison of Document-at-a-Time and Score-at-a-Time Query Evaluation
WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data MiningWe present an empirical comparison between document-at-a-time (DaaT) and score-at-a-time (SaaT) document ranking strategies within a common framework. Although both strategies have been extensively explored, the literature lacks a fair, direct ...
Comments