skip to main content
10.1145/956863.956944acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Efficient query evaluation using a two-level retrieval process

Published:03 November 2003Publication History

ABSTRACT

We present an efficient query evaluation method based on a two level approach: at the first level, our method iterates in parallel over query term postings and identifies candidate documents using an approximate evaluation taking into account only partial information on term occurrences and no query independent factors; at the second level, promising candidates are fully evaluated and their exact scores are computed. The efficiency of the evaluation process can be improved significantly using dynamic pruning techniques with very little cost in effectiveness. The amount of pruning can be controlled by the user as a function of time allocated for query evaluation. Experimentally, using the TREC Web Track data, we have determined that our algorithm significantly reduces the total number of full evaluations by more than 90%, almost without any loss in precision or recall. At the heart of our approach there is an efficient implementation of a new Boolean construct called WAND or Weak AND that might be of independent interest.

References

  1. Altavista. http://www.altavista.com.Google ScholarGoogle Scholar
  2. L. Bahl, P. de Souza, P. Gopalakrishnan, D. Nahamoo, and M. Picheny. A fast match for continuous speech recognition using allophonic models. In Proceedings of IEEE ICASSP, pages 17--20, March 1992.Google ScholarGoogle ScholarCross RefCross Ref
  3. K. Beyer, A. Jhingran, B. Lyle, S. Rajagopalan, and E. Shekita. Pivot join: A runtime operator for text search. Submitted for publication, 2003.Google ScholarGoogle Scholar
  4. B. H. Bloom. Space time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422--426, July 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. WWW7/C omputer Networks and ISDN Systems, 30:107--117, April 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Buckley and A. F. Lewit. Optimization of inverted vector searches. In Proceedings of the Eighth International ACM-SIGIR Conference, pages 97--110, Montreal, Canada, June 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Burrows. Object-oriented interface for an index. US Patent 5,809,502, 1998.Google ScholarGoogle Scholar
  8. M. Burrows. Method for parsing, indexing and searching world-wide-web pages. US Patent 5,864,863, 1999.Google ScholarGoogle Scholar
  9. D. Carmel, E. Amitay, M. Herscovici, Y. S. Maarek, Y. Petruschka, and A. Soffer. Juru at TREC 10 - Experiments with Index Pruning. In Proceeding of Tenth Text REtrieval Conference (TREC-10). National Institute of Standards and Technology (NIST), 2001.Google ScholarGoogle Scholar
  10. C. Clarke, G. Cormack, and F. Burkowski. Shortest substring ranking. In Proceedings of the Fourth Text Retrieval Conference (TREC-4). National Institute of Standards and Technology (NIST), November 1995.Google ScholarGoogle Scholar
  11. Google. http://www.google.com.Google ScholarGoogle Scholar
  12. D. Harman and G. Candela. Retrieving records from a gigabyte of text on a minicomputer using statistical ranking. Journal of the American Society of Information Science, 41(8):581--589, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  13. D. Hawking and N. Craswell. Overview of the TREC-2001 Web Track. In E. M. Voorhees and D. K. Harman, editors, Proceedings of the Tenth Text Retrieval Conference (TREC-10). National Institute of Standards and Technology (NIST), 2001.Google ScholarGoogle Scholar
  14. M. Kaszkiel and J. Zobel. Term-ordered query evaluation versus document-ordered query evaluation for large document databases. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 343--344, Melbourne, Austrailia, August 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Y. Maarek and F. Smadja. Full text indexing based on lexical relations: An application: Software libraries. In Proceedings of the Twelfth International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 198--206, Cambridge, MA, June 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Moffat and J. Zobel. Fast ranking in limited space. In Proceedings of the 10th IEEE International Conference on Data Engineering, pages 428--4376, Houston, TX, February 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. K. Mullin. Optimal semijoins for distributed database systems. IEEE Transactions on Software Engineering, 16(5):558, May 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Persin. Document filtering for fast ranking. In Proceedings of the Seventeenth International ACM-SIGIR Conference on Research and Development in Information Retrieval, pages 339--348, Dublin, Ireland, July 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. Salton, E. A. Fox, and H. Wu. Extended boolean information retrieval. Communications of the ACM, 26(11):1022--1036, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Turtle and J. Flood. Query evaluation: Strategies and optimizations. Information Processing and Management, 31(6):831--850, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. N. Vo and A. Moffat. Compressed inverted files with reduced decoding overheads. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 290--297, Melbourne, Australia, August 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. E. M. Voorhees and D. K. Harman. Overview of the Tenth Text REtrieval Conference (TREC-10). In Proceedings of the Tenth Text Retrieval Conference (TREC-10). National Institute of Standards and Technology (NIST), 2001.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Efficient query evaluation using a two-level retrieval process

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '03: Proceedings of the twelfth international conference on Information and knowledge management
      November 2003
      592 pages
      ISBN:1581137230
      DOI:10.1145/956863

      Copyright © 2003 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 November 2003

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader