skip to main content
10.1145/1183614.1183645acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Pruning strategies for mixed-mode querying

Published:06 November 2006Publication History

ABSTRACT

Web information retrieval systems face a range of unique challenges, not the least of which is the sheer scale of the data that must be handled. Also specific to web retrieval is that queries may be a mix of Boolean and ranked features, and documents may have static score components that must also be factored into the ranking process. In this paper we consider a range of query semantics used in web retrieval systems, and show that impact-sorted indexes provide support for dynamic pruning mechanisms and in doing so allow fast document-at-a-time resolution of typical mixed-mode queries, even on relatively large volumes of data. Our techniques also extend to more complex query semantics, including the use of phrase, proximity, and structural constraints.

References

  1. V. N. Anh, O. de Kretser, and A. Moffat. Vector-space ranking with effective early termination. In W. B. Croft, D. J. Harper, D. H. Kraft, and J. Zobel, editors, Proc. 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 35--42, New Orleans, Louisiana, September 2001. ACM Press, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. V. N. Anh and A. Moffat. Simplified similarity scoring using term ranks. In G. Marchionini, A. Moffat, J. Tait, R. Baeza-Yates, and N. Ziviani, editors, Proc. 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 226--233, Salvador, Brazil, August 2005. ACM Press, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. V. N. Anh and A. Moffat. Pruned query evaluation using pre-computed impacts. In S. Dumais, E. N. Efthimiadis, D. Hawking, and K. Järvelin, editors, Proc. 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 372--379, Seattle, WA, August 2006a. ACM Press, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. V. N. Anh and A. Moffat. Structured index organizations for high-throughput text querying. In Proc. Symp. String Processing and Information Retrieval, pages 304--315, Glasgow, Scotland, October 2006b. LNCS 4209, Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Z. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Y. Zien. Efficient query evaluation using a two-level retrieval process. In Proc. 2003 CIKM Int. Conf. Information and Knowledge Management, pages 426--434, New Orleans, Louisiana, November 2005. ACM Press, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. W. Brown. Fast evaluation of structured queries for information retrieval. In E. A. Fox, P. Ingwersen, and R. Fidel, editors, Proc. 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 30--38. ACM Press, New York, July 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Buckley and A. F. Lewit. Optimization of inverted vector searches. In Proc. 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 97--110, Montreal, Canada, June 1985. ACM Press, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. L. A. Clarke and F. Scholer. The TREC 2005 Terabyte Track. In The Fourteenth Text REtrieval Conference (TREC 2005) Notebook, Gaithersburg, MD, November 2005. National Institute of Standards and Technology. http://trec.nist.gov/act_part/t14_notebook/t14.notebook.html.Google ScholarGoogle Scholar
  9. E. S. de Moura, C. F. dos Santos, D. R. Fernandes, A. S. Silva, P. Calado, and M. A. Nascimento. Improving web search efficiency via a locality based static pruning method. In Proc. 14th International World Wide Web Conference, pages 235--244, Chiba, Japan, May 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. K. Harman and G. Candela. Retrieving records from a gigabyte of text on a minicomputer using statistical ranking. Journal of the American Society for Information Science, 41(8):581--589, August 1990.Google ScholarGoogle ScholarCross RefCross Ref
  11. D. Hawking. Efficiency/effectiveness trade-offs in query processing. ACM SIGIR Forum, 32(2):16--22, September 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Kaszkiel, J. Zobel, and R. Sacks-Davis. Efficient passage ranking for document databases. ACM Transactions on Information Systems, 17(4):406--439, October 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. Lester, A. Moffat, W. Webber, and J. Zobel. Space-limited ranked query evaluation using adaptive pruning. In A. H. H. Ngu, M. Kitsuregawa, E. J. Neuhold, J.-Y. Chung, and Q. Z. Sheng, editors, Proc. 6th International Conference on Web Information Systems Engineering, pages 470--477, New York, November 2005. LNCS 3806, Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Persin, J. Zobel, and R. Sacks-Davis. Filtered document retrieval with frequency-sorted indexes. Journal of the American Society for Information Science, 47(10):749--764, October 1996. Google ScholarGoogle ScholarCross RefCross Ref
  15. G. Salton, E. A. Fox, and H. Wu. Extended Boolean information retrieval. Communications of the ACM, 26(11):1022--1036, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Soffer, D. Carmel, D. Cohen, R. Fagin, E. Farchi, M. Herscovici, and Y. S. Maarek. Static index pruning for information retrieval systems. In W. B. Croft, D. J. Harper, D. H. Kraft, and J. Zobel, editors, Proc. 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 43--50, New Orleans, Louisiana, September 2001. ACM Press, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. Strohman, H. Turtle, and W. B. Croft. Optimization strategies for complex queries. In G. Marchionini, A. Moffat, J. Tait, R. Baeza-Yates, and N. Ziviani, editors, Proc. 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 219--225, Salvador, Brazil, August 2005. ACM Press, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Theobald, R. Schenkel, and G. Weikum. Efficient and self-tuning incremental query expansion for top-k query processing. In G. Marchionini, A. Moffat, J. Tait, R. Baeza-Yates, and N. Ziviani, editors, Proc. 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 242--249, Salvador, Brazil, August 2005. ACM Press, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. Turtle and J. Flood. Query evaluation: strategies and optimizations. Information Processing & Management, 31(1):831--850, November 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. M. Voorhees and D. K. Harman. TREC: Experiment and Evaluation in Information Retrieval. MIT Press, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Zobel and A. Moffat. Inverted files for text search engines. Computing Surveys, 38(2), July 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Pruning strategies for mixed-mode querying

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management
                November 2006
                916 pages
                ISBN:1595934332
                DOI:10.1145/1183614

                Copyright © 2006 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 6 November 2006

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • Article

                Acceptance Rates

                Overall Acceptance Rate1,861of8,427submissions,22%

                Upcoming Conference

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader