ABSTRACT
Web information retrieval systems face a range of unique challenges, not the least of which is the sheer scale of the data that must be handled. Also specific to web retrieval is that queries may be a mix of Boolean and ranked features, and documents may have static score components that must also be factored into the ranking process. In this paper we consider a range of query semantics used in web retrieval systems, and show that impact-sorted indexes provide support for dynamic pruning mechanisms and in doing so allow fast document-at-a-time resolution of typical mixed-mode queries, even on relatively large volumes of data. Our techniques also extend to more complex query semantics, including the use of phrase, proximity, and structural constraints.
- V. N. Anh, O. de Kretser, and A. Moffat. Vector-space ranking with effective early termination. In W. B. Croft, D. J. Harper, D. H. Kraft, and J. Zobel, editors, Proc. 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 35--42, New Orleans, Louisiana, September 2001. ACM Press, New York. Google ScholarDigital Library
- V. N. Anh and A. Moffat. Simplified similarity scoring using term ranks. In G. Marchionini, A. Moffat, J. Tait, R. Baeza-Yates, and N. Ziviani, editors, Proc. 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 226--233, Salvador, Brazil, August 2005. ACM Press, New York. Google ScholarDigital Library
- V. N. Anh and A. Moffat. Pruned query evaluation using pre-computed impacts. In S. Dumais, E. N. Efthimiadis, D. Hawking, and K. Järvelin, editors, Proc. 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 372--379, Seattle, WA, August 2006a. ACM Press, New York. Google ScholarDigital Library
- V. N. Anh and A. Moffat. Structured index organizations for high-throughput text querying. In Proc. Symp. String Processing and Information Retrieval, pages 304--315, Glasgow, Scotland, October 2006b. LNCS 4209, Springer. Google ScholarDigital Library
- A. Z. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Y. Zien. Efficient query evaluation using a two-level retrieval process. In Proc. 2003 CIKM Int. Conf. Information and Knowledge Management, pages 426--434, New Orleans, Louisiana, November 2005. ACM Press, New York. Google ScholarDigital Library
- E. W. Brown. Fast evaluation of structured queries for information retrieval. In E. A. Fox, P. Ingwersen, and R. Fidel, editors, Proc. 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 30--38. ACM Press, New York, July 1995. Google ScholarDigital Library
- C. Buckley and A. F. Lewit. Optimization of inverted vector searches. In Proc. 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 97--110, Montreal, Canada, June 1985. ACM Press, New York. Google ScholarDigital Library
- C. L. A. Clarke and F. Scholer. The TREC 2005 Terabyte Track. In The Fourteenth Text REtrieval Conference (TREC 2005) Notebook, Gaithersburg, MD, November 2005. National Institute of Standards and Technology. http://trec.nist.gov/act_part/t14_notebook/t14.notebook.html.Google Scholar
- E. S. de Moura, C. F. dos Santos, D. R. Fernandes, A. S. Silva, P. Calado, and M. A. Nascimento. Improving web search efficiency via a locality based static pruning method. In Proc. 14th International World Wide Web Conference, pages 235--244, Chiba, Japan, May 2005. Google ScholarDigital Library
- D. K. Harman and G. Candela. Retrieving records from a gigabyte of text on a minicomputer using statistical ranking. Journal of the American Society for Information Science, 41(8):581--589, August 1990.Google ScholarCross Ref
- D. Hawking. Efficiency/effectiveness trade-offs in query processing. ACM SIGIR Forum, 32(2):16--22, September 1998. Google ScholarDigital Library
- M. Kaszkiel, J. Zobel, and R. Sacks-Davis. Efficient passage ranking for document databases. ACM Transactions on Information Systems, 17(4):406--439, October 1999. Google ScholarDigital Library
- N. Lester, A. Moffat, W. Webber, and J. Zobel. Space-limited ranked query evaluation using adaptive pruning. In A. H. H. Ngu, M. Kitsuregawa, E. J. Neuhold, J.-Y. Chung, and Q. Z. Sheng, editors, Proc. 6th International Conference on Web Information Systems Engineering, pages 470--477, New York, November 2005. LNCS 3806, Springer. Google ScholarDigital Library
- M. Persin, J. Zobel, and R. Sacks-Davis. Filtered document retrieval with frequency-sorted indexes. Journal of the American Society for Information Science, 47(10):749--764, October 1996. Google ScholarCross Ref
- G. Salton, E. A. Fox, and H. Wu. Extended Boolean information retrieval. Communications of the ACM, 26(11):1022--1036, 1983. Google ScholarDigital Library
- A. Soffer, D. Carmel, D. Cohen, R. Fagin, E. Farchi, M. Herscovici, and Y. S. Maarek. Static index pruning for information retrieval systems. In W. B. Croft, D. J. Harper, D. H. Kraft, and J. Zobel, editors, Proc. 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 43--50, New Orleans, Louisiana, September 2001. ACM Press, New York. Google ScholarDigital Library
- T. Strohman, H. Turtle, and W. B. Croft. Optimization strategies for complex queries. In G. Marchionini, A. Moffat, J. Tait, R. Baeza-Yates, and N. Ziviani, editors, Proc. 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 219--225, Salvador, Brazil, August 2005. ACM Press, New York. Google ScholarDigital Library
- M. Theobald, R. Schenkel, and G. Weikum. Efficient and self-tuning incremental query expansion for top-k query processing. In G. Marchionini, A. Moffat, J. Tait, R. Baeza-Yates, and N. Ziviani, editors, Proc. 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 242--249, Salvador, Brazil, August 2005. ACM Press, New York. Google ScholarDigital Library
- H. Turtle and J. Flood. Query evaluation: strategies and optimizations. Information Processing & Management, 31(1):831--850, November 1995. Google ScholarDigital Library
- E. M. Voorhees and D. K. Harman. TREC: Experiment and Evaluation in Information Retrieval. MIT Press, 2005. Google ScholarDigital Library
- J. Zobel and A. Moffat. Inverted files for text search engines. Computing Surveys, 38(2), July 2006. Google ScholarDigital Library
Index Terms
- Pruning strategies for mixed-mode querying
Recommendations
A parametric linguistics based approach for cross-lingual web querying
Developing efficient and meaningful search mechanisms for the Web is an active area of research in Information Management. With information explosion on the Internet, existing search engines encounter difficulty in accurate document positioning and ...
Intelligent Indexing and Semantic Retrieval of Multimodal Documents
AbstractFinding useful information from large multimodal document collections such as the WWW without encountering numerous false positives poses a challenge to multimedia information retrieval systems (MMIR). This research addresses the problem of ...
Spaces, Trees, and Colors: The algorithmic landscape of document retrieval on sequences
Document retrieval is one of the best-established information retrieval activities since the ’60s, pervading all search engines. Its aim is to obtain, from a collection of text documents, those most relevant to a pattern query. Current technology is ...
Comments