skip to main content
article
Free Access

Interaction of query evaluation and buffer management for information retrieval

Published:01 June 1998Publication History
Skip Abstract Section

Abstract

The proliferation of the World Wide Web has brought information retrieval (IR) techniques to the forefront of search technology. To the average computer user, “searching” now means using IR-based systems for finding information on the WWW or in other document collections. IR query evaluation methods and workloads differ significantly from those found in database systems. In this paper, we focus on three such differences. First, due to the inherent fuzziness of the natural language used in IR queries and documents, an additional degree of flexibility is permitted in evaluating queries. Second, IR query evaluation algorithms tend to have access patterns that cause problems for traditional buffer replacement policies. Third, IR search is often an iterative process, in which a query is repeatedly refined and resubmitted by the user. Based on these differences, we develop two complementary techniques to improve the efficiency of IR queries: 1) Buffer-aware query evaluation, which alters the query evaluation process based on the current contents of buffers; and 2) Ranking-aware buffer replacement, which incorporates knowledge of the query processing strategy into replacement decisions. In a detailed performance study we show that using either of these techniques yields significant performance benefits and that in many cases, combining them produces even further improvements.

References

  1. ABGM90 R. Alonso, D. Barbar~i, and H. Garcia-Molina. Data caching issues in an information retrieval system. A CM TODS, 15(3), Sept. 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bro95 E.W. Brown. Fast evaluation of structured queries for information retrieval. Proc. A CM SIGIR Conf., Seattle, WA, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bro97 K.P. Brown. DBGuide industrial presentation. ACM SIG- MOD Conf., Tucson, AZ, 1997.Google ScholarGoogle Scholar
  4. CD85 H.-T. Chou and D.J. DeWitt. An evaluation of buffer management strategies for relational database systems. Proc. of the VLDB Conf., Stockholm, Sweden, 1985.Google ScholarGoogle Scholar
  5. CK97 M.J. Carey and D. Kossmann. On saying "enough already !" in SQL. Proc. ACM SIGMOD Conf., Tucson, AZ, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. CR93 C.M. Chen and N. Roussopoulos. Adaptive database buffer allocation using query feedback. Proc. of the VLDB Conf., Dublin, Ireland, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. DFJ+96 S. Dar, M. Franklin, B.T. J6nsson, D. Srivastava, and M. Tan. Semantic data caching and replacement. Proc. of the VLDB Conf., Bombay, India, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. EH84 W. Effelsberg and T. Haerder. Principles of database buffer management. ACM TODS, 9(4), Dec. 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Fal85 C. Faloutsos. Access methods for text. ACM Computing Surveys, 17(1 ), 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Fid91 R. Fidel. Searchers' selection of search keys: III. Searching styles. Journal of the American Society of lnformation Science, 42(7), 1991.Google ScholarGoogle Scholar
  11. FJK96 M.J. Franklin, B.T. J6nsson, and D. Kossmann. Performance tradeoffs for client-server query processing. Proc. A CM SIGMOD Conf., Montreal, Canada, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Fox92 C. Fox. Lexical analysis and stoplists. In W.B. Frakes and R. Baeza-Yates, editors, information Retrieval Data Structures and Algorithms. Prentice Hall, Englewood Cliffs, NJ, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Fra92 W.B. Frakes. Stemming algorithms. In W.B. Frakes and R. Baeza-Yates, editors, Information Retrieval Data Structures and Algorithms. Prentice Hall, Englewood Cliffs, NJ, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Har96 D. Harman. Overview of the fourth Text REtrieval Conference (TREC-4). The fourth Text REtrieval Conference (TREC- 4), NIST, Gaithersburg, MD, 1996.Google ScholarGoogle Scholar
  15. HHW97 J.M. Hellerstein, P.J. Haas, and H.J. Wang. Online aggregation. Proc. A CM SIGMOD Conf., Tucson, AZ, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ink Inktomi. The lnktomi Technology Behind Hotbot. See http ://www.inktomi.com/Tech/CoupClustWhitePap. html.Google ScholarGoogle Scholar
  17. JS94 T. Johnson and D. Shasha. 2Q: A low overhead high performance buffer management replacement algorithm. Proc. of the VLDB Conf., Santiago, Chile, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. KK94 A. Kemper and D. Kossmann. Dual-buffeting strategies in object bases. Proc. of the VLDB Conf., Santiago, Chile, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. KQCB94 J. Koenemann, R. Quatrain, C. Cool, and N.J. Belkin~ New tools and old habits: The interactive searching behavior of expert online searchers using INQUERY. The Third Text REtrieval Conference (TREC-3), NIST, Gaithersburg, MD, 1994.Google ScholarGoogle Scholar
  20. MZ94 A. Moffat and J. Zobel. Fast ranking in limited space. Proc. IEEE Conf. on Data Engineering, Houston, TX 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. NFS91 R. Ng, C. Faloutsos, and T. Sellis. Flexible buffer allocation based on marginal gains. Proc. ACM SIGMOD Conf, Denver, CO, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. OOW93 E.J. O'Neil, RE. O'Neil, and G. Weikum. The LRU-K page replacement algorithm for database disk buffering. Proc. A CM SIGMOD Conf., Washington, DC, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Per94 M. Persin. Document filtering for fast ranking. Proc. A CM SIGIR Conf., Dublin, Ireland, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. PZSD96 M. Persin, J. Zobel, and R. Sacks-Davis. Filtered document retrieval with frequency-sorted indexes. Journal of the American Society of Information Science, 47(10), Oct. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. SA87 R Simpson and R. Alonso. Data caching in information retrieval systems. Proc. ACM SIGIR Conf., New Orleans, LA, 1987.Google ScholarGoogle Scholar
  26. SB88 G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval, lnfi)rmation Processing & Management, 24(5), 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. SB90 G. Salton and C. Buckley. Improving retrieval performance by relevance feedback. Journal of the American Society of Information Science, 41 (4), 1990.Google ScholarGoogle ScholarCross RefCross Ref
  28. SS96 S. Sarawagi and M. Stonebraker. Reordering query execution in tertiary memory databases. Proc. A CM SIGMOD Conf., Montreal, Canada, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Sto81 M. Stonebraker. Operating system support for database management. CA CM, 24(7), July 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. TF95 H. Turtle and J. Flood. Query evaluation: Strategies and optimization, information Processing & Management, Nov. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. TGM93a A. Tomasic and H. Garcia-Molina. Caching and database scaling in distributed shared-nothing information retrieval systems. Proc. A CM SIGMOD Conf., Washington, DC, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. TGM93b A. Tomasic and H. Garcia-Molina. Performance of inverted indices in shared-nothing distributed text document information retrieval systems. Proc. PDIS Conf., San Diego, CA, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Tra95 Transaction Processing Performance Council (TPC), 777 N. First Street, Suite 600, San Jose, CA 95112, USA. TPC Benchmark D (Decision Support), May 1995.Google ScholarGoogle Scholar
  34. Tur94 H. Turtle. Natural language vs. boolean query evaluation: A comparison of retrieval performance. Proc. A CM SIGIR Conf., Dublin, Ireland, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. VH97 E.M. Voorhees and D. Harman. Overview of the fifth Text REtrieval Conference (TREC-5). The fifth Text REtrieval Conference (TREC-5), NIST, Gaithersburg, MD, 1997.Google ScholarGoogle Scholar
  36. WL93 W.Y.P. Wong and D.L. Lee. Implementation of partial document ranking using inverted files. Information Processing & Management, 29(5), Oct. 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. ZMSD92 J. Zobel, A. Moffat, and R. Sacks-Davis. An efficient indexing technique for full-text database systems. Proc. of the VLDB Conf., Vancouver, Canada, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Interaction of query evaluation and buffer management for information retrieval

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGMOD Record
            ACM SIGMOD Record  Volume 27, Issue 2
            June 1998
            595 pages
            ISSN:0163-5808
            DOI:10.1145/276305
            Issue’s Table of Contents
            • cover image ACM Conferences
              SIGMOD '98: Proceedings of the 1998 ACM SIGMOD international conference on Management of data
              June 1998
              599 pages
              ISBN:0897919955
              DOI:10.1145/276304

            Copyright © 1998 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 June 1998

            Check for updates

            Qualifiers

            • article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader