Abstract
The proliferation of the World Wide Web has brought information retrieval (IR) techniques to the forefront of search technology. To the average computer user, “searching” now means using IR-based systems for finding information on the WWW or in other document collections. IR query evaluation methods and workloads differ significantly from those found in database systems. In this paper, we focus on three such differences. First, due to the inherent fuzziness of the natural language used in IR queries and documents, an additional degree of flexibility is permitted in evaluating queries. Second, IR query evaluation algorithms tend to have access patterns that cause problems for traditional buffer replacement policies. Third, IR search is often an iterative process, in which a query is repeatedly refined and resubmitted by the user. Based on these differences, we develop two complementary techniques to improve the efficiency of IR queries: 1) Buffer-aware query evaluation, which alters the query evaluation process based on the current contents of buffers; and 2) Ranking-aware buffer replacement, which incorporates knowledge of the query processing strategy into replacement decisions. In a detailed performance study we show that using either of these techniques yields significant performance benefits and that in many cases, combining them produces even further improvements.
- ABGM90 R. Alonso, D. Barbar~i, and H. Garcia-Molina. Data caching issues in an information retrieval system. A CM TODS, 15(3), Sept. 1990. Google ScholarDigital Library
- Bro95 E.W. Brown. Fast evaluation of structured queries for information retrieval. Proc. A CM SIGIR Conf., Seattle, WA, 1995. Google ScholarDigital Library
- Bro97 K.P. Brown. DBGuide industrial presentation. ACM SIG- MOD Conf., Tucson, AZ, 1997.Google Scholar
- CD85 H.-T. Chou and D.J. DeWitt. An evaluation of buffer management strategies for relational database systems. Proc. of the VLDB Conf., Stockholm, Sweden, 1985.Google Scholar
- CK97 M.J. Carey and D. Kossmann. On saying "enough already !" in SQL. Proc. ACM SIGMOD Conf., Tucson, AZ, 1997. Google ScholarDigital Library
- CR93 C.M. Chen and N. Roussopoulos. Adaptive database buffer allocation using query feedback. Proc. of the VLDB Conf., Dublin, Ireland, 1993. Google ScholarDigital Library
- DFJ+96 S. Dar, M. Franklin, B.T. J6nsson, D. Srivastava, and M. Tan. Semantic data caching and replacement. Proc. of the VLDB Conf., Bombay, India, 1996. Google ScholarDigital Library
- EH84 W. Effelsberg and T. Haerder. Principles of database buffer management. ACM TODS, 9(4), Dec. 1984. Google ScholarDigital Library
- Fal85 C. Faloutsos. Access methods for text. ACM Computing Surveys, 17(1 ), 1985. Google ScholarDigital Library
- Fid91 R. Fidel. Searchers' selection of search keys: III. Searching styles. Journal of the American Society of lnformation Science, 42(7), 1991.Google Scholar
- FJK96 M.J. Franklin, B.T. J6nsson, and D. Kossmann. Performance tradeoffs for client-server query processing. Proc. A CM SIGMOD Conf., Montreal, Canada, 1996. Google ScholarDigital Library
- Fox92 C. Fox. Lexical analysis and stoplists. In W.B. Frakes and R. Baeza-Yates, editors, information Retrieval Data Structures and Algorithms. Prentice Hall, Englewood Cliffs, NJ, 1992. Google ScholarDigital Library
- Fra92 W.B. Frakes. Stemming algorithms. In W.B. Frakes and R. Baeza-Yates, editors, Information Retrieval Data Structures and Algorithms. Prentice Hall, Englewood Cliffs, NJ, 1992. Google ScholarDigital Library
- Har96 D. Harman. Overview of the fourth Text REtrieval Conference (TREC-4). The fourth Text REtrieval Conference (TREC- 4), NIST, Gaithersburg, MD, 1996.Google Scholar
- HHW97 J.M. Hellerstein, P.J. Haas, and H.J. Wang. Online aggregation. Proc. A CM SIGMOD Conf., Tucson, AZ, 1997. Google ScholarDigital Library
- Ink Inktomi. The lnktomi Technology Behind Hotbot. See http ://www.inktomi.com/Tech/CoupClustWhitePap. html.Google Scholar
- JS94 T. Johnson and D. Shasha. 2Q: A low overhead high performance buffer management replacement algorithm. Proc. of the VLDB Conf., Santiago, Chile, 1994. Google ScholarDigital Library
- KK94 A. Kemper and D. Kossmann. Dual-buffeting strategies in object bases. Proc. of the VLDB Conf., Santiago, Chile, 1994. Google ScholarDigital Library
- KQCB94 J. Koenemann, R. Quatrain, C. Cool, and N.J. Belkin~ New tools and old habits: The interactive searching behavior of expert online searchers using INQUERY. The Third Text REtrieval Conference (TREC-3), NIST, Gaithersburg, MD, 1994.Google Scholar
- MZ94 A. Moffat and J. Zobel. Fast ranking in limited space. Proc. IEEE Conf. on Data Engineering, Houston, TX 1994. Google ScholarDigital Library
- NFS91 R. Ng, C. Faloutsos, and T. Sellis. Flexible buffer allocation based on marginal gains. Proc. ACM SIGMOD Conf, Denver, CO, 1991. Google ScholarDigital Library
- OOW93 E.J. O'Neil, RE. O'Neil, and G. Weikum. The LRU-K page replacement algorithm for database disk buffering. Proc. A CM SIGMOD Conf., Washington, DC, 1993. Google ScholarDigital Library
- Per94 M. Persin. Document filtering for fast ranking. Proc. A CM SIGIR Conf., Dublin, Ireland, 1994. Google ScholarDigital Library
- PZSD96 M. Persin, J. Zobel, and R. Sacks-Davis. Filtered document retrieval with frequency-sorted indexes. Journal of the American Society of Information Science, 47(10), Oct. 1996. Google ScholarDigital Library
- SA87 R Simpson and R. Alonso. Data caching in information retrieval systems. Proc. ACM SIGIR Conf., New Orleans, LA, 1987.Google Scholar
- SB88 G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval, lnfi)rmation Processing & Management, 24(5), 1988. Google ScholarDigital Library
- SB90 G. Salton and C. Buckley. Improving retrieval performance by relevance feedback. Journal of the American Society of Information Science, 41 (4), 1990.Google ScholarCross Ref
- SS96 S. Sarawagi and M. Stonebraker. Reordering query execution in tertiary memory databases. Proc. A CM SIGMOD Conf., Montreal, Canada, 1996. Google ScholarDigital Library
- Sto81 M. Stonebraker. Operating system support for database management. CA CM, 24(7), July 1981. Google ScholarDigital Library
- TF95 H. Turtle and J. Flood. Query evaluation: Strategies and optimization, information Processing & Management, Nov. 1995. Google ScholarDigital Library
- TGM93a A. Tomasic and H. Garcia-Molina. Caching and database scaling in distributed shared-nothing information retrieval systems. Proc. A CM SIGMOD Conf., Washington, DC, 1993. Google ScholarDigital Library
- TGM93b A. Tomasic and H. Garcia-Molina. Performance of inverted indices in shared-nothing distributed text document information retrieval systems. Proc. PDIS Conf., San Diego, CA, 1993. Google ScholarDigital Library
- Tra95 Transaction Processing Performance Council (TPC), 777 N. First Street, Suite 600, San Jose, CA 95112, USA. TPC Benchmark D (Decision Support), May 1995.Google Scholar
- Tur94 H. Turtle. Natural language vs. boolean query evaluation: A comparison of retrieval performance. Proc. A CM SIGIR Conf., Dublin, Ireland, 1994. Google ScholarDigital Library
- VH97 E.M. Voorhees and D. Harman. Overview of the fifth Text REtrieval Conference (TREC-5). The fifth Text REtrieval Conference (TREC-5), NIST, Gaithersburg, MD, 1997.Google Scholar
- WL93 W.Y.P. Wong and D.L. Lee. Implementation of partial document ranking using inverted files. Information Processing & Management, 29(5), Oct. 1993. Google ScholarDigital Library
- ZMSD92 J. Zobel, A. Moffat, and R. Sacks-Davis. An efficient indexing technique for full-text database systems. Proc. of the VLDB Conf., Vancouver, Canada, 1992. Google ScholarDigital Library
Index Terms
- Interaction of query evaluation and buffer management for information retrieval
Recommendations
Interaction of query evaluation and buffer management for information retrieval
SIGMOD '98: Proceedings of the 1998 ACM SIGMOD international conference on Management of dataThe proliferation of the World Wide Web has brought information retrieval (IR) techniques to the forefront of search technology. To the average computer user, “searching” now means using IR-based systems for finding information on the WWW or in other ...
Exploration of query context for information retrieval
WWW '07: Proceedings of the 16th international conference on World Wide WebA number of existing information retrieval systems propose the notion of query context to combine the knowledge of query and user into retrieval to reveal the most exact description of user's information needs. In this paper we interpret query context ...
Comments