ABSTRACT
We examine issues in the design of fully dynamic information retrieval systems supporting both document insertions and deletions. The two main components of such a system, index maintenance and query processing, affect each other, as high query performance is usually paid for by additional work during update operations. Two aspects of the system -- incremental updates and garbage collection for delayed document deletions -- are discussed, with a focus on the respective indexing vs. query performance trade-offs. Depending on the relative number of queries and update operations, different strategies lead to optimal overall performance.
- T. Chiueh and L. Huang. Efficient Real-Time Index Updates in Text Retrieval Systems. Technical report, Stony Brook, New York, USA, August 1998.Google Scholar
- T. J. Gibson and E. L. Miller. Long-Term File Activity Patterns in a UNIX Workstation Environment. In Proceedings of the 15th IEEE Symposium on Mass Storage Systems, pages 355--371, March 1998.Google Scholar
- N. Lester, J. Zobel, and H. E. Williams. In-Place versus Re-Build versus Re-Merge: Index Maintenance Strategies for Text Retrieval Systems. In Proceedings of the 27th Conference on Australasian Computer Science, pages 15--23, Darlinghurst, Australia, 2004. Google ScholarDigital Library
- A. Tomasic, H. García-Molina, and K. Shoens. Incremental Updates of Inverted Lists for Text ocument Retrieval. In Proceedings of the 1994 ACM SIGMOD Conference, pages 289--300, New York, 1994. Google ScholarDigital Library
Index Terms
- Indexing time vs. query time: trade-offs in dynamic information retrieval systems
Recommendations
Query Indexing and Velocity Constrained Indexing: Scalable Techniques for Continuous Queries on Moving Objects
Moving object environments are characterized by large numbers of moving objects and numerous concurrent continuous queries over these objects. Efficient evaluation of these queries in response to the movement of the objects is critical for supporting ...
Query Optimization Time: The New Bottleneck in Real-time Analytics
IMDM '15: Proceedings of the 3rd VLDB Workshop on In-Memory Data Mangement and AnalyticsIn the recent past, in-memory distributed database management systems have become increasingly popular to manage and query huge amounts of data. For an in-memory distributed database like MemSQL, it is imperative that the analytical queries run fast. A ...
Time- and Space-Efficient Sliding Window Top-k Query Processing
A sliding window top-k (top-k/w) query monitors incoming data stream objects within a sliding window of size w to identify the k highest-ranked objects with respect to a given scoring function over time. Processing of such queries is challenging because,...
Comments