ABSTRACT
With the proliferation of the world's “information highways” a renewed interest in efficient document indexing techniques has come about. In this paper, the problem of incremental updates of inverted lists is addressed using a new dual-structure index. The index dynamically separates long and short inverted lists and optimizes retrieval, update, and storage of each type of list. To study the behavior of the index, a space of engineering trade-offs which range from optimizing update time to optimizing query performance is described. We quantitatively explore this space by using actual data and hardware in combination with a simulation of an information retrieval system. We then describe the best algorithm for a variety of criteria.
- 1.Doug Cutting and Jan Pedersen. Optimizations for dynamic inverted index maintenance. In Proceedings of SIGIR '90, pages 405-411, 1990. Google ScholarDigital Library
- 2.Samuel DeFazio. Full-text document retrieval benchmark. In Jim Gray, editor, The Benchmark Handbook }or Database and Transaction Processsng Systems, cha.pter 8. Morgan Kaufmann, second edition, 1993.Google Scholar
- 3.Christos Faloutsos and H. V. Jagadish. Hybrid index organizations for text databases. In A. Pirotte, C. Delobel, and G. Gottlob, editors, Procce&nga 3rd International Conference on Extending Database Technology- EDBT '92, Vienna, 1992. Springer- Verlag. Google ScholarDigital Library
- 4.Christos Faloutsos and H. V. Jagadish. On B-tree indices for skewed distributions. In Proceedings of 18th International Conference on Very Large Databases, pages 363-374, Vancouver, British Columbia, Canada, 1992. Google ScholarDigital Library
- 5.William B. Frakes and Ricardo Baeza-Yates. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, 1992. Google ScholarDigital Library
- 6.Donna Harman and Gerald Candela. Retrieving records from a gigabyte of text on a minicomputer using statistical ranking. Journal of the American Society }or Information Science, 41(8):581-589, 1990.Google Scholar
- 7.Donald E. Knuth. The Art of Computer Programmzng. Addison-Wesley, Reading, Massachusetts. 1973.Google Scholar
- 8.Katia Obraczka, Peter B. Danzig, and Shih-Hao Li. IN- TERNET resource discovery services. IEEE Computer, 26(9), September 1993. Google ScholarDigital Library
- 9.Kurt Shoens, Allen Luniewski, Peter Schwarz, Jim Stamos, and 3ohn Thomas. The Rufus system: Information organization for semi-structured data. In P,vcee&ngs of the. 19th VLDB Conference, Dublin, Ireland, 1993. Google ScholarDigital Library
- 10.Kurt Shoens, Anthony Tomasic, and Hector Garcia- Molina. Synthetic workload performance analysis of incremental updates. In Procee&ngs of the 17th International A CM/SIGIR Conference on Research and Development in ln}ormatzon Retrieval, Dublin, Ireland, 1994. (to appear). Google ScholarDigital Library
- 11.Anthony Tomasic, Hector Garcia-Molina, and Kurt Shoens. Incremental updates of inverted lists for text document retrieval. Technical Note STAN-CS-TN-93- 1, Stanford University, 1993. Available via FTP from db.stanford.edu as/pub/tomasic/stan.cs.tn.93.1.ps. Google ScholarDigital Library
- 12.George Kingsley Zipf. Human Behavior and the Prznciple of Least Effort. Addison-Wesley Press, 1949.Google Scholar
- 13.Justin Zobel, Alista.ir Moffat, and Ron Sacks-Davis. An efficient indexing technique for full-text database systems. In Procee&ngs o} 18th International Conference on Very Large Databases, Vancouver, 1992. Google ScholarDigital Library
Index Terms
- Incremental updates of inverted lists for text document retrieval
Recommendations
Incremental updates of inverted lists for text document retrieval
With the proliferation of the world's “information highways” a renewed interest in efficient document indexing techniques has come about. In this paper, the problem of incremental updates of inverted lists is addressed using a new dual-structure index. ...
Comments