skip to main content
10.1145/191839.191896acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article
Free Access

Incremental updates of inverted lists for text document retrieval

Authors Info & Claims
Published:24 May 1994Publication History

ABSTRACT

With the proliferation of the world's “information highways” a renewed interest in efficient document indexing techniques has come about. In this paper, the problem of incremental updates of inverted lists is addressed using a new dual-structure index. The index dynamically separates long and short inverted lists and optimizes retrieval, update, and storage of each type of list. To study the behavior of the index, a space of engineering trade-offs which range from optimizing update time to optimizing query performance is described. We quantitatively explore this space by using actual data and hardware in combination with a simulation of an information retrieval system. We then describe the best algorithm for a variety of criteria.

References

  1. 1.Doug Cutting and Jan Pedersen. Optimizations for dynamic inverted index maintenance. In Proceedings of SIGIR '90, pages 405-411, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.Samuel DeFazio. Full-text document retrieval benchmark. In Jim Gray, editor, The Benchmark Handbook }or Database and Transaction Processsng Systems, cha.pter 8. Morgan Kaufmann, second edition, 1993.Google ScholarGoogle Scholar
  3. 3.Christos Faloutsos and H. V. Jagadish. Hybrid index organizations for text databases. In A. Pirotte, C. Delobel, and G. Gottlob, editors, Procce&nga 3rd International Conference on Extending Database Technology- EDBT '92, Vienna, 1992. Springer- Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.Christos Faloutsos and H. V. Jagadish. On B-tree indices for skewed distributions. In Proceedings of 18th International Conference on Very Large Databases, pages 363-374, Vancouver, British Columbia, Canada, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.William B. Frakes and Ricardo Baeza-Yates. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.Donna Harman and Gerald Candela. Retrieving records from a gigabyte of text on a minicomputer using statistical ranking. Journal of the American Society }or Information Science, 41(8):581-589, 1990.Google ScholarGoogle Scholar
  7. 7.Donald E. Knuth. The Art of Computer Programmzng. Addison-Wesley, Reading, Massachusetts. 1973.Google ScholarGoogle Scholar
  8. 8.Katia Obraczka, Peter B. Danzig, and Shih-Hao Li. IN- TERNET resource discovery services. IEEE Computer, 26(9), September 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.Kurt Shoens, Allen Luniewski, Peter Schwarz, Jim Stamos, and 3ohn Thomas. The Rufus system: Information organization for semi-structured data. In P,vcee&ngs of the. 19th VLDB Conference, Dublin, Ireland, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.Kurt Shoens, Anthony Tomasic, and Hector Garcia- Molina. Synthetic workload performance analysis of incremental updates. In Procee&ngs of the 17th International A CM/SIGIR Conference on Research and Development in ln}ormatzon Retrieval, Dublin, Ireland, 1994. (to appear). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.Anthony Tomasic, Hector Garcia-Molina, and Kurt Shoens. Incremental updates of inverted lists for text document retrieval. Technical Note STAN-CS-TN-93- 1, Stanford University, 1993. Available via FTP from db.stanford.edu as/pub/tomasic/stan.cs.tn.93.1.ps. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.George Kingsley Zipf. Human Behavior and the Prznciple of Least Effort. Addison-Wesley Press, 1949.Google ScholarGoogle Scholar
  13. 13.Justin Zobel, Alista.ir Moffat, and Ron Sacks-Davis. An efficient indexing technique for full-text database systems. In Procee&ngs o} 18th International Conference on Very Large Databases, Vancouver, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Incremental updates of inverted lists for text document retrieval

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              SIGMOD '94: Proceedings of the 1994 ACM SIGMOD international conference on Management of data
              May 1994
              525 pages
              ISBN:0897916395
              DOI:10.1145/191839

              Copyright © 1994 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 24 May 1994

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              Overall Acceptance Rate785of4,003submissions,20%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader