Incremental updates of inverted lists for text document retrieval

Authors:
Anthony Tomasic

Stanford University, Department of Computer Science, Stanford, CA

Stanford University, Department of Computer Science, Stanford, CA
View Profile

,
Héctor García-Molina

Stanford University, Department of Computer Science, Stanford, CA

Stanford University, Department of Computer Science, Stanford, CA
View Profile

,
Kurt Shoens

IBM Almaden Research Center, 650 Harry Road San Jose, CA

IBM Almaden Research Center, 650 Harry Road San Jose, CA
View Profile

SIGMOD '94: Proceedings of the 1994 ACM SIGMOD international conference on Management of dataMay 1994Pages 289–300https://doi.org/10.1145/191839.191896

Published:24 May 1994Publication History

SIGMOD '94: Proceedings of the 1994 ACM SIGMOD international conference on Management of data

Pages 289–300

ABSTRACT

With the proliferation of the world's “information highways” a renewed interest in efficient document indexing techniques has come about. In this paper, the problem of incremental updates of inverted lists is addressed using a new dual-structure index. The index dynamically separates long and short inverted lists and optimizes retrieval, update, and storage of each type of list. To study the behavior of the index, a space of engineering trade-offs which range from optimizing update time to optimizing query performance is described. We quantitatively explore this space by using actual data and hardware in combination with a simulation of an information retrieval system. We then describe the best algorithm for a variety of criteria.

References

1.Doug Cutting and Jan Pedersen. Optimizations for dynamic inverted index maintenance. In Proceedings of SIGIR '90, pages 405-411, 1990. Google ScholarDigital Library
2.Samuel DeFazio. Full-text document retrieval benchmark. In Jim Gray, editor, The Benchmark Handbook }or Database and Transaction Processsng Systems, cha.pter 8. Morgan Kaufmann, second edition, 1993.Google Scholar
3.Christos Faloutsos and H. V. Jagadish. Hybrid index organizations for text databases. In A. Pirotte, C. Delobel, and G. Gottlob, editors, Procce&nga 3rd International Conference on Extending Database Technology- EDBT '92, Vienna, 1992. Springer- Verlag. Google ScholarDigital Library
4.Christos Faloutsos and H. V. Jagadish. On B-tree indices for skewed distributions. In Proceedings of 18th International Conference on Very Large Databases, pages 363-374, Vancouver, British Columbia, Canada, 1992. Google ScholarDigital Library
5.William B. Frakes and Ricardo Baeza-Yates. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, 1992. Google ScholarDigital Library
6.Donna Harman and Gerald Candela. Retrieving records from a gigabyte of text on a minicomputer using statistical ranking. Journal of the American Society }or Information Science, 41(8):581-589, 1990.Google Scholar
7.Donald E. Knuth. The Art of Computer Programmzng. Addison-Wesley, Reading, Massachusetts. 1973.Google Scholar
8.Katia Obraczka, Peter B. Danzig, and Shih-Hao Li. IN- TERNET resource discovery services. IEEE Computer, 26(9), September 1993. Google ScholarDigital Library
9.Kurt Shoens, Allen Luniewski, Peter Schwarz, Jim Stamos, and 3ohn Thomas. The Rufus system: Information organization for semi-structured data. In P,vcee&ngs of the. 19th VLDB Conference, Dublin, Ireland, 1993. Google ScholarDigital Library
10.Kurt Shoens, Anthony Tomasic, and Hector Garcia- Molina. Synthetic workload performance analysis of incremental updates. In Procee&ngs of the 17th International A CM/SIGIR Conference on Research and Development in ln}ormatzon Retrieval, Dublin, Ireland, 1994. (to appear). Google ScholarDigital Library
11.Anthony Tomasic, Hector Garcia-Molina, and Kurt Shoens. Incremental updates of inverted lists for text document retrieval. Technical Note STAN-CS-TN-93- 1, Stanford University, 1993. Available via FTP from db.stanford.edu as/pub/tomasic/stan.cs.tn.93.1.ps. Google ScholarDigital Library
12.George Kingsley Zipf. Human Behavior and the Prznciple of Least Effort. Addison-Wesley Press, 1949.Google Scholar
13.Justin Zobel, Alista.ir Moffat, and Ron Sacks-Davis. An efficient indexing technique for full-text database systems. In Procee&ngs o} 18th International Conference on Very Large Databases, Vancouver, 1992. Google ScholarDigital Library

Index Terms

Incremental updates of inverted lists for text document retrieval
1. Information systems
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

Incremental updates of inverted lists for text document retrieval

With the proliferation of the world's “information highways” a renewed interest in efficient document indexing techniques has come about. In this paper, the problem of incremental updates of inverted lists is addressed using a new dual-structure index. ...
Read More
Incremental Updates of Inverted Lists for Text Document Retrieval
Read More
Distributed queries and incremental updates in information retrieval systems
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '94: Proceedings of the 1994 ACM SIGMOD international conference on Management of data
May 1994
525 pages
ISBN:0897916395
DOI:10.1145/191839
Editors:
Richard Thomas Snodgrass
Univ. of Arizona
,
Marianne Winslett
Univ. of Illinois
ACM SIGMOD Record Volume 23, Issue 2
June 1994
522 pages
ISSN:0163-5808
DOI:10.1145/191843
Editors:
Richard Thomas Snodgrass
Univ. of Arizona
,
Marianne Winslett
Univ. of Illinois
Issue’s Table of Contents
Copyright © 1994 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 May 1994
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 113
  Total Citations
  View Citations
- 1,101
  Total Downloads
- Downloads (Last 12 months)67
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Incremental updates of inverted lists for text document retrieval

SIGMOD '94: Proceedings of the 1994 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Incremental updates of inverted lists for text document retrieval

Incremental Updates of Inverted Lists for Text Document Retrieval

Distributed queries and incremental updates in information retrieval systems