A parallel indexed algorithm for information retrieval

Authors:
C. Stanfill

Thinking Machines Corporation, 245 First Street, Cambridge MA

Thinking Machines Corporation, 245 First Street, Cambridge MA
View Profile

,
R. Thau

Thinking Machines Corporation, 245 First Street, Cambridge MA

Thinking Machines Corporation, 245 First Street, Cambridge MA
View Profile

,
D. Waltz

Thinking Machines Corporation, 245 First Street, Cambridge MA

Thinking Machines Corporation, 245 First Street, Cambridge MA
View Profile

SIGIR '89: Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrievalMay 1989Pages 88–97https://doi.org/10.1145/75334.75345

Published:01 May 1989Publication History

SIGIR '89: Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 88–97

ABSTRACT

In this paper we present a parallel document ranking algorithm suitable for use on databases of 1-1000 GB, resident on primary or secondary storage. The algorithm is based on inverted indexes, and has two advantages over a previously published parallel algorithm for retrieval based on signature files. First, it permits the employment of ranking strategies which cannot be easily implemented using signature files, specifically methods which depend on document-term weighting. Second, it permits the interactive searching of databases resident on secondary storage. The algorithm is evaluated via a mixture of analytic and simulation techniques, with a particular focus on how cost-effectiveness and efficiency change as the size of the database, number of processors, and cost of memory are altered. In particular, we find that if the ratio of the number of processors and/or disks to the size of the database is held constant, then the cost-effectiveness of the resulting system remains constant. Furthermore, for a given size of database, there is a number of processors which optimizes cost-effectiveness. Estimated response times are also presented. Using these methods, it appears that cost-effective interactive access to databases in the 100-1000 GB range can be achieved using current technology.

References

1.Van Rijsbergen, C.J., Information Retrieval, Second Edition, Butterworths, London, 1979. Google ScholarDigital Library
2.Salton, G., The SMART Retrieval System -- Experiment in Automatic Document Processing, Prentice-Hall, Englewood Cliffs, New Jersey, 1971. Google ScholarDigital Library
3.Christodoulakis, S. and Faloutsos, C., "Design considerations for a message file server," IEEE Transactions on Software Engineering, Volume SE-10, 1984, pp. 201-210.Google ScholarDigital Library
4.Faloutsos, C. and Christodoulakis, S., "Signature files: An access method for documents and its analytic performance evaluation," A CM Transactions on Office Information Systems, Volume 2 Number 4, October 1984, pp. 267 - 288. Google ScholarDigital Library
5.Tsichritzis, D. and Christodoulakis, S., "Message files," A CM Transactions on Office Information Systems, Volume 11 Number 1, January 1983, pp. 88-98. Google ScholarDigital Library
6.Stanfill, C. and Kahle, B., "Parallel Free- Text Search on the Connection Machine System," Communications of the A CM, Volume 29 Number 12, December 1986, pp. 1229- 1238. Google ScholarDigital Library
7.Hillis, D., The Connection Machine, MIT Press, Cambridge MA, 1985, Google ScholarDigital Library
8."Connection Machine~ Model CM-2 Technical Summary, Technical Report HA87-4, Thinking Machines Corporation, Cambridge MA, 1987.Google Scholar
9.Stone, H., "Parallel Querying of Large Databases: A Case Study," IEEE Computer, October 1987, pp. 11-21. Google ScholarDigital Library
10.Salton, G. and Buckley, C., "Parallel Text Search Methods," Communications of the ACM, Volume 31 Number 2, February 1988, pp. 202-215. Google ScholarDigital Library
11.Croft, Bruce, "Implementing Ranking Strategies Using Text Signatures," A CM Transactions on Office Information Systems, Volume 6 Number 1, January 1988, pp. 42-62. Google ScholarDigital Library
12.Stanfill, C., "Parallel Computing for Information Retrieval: Recent Developments," Technical Report DR88-1, Thinking Machines Corporation, Cambridge MA, January 1988.Google Scholar
13.Zipf, H.P., Human Behavior and the Principle of Least Effort, Addison-Wesley, Cambridge MA, 1949.Google Scholar

Index Terms

A parallel indexed algorithm for information retrieval

Recommendations

A parallel indexed algorithm for information retrieval
Special issue: Proceedings of the 12th annual international ACMSIGIR conference on Research and development in information retrieval, N.J. Belkin and C.J. van Rijsbergen (Eds.), June 25-28, 1989, Cambridge, MA.

In this paper we present a parallel document ranking algorithm suitable for use on databases of 1-1000 GB, resident on primary or secondary storage. The algorithm is based on inverted indexes, and has two advantages over a previously published parallel ...
Read More
Efficient Directory Mutations in a Full-Path-Indexed File System
Special Issue on FAST 2018 and Regular Papers

Full-path indexing can improve I/O efficiency for workloads that operate on data organized using traditional, hierarchical directories, because data is placed on persistent storage in scan order. Prior results indicate, however, that renames in a local ...
Read More
A formal system for information retrieval from files

A generalized file structure is provided by which the concepts of keyword, index, record, file, directory, file structure, directory decoding, and record retrieval are defined and from which some of the frequently used file structures such as inverted ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '89: Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
May 1989
257 pages
ISBN:0897913213
DOI:10.1145/75334
Editors:
N. J. Belkin,
C. J. van Rijsbergen
ACM SIGIR Forum Volume 23, Issue SI
Special issue: Proceedings of the 12th annual international ACMSIGIR conference on Research and development in information retrieval, N.J. Belkin and C.J. van Rijsbergen (Eds.), June 25-28, 1989, Cambridge, MA.
June 1989
243 pages
ISSN:0163-5840
DOI:10.1145/75335
Editors:
Vijay Raghavan
University of Southwestern Louisiana, Lafayette, LA
,
William B. Frakes
AT&T Bell Laboratories, Holmdel, NJ
,
N. J. Belkin,
C. J. van Rijsbergen
Issue’s Table of Contents
Copyright © 1989 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 1989
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 18
  Total Citations
  View Citations
- 666
  Total Downloads
- Downloads (Last 12 months)27
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A parallel indexed algorithm for information retrieval

SIGIR '89: Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

A parallel indexed algorithm for information retrieval

Efficient Directory Mutations in a Full-Path-Indexed File System

A formal system for information retrieval from files