skip to main content
10.1145/75334.75345acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article
Free Access

A parallel indexed algorithm for information retrieval

Authors Info & Claims
Published:01 May 1989Publication History

ABSTRACT

In this paper we present a parallel document ranking algorithm suitable for use on databases of 1-1000 GB, resident on primary or secondary storage. The algorithm is based on inverted indexes, and has two advantages over a previously published parallel algorithm for retrieval based on signature files. First, it permits the employment of ranking strategies which cannot be easily implemented using signature files, specifically methods which depend on document-term weighting. Second, it permits the interactive searching of databases resident on secondary storage. The algorithm is evaluated via a mixture of analytic and simulation techniques, with a particular focus on how cost-effectiveness and efficiency change as the size of the database, number of processors, and cost of memory are altered. In particular, we find that if the ratio of the number of processors and/or disks to the size of the database is held constant, then the cost-effectiveness of the resulting system remains constant. Furthermore, for a given size of database, there is a number of processors which optimizes cost-effectiveness. Estimated response times are also presented. Using these methods, it appears that cost-effective interactive access to databases in the 100-1000 GB range can be achieved using current technology.

References

  1. 1.Van Rijsbergen, C.J., Information Retrieval, Second Edition, Butterworths, London, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.Salton, G., The SMART Retrieval System -- Experiment in Automatic Document Processing, Prentice-Hall, Englewood Cliffs, New Jersey, 1971. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.Christodoulakis, S. and Faloutsos, C., "Design considerations for a message file server," IEEE Transactions on Software Engineering, Volume SE-10, 1984, pp. 201-210.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.Faloutsos, C. and Christodoulakis, S., "Signature files: An access method for documents and its analytic performance evaluation," A CM Transactions on Office Information Systems, Volume 2 Number 4, October 1984, pp. 267 - 288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.Tsichritzis, D. and Christodoulakis, S., "Message files," A CM Transactions on Office Information Systems, Volume 11 Number 1, January 1983, pp. 88-98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.Stanfill, C. and Kahle, B., "Parallel Free- Text Search on the Connection Machine System," Communications of the A CM, Volume 29 Number 12, December 1986, pp. 1229- 1238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.Hillis, D., The Connection Machine, MIT Press, Cambridge MA, 1985, Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8."Connection Machine~ Model CM-2 Technical Summary, Technical Report HA87-4, Thinking Machines Corporation, Cambridge MA, 1987.Google ScholarGoogle Scholar
  9. 9.Stone, H., "Parallel Querying of Large Databases: A Case Study," IEEE Computer, October 1987, pp. 11-21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.Salton, G. and Buckley, C., "Parallel Text Search Methods," Communications of the ACM, Volume 31 Number 2, February 1988, pp. 202-215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.Croft, Bruce, "Implementing Ranking Strategies Using Text Signatures," A CM Transactions on Office Information Systems, Volume 6 Number 1, January 1988, pp. 42-62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.Stanfill, C., "Parallel Computing for Information Retrieval: Recent Developments," Technical Report DR88-1, Thinking Machines Corporation, Cambridge MA, January 1988.Google ScholarGoogle Scholar
  13. 13.Zipf, H.P., Human Behavior and the Principle of Least Effort, Addison-Wesley, Cambridge MA, 1949.Google ScholarGoogle Scholar

Index Terms

  1. A parallel indexed algorithm for information retrieval

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SIGIR '89: Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
            May 1989
            257 pages
            ISBN:0897913213
            DOI:10.1145/75334
            • cover image ACM SIGIR Forum
              ACM SIGIR Forum  Volume 23, Issue SI
              Special issue: Proceedings of the 12th annual international ACMSIGIR conference on Research and development in information retrieval, N.J. Belkin and C.J. van Rijsbergen (Eds.), June 25-28, 1989, Cambridge, MA.
              June 1989
              243 pages
              ISSN:0163-5840
              DOI:10.1145/75335
              Issue’s Table of Contents

            Copyright © 1989 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 May 1989

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            Overall Acceptance Rate792of3,983submissions,20%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader