Another look at automatic text-retrieval systems

Author:
Gerard Salton

Cornell Univ., Ithaca, NY

Cornell Univ., Ithaca, NY
View Profile

Authors Info & Claims

Communications of the ACM Volume 29 Issue 7July 1986pp 648–656https://doi.org/10.1145/6138.6149

Published:01 July 1986Publication History

Communications of the ACM

Abstract

Evidence from available studies comparing manual and automatic text-retrieval systems does not support the conclusion that intellectual content analysis produces better results than comparable automatic systems.

References

1 Blair, DC. and Maron, M.E. An evaluation of retrieval effectiveness for a full-text document-retrieval system. Commun. ACM 28, 3 (Mar. 1985), X79-299. A recent evaluation of the IBM/STAIRS text-search system, which concludes that STAIRS does not always produce adequate search output. Google ScholarDigital Library
2 Cleverdon. C.W. A computer evaluation of searching by controlled language and natural language in an experimental NASA data base. Rep. ESA l/432, European Space Agency, Frascati. Italy, July 1977. A description of a large-scale test of the NASA search system using various manual and automatic text-analysis methods.Google Scholar
3 Cleverdon, C.W. Optimizing convenient on-line access to bibliographic databases. If. Serv. Use 4 (19841, 37-47. A summary of the strengths and weaknesses of existing bibliographic retrieval systems and proposals for improving the existing methodologies. Google ScholarDigital Library
4 Cleverdon. C.W., and Keen, E.M. Aslib-Cranfield Research Project. Vol. 2. Test Results. Cranfield Institute of Technology, Cranfield. England, 1966. The report on the most thorough evaluation of automatic versus manual text-analysis methods ever carried out, using a collection of 1400 aeronautics documents.Google Scholar
5 Croft, W.B., and Harper, O.J. Using probabilistic models of document retrieval without relevance information. 1. Dot. 35, 4 (Dec. 1979). 285-295. Describes a method for using probabilistic considerations of term relevance for an initial collection search before any relevance information is available.Google Scholar
6 IBM World Trade Corporation. Storage and Znformafion Refrieval Sysfern (STAIRS}-General Iilformafion Manual. 2nd ed. IBM Germany. Stuttgart, Germany. Apr. 1972. Contains an early description of the IBM/STAIRS system.Google Scholar
7 Lancaster, F.W. Evaluation of fhe Medlars Demand Search Service. National Library of Medicine, Bethesda, Md., Jan. 1968. An impressive description of the in-house test of the Medlars search system carried out at the National Library of Medicine.Google Scholar
8 Lancaster, F.W. information Retrieval Systems: Clraracferistics, Testing, and EualuaGon. 2nd ed. Wiley, New York, 1979. A well-known textbook in information retrieval with an emphasis on system testing and evaluation.Google Scholar
9 Lovins, J.B. Development of a stemming algorithm. Mech. Transl. Comput. Linguist. II. 1-2 (Mar. and June 1968), 11-31. A detailed description of an automatic word-stemming algorithm.Google Scholar
10 Robertson, SE. and Sparck Jones, K. Relevance weighting of search terms. 1. ASIS 27, 3 (May-June 1976), 129-146. Describes one of the main probabilistic information-retrieval models.Google ScholarCross Ref
11 Salton, G. Automatic text analysis. Science 168, 3929 (Apr. 1970), 335-343. A survey of automatic text retrieval as of 1970.Google ScholarCross Ref
12 Salton. G. Recent studies in automatic text analysis and document retrieval. 1. ACM 20, 2 (Apr. 1973). 258-278. An evaluation of various automatic text-analysis and indexing methods. Google ScholarDigital Library
13 Salton, G. A blueprint for automatic indexing. ACM SIGIR Forum 16, 2 (Fall 1981), 22-38. A relatively nontechnical summary of an approach to automatic indexing and text analysis. Google ScholarDigital Library
14 Salton, G. A blueprint for automatic Boolean query processing. ACM SIGIR Forum 17, 2 (Fall 1982), 6-25. A summary of a retrieval system based on soft Boolean logic and automatically assigned term weights. Google ScholarDigital Library
15 Salton, C. and Lesk. M.E. Computer evaluation of indexing and text processing. 1. ACM 15, 1 (Jan. 1968), 6-36. An early set of test results for some automatic indexing methods. Google ScholarDigital Library
16 Salton, G., and McGill, M.J. Lntroducfion to Modern Information Retrieval. McGraw-Hill, New York, 1983. A recent textbook dealing with automatic text processing and text search and retrieval. Google ScholarDigital Library
17 Salton. G., Fox, E.A., and Wu, H. Extended Boolean information retrieval. Commun. ACM 26, 11 (Nov. 1983), 1022-1036. A description of a retrieval model using soft (fuzzy) Boolean logic with weighted document terms and weighted Boolean queries. Google ScholarDigital Library
18 Salton, G. Yang, C.S., and Yu, CT. A theory of term importance in automatic text analysis. I. ASK 26, 1 (Jan.-Feb. 1975), 33-44. Contains a description of term-discrimination theory and some retrieval results based on discrimination value weighting.Google Scholar
19 Sparck Jones, K. A statistical interpretation of term specificity and its application in retrieval. J Dot. 28, 1 (Mar. 1972), 11-21. Relates the usefulness of index terms to certain statistical term occurrence parameters.Google ScholarCross Ref
20 Swanson. D.R. Searching natural language text by computer. Science 132, 3434 (Oct. 1960), 1099-1104. A pioneering small-scale test comparing an automatic text-search system with a conventional retrieval system based on manual indexing: probably the earliest result showing the superiority of automatic text searching.Google ScholarCross Ref
21 van Rijsbergen. C.J. Information Refrieval. 2nd ed. Butterworths. London, England, 1979. A well-known research-oriented informationretrieval text containing many original research results, including work in probabilistic information retrieval. Google ScholarDigital Library

Index Terms

Another look at automatic text-retrieval systems

Recommendations

Another Look at Automatic Text Retrieval Systems
Read More
Automatic Text Categorization and Its Application to Text Retrieval

We develop an automatic text categorization approach and investigate its application to text retrieval. The categorization approach is derived from a combination of a learning paradigm known as instance-based learning and an advanced document retrieval ...
Read More
Automatic Query Expansion for Japanese Text Retrieval
Read More

Reviews

Reviewer: Robert G Crawford

In a document retrieval system, a file of natural-language documents is searched and certain stored items are retrieved in response to queries submitted by users. A research question concerns the effectiveness of fully automated document retrieval as compared to document retrieval based on manual indexing. In a recent paper, Blair and Maron [1] reported the results of a large-scale document retrieval experiment and stated that their study “shows that full-text document retrieval does not operate at satisfactory levels.” Salton's paper provides a thoughtful and necessary response to this unwarranted claim. Salton interprets the results of the Blair and Maron experiments as representing a high order of retrieval effectiveness. He summarizes other major experiments comparing automatic retrieval with manual, controlled vocabulary systems. As well, the theories underlying automatic indexing are presented, and a basic blueprint for implementing effective automatic retrieval systems is proposed. The paper provides an excellent overview and a good synopsis of the current state of the art in document retrieval.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Communications of the ACM Volume 29, Issue 7
July 1986
103 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/6138
Editor:
Peter J. Denning
NASA Ames Research Center, Moffett Field, CA
Issue’s Table of Contents
Copyright © 1986 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 July 1986
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 213
  Total Citations
  View Citations
- 2,054
  Total Downloads
- Downloads (Last 12 months)66
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Another look at automatic text-retrieval systems

Communications of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Another Look at Automatic Text Retrieval Systems

Automatic Text Categorization and Its Application to Text Retrieval

Automatic Query Expansion for Japanese Text Retrieval

Reviews

Access critical reviews of Computing literature here