An evaluation of retrieval effectiveness for a full-text document-retrieval system

Authors:
David C. Blair

The Univ. of Michigan, Ann Arbor

The Univ. of Michigan, Ann Arbor
View Profile

,
M. E. Maron

The Univ. of California, Berkeley

The Univ. of California, Berkeley
View Profile

Authors Info & Claims

Communications of the ACM Volume 28 Issue 3March 1985pp 289–299https://doi.org/10.1145/3166.3197

Published:01 March 1985Publication History

Communications of the ACM

Abstract

An evaluation of a large, operational full-text document-retrieval system (containing roughly 350,000 pages of text) shows the system to be retrieving less than 20 percent of the documents relevant to a particular search. The findings are discussed in terms of the theory and practice of full-text document retrieval.

References

1 Blair. D.C. Searching biases in large interactive document retrieval systems. 1. Am. Sm. Inf. Sci. 31 (July 1960), 271-277.Google ScholarCross Ref
2 Resnikoff, H.L. The national need for research in information science. ST1 Issues and Options Workshop. House subcommittee on science. research and technology, Washington, D.C. Nov. 3, 1976.Google Scholar
3 Salton, G. Automatic text analysis. Science 168. 3929 (Apr. 1970). 335-343.Google ScholarCross Ref
4 Saracevic. T. Relevance: A review of and a framework for thinking on the notion in information science. I. Am. Sm. In/, Sri. 26 (19751, 321-343.Google Scholar
5 Sparck Jones, K. Automatic Keyword Classification for Information RP- trieval. Butterworths, London, 1971.Google Scholar
6 Swanson, IX. Searching natural language text by computer. Science 132. 3434 (Oct. 1960). 1099-1104. {7}. Swanson, D.R. Information retrieval as a trial and error process. Libr. Q. 47, 2 (1976), 1213-148.Google ScholarCross Ref
7 Swets. J.A. Information retrieval systems. Science 141 (1963), 245- 250.Google ScholarCross Ref
8 Zunde. P. and Dexter, M.E. Indexing consistency and quality. Am. Dot. 20, 3 (July 1969). 259-264.Google ScholarCross Ref

Index Terms

Recommendations

Evaluation of Full-Text Retrieval System Using Collection of Serially Evolved Documents
ICIBE '17: Proceedings of the 3rd International Conference on Industrial and Business Engineering

Finding a document that is similar to a specified query document within a large document database is one of important issues in the Big Data era, as most data available is in the form of unstructured texts. Our testing collection consists of two parts: ...
Read More
Document Processing and Retrieval: Texpros
Read More
Full text processing and retrieval: weight ranking, text structuring, and passage retrieval for Arabic documents
Read More

Reviews

Reviewer: Robert G Crawford

The notion of automatic full-text retrieval is clearly attractive; retrieval is based on automatic searching of documents for those embodying certain subject content. Such a system may involve automatic preprocessing of documents to form indexes or other structures to facilitate retrieval, but may not include any human intervention such as manual indexing of documents. Many commercial retrieval systems include accessability to databases that have not been manually indexed, so the idea of full-text retrieval is not simply a research issue. This paper describes a large-scale search and retrieval experiment aimed at evaluating the effectiveness of full-text retrieval. IBM's STAIRS, a fast, large-capacity, full-text retrieval system, was used for the study. The database consisted of just under 40,000 documents, representing roughly 350,000 pages of hard-copy text, which were related to the defense of a large corporate lawsuit. Two attorneys participated in the experiment, along with two paralegals who were familiar with the case and experienced with the STAIRS system. A total of 51 retrieval requests were processed. Precision and Recall were chosen as measures of retrieval effectiveness. Overall, these aspects of the experimental design were thoughtfully done. The scope of this study, which involved two researchers and six support staff members, took six months, and cost almost half a million dollars, is impressive. The most significant reported result of the experiment is that the average value of Recall was 20 percent, this being clearly unacceptable to lawyers specifying a need for at least 75 percent Recall. The authors discuss the reasons for these results and suggest a theoretical basis for them. While it is encouraging to see such a large-scale experiment, the paper is disappointing in several areas. The authors apparently recognize the difference between what are referred to as “data retrieval” and “information retrieval.” However, they ignore the fact that they are using a data retrieval system (that happens to handle text better than most) to do information retrieval. Ascribing Recall levels to STAIRS (“This meant that, on average, STAIRS could be used to retrieve only 20 percent of the relevant documents. . .”), as they do, is wrong. The authors seemingly respond to this comment when they write, “An objection that might be made to our evaluation of STAIRS is that the low Recall observed was not due to STAIRS but rather to query-formulation error.” They describe the difficulty facing the user who is trying to express a request to STAIRS such that all (and only) the relevant documents will be retrieved. This difficulty certainly exists, but it does not follow that the poor performance should therefore be ascribed to the retrieval step. One can equally argue that STAIRS performs at a level of 100 percent in response to any request and that, therefore, however difficult the task may be, it is query formulation that is being evaluated. However, the problem goes deeper than that, which leads to a further point. Clearly the authors recognize only two steps in the retrieval process: Boolean query formulation and Boolean retrieval using STAIRS. And indeed, those are the steps in their experimental procedure. But running such a simple system does not warrant the sweeping conclusions that they draw: “The retrieval problems we describe would be problems with any large-scale, full-text retrieval system.” In a 1970 paper by Salton [1] that they cite, easily ten different tools to aid in automatic full-text retrieval are referred to or described. The authors incorporate none of these, nor any of those proposed in the intervening years. That Recall is low, given their approach, is not news. Interesting examples are given of problems they encountered in the way words were used in their database. Because of the nature of the data, which included personal correspondence, memoranda, and verbatim minutes of meetings, the problems are particularly severe and intriguing. The authors do not speculate as to the extent of this particular factor in performance. However, it is clear that these documents inhibit Recall in ways that would not be true of, for example, scientific journal articles. Another area of concern involves the authors' discussion as to why Recall must decrease as file size increases. For example, they propose to show how Recall is calculated for a two term search, based on the probabilities of each of the terms occurring in a relevant document as well as the probabilities of a searcher using the terms in a query. But their analysis is embarrassingly simplistic and incomplete. It holds only under the assumption that all other queries result in Recall of zero, which is clearly not the case. Their claim that their study “shows that full-text retrieval does not operate at satisfactory levels and that there are sound theoretical reasons to expect this to be so” is simply not validated in this paper. Finally, it may be noted that they have traded one experimental problem for another. Much research in information retrieval has indeed suffered from the small numbers of documents used in experiments. Yet, with small size came the advantage of being able to do multitudinous comparative studies. So, for example, previous work that these authors cite involved comparisons between manual indexing and automatic full-text retrieval. Their study, while done on a large number of documents, provides no such comparison.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Communications of the ACM Volume 28, Issue 3
March 1985
94 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/3166
Editor:
Peter J. Denning
NASA Ames Research Center, Moffett Field, CA
Issue’s Table of Contents
Copyright © 1985 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 March 1985
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 466
  Total Citations
  View Citations
- 4,756
  Total Downloads
- Downloads (Last 12 months)554
- Downloads (Last 6 weeks)78
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An evaluation of retrieval effectiveness for a full-text document-retrieval system

Communications of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Evaluation of Full-Text Retrieval System Using Collection of Serially Evolved Documents

Document Processing and Retrieval: Texpros

Full text processing and retrieval: weight ranking, text structuring, and passage retrieval for Arabic documents

Reviews

Access critical reviews of Computing literature here