skip to main content
article
Free Access

An evaluation of retrieval effectiveness for a full-text document-retrieval system

Published:01 March 1985Publication History
Skip Abstract Section

Abstract

An evaluation of a large, operational full-text document-retrieval system (containing roughly 350,000 pages of text) shows the system to be retrieving less than 20 percent of the documents relevant to a particular search. The findings are discussed in terms of the theory and practice of full-text document retrieval.

References

  1. 1 Blair. D.C. Searching biases in large interactive document retrieval systems. 1. Am. Sm. Inf. Sci. 31 (July 1960), 271-277.Google ScholarGoogle ScholarCross RefCross Ref
  2. 2 Resnikoff, H.L. The national need for research in information science. ST1 Issues and Options Workshop. House subcommittee on science. research and technology, Washington, D.C. Nov. 3, 1976.Google ScholarGoogle Scholar
  3. 3 Salton, G. Automatic text analysis. Science 168. 3929 (Apr. 1970). 335-343.Google ScholarGoogle ScholarCross RefCross Ref
  4. 4 Saracevic. T. Relevance: A review of and a framework for thinking on the notion in information science. I. Am. Sm. In/, Sri. 26 (19751, 321-343.Google ScholarGoogle Scholar
  5. 5 Sparck Jones, K. Automatic Keyword Classification for Information RP- trieval. Butterworths, London, 1971.Google ScholarGoogle Scholar
  6. 6 Swanson, IX. Searching natural language text by computer. Science 132. 3434 (Oct. 1960). 1099-1104. {7}. Swanson, D.R. Information retrieval as a trial and error process. Libr. Q. 47, 2 (1976), 1213-148.Google ScholarGoogle ScholarCross RefCross Ref
  7. 7 Swets. J.A. Information retrieval systems. Science 141 (1963), 245- 250.Google ScholarGoogle ScholarCross RefCross Ref
  8. 8 Zunde. P. and Dexter, M.E. Indexing consistency and quality. Am. Dot. 20, 3 (July 1969). 259-264.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. An evaluation of retrieval effectiveness for a full-text document-retrieval system

                    Recommendations

                    Reviews

                    Robert G Crawford

                    The notion of automatic full-text retrieval is clearly attractive; retrieval is based on automatic searching of documents for those embodying certain subject content. Such a system may involve automatic preprocessing of documents to form indexes or other structures to facilitate retrieval, but may not include any human intervention such as manual indexing of documents. Many commercial retrieval systems include accessability to databases that have not been manually indexed, so the idea of full-text retrieval is not simply a research issue. This paper describes a large-scale search and retrieval experiment aimed at evaluating the effectiveness of full-text retrieval. IBM's STAIRS, a fast, large-capacity, full-text retrieval system, was used for the study. The database consisted of just under 40,000 documents, representing roughly 350,000 pages of hard-copy text, which were related to the defense of a large corporate lawsuit. Two attorneys participated in the experiment, along with two paralegals who were familiar with the case and experienced with the STAIRS system. A total of 51 retrieval requests were processed. Precision and Recall were chosen as measures of retrieval effectiveness. Overall, these aspects of the experimental design were thoughtfully done. The scope of this study, which involved two researchers and six support staff members, took six months, and cost almost half a million dollars, is impressive. The most significant reported result of the experiment is that the average value of Recall was 20 percent, this being clearly unacceptable to lawyers specifying a need for at least 75 percent Recall. The authors discuss the reasons for these results and suggest a theoretical basis for them. While it is encouraging to see such a large-scale experiment, the paper is disappointing in several areas. The authors apparently recognize the difference between what are referred to as “data retrieval” and “information retrieval.” However, they ignore the fact that they are using a data retrieval system (that happens to handle text better than most) to do information retrieval. Ascribing Recall levels to STAIRS (“This meant that, on average, STAIRS could be used to retrieve only 20 percent of the relevant documents. . .”), as they do, is wrong. The authors seemingly respond to this comment when they write, “An objection that might be made to our evaluation of STAIRS is that the low Recall observed was not due to STAIRS but rather to query-formulation error.” They describe the difficulty facing the user who is trying to express a request to STAIRS such that all (and only) the relevant documents will be retrieved. This difficulty certainly exists, but it does not follow that the poor performance should therefore be ascribed to the retrieval step. One can equally argue that STAIRS performs at a level of 100 percent in response to any request and that, therefore, however difficult the task may be, it is query formulation that is being evaluated. However, the problem goes deeper than that, which leads to a further point. Clearly the authors recognize only two steps in the retrieval process: Boolean query formulation and Boolean retrieval using STAIRS. And indeed, those are the steps in their experimental procedure. But running such a simple system does not warrant the sweeping conclusions that they draw: “The retrieval problems we describe would be problems with any large-scale, full-text retrieval system.” In a 1970 paper by Salton [1] that they cite, easily ten different tools to aid in automatic full-text retrieval are referred to or described. The authors incorporate none of these, nor any of those proposed in the intervening years. That Recall is low, given their approach, is not news. Interesting examples are given of problems they encountered in the way words were used in their database. Because of the nature of the data, which included personal correspondence, memoranda, and verbatim minutes of meetings, the problems are particularly severe and intriguing. The authors do not speculate as to the extent of this particular factor in performance. However, it is clear that these documents inhibit Recall in ways that would not be true of, for example, scientific journal articles. Another area of concern involves the authors' discussion as to why Recall must decrease as file size increases. For example, they propose to show how Recall is calculated for a two term search, based on the probabilities of each of the terms occurring in a relevant document as well as the probabilities of a searcher using the terms in a query. But their analysis is embarrassingly simplistic and incomplete. It holds only under the assumption that all other queries result in Recall of zero, which is clearly not the case. Their claim that their study “shows that full-text retrieval does not operate at satisfactory levels and that there are sound theoretical reasons to expect this to be so” is simply not validated in this paper. Finally, it may be noted that they have traded one experimental problem for another. Much research in information retrieval has indeed suffered from the small numbers of documents used in experiments. Yet, with small size came the advantage of being able to do multitudinous comparative studies. So, for example, previous work that these authors cite involved comparisons between manual indexing and automatic full-text retrieval. Their study, while done on a large number of documents, provides no such comparison.

                    Access critical reviews of Computing literature here

                    Become a reviewer for Computing Reviews.

                    Comments

                    Login options

                    Check if you have access through your login credentials or your institution to get full access on this article.

                    Sign in

                    Full Access

                    • Published in

                      cover image Communications of the ACM
                      Communications of the ACM  Volume 28, Issue 3
                      March 1985
                      94 pages
                      ISSN:0001-0782
                      EISSN:1557-7317
                      DOI:10.1145/3166
                      Issue’s Table of Contents

                      Copyright © 1985 ACM

                      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                      Publisher

                      Association for Computing Machinery

                      New York, NY, United States

                      Publication History

                      • Published: 1 March 1985

                      Permissions

                      Request permissions about this article.

                      Request Permissions

                      Check for updates

                      Qualifiers

                      • article

                    PDF Format

                    View or Download as a PDF file.

                    PDF

                    eReader

                    View online with eReader.

                    eReader