The Weaknesses of Full-Text Searching

https://doi.org/10.1016/j.acalib.2008.06.007Get rights and content

Abstract

This paper provides a theoretical critique of the deficiencies of full-text searching in academic library databases. Because full-text searching relies on matching words in a search query with words in online resources, it is an inefficient method of finding information in a database. This matching fails to retrieve synonyms, and it also retrieves unwanted homonyms. Numerous other problems also make full-text searching an ineffective information retrieval tool. Academic libraries purchase and subscribe to numerous proprietary databases, many of which rely on full-text searching for access and discovery. An understanding of the weaknesses of full-text searching is needed to evaluate the search and discovery capabilities of academic library databases.

Introduction

Full-text searching is the type of search a computer performs when it matches terms in a search query with terms in individual documents in a database and ranks the results algorithmically. This type of searching is ubiquitous on the Internet and includes the type of natural language search we typically find in commercial search engines, Web site search boxes, and in many proprietary databases. The term full-text searching has several synonyms and variations, including keyword searching, algorithmic searching, stochastic searching, and probabilistic searching.

There is one other main type of online searching. This is metadata-enabled searching, which is also called deterministic searching. In this type of search, searchers pre-select and search individual facets of an information resource, such as author, title, and subject. In this type of search, the system matches terms in the search with terms in structured metadata and generates results, often a browse display sorted alphanumerically. Author, title, and subject searches in online library catalogs are examples of this type of search.

Understanding the weaknesses of full-text searching is important for academic libraries for several reasons. First, academic libraries purchase or subscribe to numerous proprietary databases, including many full-text databases. When they decide whether to pay for a particular database, libraries need to evaluate the search engine or system that accompanies the database. When these databases provide only full-text searching and not metadata-enabled searching, resource discovery within the resource may be difficult, putting libraries in the position of paying for content that is hard to find. Library-created databases, such as institutional repositories, are another area where an understanding of the weaknesses of full-text searching is needed. Providing only full-text access to a library's digital objects may not provide resource discovery of sufficient quality for the collection's users. Academic libraries need to evaluate these collections and the available search engines and systems and select the best one for their particular databases. Finally, much current debate centers on the need for online library catalogs versus the ability to access academic library materials through a commercial search engine. A thorough knowledge of the weaknesses of full-text searching adds to the debate and helps academic librarians in the evaluation, recommendation and design of library database search engines.

The purpose of this article is to list and describe the chief weaknesses of full-text searching. We limit the scope of this article to true full-text searching that automatically matches words entered in the search box with words in resources a database contains to generate results. This study does not include in its analysis new, semantic search engines such as Hakia, which stores metadata for each Web page indexed and uses that metadata, along with word matching, to generate search results. Indeed, many popular search engines do incorporate metadata into their searches. For example, the Google advanced search allows for limiting search results to a specific language. This search limit is generated by language metadata that the search engine assigns to each Web page it indexes (the accuracy of this automatically-generated language metadata may not always be high).

Still, the great majority of the searches performed on the Internet are of the type this paper seeks to study: full-text searching that matches words in a search box with words in online documents or online text. This study is not a comparison of full-text searching and metadata-enabled searching. Both of these two types of searching have their various strengths and weaknesses. This article seeks chiefly to describe the weaknesses of full-text searching.

This paper is a theoretical critique of full-text searching and focuses on the type of searching done in academic libraries. It describes and categorizes the ways in which full-text searching can fail, failures that most searchers have likely encountered themselves. While outside the scope of this paper, quantitative research that measures the extent of these problems would be valuable and would further inform the debate.

Section snippets

Previous Studies

Most information retrieval and information discovery has transitioned from searching dominated by metadata-enabled searching (academic library card catalogs) to the present full-text or algorithmic searching (Web search engines). This transition occurred without sufficient analysis of the weaknesses of full-text searching. Perhaps if searchers understood the number of resources they were missing because of full-text searching's reliance on word matching to generate retrieval, they would be less

The Synonym Problem

Perhaps the biggest and most pervasive weakness of full-text searching is the synonym problem. This problem occurs because there is often more than one way to name or express a given concept, such as a person, place, or thing. There are several different aspects of the synonym problem.

True Synonyms

Synonyms are two words that mean the same thing in one language. In full-text searching, synonyms hinder effective information retrieval when a searcher enters a term in the search box and the system only returns

Further Research

Research that measures the deficiencies of full-text searching would provide valuable information. For example, research that studies the synonym problem could measure the proportion of resources missed when a library patron searches a word and fails to retrieve in the search resources that only refer to the concept being searched by its synonyms. In addition, research that compares the weaknesses of full-text searching in the humanities versus STM would prove valuable, especially if it could

Conclusion

Linguistic problems, the limitations of full-text search engines, and missing data combine to make full-text searching unreliable, incomplete, and insidiously imprecise, especially for serious information seeking, such as scholarly research. Many Web-based applications still use basic full-text searching as their chief information retrieval mechanism. Over the past fifteen years, most information retrieval has transitioned from searching based on rich metadata to full-text searching. The result

Notes and References (20)

  • Kai A. Olsen et al.

    Full Text Searching and Information Overload

    International Information & Library Review

    (June, 1998)
  • Thomas Mann Will Google's Keyword Searching Eliminate the Need for LC Cataloging and Classification? (2005). Available:...
  • Thomas Mann
  • Jeffrey Beall

    The death of metadata

    The Serials Librarian

    (2006)
  • Jeffrey Beall

    The death of full-text searching

    PNLA quarterly

    (Winter, 2006)
  • Jeffrey Beall

    Search fatigue: finding a cure for the database blues

    American Libraries

    (March, 2007)
  • Bredley M. Hemminger et al.

    Comparison of full-text searching to metadata searching for genes in two biomedical literature cohorts

    Journal of the American Society for Information Science & Technology

    (2007)
  • Beall, Search...
  • Monika Henzinger

    Search Technologies for the Internet

    Science

    (July 2007)
  • Terrence Brooks

    Web Search: how the Web has changed information retrieval

    Information Research

    (2003)
There are more references available in the full text version of this article.

Cited by (56)

View full text