skip to main content
article
Free Access

Fast text searching: allowing errors

Published:01 October 1992Publication History
First page image

References

  1. 1 Abrahamson, D. Generalized string matching. SIAM J. Comput. 16, 6 (1987), 1039-1051. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2 Baeza-Yates, R.A. and Gonnet, G.H. A new approach to text seartching. Commun. ACM 35, 10 (1992). Preliminary version appearted in Proceedings of the 12th Annual ACM-SIGIR Conference on Information Retreival, Cambridge Mass. (June 1989), pp. 168-175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3 Borer, R.,S, and Moore, J.s., A fast string searching algorithm. Commun. ACM 20, 10 (Oct, 1977), 762- 772 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4 Chang, W.I. and Lawler, E.L. Approximate string matching in sublinear expected time., FOCS 90, pp. 116-124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5 Galil, Z. and Giancarlo, R. Data structures and algorithms for approximate string matching. J. Com plex. 4 (1988)., 33- 72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6 Galil, Z and Park, K. An improved algorithm for approximate, string matching. SIAM J. Comput. 19 (Dec. 1990), 989-999, Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7 Gonnet, G.H. and Baeza.-Yates. R.A. Handbook of Algoritms and Data Structures. Second ed. Addison- Wesley, Reading, Mass., 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8 Hall, P.A.V. and Dowling, G.R. Approximate string matching. Comput. Surv, (Dec. 1980), 381-402, Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9 Hopcroft, J.E. and U'llman, J.D. Introduction to Automata Theory, Languages. and Computational addison- Wesley, Reading, Mass, (1979), Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10 Knuth, D.E., Morris. J.H. and Pratt V.R,.Fast pattern matching in strings. SIAM J. Comput. 6 (June 1977), 323-350.Google ScholarGoogle ScholarCross RefCross Ref
  11. 11 Landau, G.M. and Vishkin. U, Fast siring matching with k differenccs, J. Comput. Syst. Sci. 37 (1988), 63- 78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12 Landau, G.M. and Vishkin, U. Fast parallel and serial approximate string matching. J. Algor. 10 (1989). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13 Levenshtein, V.I. Binary codes capable of correcting deletions, insertions, and reversals, Sov. Physs., DokL (Feb. 1966), 707-710LGoogle ScholarGoogle Scholar
  14. 14 Manber, U. and Wu, S. Approximate string matching with arbitrary costs for text abd hypertext. IAPR Workshop on Structural and Syntatic Pattern. Recognition, (Bern, Switzerland. Aug. 1992).Google ScholarGoogle Scholar
  15. 15 Manber, U. and Wu, S. Approximate pattern matching. BYTE. To be published Nov, I992,Google ScholarGoogle Scholar
  16. 16 Myers, E.W, An O(ND) difference algorithm and its variations. Algorithmica 1 (1986), 251-266,.Google ScholarGoogle ScholarCross RefCross Ref
  17. 17 Myers, E.W. and Miller, W. Approximate matching of regular expressions. Bull Math. Bio, 51, (1989), 5-37.Google ScholarGoogle ScholarCross RefCross Ref
  18. 18 Pinter, R. Efficient string matching with don't-care patterns. In combinatorial Algoritms on Words, A. Apostolico and Z. Galil, Eds., Springer-Verlag, Berlin, 1985.Google ScholarGoogle Scholar
  19. 19 Tarhio, J. and Ukkonen, E. Approximate Boyer-Moore string matching. Tech. Rep.#A-.1990-3,. Dept. of Computer Science, University of Helsinki (Mar. 1990).,Google ScholarGoogle Scholar
  20. 20 Ukkonen, E. Finding approximate patterns in sirings. J, Algor. 6 (1985), 132-137.Google ScholarGoogle ScholarCross RefCross Ref
  21. 21 Ukkonen, E, Algorithms for approximate string matching. Inf. Control 64 (1985), 100-118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22 Wagner, R.A. and Seiferas, J.I., Correcting counter-automation-recognizable languages, SIAM J. Comput. (1978), 3357-375.Google ScholarGoogle Scholar
  23. 23 Wu, S. Approximatr patern matching and its applications. Ph.D. dissertation, Dept. of Comput. Sci., University of Arizona, June 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. 24 Wu, S. and Manber. U. Agrep-A fast approximate pattern-matching tool. Usenix Winter 1992 Technical Conference (San Francisco Jan. 1992), pp. 153-162,Google ScholarGoogle Scholar
  25. 25 Wu, S., Manber, U. and Myers,. E,.W',. A Sub-Quadratic Algorithm for Approximate Regular Expression M:atching, submitted for publication (May 1992).Google ScholarGoogle Scholar

Index Terms

  1. Fast text searching: allowing errors

        Recommendations

        Reviews

        Wilfred J. Hansen

        The simple string matching problem is to find an exact match for a pattern string in some text corpus. Fast, special-purpose algorithms solve this problem in time O n/m , where n is the length of the text and m is the length of the pattern. These algorithms are of little use for inexact matches, however. This paper considers more general string matching problems such as approximate matches, wild cards, and regular expressions. It shows that many interesting cases can be handled by variations of an earlier O n simple string matching algorithm. This algorithm maintains an m -bit vector R such that R [ i ] =1 if and only if the text preceding the current scan position matches the first i characters in the pattern. To allow k errors, the new algorithm requires k additional bit vectors; the running time is O kn , which is much better than other approximate matching algorithms. When k is somewhat smaller than m , a partition searching extension reduces the running time to O n . The algorithms have been incorporated in a package, agrep, that is publicly available. Experimental results reported in the paper indicate that this algorithm is considerably faster than other approaches.

        Access critical reviews of Computing literature here

        Become a reviewer for Computing Reviews.

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Communications of the ACM
          Communications of the ACM  Volume 35, Issue 10
          Oct. 1992
          93 pages
          ISSN:0001-0782
          EISSN:1557-7317
          DOI:10.1145/135239
          • Editor:
          • Peter Denning
          Issue’s Table of Contents

          Copyright © 1992 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 October 1992

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader