skip to main content
article

The string edit distance matching problem with moves

Published:02 February 2007Publication History
Skip Abstract Section

Abstract

The edit distance between two strings S and R is defined to be the minimum number of character inserts, deletes, and changes needed to convert R to S. Given a text string t of length n, and a pattern string p of length m, informally, the string edit distance matching problem is to compute the smallest edit distance between p and substrings of t.

We relax the problem so that: (a) we allow an additional operation, namely, substring moves; and (b) we allow approximation of this string edit distance. Our result is a near-linear time deterministic algorithm to produce a factor of O(log n log* n) approximation to the string edit distance with moves. This is the first known significantly subquadratic algorithm for a string edit distance problem in which the distance involves nontrivial alignments. Our results are obtained by embedding strings into L1 vector space using a simplified parsing technique, which we call edit-sensitive parsing (ESP).

References

  1. Abrahamson, K. 1987. Generalized string matching. SIAM J. Comput. 16, 6, 1039--1051. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alstrup, S., Brodal, G. S., and Rauhe, T. 2000. Pattern matching in dynamic texts. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 819--828. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Amir, A., Lewenstein, M., and Porat, E. 2000. Faster algorithms for string matching with k-mismatches. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 794--803. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Anderson, R. J., and Miller, G. L. 1991. Deterministic parallel list ranking. Algorithmica 6, 859--868.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cole, R., and Hariharan, R. 1998. Approximate string matching: A simpler, faster algorithm. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 463--472. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cole, R., and Vishkin, U. 1986. Deterministic coin tossing and accelerating cascades: Micro and macro techniques for designing parallel algorithms. In Proceedings of the ACM Symposium on Theory of Computing. 206--219. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cormode, G., Paterson, M., Sahinalp, S. C., and Vishkin, U. 2000. Communication complexity of document exchange. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 197--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Crochemore, M., and Rytter, W. 1994. Text Algorithms. Oxford University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Goel, A., Indyk, P., and Varadarajan, K. 2001. Reductions among high dimensional proximity problems. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 769--778. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Goldberg, A., Plotkin, S., and Shannon, G. 1987. Parallel symmetry-breaking in sparse graphs. In Proceedings of the ACM Symposium on Theory of Computing. 315--324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gusfield, D. 1997. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Henzinger, M., Raghavan, P., and Rajagopalan, S. 1998. Computing on data streams. Tech. Rep. SRC 1998-011, DEC Systems Research Centre.Google ScholarGoogle Scholar
  13. Indyk, P. 2000. Stable distributions, pseudorandom generators, embeddings and data stream computation. In IEEE Conference on the Foundations of Computer Science. 189--197. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Indyk, P., and Motwani, R. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the ACM Symposium on Theory of Computing. 604--613. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Karloff, H. 1993. Fast algorithms for approximately counting mismatches. Inf. Process. Lett. 48, 2, 53--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Karp, R. M., Miller, R. E., and Rosenberg, A. L. 1972. Rapid identification of repeated patterns in strings, trees and arrays. In Proceedings of the ACM Symposium on Theory of Computing. 125--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Karp, R. M., and Rabin, M. O. 1987. Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31, 2, 249--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Landau, G., and Vishkin, U. 1989. Fast parallel and serial approximate string matching. J. Algorithms 10, 2, 157--169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Landau, G. M., and Vishkin, U. 1986. Efficient string matching with k mismatches. Theor. Comput. Sci. 43, 239--249. Google ScholarGoogle ScholarCross RefCross Ref
  20. Levenshtein, V. I. 1966. Binary codes capable of correcting deletions, insertions and reversals. Soviet Physic. Doklady. 10, 8, 707--710.Google ScholarGoogle Scholar
  21. Masek, W. J., and Paterson, M. S. 1980. A faster algorithm computing string edit distances. J. Comput. Syst. Sci. 20, 18--31.Google ScholarGoogle ScholarCross RefCross Ref
  22. Mehlhorn, K., Sundar, R., and Uhrig, C. 1997. Maintaining dynamic sequences under equality tests in polylogarithmic time. Algorithmica 17, 2, 183--198.Google ScholarGoogle ScholarCross RefCross Ref
  23. Muthukrishnan, S., Sahinalp, S. C. 2000. Approximate nearest neighbors and sequence comparison with block operations. In Proceedings of the ACM Symposium on Theory of Computing. 416--424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Myers, E. W. 1986. An O(N D) difference algorithm and its variations. Algorithmica 1, 251--256.Google ScholarGoogle ScholarCross RefCross Ref
  25. Sahinalp, S. C., and Vishkin, U. 1994. Symmetry breaking for suffix tree construction. In Proceedings of the ACM Symposium on Theory of Computing. 300--309. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Sahinalp, S. C., and Vishkin, U. 1995. Data compression using locally consistent parsing. Tech. Rep., University of Maryland Department of Computer Science.Google ScholarGoogle Scholar
  27. Sahinalp, S. C., and Vishkin, U. 1996. Efficient approximate and dynamic matching of patterns using a labeling paradigm. In IEEE Conference on the Foundations of Computer Science. 320--328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Sankoff, D., and Kruskal, J. B. 1983. Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley.Google ScholarGoogle Scholar
  29. Shapira, D., and Storer, J. A. 2002. Edit distance with move operations. In Proceedings of the 13th Symposium on Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 2373. 85--98. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The string edit distance matching problem with moves

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Algorithms
      ACM Transactions on Algorithms  Volume 3, Issue 1
      February 2007
      178 pages
      ISSN:1549-6325
      EISSN:1549-6333
      DOI:10.1145/1186810
      Issue’s Table of Contents

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 February 2007
      Published in talg Volume 3, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader