Abstract
The edit distance between two strings S and R is defined to be the minimum number of character inserts, deletes, and changes needed to convert R to S. Given a text string t of length n, and a pattern string p of length m, informally, the string edit distance matching problem is to compute the smallest edit distance between p and substrings of t.
We relax the problem so that: (a) we allow an additional operation, namely, substring moves; and (b) we allow approximation of this string edit distance. Our result is a near-linear time deterministic algorithm to produce a factor of O(log n log* n) approximation to the string edit distance with moves. This is the first known significantly subquadratic algorithm for a string edit distance problem in which the distance involves nontrivial alignments. Our results are obtained by embedding strings into L1 vector space using a simplified parsing technique, which we call edit-sensitive parsing (ESP).
- Abrahamson, K. 1987. Generalized string matching. SIAM J. Comput. 16, 6, 1039--1051. Google ScholarDigital Library
- Alstrup, S., Brodal, G. S., and Rauhe, T. 2000. Pattern matching in dynamic texts. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 819--828. Google ScholarDigital Library
- Amir, A., Lewenstein, M., and Porat, E. 2000. Faster algorithms for string matching with k-mismatches. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 794--803. Google ScholarDigital Library
- Anderson, R. J., and Miller, G. L. 1991. Deterministic parallel list ranking. Algorithmica 6, 859--868.Google ScholarDigital Library
- Cole, R., and Hariharan, R. 1998. Approximate string matching: A simpler, faster algorithm. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 463--472. Google ScholarDigital Library
- Cole, R., and Vishkin, U. 1986. Deterministic coin tossing and accelerating cascades: Micro and macro techniques for designing parallel algorithms. In Proceedings of the ACM Symposium on Theory of Computing. 206--219. Google ScholarDigital Library
- Cormode, G., Paterson, M., Sahinalp, S. C., and Vishkin, U. 2000. Communication complexity of document exchange. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 197--206. Google ScholarDigital Library
- Crochemore, M., and Rytter, W. 1994. Text Algorithms. Oxford University Press. Google ScholarDigital Library
- Goel, A., Indyk, P., and Varadarajan, K. 2001. Reductions among high dimensional proximity problems. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 769--778. Google ScholarDigital Library
- Goldberg, A., Plotkin, S., and Shannon, G. 1987. Parallel symmetry-breaking in sparse graphs. In Proceedings of the ACM Symposium on Theory of Computing. 315--324. Google ScholarDigital Library
- Gusfield, D. 1997. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York. Google ScholarDigital Library
- Henzinger, M., Raghavan, P., and Rajagopalan, S. 1998. Computing on data streams. Tech. Rep. SRC 1998-011, DEC Systems Research Centre.Google Scholar
- Indyk, P. 2000. Stable distributions, pseudorandom generators, embeddings and data stream computation. In IEEE Conference on the Foundations of Computer Science. 189--197. Google ScholarDigital Library
- Indyk, P., and Motwani, R. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the ACM Symposium on Theory of Computing. 604--613. Google ScholarDigital Library
- Karloff, H. 1993. Fast algorithms for approximately counting mismatches. Inf. Process. Lett. 48, 2, 53--60. Google ScholarDigital Library
- Karp, R. M., Miller, R. E., and Rosenberg, A. L. 1972. Rapid identification of repeated patterns in strings, trees and arrays. In Proceedings of the ACM Symposium on Theory of Computing. 125--136. Google ScholarDigital Library
- Karp, R. M., and Rabin, M. O. 1987. Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31, 2, 249--260. Google ScholarDigital Library
- Landau, G., and Vishkin, U. 1989. Fast parallel and serial approximate string matching. J. Algorithms 10, 2, 157--169. Google ScholarDigital Library
- Landau, G. M., and Vishkin, U. 1986. Efficient string matching with k mismatches. Theor. Comput. Sci. 43, 239--249. Google ScholarCross Ref
- Levenshtein, V. I. 1966. Binary codes capable of correcting deletions, insertions and reversals. Soviet Physic. Doklady. 10, 8, 707--710.Google Scholar
- Masek, W. J., and Paterson, M. S. 1980. A faster algorithm computing string edit distances. J. Comput. Syst. Sci. 20, 18--31.Google ScholarCross Ref
- Mehlhorn, K., Sundar, R., and Uhrig, C. 1997. Maintaining dynamic sequences under equality tests in polylogarithmic time. Algorithmica 17, 2, 183--198.Google ScholarCross Ref
- Muthukrishnan, S., Sahinalp, S. C. 2000. Approximate nearest neighbors and sequence comparison with block operations. In Proceedings of the ACM Symposium on Theory of Computing. 416--424. Google ScholarDigital Library
- Myers, E. W. 1986. An O(N D) difference algorithm and its variations. Algorithmica 1, 251--256.Google ScholarCross Ref
- Sahinalp, S. C., and Vishkin, U. 1994. Symmetry breaking for suffix tree construction. In Proceedings of the ACM Symposium on Theory of Computing. 300--309. Google ScholarDigital Library
- Sahinalp, S. C., and Vishkin, U. 1995. Data compression using locally consistent parsing. Tech. Rep., University of Maryland Department of Computer Science.Google Scholar
- Sahinalp, S. C., and Vishkin, U. 1996. Efficient approximate and dynamic matching of patterns using a labeling paradigm. In IEEE Conference on the Foundations of Computer Science. 320--328. Google ScholarDigital Library
- Sankoff, D., and Kruskal, J. B. 1983. Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley.Google Scholar
- Shapira, D., and Storer, J. A. 2002. Edit distance with move operations. In Proceedings of the 13th Symposium on Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 2373. 85--98. Google ScholarDigital Library
Index Terms
- The string edit distance matching problem with moves
Recommendations
The string edit distance matching problem with moves
SODA '02: Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithmsThe edit distance between two strings S and R is defined to be the minimum number of character inserts, deletes and changes needed to convert R to S. Given a text string t of length n, and a pattern string p of length m, informally, the string edit ...
Approximating Tree Edit Distance through String Edit Distance
We present an algorithm to approximate edit distance between two ordered and rooted trees of bounded degree. In this algorithm, each input tree is transformed into a string by computing the Euler string, where labels of some edges in the input trees are ...
Swap and mismatch edit distance
There is no known algorithm that solves the general case of theapproximate string matching problem with the extended edit distance, where the edit operations are: insertion, deletion, mismatch and swap, in timeo(nm), wheren is the length of the text ...
Comments