skip to main content
10.1145/1007352.1007374acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
Article

Dictionary matching and indexing with errors and don't cares

Published:13 June 2004Publication History

ABSTRACT

This paper considers various flavors of the following online problem: preprocess a text or collection of strings, so that given a query string p, all matches of p with the text can be reported quickly. In this paper we consider matches in which a bounded number of mismatches are allowed, or in which a bounded number of "don't care" characters are allowed. The specific problems we look at are: indexing, in which there is a single text t, and we seek locations where p matches a substring of t; dictionary queries, in which a collection of strings is given upfront, and we seek those strings which match p in their entirety; and dictionary matching, in which a collection of strings is given upfront, and we seek those substrings of a (long) p which match an original string in its entirety. These are all instances of an all-to-all matching problem, for which we provide a single solution.The performance bounds all have a similar character. For example, for the indexing problem with n=|t| and m=|p|, the query time for k substitutions is O(m + (c1 log n)k⁄k! + # matches), with a data structure of size O(n (c2 log n)k⁄k!) and a preprocessing time of O(n (c2 log n)k⁄k!), where c1,c2 > 1 are constants. The deterministic preprocessing assumes a weakly nonuniform RAM model; this assumption is not needed if randomization is used in the preprocessing.

References

  1. K. Abrahamson. Generalized string matching. SIAM J. Computing, 16(6):1039--1051, 1987.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. V. Aho and M. J. Corasick. Efficient string matching. Comm. ACM, 18(6):333--340, 1975.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. Akutsu. Approximate string matching with don't care characters. Information Processing Letters, 55:235--239, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Akutsu. Approximate string matching with variable length don't care characters. IEICE Trans. Information and Systems, E79-D:1353--1354, 1996.]]Google ScholarGoogle Scholar
  5. A. Amir and M. Farach. Adaptive dictionary matching. Proc. of the Symposium on Foundations of Computer Science, 1991, 760--766.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Amir, M. Farach, R. Giancarlo, Z. Galil and K. Park. Dynamic dictionary matching. J. of Computer and System Sciences, 49(2):208--222, 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Amir, M. Farach, R. M. Idury, J. A. La Poutre, and A. A. Schaffer. Improved dynamic dictionary matching. Information and Computation, 119(2):258--282, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Amir, D. Keselman, G. M. Landau, N. Lewenstein, M. Lewenstein, and M. Rodeh. Text Indexing and dictionary matching with one error. J. of Algorithms, 37(2): 309--325, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Amir, G. Landau, M. Lewenstein and D. Sokol. Dynamic pattern, static text matching. Submitted for journal publication; preliminary version, Proc. of Workshop on Algorithms and Data Structures, 2003, 340--352.]]Google ScholarGoogle ScholarCross RefCross Ref
  10. A. Amir, M. Lewenstein and E. Porat. Faster algorithms for string matching with k mismatches. Proc. of the Symposium on Discrete Algorithms, 2000, 794--803.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Amir, M. Lewenstein and E. Porat. Approximate subset matching with don't cares. Proc. of the Symposium on Discrete Algorithms, 2001, 279--288.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. S. Brodal and L. Gasieniec. Approximate dictionary queries. Proc. of the Symposium on Combinatorial Pattern Matching, 1996, 65--74. (LNCS 1075).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. S. Brodal and S. Venkatesh. Improved bounds for dictionary lookup with one error. Information Processing Letters, 75(1-2):57--59 (2000).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. L. Buchsbaum, M. T. Goodrich, J. Westbrook. Range searching over tree cross products. Proc. of the European Symposium on Algorithms, 2000, 120--131.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Cole and R. Hariharan. Approximate string matching: A faster simpler algorithm. Proc. of the Symposium on Discrete Algorithms, 1998, 463--472.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Cole and R. Hariharan. Verifying candidate matches in sparse and wildcard matching. Proc. of the Symposium on Theory of Computing, 2002, 592--601.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Farach. Optimal suffix tree construction with large alphabets. Proc. of the Symposium on Foundations of Computer Science, 1997, 137--143.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. Ferragina, S. Muthukrishnan, and M. deBerg. Multi-method dispatching: a geometric approach with applications to string matching. Proc. of the Symposium on the Theory of Computing, 1999, 483--491.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. J. Fischer and M. S. Paterson. String matching and other products. Complexity of Computation, R. M. Karp (editor), SIAM-AMS Proceedings, 7:113--125, 1974.]]Google ScholarGoogle Scholar
  20. Z. Galil and R. Giancarlo. Improved string matching with k mismatches. SIGACT News, 17(4), 52--54, 1986.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Greene, M. Parnas and F. Yao. Multi-index hashing for information retrieval. Proc. of the Symposium on the Foundations of Computer Science, 1994, 722--731.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Grossi, J. S. Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. Proc. of the Symposium on Theory of Computing, 2000, 397--406.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Hagerup, P. B. Miltersen, R. Pagh. Deterministic dictionaries. J. of Algorithms, 41(1):69--85, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. Harel, R. E. Tarjan. Fast algorithms for finding nearest common ancestors. SIAM Journal on Computing, 13(2):338--55, 1984.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. M. Idury and A. A Sch"affer. Dynamic dictionary matching with failure functions. Proc. of the 3rd Annual Symposium on Combinatorial Pattern Matching, 1992, 273--284.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. Proc. of the Symposium on Theory of Computing, 1998, 604--613.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. E. Kushilevitz, R. Ostrovsky, Y. Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM Journal on Computing, 30(2):457--474, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. G. M. Landau and U. Vishkin. Efficient string matching with k mismatches. Theoretical Computer Science, 43, 239-249, 1986.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. G. M. Landau and U. Vishkin. Fast parallel and serial approximate string matching. J. of Algorithms, 10(2):157--169, 1989.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. U. Manber and S. Wu. An algorithm for approximate membership checking with applications to password security. Information Processing Letters, 50:92--197, 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. E. M. McCreight. A space-economical suffix tree construction algorithm. J. of the ACM, 23:262--272, 1976.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. K. Mehlhorn. Data Structures and Efficient Algorithms, Vol. 1: Sorting and Searching, pp. 177--178. Springer Verlag, EATCS Monographs, 1984.]]Google ScholarGoogle Scholar
  33. M. Minsky and S. Papert. Perceptrons. MIT Press, Cambridge, Mass., 1969.]]Google ScholarGoogle Scholar
  34. S. C. Sahinalp and U. Vishkin. Efficient approximate and dynamic matching of patterns using a labeling paradigm. Proc. of the Symposium on Foundations of Computer Science, 1996, 320--328.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. B. Schieber and U. Vishkin. On finding lowest common ancestors: simplifications and parallelization. SIAM Journal on Computing, 17(6):1253--62, 1988.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. S. Shekhar, S. Chawla, S. Ravada, A. Fetterer, X. Liu, C. T. Lu. Spatial databases - accomplishments and research needs. TKDE, 11(1):45--55 (1999).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14:249--260, 1995.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. P. Weiner. Linear pattern matching algorithm. Proc. of the Symposium on Switching and Automata Theory, 1973, 1--11.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. D. E. Willard. Log-logarithmic worst-case range queries are possible in space θ(n). Information Processing Letters, 17(2):81--84, 1983.]]Google ScholarGoogle ScholarCross RefCross Ref
  40. A. C. -C. Yao and F. F. Yao. Dictionary lookup with one error. J. of Algorithms, 25(1):194--202, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Dictionary matching and indexing with errors and don't cares

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            STOC '04: Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
            June 2004
            660 pages
            ISBN:1581138520
            DOI:10.1145/1007352

            Copyright © 2004 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 13 June 2004

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            Overall Acceptance Rate1,469of4,586submissions,32%

            Upcoming Conference

            STOC '24
            56th Annual ACM Symposium on Theory of Computing (STOC 2024)
            June 24 - 28, 2024
            Vancouver , BC , Canada

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader