ABSTRACT
This paper considers various flavors of the following online problem: preprocess a text or collection of strings, so that given a query string p, all matches of p with the text can be reported quickly. In this paper we consider matches in which a bounded number of mismatches are allowed, or in which a bounded number of "don't care" characters are allowed. The specific problems we look at are: indexing, in which there is a single text t, and we seek locations where p matches a substring of t; dictionary queries, in which a collection of strings is given upfront, and we seek those strings which match p in their entirety; and dictionary matching, in which a collection of strings is given upfront, and we seek those substrings of a (long) p which match an original string in its entirety. These are all instances of an all-to-all matching problem, for which we provide a single solution.The performance bounds all have a similar character. For example, for the indexing problem with n=|t| and m=|p|, the query time for k substitutions is O(m + (c1 log n)k⁄k! + # matches), with a data structure of size O(n (c2 log n)k⁄k!) and a preprocessing time of O(n (c2 log n)k⁄k!), where c1,c2 > 1 are constants. The deterministic preprocessing assumes a weakly nonuniform RAM model; this assumption is not needed if randomization is used in the preprocessing.
- K. Abrahamson. Generalized string matching. SIAM J. Computing, 16(6):1039--1051, 1987.]] Google ScholarDigital Library
- A. V. Aho and M. J. Corasick. Efficient string matching. Comm. ACM, 18(6):333--340, 1975.]] Google ScholarDigital Library
- T. Akutsu. Approximate string matching with don't care characters. Information Processing Letters, 55:235--239, 1995.]] Google ScholarDigital Library
- T. Akutsu. Approximate string matching with variable length don't care characters. IEICE Trans. Information and Systems, E79-D:1353--1354, 1996.]]Google Scholar
- A. Amir and M. Farach. Adaptive dictionary matching. Proc. of the Symposium on Foundations of Computer Science, 1991, 760--766.]] Google ScholarDigital Library
- A. Amir, M. Farach, R. Giancarlo, Z. Galil and K. Park. Dynamic dictionary matching. J. of Computer and System Sciences, 49(2):208--222, 1994.]] Google ScholarDigital Library
- A. Amir, M. Farach, R. M. Idury, J. A. La Poutre, and A. A. Schaffer. Improved dynamic dictionary matching. Information and Computation, 119(2):258--282, 1995.]] Google ScholarDigital Library
- A. Amir, D. Keselman, G. M. Landau, N. Lewenstein, M. Lewenstein, and M. Rodeh. Text Indexing and dictionary matching with one error. J. of Algorithms, 37(2): 309--325, 2000.]] Google ScholarDigital Library
- A. Amir, G. Landau, M. Lewenstein and D. Sokol. Dynamic pattern, static text matching. Submitted for journal publication; preliminary version, Proc. of Workshop on Algorithms and Data Structures, 2003, 340--352.]]Google ScholarCross Ref
- A. Amir, M. Lewenstein and E. Porat. Faster algorithms for string matching with k mismatches. Proc. of the Symposium on Discrete Algorithms, 2000, 794--803.]] Google ScholarDigital Library
- A. Amir, M. Lewenstein and E. Porat. Approximate subset matching with don't cares. Proc. of the Symposium on Discrete Algorithms, 2001, 279--288.]] Google ScholarDigital Library
- G. S. Brodal and L. Gasieniec. Approximate dictionary queries. Proc. of the Symposium on Combinatorial Pattern Matching, 1996, 65--74. (LNCS 1075).]] Google ScholarDigital Library
- G. S. Brodal and S. Venkatesh. Improved bounds for dictionary lookup with one error. Information Processing Letters, 75(1-2):57--59 (2000).]] Google ScholarDigital Library
- A. L. Buchsbaum, M. T. Goodrich, J. Westbrook. Range searching over tree cross products. Proc. of the European Symposium on Algorithms, 2000, 120--131.]] Google ScholarDigital Library
- R. Cole and R. Hariharan. Approximate string matching: A faster simpler algorithm. Proc. of the Symposium on Discrete Algorithms, 1998, 463--472.]] Google ScholarDigital Library
- R. Cole and R. Hariharan. Verifying candidate matches in sparse and wildcard matching. Proc. of the Symposium on Theory of Computing, 2002, 592--601.]] Google ScholarDigital Library
- M. Farach. Optimal suffix tree construction with large alphabets. Proc. of the Symposium on Foundations of Computer Science, 1997, 137--143.]] Google ScholarDigital Library
- P. Ferragina, S. Muthukrishnan, and M. deBerg. Multi-method dispatching: a geometric approach with applications to string matching. Proc. of the Symposium on the Theory of Computing, 1999, 483--491.]] Google ScholarDigital Library
- M. J. Fischer and M. S. Paterson. String matching and other products. Complexity of Computation, R. M. Karp (editor), SIAM-AMS Proceedings, 7:113--125, 1974.]]Google Scholar
- Z. Galil and R. Giancarlo. Improved string matching with k mismatches. SIGACT News, 17(4), 52--54, 1986.]] Google ScholarDigital Library
- D. Greene, M. Parnas and F. Yao. Multi-index hashing for information retrieval. Proc. of the Symposium on the Foundations of Computer Science, 1994, 722--731.]]Google ScholarDigital Library
- R. Grossi, J. S. Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. Proc. of the Symposium on Theory of Computing, 2000, 397--406.]] Google ScholarDigital Library
- T. Hagerup, P. B. Miltersen, R. Pagh. Deterministic dictionaries. J. of Algorithms, 41(1):69--85, 2001.]] Google ScholarDigital Library
- D. Harel, R. E. Tarjan. Fast algorithms for finding nearest common ancestors. SIAM Journal on Computing, 13(2):338--55, 1984.]] Google ScholarDigital Library
- R. M. Idury and A. A Sch"affer. Dynamic dictionary matching with failure functions. Proc. of the 3rd Annual Symposium on Combinatorial Pattern Matching, 1992, 273--284.]] Google ScholarDigital Library
- P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. Proc. of the Symposium on Theory of Computing, 1998, 604--613.]] Google ScholarDigital Library
- E. Kushilevitz, R. Ostrovsky, Y. Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM Journal on Computing, 30(2):457--474, 2000.]] Google ScholarDigital Library
- G. M. Landau and U. Vishkin. Efficient string matching with k mismatches. Theoretical Computer Science, 43, 239-249, 1986.]] Google ScholarDigital Library
- G. M. Landau and U. Vishkin. Fast parallel and serial approximate string matching. J. of Algorithms, 10(2):157--169, 1989.]] Google ScholarDigital Library
- U. Manber and S. Wu. An algorithm for approximate membership checking with applications to password security. Information Processing Letters, 50:92--197, 1994.]] Google ScholarDigital Library
- E. M. McCreight. A space-economical suffix tree construction algorithm. J. of the ACM, 23:262--272, 1976.]] Google ScholarDigital Library
- K. Mehlhorn. Data Structures and Efficient Algorithms, Vol. 1: Sorting and Searching, pp. 177--178. Springer Verlag, EATCS Monographs, 1984.]]Google Scholar
- M. Minsky and S. Papert. Perceptrons. MIT Press, Cambridge, Mass., 1969.]]Google Scholar
- S. C. Sahinalp and U. Vishkin. Efficient approximate and dynamic matching of patterns using a labeling paradigm. Proc. of the Symposium on Foundations of Computer Science, 1996, 320--328.]] Google ScholarDigital Library
- B. Schieber and U. Vishkin. On finding lowest common ancestors: simplifications and parallelization. SIAM Journal on Computing, 17(6):1253--62, 1988.]] Google ScholarDigital Library
- S. Shekhar, S. Chawla, S. Ravada, A. Fetterer, X. Liu, C. T. Lu. Spatial databases - accomplishments and research needs. TKDE, 11(1):45--55 (1999).]] Google ScholarDigital Library
- E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14:249--260, 1995.]]Google ScholarDigital Library
- P. Weiner. Linear pattern matching algorithm. Proc. of the Symposium on Switching and Automata Theory, 1973, 1--11.]]Google ScholarDigital Library
- D. E. Willard. Log-logarithmic worst-case range queries are possible in space θ(n). Information Processing Letters, 17(2):81--84, 1983.]]Google ScholarCross Ref
- A. C. -C. Yao and F. F. Yao. Dictionary lookup with one error. J. of Algorithms, 25(1):194--202, 1997.]] Google ScholarDigital Library
Index Terms
- Dictionary matching and indexing with errors and don't cares
Recommendations
Less space
Text indexing is a fundamental problem in computer science, where the task is to index a given text (string) T 1 . . n , such that whenever a pattern P 1 . . p comes as a query, we can efficiently report all those locations where P occurs as a substring ...
Compressed Dictionary Matching with One Error
DCC '11: Proceedings of the 2011 Data Compression ConferenceGiven a set $\D$ of $d$ patterns of total length $n$, the dictionary matching problem is to index $\D$ such that for any query text $T$, we can locate the occurrences of any pattern within $T$ efficiently. This problem can be solved in optimal $O(|T|+...
Compressing Dictionary Matching Index via Sparsification Technique
Given a set $\mathcal{D}$ of patterns of total length n, the dictionary matching problem is to index $\mathcal{D}$ such that for any query text T, we can locate the occurrences of any pattern within T efficiently. This problem can be solved in optimal O(...
Comments