Article

Dictionary matching and indexing with errors and don't cares

Authors:
Richard Cole

New York University, NY, NY

New York University, NY, NY
View Profile

,
Lee-Ad Gottlieb

New York University, NY, NY

New York University, NY, NY
View Profile

,
Moshe Lewenstein

Bar-Ilan University, Ramat Gan, Israel

Bar-Ilan University, Ramat Gan, Israel
View Profile

STOC '04: Proceedings of the thirty-sixth annual ACM symposium on Theory of computingJune 2004Pages 91–100https://doi.org/10.1145/1007352.1007374

Published:13 June 2004Publication History

STOC '04: Proceedings of the thirty-sixth annual ACM symposium on Theory of computing

Pages 91–100

ABSTRACT

This paper considers various flavors of the following online problem: preprocess a text or collection of strings, so that given a query string p, all matches of p with the text can be reported quickly. In this paper we consider matches in which a bounded number of mismatches are allowed, or in which a bounded number of "don't care" characters are allowed. The specific problems we look at are: indexing, in which there is a single text t, and we seek locations where p matches a substring of t; dictionary queries, in which a collection of strings is given upfront, and we seek those strings which match p in their entirety; and dictionary matching, in which a collection of strings is given upfront, and we seek those substrings of a (long) p which match an original string in its entirety. These are all instances of an all-to-all matching problem, for which we provide a single solution.The performance bounds all have a similar character. For example, for the indexing problem with n=|t| and m=|p|, the query time for k substitutions is O(m + (c₁ log n)^k⁄k! + # matches), with a data structure of size O(n (c₂ log n)^k⁄k!) and a preprocessing time of O(n (c₂ log n)^k⁄k!), where c₁,c₂ > 1 are constants. The deterministic preprocessing assumes a weakly nonuniform RAM model; this assumption is not needed if randomization is used in the preprocessing.

References

K. Abrahamson. Generalized string matching. SIAM J. Computing, 16(6):1039--1051, 1987.]] Google ScholarDigital Library
A. V. Aho and M. J. Corasick. Efficient string matching. Comm. ACM, 18(6):333--340, 1975.]] Google ScholarDigital Library
T. Akutsu. Approximate string matching with don't care characters. Information Processing Letters, 55:235--239, 1995.]] Google ScholarDigital Library
T. Akutsu. Approximate string matching with variable length don't care characters. IEICE Trans. Information and Systems, E79-D:1353--1354, 1996.]]Google Scholar
A. Amir and M. Farach. Adaptive dictionary matching. Proc. of the Symposium on Foundations of Computer Science, 1991, 760--766.]] Google ScholarDigital Library
A. Amir, M. Farach, R. Giancarlo, Z. Galil and K. Park. Dynamic dictionary matching. J. of Computer and System Sciences, 49(2):208--222, 1994.]] Google ScholarDigital Library
A. Amir, M. Farach, R. M. Idury, J. A. La Poutre, and A. A. Schaffer. Improved dynamic dictionary matching. Information and Computation, 119(2):258--282, 1995.]] Google ScholarDigital Library
A. Amir, D. Keselman, G. M. Landau, N. Lewenstein, M. Lewenstein, and M. Rodeh. Text Indexing and dictionary matching with one error. J. of Algorithms, 37(2): 309--325, 2000.]] Google ScholarDigital Library
A. Amir, G. Landau, M. Lewenstein and D. Sokol. Dynamic pattern, static text matching. Submitted for journal publication; preliminary version, Proc. of Workshop on Algorithms and Data Structures, 2003, 340--352.]]Google ScholarCross Ref
A. Amir, M. Lewenstein and E. Porat. Faster algorithms for string matching with k mismatches. Proc. of the Symposium on Discrete Algorithms, 2000, 794--803.]] Google ScholarDigital Library
A. Amir, M. Lewenstein and E. Porat. Approximate subset matching with don't cares. Proc. of the Symposium on Discrete Algorithms, 2001, 279--288.]] Google ScholarDigital Library
G. S. Brodal and L. Gasieniec. Approximate dictionary queries. Proc. of the Symposium on Combinatorial Pattern Matching, 1996, 65--74. (LNCS 1075).]] Google ScholarDigital Library
G. S. Brodal and S. Venkatesh. Improved bounds for dictionary lookup with one error. Information Processing Letters, 75(1-2):57--59 (2000).]] Google ScholarDigital Library
A. L. Buchsbaum, M. T. Goodrich, J. Westbrook. Range searching over tree cross products. Proc. of the European Symposium on Algorithms, 2000, 120--131.]] Google ScholarDigital Library
R. Cole and R. Hariharan. Approximate string matching: A faster simpler algorithm. Proc. of the Symposium on Discrete Algorithms, 1998, 463--472.]] Google ScholarDigital Library
R. Cole and R. Hariharan. Verifying candidate matches in sparse and wildcard matching. Proc. of the Symposium on Theory of Computing, 2002, 592--601.]] Google ScholarDigital Library
M. Farach. Optimal suffix tree construction with large alphabets. Proc. of the Symposium on Foundations of Computer Science, 1997, 137--143.]] Google ScholarDigital Library
P. Ferragina, S. Muthukrishnan, and M. deBerg. Multi-method dispatching: a geometric approach with applications to string matching. Proc. of the Symposium on the Theory of Computing, 1999, 483--491.]] Google ScholarDigital Library
M. J. Fischer and M. S. Paterson. String matching and other products. Complexity of Computation, R. M. Karp (editor), SIAM-AMS Proceedings, 7:113--125, 1974.]]Google Scholar
Z. Galil and R. Giancarlo. Improved string matching with k mismatches. SIGACT News, 17(4), 52--54, 1986.]] Google ScholarDigital Library
D. Greene, M. Parnas and F. Yao. Multi-index hashing for information retrieval. Proc. of the Symposium on the Foundations of Computer Science, 1994, 722--731.]]Google ScholarDigital Library
R. Grossi, J. S. Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. Proc. of the Symposium on Theory of Computing, 2000, 397--406.]] Google ScholarDigital Library
T. Hagerup, P. B. Miltersen, R. Pagh. Deterministic dictionaries. J. of Algorithms, 41(1):69--85, 2001.]] Google ScholarDigital Library
D. Harel, R. E. Tarjan. Fast algorithms for finding nearest common ancestors. SIAM Journal on Computing, 13(2):338--55, 1984.]] Google ScholarDigital Library
R. M. Idury and A. A Sch"affer. Dynamic dictionary matching with failure functions. Proc. of the 3rd Annual Symposium on Combinatorial Pattern Matching, 1992, 273--284.]] Google ScholarDigital Library
P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. Proc. of the Symposium on Theory of Computing, 1998, 604--613.]] Google ScholarDigital Library
E. Kushilevitz, R. Ostrovsky, Y. Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM Journal on Computing, 30(2):457--474, 2000.]] Google ScholarDigital Library
G. M. Landau and U. Vishkin. Efficient string matching with k mismatches. Theoretical Computer Science, 43, 239-249, 1986.]] Google ScholarDigital Library
G. M. Landau and U. Vishkin. Fast parallel and serial approximate string matching. J. of Algorithms, 10(2):157--169, 1989.]] Google ScholarDigital Library
U. Manber and S. Wu. An algorithm for approximate membership checking with applications to password security. Information Processing Letters, 50:92--197, 1994.]] Google ScholarDigital Library
E. M. McCreight. A space-economical suffix tree construction algorithm. J. of the ACM, 23:262--272, 1976.]] Google ScholarDigital Library
K. Mehlhorn. Data Structures and Efficient Algorithms, Vol. 1: Sorting and Searching, pp. 177--178. Springer Verlag, EATCS Monographs, 1984.]]Google Scholar
M. Minsky and S. Papert. Perceptrons. MIT Press, Cambridge, Mass., 1969.]]Google Scholar
S. C. Sahinalp and U. Vishkin. Efficient approximate and dynamic matching of patterns using a labeling paradigm. Proc. of the Symposium on Foundations of Computer Science, 1996, 320--328.]] Google ScholarDigital Library
B. Schieber and U. Vishkin. On finding lowest common ancestors: simplifications and parallelization. SIAM Journal on Computing, 17(6):1253--62, 1988.]] Google ScholarDigital Library
S. Shekhar, S. Chawla, S. Ravada, A. Fetterer, X. Liu, C. T. Lu. Spatial databases - accomplishments and research needs. TKDE, 11(1):45--55 (1999).]] Google ScholarDigital Library
E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14:249--260, 1995.]]Google ScholarDigital Library
P. Weiner. Linear pattern matching algorithm. Proc. of the Symposium on Switching and Automata Theory, 1973, 1--11.]]Google ScholarDigital Library
D. E. Willard. Log-logarithmic worst-case range queries are possible in space θ(n). Information Processing Letters, 17(2):81--84, 1983.]]Google ScholarCross Ref
A. C. -C. Yao and F. F. Yao. Dictionary lookup with one error. J. of Algorithms, 25(1):194--202, 1997.]] Google ScholarDigital Library

Index Terms

Dictionary matching and indexing with errors and don't cares

Recommendations

Less space

Text indexing is a fundamental problem in computer science, where the task is to index a given text (string) T 1 . . n , such that whenever a pattern P 1 . . p comes as a query, we can efficiently report all those locations where P occurs as a substring ...
Read More
Compressed Dictionary Matching with One Error
DCC '11: Proceedings of the 2011 Data Compression Conference

Given a set $\D$ of $d$ patterns of total length $n$, the dictionary matching problem is to index $\D$ such that for any query text $T$, we can locate the occurrences of any pattern within $T$ efficiently. This problem can be solved in optimal $O(|T|+...
Read More
Compressing Dictionary Matching Index via Sparsification Technique

Given a set $\mathcal{D}$ of patterns of total length n, the dictionary matching problem is to index $\mathcal{D}$ such that for any query text T, we can locate the occurrences of any pattern within T efficiently. This problem can be solved in optimal O(...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
STOC '04: Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
June 2004
660 pages
ISBN:1581138520
DOI:10.1145/1007352
Program Chair:
László Babai
University of Chicago, Chicago, IL
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 June 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
approximate pattern matching
dictionary matching
dictionary query
suffix trees
text indexing
wildcards
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,469of4,586submissions,32%
Upcoming Conference
STOC '24

Sponsor:

sigact

56th Annual ACM Symposium on Theory of Computing (STOC 2024)

June 24 - 28, 2024

Vancouver , BC , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 209
  Total Citations
  View Citations
- 1,116
  Total Downloads
- Downloads (Last 12 months)40
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Dictionary matching and indexing with errors and don't cares

STOC '04: Proceedings of the thirty-sixth annual ACM symposium on Theory of computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Less space

Compressed Dictionary Matching with One Error

Compressing Dictionary Matching Index via Sparsification Technique

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Dictionary matching and indexing with errors and don't cares

STOC '04: Proceedings of the thirty-sixth annual ACM symposium on Theory of computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Less space

Compressed Dictionary Matching with One Error

Compressing Dictionary Matching Index via Sparsification Technique

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media