Skip to main content
Log in

Compressing Dictionary Matching Index via Sparsification Technique

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

Given a set \(\mathcal{D}\) of patterns of total length n, the dictionary matching problem is to index \(\mathcal{D}\) such that for any query text T, we can locate the occurrences of any pattern within T efficiently. This problem can be solved in optimal O(|T|+occ) time by the classical AC automaton (Aho and Corasick in Commun. ACM 18(6):333–340, 1975), where occ denotes the number of occurrences. The space requirement is O(n) words which is still far from optimal. In this paper, we show that in many cases, sparsification technique can be applied to improve the space requirements of the indexes for the dictionary matching and its related problems. First, we give a compressed index for dictionary matching, and show that such an index can be generalized to handle dynamic updates of \(\mathcal{D}\). Also, we give a compressed index for approximate dictionary matching with one error. In each case, the query time is only slowed down by a polylogarithmic factor when compared with that achieved by the best O(n)-word counterparts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Algorithm 2

Similar content being viewed by others

Notes

  1. In fact, Chan et al.’s index can support dynamic updates of \(\mathcal{D}\).

  2. A predecessor query reports the largest integer in the set that is at most the query integer, while a successor query reports the smallest integer in the set that is at least the query integer.

  3. Here, node insertion includes the case where a node is inserted into the middle of an existing edge, thus splitting one edge into two edges. On the other hand, when a degree-1 internal node is deleted, we reverse the process so that its parent edge and its child edge will be merged to a single edge.

References

  1. Aho, A., Corasick, M.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  2. Alstrup, S., Husfeldt, T., Rauhe, T.: Marked ancestor problems. In: Proceedings of IEEE Symposium on Foundations of Computer Science (FOCS’98), pp. 534–544 (1998)

    Google Scholar 

  3. Amir, A., Farach, M.: Adaptive dictionary matching. In: Proceedings of IEEE Symposium on Foundations of Computer Science (FOCS’91), pp. 760–766 (1991)

    Google Scholar 

  4. Amir, A., Farach, M., Galil, Z., Giancarlo, R., Park, K.: Dynamic dictionary matching. J. Comput. Syst. Sci. 49(2), 208–222 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  5. Amir, A., Farach, M., Idury, R., Poutre, A.L., Schaffer, A.: Improved dynamic dictionary matching. Inf. Comput. 119(2), 258–282 (1995)

    Article  MATH  Google Scholar 

  6. Amir, A., Keselman, D., Landau, G.M., Lewenstein, M., Lewenstein, N., Rodeh, M.: Text indexing and dictionary matching with one error. J. Algorithms 37(2), 309–325 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  7. Arge, L., Vitter, J.S.: Optimal external memory interval management. SIAM J. Comput. 32(6), 1488–1508 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  8. Belazzougui, D.: Succinct dictionary matching with no slowdown. In: Proceedings of Symposium on Combinatorial Pattern Matching (CPM’10), pp. 88–100 (2010)

    Chapter  Google Scholar 

  9. Bender, M.A., Cole, R., Demaine, E.D., Farach-Colton, M., Zito, J.: Two simplified algorithms for maintaining order in a list. In: Proceedings of European Symposium on Algorithms (ESA’02), pp. 152–164 (2002)

    Google Scholar 

  10. Bender, M.A., Farach-Colton, M., Pemmasani, G., Skiena, S., Sumazin, P.: Lowest common ancestors in trees and directed acyclic graphs. J. Algorithms 57(2), 75–94 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  11. Chan, H.L., Hon, W.K., Lam, T.W., Sadakane, K.: Compressed indexes for dynamic text collections. ACM Trans. Algorithms 3, 2 (2007)

    Article  MathSciNet  Google Scholar 

  12. Chien, Y.F., Hon, W.K., Shah, R., Vitter, J.S.: Geometric Burrows-Wheeler transform: linking range searching and text indexing. In: Proceedings of IEEE Data Compression Conference (DCC’08), pp. 252–261 (2008)

    Google Scholar 

  13. Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of ACM Symposium on Theory of Computing (STOC’04), pp. 91–100 (2004)

    Google Scholar 

  14. Dietz, P.F., Sleator, D.D.: Two algorithms for maintaining order in a list. In: Proceedings of ACM Symposium on Theory of Computing (STOC’87), pp. 365–372 (1987)

    Google Scholar 

  15. Ferragina, P., Grossi, R.: The string B-tree: a new data structure for string search in external memory and its applications. J. ACM 46(2), 236–280 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  16. Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52(4), 552–581 (2005)

    Article  MathSciNet  Google Scholar 

  17. Ferragina, P., Venturini, R.: A simple storage scheme for strings achieving entropy bounds. Theor. Comput. Sci. 372(1), 115–121 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  18. Ferragina, P., Muthukrishnan, S., de Berg, M.: Multi-method dispatching: a geometric approach with applications to string matching problems. In: Proceedings of ACM Symposium on Theory of Computing (STOC’99), pp. 483–491 (1999)

    Google Scholar 

  19. Fischer, J., Heun, V.: Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput. 40(2), 465–492 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  20. Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35(2), 378–407 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  21. Hagerup, T., Miltersen, P.B., Pagh, R.: Deterministic dictionaries. J. Algorithms 41(1), 69–85 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  22. Hon, W.K., Lam, T.W., Shah, R., Tam, S.L., Vitter, J.S.: Compressed index for dictionary matching. In: Proceedings of IEEE Data Compression Conference (DCC’08), pp. 23–32 (2008)

    Google Scholar 

  23. Hon, W.K., Shah, R., Thankachan, S.V., Vitter, J.S.: On entropy-compressed text indexing in external memory. In: Proceedings of International Symposium on String Processing and Information Retrieval (SPIRE’09), pp. 75–89 (2009)

    Chapter  Google Scholar 

  24. Hon, W.K., Ku, T.H., Shah, R., Thankachan, S.V., Vitter, J.S.: Faster compressed dictionary matching. In: Proceedings of International Symposium on String Processing and Information Retrieval (SPIRE’10), pp. 191–200 (2010)

    Chapter  Google Scholar 

  25. Kärkkäinen, J., Ukkonen, E.: Sparse suffix trees. In: Proceedings of International Conference on Computing and Combinatorics (COCOON’96), pp. 219–230 (1996)

    Google Scholar 

  26. Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  27. McCreight, E.M.: A space-economical suffix tree construction algorithm. J. ACM 23(2), 262–272 (1976)

    Article  MATH  MathSciNet  Google Scholar 

  28. McCreight, E.M.: Priority search trees. SIAM J. Comput. 14(2), 257–276 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  29. Overmars, M.H.: Efficient data structures for range searching on a grid. J. Algorithms 9(2), 254–275 (1988)

    Article  MATH  MathSciNet  Google Scholar 

  30. Sadakane, K.: Compressed suffix trees with full functionality. Theory Comput. Syst. 41(4), 589–607 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  31. Weiner, P.: Linear pattern matching algorithms. In: Proceedings of Symposium on Switching and Automata Theory, pp. 1–11 (1973)

    Chapter  Google Scholar 

  32. Willard, D.E.: Log-logarithmic worst-case range queries are possible in space Θ(N). Inf. Process. Lett. 17(2), 81–84 (1983)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wing-Kai Hon.

Additional information

This work is supported in part by Taiwan NSC Grant 99-2221-E-007-123 (W.K. Hon), HK RGC Grant HKU 7140/06E (T.W. Lam), and US NSF Grant CCF-1017623 (R. Shah and J.S. Vitter). A preliminary version of this work appears in the following conference proceedings: the Proceedings of the IEEE Data Compression Conference (DCC), 2008 and 2011, and the Proceedings of the International Symposium on Algorithms and Computation (ISAAC), 2009.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hon, WK., Ku, TH., Lam, TW. et al. Compressing Dictionary Matching Index via Sparsification Technique. Algorithmica 72, 515–538 (2015). https://doi.org/10.1007/s00453-013-9863-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-013-9863-3

Keywords

Navigation