Skip to main content
Log in

Compressed Subsequence Matching and Packed Tree Coloring

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

We present a new algorithm for subsequence matching in grammar compressed strings. Given a grammar of size n compressing a string of size N and a pattern string of size m over an alphabet of size \(\sigma \), our algorithm uses \(O(n+\frac{n\sigma }{w})\) space and \(O(n+\frac{n\sigma }{w}+m\log N\log w\cdot occ)\) or \(O(n+\frac{n\sigma }{w}\log w+m\log N\cdot occ)\) time. Here w is the word size and occ is the number of minimal occurrences of the pattern. Our algorithm uses less space than previous algorithms and is also faster for \(occ=o(\frac{n}{\log N})\) occurrences. The algorithm uses a new data structure that allows us to efficiently find the next occurrence of a given character after a given position in a compressed string. This data structure in turn is based on a new data structure for the tree color problem, where the node colors are packed in bit strings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Abiteboul, S., Alstrup, S., Kaplan, H., Milo, T., Rauhe, T.: Compact labeling scheme for ancestor queries. SIAM J. Comput. 35(6), 1295–1309 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  2. Alstrup, S., Holm, J.: Improved algorithms for finding level ancestors in dynamic trees. In: Montanari, U., Rolim, J.P., Welzl, E. (eds.) Automata, Languages and Programming. Lecture Notes in Computer Science, vol. 1853, pp. 73–84. Springer, Berlin, Heidelberg (2000)

    Chapter  Google Scholar 

  3. Alstrup, S., Holm, J., de Lichtenberg, K., Thorup, M.: Minimizing diameters of dynamic trees. In: Degano, P., Gorrieri, R., Marchetti-Spaccamela, A. (eds.) Automata, Languages and Programming. Lecture Notes in Computer Science, vol. 1256, pp. 270–280. Springer, Berlin, Heidelberg (1997)

    Chapter  Google Scholar 

  4. Alstrup, S., Husfeldt, T., Rauhe, T.: Marked ancestor problems. In: Proceedings of the 39th Annual Symposium on Foundations of Computer Science, pp. 534–543. IEEE Computer Society, Washington, DC (1998)

  5. Alstrup, S., Secher, J.P., Spork, M.: Optimal on-line decremental connectivity in trees. Inf. Process. Lett. 64(4), 161–164 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  6. Baeza-Yates, R.A.: Searching subsequences. Theor. Comput. Sci. 78(2), 363–376 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  7. Bender, M.A., Farach-Colton, M.: The level ancestor problem simplified. Theor. Comput. Sci. 321(1), 5–12 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  8. Berkman, O., Vishkin, U.: Finding level-ancestors in trees. J. Comput. Syst. Sci. 48(2), 214–230 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  9. Bille, P., Landau, G.M., Raman, R., Sadakane, K., Satti, S.R., Weimann, O.: Random access to grammar-compressed strings. In: Proceedings of the Twenty-second Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 373–389. SIAM, San Francisco (2011)

  10. Boasson, L., Cegielski, P., Guessarian, I., Matiyasevich, Y.: Window-accumulated subsequence matching problem is linear. Ann. Pure Appl. Log. 113(1), 59–80 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  11. Cégielski, P., Guessarian, I., Lifshits, Y., Matiyasevich, Y.: Window subsequence problems for compressed texts. In: Grigoriev, D., Harrison, J., Hirsch, E.A. (eds.) Computer Science-Theory and Applications. Lecture Notes in Computer Science, vol. 3967, pp. 127–136. Springer, Berlin, Heidelberg (2006)

    Chapter  Google Scholar 

  12. Cégielski, P., Guessarian, I., Matiyasevich, Y.: Multiple serial episodes matching. Inf. Proc. Lett. 98(6), 211–218 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  13. Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  14. Crochemore, M., Melichar, B., Troníček, Z.: Directed acyclic subsequence graph—overview. J. Discret. Algorithms 1(3), 255–280 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  15. Das, G., Fleischer, R., Gasieniec, L., Gunopulos, D., Kärkkäinen, J.: Episode matching. In: Apostolico, A., Hein, J. (eds.) Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 1264, pp. 12–27. Springer, Berlin, Heidelberg (1997)

    Chapter  Google Scholar 

  16. Dietz, P.F.: Finding level-ancestors in dynamic trees. In: Dehne, F., Sack, J.-R., Santoro, N. (eds.) Algorithms and Data Structures. Lecture Notes in Computer Science, vol. 519, pp. 32–40. Springer, Berlin, Heidelberg (1991)

    Chapter  Google Scholar 

  17. Ferragina, P., Muthukrishnan, S.: Efficient dynamic method-lookup for object oriented languages. In: Diaz, J., Serna, M. (eds.) European Symposium on Algorithms. Lecture Notes in Computer Science, vol. 1136, pp. 107–120. Springer, Berlin, Heidelberg (1996)

    Google Scholar 

  18. Fredman, M.L., Willard, D.E.: Surpassing the information theoretic bound with fusion trees. J. Comput. Syst. Sci. 47(3), 424–436 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  19. Larsson, N.J., Moffat, A.: Off-line dictionary-based compression. P. IEEE. 88(11), 1722–1732 (2000)

  20. Mannila, H., Toivonen, H., Verkamo, A.I.: Discovery of frequent episodes in event sequences. Data Min. Knowl. Disc. 1(3), 259–289 (1997)

    Article  Google Scholar 

  21. Muthukrishnan, S., Müller, M.: Time and space efficient method-lookup for object-oriented programs. In: Proceedings of the Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 42–51. Society for Industrial and Applied Mathematics, Atlanta (1996)

  22. Rytter, W.: Application of Lempel–Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302(1), 211–222 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  23. Sleator, D.D., Endre Tarjan, R.: A data structure for dynamic trees. J. Comput. Syst. Sci. 26(3), 362–391 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  24. Thorup, M.: Randomized sorting in \(O(n\log \log n)\) time and linear space using addition, shift, and bit-wise boolean operations. J. Algorithms 42(2), 205–230 (2002)

    Article  MATH  Google Scholar 

  25. Tiskin, A.: Faster subsequence recognition in compressed strings. J. Math. Sci. 158(5), 759–769 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  26. Tiskin, A.: Towards approximate matching in compressed strings: local subsequence recognition. In: Kulikov, A., Vereshchagin, N. (eds.) Computer Science-Theory and Applications, vol. 6651, pp. 401–414. Springer, Berlin, Heidelberg (2011)

    Chapter  Google Scholar 

  27. Troníček, Z.: Episode matching. In: Amir, A. (ed.) Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 2089, pp. 143–146. Springer, Berlin, Heidelberg (2001)

    Chapter  Google Scholar 

  28. Yamamoto, T., Bannai, H., Inenaga, S., Takeda, M.: Faster subsequence and dont-care pattern matching on compressed texts. In: Giancarlo, R., Manzini, G. (eds.) Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 6661, pp. 309–322. Springer, Berlin, Heidelberg (2011)

    Chapter  Google Scholar 

  29. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23(3), 337–343 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  30. Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory 24(5), 530–536 (1978)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick Hagge Cording.

Additional information

Philip Bille, Inge Li Gørtz: Supported in part by the The Danish Council for Independent Research and Natural Sciences Grant (DFF 1323-00178).

Philip Bille, Patrick Hagge Cording, Inge Li Gørtz: Supported in part by the Danish Research Council and the Danish Research Council under the Sapere Aude Program (DFF 4005-00267).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bille, P., Cording, P.H. & Gørtz, I.L. Compressed Subsequence Matching and Packed Tree Coloring. Algorithmica 77, 336–348 (2017). https://doi.org/10.1007/s00453-015-0068-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-015-0068-9

Keywords

Navigation