Abstract
We present a new algorithm for subsequence matching in grammar compressed strings. Given a grammar of size n compressing a string of size N and a pattern string of size m over an alphabet of size \(\sigma \), our algorithm uses \(O(n+\frac{n\sigma }{w})\) space and \(O(n+\frac{n\sigma }{w}+m\log N\log w\cdot occ)\) or \(O(n+\frac{n\sigma }{w}\log w+m\log N\cdot occ)\) time. Here w is the word size and occ is the number of minimal occurrences of the pattern. Our algorithm uses less space than previous algorithms and is also faster for \(occ=o(\frac{n}{\log N})\) occurrences. The algorithm uses a new data structure that allows us to efficiently find the next occurrence of a given character after a given position in a compressed string. This data structure in turn is based on a new data structure for the tree color problem, where the node colors are packed in bit strings.
Similar content being viewed by others
References
Abiteboul, S., Alstrup, S., Kaplan, H., Milo, T., Rauhe, T.: Compact labeling scheme for ancestor queries. SIAM J. Comput. 35(6), 1295–1309 (2006)
Alstrup, S., Holm, J.: Improved algorithms for finding level ancestors in dynamic trees. In: Montanari, U., Rolim, J.P., Welzl, E. (eds.) Automata, Languages and Programming. Lecture Notes in Computer Science, vol. 1853, pp. 73–84. Springer, Berlin, Heidelberg (2000)
Alstrup, S., Holm, J., de Lichtenberg, K., Thorup, M.: Minimizing diameters of dynamic trees. In: Degano, P., Gorrieri, R., Marchetti-Spaccamela, A. (eds.) Automata, Languages and Programming. Lecture Notes in Computer Science, vol. 1256, pp. 270–280. Springer, Berlin, Heidelberg (1997)
Alstrup, S., Husfeldt, T., Rauhe, T.: Marked ancestor problems. In: Proceedings of the 39th Annual Symposium on Foundations of Computer Science, pp. 534–543. IEEE Computer Society, Washington, DC (1998)
Alstrup, S., Secher, J.P., Spork, M.: Optimal on-line decremental connectivity in trees. Inf. Process. Lett. 64(4), 161–164 (1997)
Baeza-Yates, R.A.: Searching subsequences. Theor. Comput. Sci. 78(2), 363–376 (1991)
Bender, M.A., Farach-Colton, M.: The level ancestor problem simplified. Theor. Comput. Sci. 321(1), 5–12 (2004)
Berkman, O., Vishkin, U.: Finding level-ancestors in trees. J. Comput. Syst. Sci. 48(2), 214–230 (1994)
Bille, P., Landau, G.M., Raman, R., Sadakane, K., Satti, S.R., Weimann, O.: Random access to grammar-compressed strings. In: Proceedings of the Twenty-second Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 373–389. SIAM, San Francisco (2011)
Boasson, L., Cegielski, P., Guessarian, I., Matiyasevich, Y.: Window-accumulated subsequence matching problem is linear. Ann. Pure Appl. Log. 113(1), 59–80 (2001)
Cégielski, P., Guessarian, I., Lifshits, Y., Matiyasevich, Y.: Window subsequence problems for compressed texts. In: Grigoriev, D., Harrison, J., Hirsch, E.A. (eds.) Computer Science-Theory and Applications. Lecture Notes in Computer Science, vol. 3967, pp. 127–136. Springer, Berlin, Heidelberg (2006)
Cégielski, P., Guessarian, I., Matiyasevich, Y.: Multiple serial episodes matching. Inf. Proc. Lett. 98(6), 211–218 (2006)
Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005)
Crochemore, M., Melichar, B., Troníček, Z.: Directed acyclic subsequence graph—overview. J. Discret. Algorithms 1(3), 255–280 (2003)
Das, G., Fleischer, R., Gasieniec, L., Gunopulos, D., Kärkkäinen, J.: Episode matching. In: Apostolico, A., Hein, J. (eds.) Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 1264, pp. 12–27. Springer, Berlin, Heidelberg (1997)
Dietz, P.F.: Finding level-ancestors in dynamic trees. In: Dehne, F., Sack, J.-R., Santoro, N. (eds.) Algorithms and Data Structures. Lecture Notes in Computer Science, vol. 519, pp. 32–40. Springer, Berlin, Heidelberg (1991)
Ferragina, P., Muthukrishnan, S.: Efficient dynamic method-lookup for object oriented languages. In: Diaz, J., Serna, M. (eds.) European Symposium on Algorithms. Lecture Notes in Computer Science, vol. 1136, pp. 107–120. Springer, Berlin, Heidelberg (1996)
Fredman, M.L., Willard, D.E.: Surpassing the information theoretic bound with fusion trees. J. Comput. Syst. Sci. 47(3), 424–436 (1993)
Larsson, N.J., Moffat, A.: Off-line dictionary-based compression. P. IEEE. 88(11), 1722–1732 (2000)
Mannila, H., Toivonen, H., Verkamo, A.I.: Discovery of frequent episodes in event sequences. Data Min. Knowl. Disc. 1(3), 259–289 (1997)
Muthukrishnan, S., Müller, M.: Time and space efficient method-lookup for object-oriented programs. In: Proceedings of the Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 42–51. Society for Industrial and Applied Mathematics, Atlanta (1996)
Rytter, W.: Application of Lempel–Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302(1), 211–222 (2003)
Sleator, D.D., Endre Tarjan, R.: A data structure for dynamic trees. J. Comput. Syst. Sci. 26(3), 362–391 (1983)
Thorup, M.: Randomized sorting in \(O(n\log \log n)\) time and linear space using addition, shift, and bit-wise boolean operations. J. Algorithms 42(2), 205–230 (2002)
Tiskin, A.: Faster subsequence recognition in compressed strings. J. Math. Sci. 158(5), 759–769 (2009)
Tiskin, A.: Towards approximate matching in compressed strings: local subsequence recognition. In: Kulikov, A., Vereshchagin, N. (eds.) Computer Science-Theory and Applications, vol. 6651, pp. 401–414. Springer, Berlin, Heidelberg (2011)
Troníček, Z.: Episode matching. In: Amir, A. (ed.) Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 2089, pp. 143–146. Springer, Berlin, Heidelberg (2001)
Yamamoto, T., Bannai, H., Inenaga, S., Takeda, M.: Faster subsequence and dont-care pattern matching on compressed texts. In: Giancarlo, R., Manzini, G. (eds.) Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 6661, pp. 309–322. Springer, Berlin, Heidelberg (2011)
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23(3), 337–343 (1977)
Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory 24(5), 530–536 (1978)
Author information
Authors and Affiliations
Corresponding author
Additional information
Philip Bille, Inge Li Gørtz: Supported in part by the The Danish Council for Independent Research and Natural Sciences Grant (DFF 1323-00178).
Philip Bille, Patrick Hagge Cording, Inge Li Gørtz: Supported in part by the Danish Research Council and the Danish Research Council under the Sapere Aude Program (DFF 4005-00267).
Rights and permissions
About this article
Cite this article
Bille, P., Cording, P.H. & Gørtz, I.L. Compressed Subsequence Matching and Packed Tree Coloring. Algorithmica 77, 336–348 (2017). https://doi.org/10.1007/s00453-015-0068-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-015-0068-9