Skip to main content
Top
Published in: Theory of Computing Systems 8/2018

11-01-2018

Finger Search in Grammar-Compressed Strings

Authors: Philip Bille, Anders Roy Christiansen, Patrick Hagge Cording, Inge Li Gørtz

Published in: Theory of Computing Systems | Issue 8/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Grammar-based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures many popular compression schemes. Given a grammar, the random access problem is to compactly represent the grammar while supporting random access, that is, given a position in the original uncompressed string report the character at that position. In this paper we study the random access problem with the finger search property, that is, the time for a random access query should depend on the distance between a specified index f, called the finger, and the query index i. We consider both a static variant, where we first place a finger and subsequently access indices near the finger efficiently, and a dynamic variant where also moving the finger such that the time depends on the distance moved is supported. Let n be the size the grammar, and let N be the size of the string. For the static variant we give a linear space representation that supports placing the finger in O(log N) time and subsequently accessing in O(log D) time, where D is the distance between the finger and the accessed index. For the dynamic variant we give a linear space representation that supports placing the finger in O(log N) time and accessing and moving the finger in O(log D + log log N) time. Compared to the best linear space solution to random access, we improve a O(log N) query bound to O(log D) for the static variant and to O(log D + log log N) for the dynamic variant, while maintaining linear space. As an application of our results we obtain an improved solution to the longest common extension problem in grammar compressed strings. To obtain our results, we introduce several new techniques of independent interest, including a novel van Emde Boas style decomposition of grammars.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Alstrup, S., Husfeldt, T., Rauhe, T.: Marked ancestor problems. In: Proceeding of the 39th FOCS, pp. 534–543 (1998) Alstrup, S., Husfeldt, T., Rauhe, T.: Marked ancestor problems. In: Proceeding of the 39th FOCS, pp. 534–543 (1998)
2.
go back to reference Apostolico, A., Lonardi, S.: Some theory and practice of greedy off-line textual substitution. In: Proceeding of the DCC, pp. 119–128 (1998) Apostolico, A., Lonardi, S.: Some theory and practice of greedy off-line textual substitution. In: Proceeding of the DCC, pp. 119–128 (1998)
3.
go back to reference Apostolico, A., Lonardi, S.: Compression of biological sequences by greedy off-line textual substitution. In: Proceeding of the DCC, pp. 143–152 (2000) Apostolico, A., Lonardi, S.: Compression of biological sequences by greedy off-line textual substitution. In: Proceeding of the DCC, pp. 143–152 (2000)
4.
go back to reference Apostolico, A., Lonardi, S.: Off-line compression by greedy textual substitution. Proc. IEEE 88(11), 1733–1744 (2000)CrossRef Apostolico, A., Lonardi, S.: Off-line compression by greedy textual substitution. Proc. IEEE 88(11), 1733–1744 (2000)CrossRef
5.
go back to reference Belazzougui, D., Cording, P.H., Puglisi, S.J., Tabei, Y.: Access, rank, and select in grammar-compressed strings. In: Proceeding of the 23rd ESA (2015) Belazzougui, D., Cording, P.H., Puglisi, S.J., Tabei, Y.: Access, rank, and select in grammar-compressed strings. In: Proceeding of the 23rd ESA (2015)
6.
go back to reference Belazzougui, D., Gagie, T., Gawrychowski, P., Karkkainen, J., Ordonez, A., Puglisi, S., Tabei, Y.: Queries on lz-bounded encodings. In: Proceeding of the DCC, pp. 83–92 (2015) Belazzougui, D., Gagie, T., Gawrychowski, P., Karkkainen, J., Ordonez, A., Puglisi, S., Tabei, Y.: Queries on lz-bounded encodings. In: Proceeding of the DCC, pp. 83–92 (2015)
7.
9.
go back to reference Bille, P., Gørtz, I.L., Cording, P.H., Sach, B., Vildhøj, H.W., Vind, S.: Fingerprints in compressed strings. J. Comput. Syst. Sci. 86, 171–180 (2013). Announced at WADSMathSciNetCrossRefMATH Bille, P., Gørtz, I.L., Cording, P.H., Sach, B., Vildhøj, H.W., Vind, S.: Fingerprints in compressed strings. J. Comput. Syst. Sci. 86, 171–180 (2013). Announced at WADSMathSciNetCrossRefMATH
10.
go back to reference Bille, P., Landau, G.M., Raman, R., Sadakane, K., Satti, S.R., Weimann, O.: Random access to grammar-compressed strings and trees. SIAM J. Comput. 44(3), 513–539 (2014). Announced at SODA 2011MathSciNetCrossRefMATH Bille, P., Landau, G.M., Raman, R., Sadakane, K., Satti, S.R., Weimann, O.: Random access to grammar-compressed strings and trees. SIAM J. Comput. 44(3), 513–539 (2014). Announced at SODA 2011MathSciNetCrossRefMATH
11.
go back to reference Blelloch, G.E., Maggs, B.M., Woo, S.L.M.: Space-efficient finger search on degree-balanced search trees. In: Proceeding of the 14th SODA, pp. 374–383 (2003) Blelloch, G.E., Maggs, B.M., Woo, S.L.M.: Space-efficient finger search on degree-balanced search trees. In: Proceeding of the 14th SODA, pp. 374–383 (2003)
12.
go back to reference Brodal, G.S.: Finger search trees. In: Handbook of Data Structures and Applications. Chapman and Hall/CRC (2004) Brodal, G.S.: Finger search trees. In: Handbook of Data Structures and Applications. Chapman and Hall/CRC (2004)
13.
go back to reference Brodal, G.S., Lagogiannis, G., Makris, C., Tsakalidis, A.K., Tsichlas, K.: Optimal finger search trees in the pointer machine. J. Comput. Syst Sci. 67(2), 381–418 (2003)MathSciNetCrossRefMATH Brodal, G.S., Lagogiannis, G., Makris, C., Tsakalidis, A.K., Tsichlas, K.: Optimal finger search trees in the pointer machine. J. Comput. Syst Sci. 67(2), 381–418 (2003)MathSciNetCrossRefMATH
14.
go back to reference Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005). Announced at STOC 2002 and SODA 2002MathSciNetCrossRefMATH Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005). Announced at STOC 2002 and SODA 2002MathSciNetCrossRefMATH
15.
go back to reference Claude, F., Navarro, G.: Self-indexed grammar-based compression. Fund. Inform. 111(3), 313–337 (2011)MathSciNetMATH Claude, F., Navarro, G.: Self-indexed grammar-based compression. Fund. Inform. 111(3), 313–337 (2011)MathSciNetMATH
16.
go back to reference Cording, P.H., Gawrychowski, P., Weimann, O.: Bookmarks in grammar-compressed strings. In: Proceeding of the 23rd SPIRE, pp. x–y (2016) Cording, P.H., Gawrychowski, P., Weimann, O.: Bookmarks in grammar-compressed strings. In: Proceeding of the 23rd SPIRE, pp. x–y (2016)
17.
go back to reference Dietz, P.F., Raman, R.: A constant update time finger search tree. Inf. Process. Lett. 52(3), 147–154 (1994)CrossRefMATH Dietz, P.F., Raman, R.: A constant update time finger search tree. Inf. Process. Lett. 52(3), 147–154 (1994)CrossRefMATH
18.
go back to reference Farach, M., Muthukrishnan, S.: Perfect hashing for strings: formalization and algorithms. In: Proceeding of the 7th CPM, pp. 130–140. Springer (1996) Farach, M., Muthukrishnan, S.: Perfect hashing for strings: formalization and algorithms. In: Proceeding of the 7th CPM, pp. 130–140. Springer (1996)
19.
go back to reference Fleischer, R.: A simple balanced search tree with o(1) worst-case update time. Int. J. Found. Comput. Sci. 7(2), 137–150 (1996)CrossRefMATH Fleischer, R.: A simple balanced search tree with o(1) worst-case update time. Int. J. Found. Comput. Sci. 7(2), 137–150 (1996)CrossRefMATH
20.
go back to reference Gage, P.: A new algorithm for data compression. The C Users J. 12(2), 23–38 (1994) Gage, P.: A new algorithm for data compression. The C Users J. 12(2), 23–38 (1994)
21.
go back to reference Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: A faster grammar-based self-index. In: Proceeding of the 6th LATA, pp. 240–251 (2012) Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: A faster grammar-based self-index. In: Proceeding of the 6th LATA, pp. 240–251 (2012)
22.
go back to reference Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: LZ77-based self-indexing with faster pattern matching. In: Proceeding of the 11th LATIN, pp. 731–742. Springer (2014) Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: LZ77-based self-indexing with faster pattern matching. In: Proceeding of the 11th LATIN, pp. 731–742. Springer (2014)
23.
go back to reference Gagie, T., Gawrychowski, P., Puglisi, S.J.: Approximate pattern matching in lz77-compressed texts. J. Discrete Algorithms 32, 64–68 (2015)MathSciNetCrossRefMATH Gagie, T., Gawrychowski, P., Puglisi, S.J.: Approximate pattern matching in lz77-compressed texts. J. Discrete Algorithms 32, 64–68 (2015)MathSciNetCrossRefMATH
24.
go back to reference Gagie, T., Hoobin, C., Puglisi, S.J.: Block graphs in practice. In: Proceeding of the ICABD, pp. 30–36 (2014) Gagie, T., Hoobin, C., Puglisi, S.J.: Block graphs in practice. In: Proceeding of the ICABD, pp. 30–36 (2014)
25.
go back to reference Ga̧sieniec, L., Kolpakov, R., Potapov, I., Sant, P.: Real-time traversal in grammar-based compressed files. In: Proceeding of the 15th DCC, p. 458 (2005) Ga̧sieniec, L., Kolpakov, R., Potapov, I., Sant, P.: Real-time traversal in grammar-based compressed files. In: Proceeding of the 15th DCC, p. 458 (2005)
26.
go back to reference Goto, K., Bannai, H., Inenaga, S., Takeda, M.: LZD factorization: simple and practical online grammar compression with variable-to-fixed encoding. In: Proceeding of the 26th CPM, pp. 219–230. Springer (2015) Goto, K., Bannai, H., Inenaga, S., Takeda, M.: LZD factorization: simple and practical online grammar compression with variable-to-fixed encoding. In: Proceeding of the 26th CPM, pp. 219–230. Springer (2015)
27.
go back to reference Guibas, L.J., McCreight, E.M., Plass, M.F., Roberts, J.R.: A new representation for linear lists. In: Proceeding of the 9Th STOC, pp. 49–60 (1977) Guibas, L.J., McCreight, E.M., Plass, M.F., Roberts, J.R.: A new representation for linear lists. In: Proceeding of the 9Th STOC, pp. 49–60 (1977)
28.
go back to reference I, T., Matsubara, W., Shimohira, K., Inenaga, S., Bannai, H., Takeda, M., Narisawa, K., Shinohara, A.: Detecting regularities on grammar-compressed strings. Inform. Comput. 240, 74–89 (2015)MathSciNetCrossRefMATH I, T., Matsubara, W., Shimohira, K., Inenaga, S., Bannai, H., Takeda, M., Narisawa, K., Shinohara, A.: Detecting regularities on grammar-compressed strings. Inform. Comput. 240, 74–89 (2015)MathSciNetCrossRefMATH
30.
go back to reference Kieffer, J.C., Yang, E.H.: Grammar based codes: a new class of universal lossless source codes. IEEE Trans. Inf. Theory 46(3), 737–754 (2000)MathSciNetCrossRefMATH Kieffer, J.C., Yang, E.H.: Grammar based codes: a new class of universal lossless source codes. IEEE Trans. Inf. Theory 46(3), 737–754 (2000)MathSciNetCrossRefMATH
31.
go back to reference Kieffer, J.C., Yang, E.H., Nelson, G.J., Cosman, P.: Universal lossless compression via multilevel pattern matching. IEEE Trans. Inf. Theory 46(5), 1227–1245 (2000)MathSciNetCrossRefMATH Kieffer, J.C., Yang, E.H., Nelson, G.J., Cosman, P.: Universal lossless compression via multilevel pattern matching. IEEE Trans. Inf. Theory 46(5), 1227–1245 (2000)MathSciNetCrossRefMATH
32.
go back to reference Kosaraju, S.R.: Localized search in sorted lists. In: Proceeding of the 13th STOC, pp. 62–69, New York (1981) Kosaraju, S.R.: Localized search in sorted lists. In: Proceeding of the 13th STOC, pp. 62–69, New York (1981)
33.
go back to reference Larsson, N.J., Moffat, A.: Off-line dictionary-based compression. Proc. IEEE 88(11), 1722–1732 (2000)CrossRef Larsson, N.J., Moffat, A.: Off-line dictionary-based compression. Proc. IEEE 88(11), 1722–1732 (2000)CrossRef
34.
go back to reference Mehlhorn, K.: A new data structure for representing sorted lists. In: Proceeding of the WG, pp. 90–112 (1981) Mehlhorn, K.: A new data structure for representing sorted lists. In: Proceeding of the WG, pp. 90–112 (1981)
35.
go back to reference Navarro, G., Ordónez, A.: Grammar compressed sequences with rank/select support. In: 21St SPIRE, pp. 31–44. Springer (2014) Navarro, G., Ordónez, A.: Grammar compressed sequences with rank/select support. In: 21St SPIRE, pp. 31–44. Springer (2014)
36.
go back to reference Nevill-Manning, C.G., Witten, I.H.: Identifying hierarchical structure in sequences: a linear-time algorithm. J. Artif. Intell. Res. 7, 67–82 (1997)CrossRefMATH Nevill-Manning, C.G., Witten, I.H.: Identifying hierarchical structure in sequences: a linear-time algorithm. J. Artif. Intell. Res. 7, 67–82 (1997)CrossRefMATH
37.
go back to reference Nishimoto, T., I, T., Inenaga, S., Bannai, H., Takeda, M.: Fully dynamic data structure for LCE queries in compressed space. In: Proceeding of the 41st MFCS, pp. 72:1–72:15 (2016) Nishimoto, T., I, T., Inenaga, S., Bannai, H., Takeda, M.: Fully dynamic data structure for LCE queries in compressed space. In: Proceeding of the 41st MFCS, pp. 72:1–72:15 (2016)
38.
go back to reference Pugh, W.: Skip lists: A probabilistic alternative to balanced trees. Commun. ACM 33(6), 668–676 (1990)CrossRef Pugh, W.: Skip lists: A probabilistic alternative to balanced trees. Commun. ACM 33(6), 668–676 (1990)CrossRef
39.
go back to reference Rytter, W.: Application of lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302(1-3), 211–222 (2003)MathSciNetCrossRefMATH Rytter, W.: Application of lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302(1-3), 211–222 (2003)MathSciNetCrossRefMATH
41.
go back to reference Shibata, Y., Kida, T., Fukamachi, S., Takeda, M., Shinohara, A., Shinohara, T., Arikawa, S.: Byte pair encoding: a text compression scheme that accelerates pattern matching. Technical Report DOI-TR-161, Dept. of Informatics Kyushu University (1999) Shibata, Y., Kida, T., Fukamachi, S., Takeda, M., Shinohara, A., Shinohara, T., Arikawa, S.: Byte pair encoding: a text compression scheme that accelerates pattern matching. Technical Report DOI-TR-161, Dept. of Informatics Kyushu University (1999)
43.
go back to reference Tanaka, T., Tomohiro, I., Inenaga, S., Bannai, H., Takeda, M.: Computing convolution on grammar-compressed text. In: Proceeding of the 23rd DCC, pp. 451–460 (2013) Tanaka, T., Tomohiro, I., Inenaga, S., Bannai, H., Takeda, M.: Computing convolution on grammar-compressed text. In: Proceeding of the 23rd DCC, pp. 451–460 (2013)
44.
go back to reference Tomohiro, I., Nishimoto, T., Inenaga, S., Bannai, H., Takeda, M.: Compressed automata for dictionary matching. Theor. Comput. Sci. 578, 30–41 (2015)MathSciNetCrossRefMATH Tomohiro, I., Nishimoto, T., Inenaga, S., Bannai, H., Takeda, M.: Compressed automata for dictionary matching. Theor. Comput. Sci. 578, 30–41 (2015)MathSciNetCrossRefMATH
45.
go back to reference van Emde Boas, P., Kaas, R., Zijlstra, E.: Design and implementation of an efficient priority queue. Theory Comput. Syst. 10(1), 99–127 (1976)MathSciNetMATH van Emde Boas, P., Kaas, R., Zijlstra, E.: Design and implementation of an efficient priority queue. Theory Comput. Syst. 10(1), 99–127 (1976)MathSciNetMATH
46.
go back to reference Verbin, E., Yu, W.: Data structure lower bounds on random access to grammar-compressed strings. In: Proceeding of the 24th CPM, pp. 247–258 (2013) Verbin, E., Yu, W.: Data structure lower bounds on random access to grammar-compressed strings. In: Proceeding of the 24th CPM, pp. 247–258 (2013)
47.
go back to reference Welch, T.A.: A technique for high-performance data compression. IEEE Computer 17(6), 8–19 (1984)CrossRef Welch, T.A.: A technique for high-performance data compression. IEEE Computer 17(6), 8–19 (1984)CrossRef
48.
go back to reference Yang, E.H., Kieffer, J.C.: Efficient universal lossless data compression algorithms based on a greedy sequential grammar transform – part one: without context models. IEEE Trans. Inf. Theory 46(3), 755–754 (2000)CrossRefMATH Yang, E.H., Kieffer, J.C.: Efficient universal lossless data compression algorithms based on a greedy sequential grammar transform – part one: without context models. IEEE Trans. Inf. Theory 46(3), 755–754 (2000)CrossRefMATH
49.
50.
Metadata
Title
Finger Search in Grammar-Compressed Strings
Authors
Philip Bille
Anders Roy Christiansen
Patrick Hagge Cording
Inge Li Gørtz
Publication date
11-01-2018
Publisher
Springer US
Published in
Theory of Computing Systems / Issue 8/2018
Print ISSN: 1432-4350
Electronic ISSN: 1433-0490
DOI
https://doi.org/10.1007/s00224-017-9839-9

Other articles of this Issue 8/2018

Theory of Computing Systems 8/2018 Go to the issue

Premium Partner