Abstract
In this article, a fully compressed pattern matching problem is studied. The compression is represented by straight-line programs (SLPs)—that is, context-free grammars generating exactly one string; the term fully means that both the pattern and the text are given in the compressed form. The problem is approached using a recently developed technique of local recompression: the SLPs are refactored so that substrings of the pattern and text are encoded in both SLPs in the same way. To this end, the SLPs are locally decompressed and then recompressed in a uniform way.
This technique yields an O((n + m) log M) algorithm for compressed pattern matching, assuming that M fits in O(1) machine words, where n (m) is the size of the compressed representation of the text (pattern, respectively), and M is the size of the decompressed pattern. If only m + n fits in O(1) machine words, the running time increases to O((n + m) log M log (n + m)). The previous best algorithm due to Lifshits has O(n2m) running time.
- Stephen Alstrup, Gerth Stolting Brodal, and Theis Rauhe. 2000. Pattern matching in dynamic texts. In Proceedings of SODA. 819--828. DOI:http://dx.doi.org/10.1145/338219.338645 Google ScholarDigital Library
- Moses Charikar, Eric Lehman, Ding Liu, Rina Panigrahy, Manoj Prabhakaran, Amit Sahai, and Abhi Shelat. 2005. The smallest grammar problem. IEEE Transactions on Information Theory 51, 7, 2554--2576. DOI:http://dx.doi.org/10.1109/TIT.2005.850116 Google ScholarDigital Library
- Leszek Gąsieniec, Marek Karpiński, Wojciech Plandowski, and Wojciech Rytter. 1996a. Efficient algorithms for Lempel-Ziv encoding. In Algorithm Theory—SWAT '96. Lecture Notes in Computer Science, Vol. 1097. Springer, 392--403. DOI:http://dx.doi.org/10.1007/3-540-61422-2_148Google Scholar
- Leszek Gąsieniec, Marek Karpiński, Wojciech Plandowski, and Wojciech Rytter. 1996b. Randomized efficient algorithms for compressed strings: The finger-print approach. In Combinatorial Pattern Matching. Lecture Notes in Computer Science, Vol. 1075. Springer, 39--49. DOI:http://dx.doi.org/10.1007/3-540-61258-0_3 Google ScholarDigital Library
- Leszek Gąsieniec and Wojciech Rytter. 1999. Almost optimal fully LZW-compressed pattern matching. In Proceedings of DCC. IEEE, Los Alamitos, CA, 316--325. Google ScholarDigital Library
- Paweł Gawrychowski. 2011a. Personal communication.Google Scholar
- Paweł Gawrychowski. 2011b. Pattern matching in Lempel-Ziv compressed strings: Fast, simple, and deterministic. In Algorithms—ESA 2011. Lecture Notes in Computer Science, Vol. 6942. Springer, 421--432. DOI:http://dx.doi.org/10.1007/978-3-642-23719-5_36 Google ScholarDigital Library
- Paweł Gawrychowski. 2012a. Simple and efficient LZW-compressed multiple pattern matching. In Combinatorial Pattern MatchingLecture Notes in Computer Science, Vol. 7354. Springer, 232--242. DOI:http://dx.doi.org/10.1007/978-3-642-31265-6_19 Google ScholarDigital Library
- Paweł Gawrychowski. 2012b. Tying up the loose ends in fully LZW-compressed pattern matching. In Proceedings of STACS. Leibniz International Proceedings in Informatics, Vol. 14. Schloss Dagstuhl, 624--635. DOI:http://dx.doi.org/10.4230/LIPIcs.STACS.2012.624Google Scholar
- Paweł Gawrychowski. 2013. Optimal pattern matching in LZW compressed strings. ACM Transactions on Algorithms 9, 3, 25. DOI:http://dx.doi.org/10.1145/2483699.2483705 Google ScholarDigital Library
- Masahiro Hirao, Ayumi Shinohara, Masayuki Takeda, and Setsuo Arikawa. 2000. Fully compressed pattern matching algorithm for balanced straight-line programs. In Proceedings of SPIRE. 132--138. Google ScholarDigital Library
- Artur Jeż. 2014. Compressed membership for NFA (DFA) with compressed labels is in NP (P). Theory of Computing Systems 55, 4, 685--718. Google ScholarDigital Library
- Artur Jeż. 2013a. Approximation of grammar-based compression via recompression. In Combinatorial Pattern Matching. Lecture Notes in Computer Science, Vol. 7922. Springer, 165--176. DOI:http://dx.doi.org/10.1007/978-3-642-38905-4_17 full version at http://arxiv.org/abs/1301.5842.Google Scholar
- Artur Jeż. 2013b. One-variable word equations in linear time. In Automata, Languages, and Programming. Lecture Notes in Computer Science, Vol. 7966. Springer, 324--335. DOI:http://dx.doi.org/10.1007/978-3-642-39212-2_30 full version accepted to Algorithmica DOI:http://dx.doi.org/10.1007/s00453-014-9931-3. Google ScholarDigital Library
- Artur Jeż. 2013c. Recompression: A simple and powerful technique for word equations. In Proceedings of STACS. Leibniz International Proceedings in Informatics, Vol. 20. Schloss Dagstuhl, 233--244. DOI:http://dx.doi.org/10.4230/LIPIcs.STACS.2013.233Google Scholar
- Artur Jeż. 2014. Context unification is in PSPACE. In Automata, Languages, and Programming. Lecture Notes in Computer Science, Vol. 8573. Springer, 244--255. Available at http://arxiv.org/abs/1310.4367.Google Scholar
- Artur Jeż and Markus Lohrey. 2014. Approximation of smallest linear tree grammar. In Proceedings of STACS. Leibniz International Proceedings in Informatics, Vol. 25. Schloss Dagstuhl, 445--457.Google Scholar
- Juha Kärkkäinen, Pekka Mikkola, and Dominik Kempa. 2012. Grammar precompression speeds up Burrows-Wheeler compression. In String Processing and Information Retrieval. Lecture Notes in Computer Science, Vol. 7608. Springer, 330--335. DOI:http://dx.doi.org/10.1007/978-3-642-34109-0_34 Google ScholarDigital Library
- Marek Karpiński, Wojciech Rytter, and Ayumi Shinohara. 1995. Pattern-matching for strings with short descriptions. In Combinatorial Pattern Matching. Lecture Notes in Computer Science, Vol. 937. Springer, 205--214. DOI:http://dx.doi.org/10.1007/3-540-60044-2_44Google Scholar
- N. Jesper Larsson and Alistair Moffat. 1999. Offline dictionary-based compression. In Proceedings of the Data Compression Conference. IEEE, Los Alamitos, CA, 296--305. DOI:http://dx.doi.org/10.1109/DCC.1999.755679 Google ScholarDigital Library
- Yury Lifshits. 2007. Processing compressed texts: A tractability border. In Combinatorial Pattern Matching. Lecture Notes in Computer Science, Vol. 4580. Springer, 228--240. DOI:http://dx.doi.org/10.1007/978-3-540-73437-6_24 Google ScholarDigital Library
- Markus Lohrey and Christian Mathissen. 2011. Compressed membership in automata with compressed labels. In Computer Science—Theory and Applications. Lecture Notes in Computer Science, Vol. 6651. Springer, 275--288. DOI:http://dx.doi.org/10.1007/978-3-642-20712-9_21 Google ScholarDigital Library
- Kurt Mehlhorn, Rajamani Sundar, and Christian Uhrig. 1997. Maintaining dynamic sequences under equality tests in polylogarithmic time. Algorithmica 17, 2, 183--198. DOI:http://dx.doi.org/10.1007/BF02522825Google ScholarCross Ref
- Michael Mitzenmacher and Eli Upfal. 2005. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press. Google ScholarDigital Library
- Masamichi Miyazaki, Ayumi Shinohara, and Masayuki Takeda. 1997. An improved pattern matching algorithm for strings in terms of straight-line programs. In Combinatorial Pattern Matching. Lecture Notes in Computer Science, Vol. 1264. Springer, 1--11. DOI:http://dx.doi.org/10.1007/3-540-63220-4_45 Google ScholarDigital Library
- Craig G. Nevill-Manning and Ian H. Witten. 1997. Identifying hierarchical strcture in sequences: A linear-time algorithm. Journal of Artificial Intelligence Research 7, 67--82. DOI:http://dx.doi.org/10.1613/jair.374 Google ScholarDigital Library
- Wojciech Plandowski. 1994. Testing equivalence of morphisms on context-free languages. In Algorithms—ESA '94. Lecture Notes in Computer Science, Vol. 855. Springer, 460--470. DOI:http://dx.doi.org/10.1007/BFb0049431 Google ScholarDigital Library
- Wojciech Plandowski and Wojciech Rytter. 1999. Complexity of language recognition problems for compressed words. In Jewels Are Forever, Juhani Karhumäki, Hermann A. Maurer, Gheorghe Paun, and Grzegorz Rozenberg (Eds.). Springer, 262--272. Google ScholarDigital Library
- Wojciech Rytter. 2003. Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theoretical Computer Science 302, 1-3, 211--222. DOI:http://dx.doi.org/10.1016/ S0304-3975(02)00777-6 Google ScholarDigital Library
- Hiroshi Sakamoto. 2005. A fully linear-time approximation algorithm for grammar-based compression. Journal of Discrete Algorithms 3, 2-4, 416--430. DOI:http://dx.doi.org/10.1016/j.jda.2004.08.016Google ScholarCross Ref
Index Terms
- Faster Fully Compressed Pattern Matching by Recompression
Recommendations
Faster fully compressed pattern matching by recompression
ICALP'12: Proceedings of the 39th international colloquium conference on Automata, Languages, and Programming - Volume Part IIn this paper, a fully compressed pattern matching problem is studied. The compression is represented by straight-line programs (SLPs), i.e. a context-free grammar generating exactly one string; the term fully means that both the pattern and the text ...
Compressed Parameterized Pattern Matching
DCC '13: Proceedings of the 2013 Data Compression ConferenceTraditional pattern matching between strings, from the alphabet $\Sigma$, is well defined for both uncompressed and compressed sequences. Prior to this work, parameterized pattern matching (p-matching) was defined predominately by the matching between ...
Compressed parameterized pattern matching
Pattern matching between traditional strings is well-defined for both uncompressed and compressed sequences. Prior to this work, parameterized pattern matching (p-matching) was defined predominately by the matching between uncompressed parameterized ...
Comments