skip to main content
research-article

Faster Fully Compressed Pattern Matching by Recompression

Authors Info & Claims
Published:13 January 2015Publication History
Skip Abstract Section

Abstract

In this article, a fully compressed pattern matching problem is studied. The compression is represented by straight-line programs (SLPs)—that is, context-free grammars generating exactly one string; the term fully means that both the pattern and the text are given in the compressed form. The problem is approached using a recently developed technique of local recompression: the SLPs are refactored so that substrings of the pattern and text are encoded in both SLPs in the same way. To this end, the SLPs are locally decompressed and then recompressed in a uniform way.

This technique yields an O((n + m) log M) algorithm for compressed pattern matching, assuming that M fits in O(1) machine words, where n (m) is the size of the compressed representation of the text (pattern, respectively), and M is the size of the decompressed pattern. If only m + n fits in O(1) machine words, the running time increases to O((n + m) log M log (n + m)). The previous best algorithm due to Lifshits has O(n2m) running time.

References

  1. Stephen Alstrup, Gerth Stolting Brodal, and Theis Rauhe. 2000. Pattern matching in dynamic texts. In Proceedings of SODA. 819--828. DOI:http://dx.doi.org/10.1145/338219.338645 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Moses Charikar, Eric Lehman, Ding Liu, Rina Panigrahy, Manoj Prabhakaran, Amit Sahai, and Abhi Shelat. 2005. The smallest grammar problem. IEEE Transactions on Information Theory 51, 7, 2554--2576. DOI:http://dx.doi.org/10.1109/TIT.2005.850116 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Leszek Gąsieniec, Marek Karpiński, Wojciech Plandowski, and Wojciech Rytter. 1996a. Efficient algorithms for Lempel-Ziv encoding. In Algorithm Theory—SWAT '96. Lecture Notes in Computer Science, Vol. 1097. Springer, 392--403. DOI:http://dx.doi.org/10.1007/3-540-61422-2_148Google ScholarGoogle Scholar
  4. Leszek Gąsieniec, Marek Karpiński, Wojciech Plandowski, and Wojciech Rytter. 1996b. Randomized efficient algorithms for compressed strings: The finger-print approach. In Combinatorial Pattern Matching. Lecture Notes in Computer Science, Vol. 1075. Springer, 39--49. DOI:http://dx.doi.org/10.1007/3-540-61258-0_3 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Leszek Gąsieniec and Wojciech Rytter. 1999. Almost optimal fully LZW-compressed pattern matching. In Proceedings of DCC. IEEE, Los Alamitos, CA, 316--325. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Paweł Gawrychowski. 2011a. Personal communication.Google ScholarGoogle Scholar
  7. Paweł Gawrychowski. 2011b. Pattern matching in Lempel-Ziv compressed strings: Fast, simple, and deterministic. In Algorithms—ESA 2011. Lecture Notes in Computer Science, Vol. 6942. Springer, 421--432. DOI:http://dx.doi.org/10.1007/978-3-642-23719-5_36 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Paweł Gawrychowski. 2012a. Simple and efficient LZW-compressed multiple pattern matching. In Combinatorial Pattern MatchingLecture Notes in Computer Science, Vol. 7354. Springer, 232--242. DOI:http://dx.doi.org/10.1007/978-3-642-31265-6_19 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Paweł Gawrychowski. 2012b. Tying up the loose ends in fully LZW-compressed pattern matching. In Proceedings of STACS. Leibniz International Proceedings in Informatics, Vol. 14. Schloss Dagstuhl, 624--635. DOI:http://dx.doi.org/10.4230/LIPIcs.STACS.2012.624Google ScholarGoogle Scholar
  10. Paweł Gawrychowski. 2013. Optimal pattern matching in LZW compressed strings. ACM Transactions on Algorithms 9, 3, 25. DOI:http://dx.doi.org/10.1145/2483699.2483705 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Masahiro Hirao, Ayumi Shinohara, Masayuki Takeda, and Setsuo Arikawa. 2000. Fully compressed pattern matching algorithm for balanced straight-line programs. In Proceedings of SPIRE. 132--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Artur Jeż. 2014. Compressed membership for NFA (DFA) with compressed labels is in NP (P). Theory of Computing Systems 55, 4, 685--718. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Artur Jeż. 2013a. Approximation of grammar-based compression via recompression. In Combinatorial Pattern Matching. Lecture Notes in Computer Science, Vol. 7922. Springer, 165--176. DOI:http://dx.doi.org/10.1007/978-3-642-38905-4_17 full version at http://arxiv.org/abs/1301.5842.Google ScholarGoogle Scholar
  14. Artur Jeż. 2013b. One-variable word equations in linear time. In Automata, Languages, and Programming. Lecture Notes in Computer Science, Vol. 7966. Springer, 324--335. DOI:http://dx.doi.org/10.1007/978-3-642-39212-2_30 full version accepted to Algorithmica DOI:http://dx.doi.org/10.1007/s00453-014-9931-3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Artur Jeż. 2013c. Recompression: A simple and powerful technique for word equations. In Proceedings of STACS. Leibniz International Proceedings in Informatics, Vol. 20. Schloss Dagstuhl, 233--244. DOI:http://dx.doi.org/10.4230/LIPIcs.STACS.2013.233Google ScholarGoogle Scholar
  16. Artur Jeż. 2014. Context unification is in PSPACE. In Automata, Languages, and Programming. Lecture Notes in Computer Science, Vol. 8573. Springer, 244--255. Available at http://arxiv.org/abs/1310.4367.Google ScholarGoogle Scholar
  17. Artur Jeż and Markus Lohrey. 2014. Approximation of smallest linear tree grammar. In Proceedings of STACS. Leibniz International Proceedings in Informatics, Vol. 25. Schloss Dagstuhl, 445--457.Google ScholarGoogle Scholar
  18. Juha Kärkkäinen, Pekka Mikkola, and Dominik Kempa. 2012. Grammar precompression speeds up Burrows-Wheeler compression. In String Processing and Information Retrieval. Lecture Notes in Computer Science, Vol. 7608. Springer, 330--335. DOI:http://dx.doi.org/10.1007/978-3-642-34109-0_34 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Marek Karpiński, Wojciech Rytter, and Ayumi Shinohara. 1995. Pattern-matching for strings with short descriptions. In Combinatorial Pattern Matching. Lecture Notes in Computer Science, Vol. 937. Springer, 205--214. DOI:http://dx.doi.org/10.1007/3-540-60044-2_44Google ScholarGoogle Scholar
  20. N. Jesper Larsson and Alistair Moffat. 1999. Offline dictionary-based compression. In Proceedings of the Data Compression Conference. IEEE, Los Alamitos, CA, 296--305. DOI:http://dx.doi.org/10.1109/DCC.1999.755679 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Yury Lifshits. 2007. Processing compressed texts: A tractability border. In Combinatorial Pattern Matching. Lecture Notes in Computer Science, Vol. 4580. Springer, 228--240. DOI:http://dx.doi.org/10.1007/978-3-540-73437-6_24 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Markus Lohrey and Christian Mathissen. 2011. Compressed membership in automata with compressed labels. In Computer Science—Theory and Applications. Lecture Notes in Computer Science, Vol. 6651. Springer, 275--288. DOI:http://dx.doi.org/10.1007/978-3-642-20712-9_21 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Kurt Mehlhorn, Rajamani Sundar, and Christian Uhrig. 1997. Maintaining dynamic sequences under equality tests in polylogarithmic time. Algorithmica 17, 2, 183--198. DOI:http://dx.doi.org/10.1007/BF02522825Google ScholarGoogle ScholarCross RefCross Ref
  24. Michael Mitzenmacher and Eli Upfal. 2005. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Masamichi Miyazaki, Ayumi Shinohara, and Masayuki Takeda. 1997. An improved pattern matching algorithm for strings in terms of straight-line programs. In Combinatorial Pattern Matching. Lecture Notes in Computer Science, Vol. 1264. Springer, 1--11. DOI:http://dx.doi.org/10.1007/3-540-63220-4_45 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Craig G. Nevill-Manning and Ian H. Witten. 1997. Identifying hierarchical strcture in sequences: A linear-time algorithm. Journal of Artificial Intelligence Research 7, 67--82. DOI:http://dx.doi.org/10.1613/jair.374 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Wojciech Plandowski. 1994. Testing equivalence of morphisms on context-free languages. In Algorithms—ESA '94. Lecture Notes in Computer Science, Vol. 855. Springer, 460--470. DOI:http://dx.doi.org/10.1007/BFb0049431 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Wojciech Plandowski and Wojciech Rytter. 1999. Complexity of language recognition problems for compressed words. In Jewels Are Forever, Juhani Karhumäki, Hermann A. Maurer, Gheorghe Paun, and Grzegorz Rozenberg (Eds.). Springer, 262--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Wojciech Rytter. 2003. Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theoretical Computer Science 302, 1-3, 211--222. DOI:http://dx.doi.org/10.1016/ S0304-3975(02)00777-6 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Hiroshi Sakamoto. 2005. A fully linear-time approximation algorithm for grammar-based compression. Journal of Discrete Algorithms 3, 2-4, 416--430. DOI:http://dx.doi.org/10.1016/j.jda.2004.08.016Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Faster Fully Compressed Pattern Matching by Recompression

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Algorithms
          ACM Transactions on Algorithms  Volume 11, Issue 3
          January 2015
          269 pages
          ISSN:1549-6325
          EISSN:1549-6333
          DOI:10.1145/2721890
          Issue’s Table of Contents

          Copyright © 2015 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 13 January 2015
          • Accepted: 1 June 2014
          • Revised: 1 March 2014
          • Received: 1 March 2013
          Published in talg Volume 11, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader