skip to main content
research-article

A Linear-Time Algorithm for Seeds Computation

Published:21 April 2020Publication History
Skip Abstract Section

Abstract

A seed in a word is a relaxed version of a period in which the occurrences of the repeating subword may overlap. Our first contribution is a linear-time algorithm computing a linear-size representation of all seeds of a word (the number of seeds might be quadratic). In particular, one can easily derive the shortest seed and the number of seeds from our representation. Thus, we solve an open problem stated in a survey by Smyth from 2000 and improve upon a previous O(n log n)-time algorithm by Iliopoulos et al. from 1996. Our approach is based on combinatorial relations between seeds and subword complexity (used here for the first time in the context of seeds).

In previous papers, compact representations of seeds consisted of two independent parts operating on the suffix tree of the input word and the suffix tree of its reverse, respectively. Our second contribution is a novel and significantly simpler representation of all seeds that avoids dealing with the suffix tree of the reversed word. This result is also of independent interest from a combinatorial point of view.

A preliminary version of this work, with a much more complex algorithm constructing a representation of seeds on two suffix trees, was presented at the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’12).

References

  1. Hayam Alamro, Golnaz Badkobeh, Djamal Belazzougui, Costas S. Iliopoulos, and Simon J. Puglisi. 2019. Computing the antiperiod(s) of a string. In 30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019, N. Pisanti and S. P. Pissis (Eds.), Vol. 128. Leibniz International Proceedings in Informatics. Schloss Dagstuhl--Leibniz-Zentrum für Informatik, Dagstuhl, Germany, Article 32, 11 pages. DOI:https://doi.org/10.4230/LIPIcs.CPM.2019.32Google ScholarGoogle Scholar
  2. Mai Alzamel, Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, and Wiktor Zuba. 2019. Quasi-linear-time algorithm for longest common circular factor. In 30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019, N. Pisanti and S. P. Pissis (Eds.), Vol. 128. Leibniz International Proceedings in Informatics. Schloss Dagstuhl--Leibniz-Zentrum für Informatik, Dagstuhl, Germany, Article 25, 14 pages. DOI:https://doi.org/10.4230/LIPIcs.CPM.2019.25Google ScholarGoogle Scholar
  3. Amihood Amir, Gad M. Landau, Moshe Lewenstein, and Dina Sokol. 2007. Dynamic text and static pattern matching. ACM Transactions on Algorithms 3, 2 (2007), 19. DOI:https://doi.org/10.1145/1240233.1240242Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Amihood Amir, Avivit Levy, Moshe Lewenstein, Ronit Lubin, and Benny Porat. 2019. Can we recover the cover? Algorithmica 81, 7 (2019), 2857--2875. DOI:https://doi.org/10.1007/s00453-019-00559-8Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Amihood Amir, Avivit Levy, Ronit Lubin, and Ely Porat. 2019. Approximate cover of strings. Theoretical Computer Science 793 (2019), 59--69. DOI:https://doi.org/10.1016/j.tcs.2019.05.020Google ScholarGoogle ScholarCross RefCross Ref
  6. Amihood Amir, Avivit Levy, and Ely Porat. 2018. Quasi-periodicity under mismatch errors. In 29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018, G. Navarro, D. Sankoff, and B. Zhu (Eds.), Vol. 105. Leibniz International Proceedings in Informatics. Schloss Dagstuhl--Leibniz-Zentrum für Informatik, Dagstuhl, Germany, Article 4, 15 pages. DOI:https://doi.org/10.4230/LIPIcs.CPM.2018.4Google ScholarGoogle Scholar
  7. Alberto Apostolico and Andrzej Ehrenfeucht. 1993. Efficient detection of quasiperiodicities in strings. Theoretical Computer Science 119, 2 (1993), 247--265. DOI:https://doi.org/10.1016/0304-3975(93)90159-QGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  8. Alberto Apostolico, Martin Farach, and Costas S. Iliopoulos. 1991. Optimal superprimitivity testing for strings. Information Processing Letters 39, 1 (1991), 17--20. DOI:https://doi.org/10.1016/0020-0190(91)90056-NGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  9. Carl Barton, Tomasz Kociumaka, Chang Liu, Solon P. Pissis, and Jakub Radoszewski. 2020. Indexing weighted sequences: Neat and efficient. Information and Computation 270 (2020), 104462. DOI:https://doi.org/10.1016/j.ic.2019.104462Google ScholarGoogle ScholarCross RefCross Ref
  10. Omer Berkman, Costas S. Iliopoulos, and Kunsoo Park. 1995. The subtree max gap problem with application to parallel string covering. Information and Computation 123, 1 (1995), 127--137. DOI:https://doi.org/10.1006/inco.1995.1162Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Omer Berkman, Baruch Schieber, and Uzi Vishkin. 1993. Optimal doubly logarithmic parallel algorithms based on finding all nearest smaller values. Journal of Algorithms 14, 3 (1993), 344--370. DOI:https://doi.org/10.1006/jagm.1993.1018Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dany Breslauer. 1992. An on-line string superprimitivity test. Information Processing Letters 44, 6 (1992), 345--347. DOI:https://doi.org/10.1016/0020-0190(92)90111-8Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gerth Stølting Brodal and Christian N. S. Pedersen. 2000. Finding maximal quasiperiodicities in strings. In 11th Annual Symposium on Combinatorial Pattern Matching, CPM 2000 (LNCS), Raffaele Giancarlo and David Sankoff (Eds.), Vol. 1848. Springer, 397--411. https://doi.org/10.1007/3-540-45123-4_33Google ScholarGoogle Scholar
  14. Manolis Christodoulakis, Costas S. Iliopoulos, Kunsoo Park, and Jeong Seop Sim. 2005. Approximate seeds of strings. Journal of Automata, Languages and Combinatorics 10, 5–6 (2005), 609--626.Google ScholarGoogle Scholar
  15. Michalis Christou, Maxime Crochemore, Costas S. Iliopoulos, Marcin Kubica, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, Bartosz Szreder, and Tomasz Waleń. 2013. Efficient seed computation revisited. Theoretical Computer Science 483 (2013), 171--181. DOI:https://doi.org/10.1016/j.tcs.2011.12.078Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Richard Cole, Costas S. Iliopoulos, Manal Mohamed, William F. Smyth, and Lu Yang. 2005. The complexity of the minimum k-cover problem. Journal of Automata, Languages and Combinatorics 10, 5-6 (2005), 641--653.Google ScholarGoogle Scholar
  17. Maxime Crochemore, Christophe Hancart, and Thierry Lecroq. 2007. Algorithms on Strings. Cambridge University Press. DOI:https://doi.org/10.1017/cbo9780511546853Google ScholarGoogle Scholar
  18. Maxime Crochemore and Wojciech Rytter. 2003. Jewels of Stringology. World Scientific. DOI:https://doi.org/10.1142/4838Google ScholarGoogle Scholar
  19. Patryk Czajka and Jakub Radoszewski. 2019. Experimental evaluation of algorithms for computing quasiperiods. arxiv:1909.11336.Google ScholarGoogle Scholar
  20. Martin Farach and S. Muthukrishnan. 1996. Perfect hashing for strings: Formalization and algorithms. In 7th Annual Symposium on Combinatorial Pattern Matching, CPM 1996 (LNCS), Daniel S. Hirschberg and Eugene W. Myers (Eds.), Vol. 1075. Springer, 130--140. https://doi.org/10.1007/3-540-61258-0_11Google ScholarGoogle Scholar
  21. Martin Farach-Colton, Paolo Ferragina, and S. Muthukrishnan. 2000. On the sorting-complexity of suffix tree construction. Journal of the ACM 47, 6 (2000), 987--1011. DOI:https://doi.org/10.1145/355541.355547Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Nathan J. Fine and Herbert S. Wilf. 1965. Uniqueness theorems for periodic functions. Proceedings of the American Mathematical Society 16, 1 (1965), 109--114. DOI:https://doi.org/10.2307/2034009Google ScholarGoogle ScholarCross RefCross Ref
  23. Tomás Flouri, Costas S. Iliopoulos, Tomasz Kociumaka, Solon P. Pissis, Simon J. Puglisi, William F. Smyth, and Wojciech Tyczyński. 2013. Enhanced string covering. Theoretical Computer Science 506 (2013), 102--114. DOI:https://doi.org/10.1016/j.tcs.2013.08.013Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Harold N. Gabow and Roberd E. Tarjan. 1985. A linear-time algorithm for a special case of disjoint set union. Journal of Computer and System Sciences 30, 2 (1985), 209--221. DOI:https://doi.org/10.1016/0022-0000(85)90014-5Google ScholarGoogle ScholarCross RefCross Ref
  25. Paweł Gawrychowski, Moshe Lewenstein, and Patrick K. Nicholson. 2014. Weighted ancestors in suffix trees. In 22th Annual European Symposium on Algorithms, ESA 2014 (LNCS), Andreas S. Schulz and Dorothea Wagner (Eds.), Vol. 8737. Springer, 455--466. https://doi.org/10.1007/978-3-662-44777-2_38Google ScholarGoogle Scholar
  26. Qing Guo, Hui Zhang, and Costas S. Iliopoulos. 2006. Computing the λ-seeds of a string. In 2nd International Conference on Algorithmic Aspects in Information and Management, AAIM 2006 (LNCS), Siu-Wing Cheng and Chung Keung Poon (Eds.), Vol. 4041. Springer, 303--313. https://doi.org/10.1007/11775096_28Google ScholarGoogle Scholar
  27. Qing Guo, Hui Zhang, and Costas S. Iliopoulos. 2006. Computing the minimum approximate λ-cover of a string. In 13th International Conference on String Processing and Information Retrieval, SPIRE 2006 (LNCS), Fabio Crestani, Paolo Ferragina, and Mark Sanderson (Eds.), Vol. 4209. Springer, 49--60. https://doi.org/10.1007/11880561_5Google ScholarGoogle Scholar
  28. Qing Guo, Hui Zhang, and Costas S. Iliopoulos. 2007. Computing the λ-covers of a string. Information Sciences 177, 19 (2007), 3957--3967. DOI:https://doi.org/10.1016/j.ins.2007.02.020Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ondřej Guth. 2019. On approximate enhanced covers under Hamming distance. Discrete Applied Mathematics 274 (2019), 67--80. DOI:https://doi.org/10.1016/j.dam.2019.01.015Google ScholarGoogle ScholarCross RefCross Ref
  30. Ondřej Guth, Borivoj Melichar, and Miroslav Balík. 2008. Searching all approximate covers and their distance using finite automata. In Proceedings of the 20th Conference on Theory and Practice of Information Technologies (ITAT’08). http://ceur-ws.org/Vol-414/paper4.pdf.Google ScholarGoogle Scholar
  31. Lucian Ilie, Sheng Yu, and Kaizhong Zhang. 2002. Repetition complexity of words. In 8th Annual International Conference on Computing and Combinatorics, COCOON 2002 (LNCS), Oscar H. Ibarra and Louxin Zhang (Eds.), Vol. 2387. Springer, 320--329. https://doi.org/10.1007/3-540-45655-4_35Google ScholarGoogle ScholarCross RefCross Ref
  32. Costas S. Iliopoulos, Manal Mohamed, and William F. Smyth. 2011. New complexity results for the k-covers problem. Information Sciences 181, 12 (2011), 2571--2575. DOI:https://doi.org/10.1016/j.ins.2011.02.009Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Costas S. Iliopoulos, Dennis W. G. Moore, and Kunsoo Park. 1996. Covering a string. Algorithmica 16, 3 (1996), 288--297. DOI:https://doi.org/10.1007/BF01955677Google ScholarGoogle ScholarCross RefCross Ref
  34. Costas S. Iliopoulos and Laurent Mouchard. 1999. Quasiperiodicity: From detection to normal forms. Journal of Automata, Languages and Combinatorics 4, 3 (1999), 213--228.Google ScholarGoogle Scholar
  35. Costas S. Iliopoulos and William F. Smyth. 1998. An on-line algorithm of computing a minimum set of k-covers of a string. In Proceedings of the 9th Australasian Workshop on Combinatorial Algorithms (AWOCA’98). 97--106.Google ScholarGoogle Scholar
  36. Juha Kärkkäinen, Peter Sanders, and Stefan Burkhardt. 2006. Linear work suffix array construction. Journal of the ACM 53, 6 (2006), 918--936. DOI:https://doi.org/10.1145/1217856.1217858Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Tomasz Kociumaka, Marcin Kubica, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. 2012. A linear time algorithm for seeds computation. In 23rd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'12), Yuval Rabani (Ed.). SIAM, 1095--1112. https://doi.org/10.1137/1.9781611973099Google ScholarGoogle ScholarCross RefCross Ref
  38. Tomasz Kociumaka, Gonzalo Navarro, and Nicola Prezza. 2019. Towards a definitive measure of repetitiveness. arxiv:1910.02151.Google ScholarGoogle Scholar
  39. Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. 2018. Efficient algorithms for shortest partial seeds in words. Theoretical Computer Science 710 (2018), 139--147. DOI:https://doi.org/10.1016/j.tcs.2016.11.035Google ScholarGoogle ScholarCross RefCross Ref
  40. Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. 2015. Fast algorithm for partial covers in words. Algorithmica 73, 1 (2015), 217--233. DOI:https://doi.org/10.1007/s00453-014-9915-3Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Roman M. Kolpakov and Gregory Kucherov. 1999. Finding maximal repetitions in a word in linear time. In Proceedings of the 40th Annual Symposium on Foundations of Computer Science (FOCS’99). IEEE, Los Alamitos, CA, 596--604. DOI:https://doi.org/10.1109/SFFCS.1999.814634Google ScholarGoogle Scholar
  42. Yin Li and William F. Smyth. 2002. Computing the cover array in linear time. Algorithmica 32, 1 (2002), 95--106. DOI:https://doi.org/10.1007/s00453-001-0062-2Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. Lothaire. 2002. Algebraic Combinatorics on Words. Encyclopedia of Mathematics and Its Applications (1st edition). Cambridge University Press. DOI:https://doi.org/10.1017/cbo9781107326019Google ScholarGoogle Scholar
  44. M. Lothaire. 2005. Applied Combinatorics on Words. Encyclopedia of Mathematics and Its Applications (1st edition). Cambridge University Press. DOI:https://doi.org/10.1017/cbo9781107341005Google ScholarGoogle Scholar
  45. Udi Manber and Eugene W. Myers. 1993. Suffix arrays: A new method for on-line string searches. SIAM Journal on Computing 22, 5 (1993), 935--948. DOI:https://doi.org/10.1137/0222058Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Dennis W. G. Moore and William F. Smyth. 1994. An optimal algorithm to compute all the covers of a string. Information Processing Letters 50, 5 (1994), 239--246. DOI:https://doi.org/10.1016/0020-0190(94)00045-XGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  47. Dennis W. G. Moore and William F. Smyth. 1995. A correction to “An Optimal Algorithm to Compute All the Covers of a String”. Information Processing Letters 54, 2 (1995), 101--103. DOI:https://doi.org/10.1016/0020-0190(94)00235-QGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  48. James H. Morris Jr. and Vaughan R. Pratt. 1970. A Linear Pattern-Matching Algorithm. Technical Report 40. Department of Computer Science, University of California, Berkeley.Google ScholarGoogle Scholar
  49. Sofya Raskhodnikova, Dana Ron, Ronitt Rubinfeld, and Adam D. Smith. 2013. Sublinear algorithms for approximating string compressibility. Algorithmica 65, 3 (2013), 685--709. DOI:https://doi.org/10.1007/s00453-012-9618-6Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Jeong Seop Sim, Kunsoo Park, Sung-Ryul Kim, and Jee-Soo Lee. 2002. Finding approximate covers of strings. Journal of KIISE: Computer Systems and Theory 29, 1 (2002), 16--21. http://www.koreascience.or.kr/article/ArticleFullRecord.jsp?cn=JBGHG6_2002_v29n1_16.Google ScholarGoogle Scholar
  51. William F. Smyth. 2000. Repetitive perhaps, but certainly not boring. Theoretical Computer Science 249, 2 (2000), 343--355. DOI:https://doi.org/10.1016/S0304-3975(00)00067-0Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Peter Weiner. 1973. Linear pattern matching algorithms. In Proceedings of the 14th Annual Symposium on Switching and Automata Theory (SWAT’73). IEEE, Los Alamitos, CA, 1--11. DOI:https://doi.org/10.1109/SWAT.1973.13Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Linear-Time Algorithm for Seeds Computation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Algorithms
      ACM Transactions on Algorithms  Volume 16, Issue 2
      April 2020
      372 pages
      ISSN:1549-6325
      EISSN:1549-6333
      DOI:10.1145/3386689
      Issue’s Table of Contents

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 April 2020
      • Accepted: 1 January 2020
      • Revised: 1 October 2019
      • Received: 1 March 2019
      Published in talg Volume 16, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format