Abstract
A seed in a word is a relaxed version of a period in which the occurrences of the repeating subword may overlap. Our first contribution is a linear-time algorithm computing a linear-size representation of all seeds of a word (the number of seeds might be quadratic). In particular, one can easily derive the shortest seed and the number of seeds from our representation. Thus, we solve an open problem stated in a survey by Smyth from 2000 and improve upon a previous O(n log n)-time algorithm by Iliopoulos et al. from 1996. Our approach is based on combinatorial relations between seeds and subword complexity (used here for the first time in the context of seeds).
In previous papers, compact representations of seeds consisted of two independent parts operating on the suffix tree of the input word and the suffix tree of its reverse, respectively. Our second contribution is a novel and significantly simpler representation of all seeds that avoids dealing with the suffix tree of the reversed word. This result is also of independent interest from a combinatorial point of view.
A preliminary version of this work, with a much more complex algorithm constructing a representation of seeds on two suffix trees, was presented at the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’12).
- Hayam Alamro, Golnaz Badkobeh, Djamal Belazzougui, Costas S. Iliopoulos, and Simon J. Puglisi. 2019. Computing the antiperiod(s) of a string. In 30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019, N. Pisanti and S. P. Pissis (Eds.), Vol. 128. Leibniz International Proceedings in Informatics. Schloss Dagstuhl--Leibniz-Zentrum für Informatik, Dagstuhl, Germany, Article 32, 11 pages. DOI:https://doi.org/10.4230/LIPIcs.CPM.2019.32Google Scholar
- Mai Alzamel, Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, and Wiktor Zuba. 2019. Quasi-linear-time algorithm for longest common circular factor. In 30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019, N. Pisanti and S. P. Pissis (Eds.), Vol. 128. Leibniz International Proceedings in Informatics. Schloss Dagstuhl--Leibniz-Zentrum für Informatik, Dagstuhl, Germany, Article 25, 14 pages. DOI:https://doi.org/10.4230/LIPIcs.CPM.2019.25Google Scholar
- Amihood Amir, Gad M. Landau, Moshe Lewenstein, and Dina Sokol. 2007. Dynamic text and static pattern matching. ACM Transactions on Algorithms 3, 2 (2007), 19. DOI:https://doi.org/10.1145/1240233.1240242Google ScholarDigital Library
- Amihood Amir, Avivit Levy, Moshe Lewenstein, Ronit Lubin, and Benny Porat. 2019. Can we recover the cover? Algorithmica 81, 7 (2019), 2857--2875. DOI:https://doi.org/10.1007/s00453-019-00559-8Google ScholarDigital Library
- Amihood Amir, Avivit Levy, Ronit Lubin, and Ely Porat. 2019. Approximate cover of strings. Theoretical Computer Science 793 (2019), 59--69. DOI:https://doi.org/10.1016/j.tcs.2019.05.020Google ScholarCross Ref
- Amihood Amir, Avivit Levy, and Ely Porat. 2018. Quasi-periodicity under mismatch errors. In 29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018, G. Navarro, D. Sankoff, and B. Zhu (Eds.), Vol. 105. Leibniz International Proceedings in Informatics. Schloss Dagstuhl--Leibniz-Zentrum für Informatik, Dagstuhl, Germany, Article 4, 15 pages. DOI:https://doi.org/10.4230/LIPIcs.CPM.2018.4Google Scholar
- Alberto Apostolico and Andrzej Ehrenfeucht. 1993. Efficient detection of quasiperiodicities in strings. Theoretical Computer Science 119, 2 (1993), 247--265. DOI:https://doi.org/10.1016/0304-3975(93)90159-QGoogle ScholarDigital Library
- Alberto Apostolico, Martin Farach, and Costas S. Iliopoulos. 1991. Optimal superprimitivity testing for strings. Information Processing Letters 39, 1 (1991), 17--20. DOI:https://doi.org/10.1016/0020-0190(91)90056-NGoogle ScholarDigital Library
- Carl Barton, Tomasz Kociumaka, Chang Liu, Solon P. Pissis, and Jakub Radoszewski. 2020. Indexing weighted sequences: Neat and efficient. Information and Computation 270 (2020), 104462. DOI:https://doi.org/10.1016/j.ic.2019.104462Google ScholarCross Ref
- Omer Berkman, Costas S. Iliopoulos, and Kunsoo Park. 1995. The subtree max gap problem with application to parallel string covering. Information and Computation 123, 1 (1995), 127--137. DOI:https://doi.org/10.1006/inco.1995.1162Google ScholarDigital Library
- Omer Berkman, Baruch Schieber, and Uzi Vishkin. 1993. Optimal doubly logarithmic parallel algorithms based on finding all nearest smaller values. Journal of Algorithms 14, 3 (1993), 344--370. DOI:https://doi.org/10.1006/jagm.1993.1018Google ScholarDigital Library
- Dany Breslauer. 1992. An on-line string superprimitivity test. Information Processing Letters 44, 6 (1992), 345--347. DOI:https://doi.org/10.1016/0020-0190(92)90111-8Google ScholarDigital Library
- Gerth Stølting Brodal and Christian N. S. Pedersen. 2000. Finding maximal quasiperiodicities in strings. In 11th Annual Symposium on Combinatorial Pattern Matching, CPM 2000 (LNCS), Raffaele Giancarlo and David Sankoff (Eds.), Vol. 1848. Springer, 397--411. https://doi.org/10.1007/3-540-45123-4_33Google Scholar
- Manolis Christodoulakis, Costas S. Iliopoulos, Kunsoo Park, and Jeong Seop Sim. 2005. Approximate seeds of strings. Journal of Automata, Languages and Combinatorics 10, 5–6 (2005), 609--626.Google Scholar
- Michalis Christou, Maxime Crochemore, Costas S. Iliopoulos, Marcin Kubica, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, Bartosz Szreder, and Tomasz Waleń. 2013. Efficient seed computation revisited. Theoretical Computer Science 483 (2013), 171--181. DOI:https://doi.org/10.1016/j.tcs.2011.12.078Google ScholarDigital Library
- Richard Cole, Costas S. Iliopoulos, Manal Mohamed, William F. Smyth, and Lu Yang. 2005. The complexity of the minimum k-cover problem. Journal of Automata, Languages and Combinatorics 10, 5-6 (2005), 641--653.Google Scholar
- Maxime Crochemore, Christophe Hancart, and Thierry Lecroq. 2007. Algorithms on Strings. Cambridge University Press. DOI:https://doi.org/10.1017/cbo9780511546853Google Scholar
- Maxime Crochemore and Wojciech Rytter. 2003. Jewels of Stringology. World Scientific. DOI:https://doi.org/10.1142/4838Google Scholar
- Patryk Czajka and Jakub Radoszewski. 2019. Experimental evaluation of algorithms for computing quasiperiods. arxiv:1909.11336.Google Scholar
- Martin Farach and S. Muthukrishnan. 1996. Perfect hashing for strings: Formalization and algorithms. In 7th Annual Symposium on Combinatorial Pattern Matching, CPM 1996 (LNCS), Daniel S. Hirschberg and Eugene W. Myers (Eds.), Vol. 1075. Springer, 130--140. https://doi.org/10.1007/3-540-61258-0_11Google Scholar
- Martin Farach-Colton, Paolo Ferragina, and S. Muthukrishnan. 2000. On the sorting-complexity of suffix tree construction. Journal of the ACM 47, 6 (2000), 987--1011. DOI:https://doi.org/10.1145/355541.355547Google ScholarDigital Library
- Nathan J. Fine and Herbert S. Wilf. 1965. Uniqueness theorems for periodic functions. Proceedings of the American Mathematical Society 16, 1 (1965), 109--114. DOI:https://doi.org/10.2307/2034009Google ScholarCross Ref
- Tomás Flouri, Costas S. Iliopoulos, Tomasz Kociumaka, Solon P. Pissis, Simon J. Puglisi, William F. Smyth, and Wojciech Tyczyński. 2013. Enhanced string covering. Theoretical Computer Science 506 (2013), 102--114. DOI:https://doi.org/10.1016/j.tcs.2013.08.013Google ScholarDigital Library
- Harold N. Gabow and Roberd E. Tarjan. 1985. A linear-time algorithm for a special case of disjoint set union. Journal of Computer and System Sciences 30, 2 (1985), 209--221. DOI:https://doi.org/10.1016/0022-0000(85)90014-5Google ScholarCross Ref
- Paweł Gawrychowski, Moshe Lewenstein, and Patrick K. Nicholson. 2014. Weighted ancestors in suffix trees. In 22th Annual European Symposium on Algorithms, ESA 2014 (LNCS), Andreas S. Schulz and Dorothea Wagner (Eds.), Vol. 8737. Springer, 455--466. https://doi.org/10.1007/978-3-662-44777-2_38Google Scholar
- Qing Guo, Hui Zhang, and Costas S. Iliopoulos. 2006. Computing the λ-seeds of a string. In 2nd International Conference on Algorithmic Aspects in Information and Management, AAIM 2006 (LNCS), Siu-Wing Cheng and Chung Keung Poon (Eds.), Vol. 4041. Springer, 303--313. https://doi.org/10.1007/11775096_28Google Scholar
- Qing Guo, Hui Zhang, and Costas S. Iliopoulos. 2006. Computing the minimum approximate λ-cover of a string. In 13th International Conference on String Processing and Information Retrieval, SPIRE 2006 (LNCS), Fabio Crestani, Paolo Ferragina, and Mark Sanderson (Eds.), Vol. 4209. Springer, 49--60. https://doi.org/10.1007/11880561_5Google Scholar
- Qing Guo, Hui Zhang, and Costas S. Iliopoulos. 2007. Computing the λ-covers of a string. Information Sciences 177, 19 (2007), 3957--3967. DOI:https://doi.org/10.1016/j.ins.2007.02.020Google ScholarDigital Library
- Ondřej Guth. 2019. On approximate enhanced covers under Hamming distance. Discrete Applied Mathematics 274 (2019), 67--80. DOI:https://doi.org/10.1016/j.dam.2019.01.015Google ScholarCross Ref
- Ondřej Guth, Borivoj Melichar, and Miroslav Balík. 2008. Searching all approximate covers and their distance using finite automata. In Proceedings of the 20th Conference on Theory and Practice of Information Technologies (ITAT’08). http://ceur-ws.org/Vol-414/paper4.pdf.Google Scholar
- Lucian Ilie, Sheng Yu, and Kaizhong Zhang. 2002. Repetition complexity of words. In 8th Annual International Conference on Computing and Combinatorics, COCOON 2002 (LNCS), Oscar H. Ibarra and Louxin Zhang (Eds.), Vol. 2387. Springer, 320--329. https://doi.org/10.1007/3-540-45655-4_35Google ScholarCross Ref
- Costas S. Iliopoulos, Manal Mohamed, and William F. Smyth. 2011. New complexity results for the k-covers problem. Information Sciences 181, 12 (2011), 2571--2575. DOI:https://doi.org/10.1016/j.ins.2011.02.009Google ScholarDigital Library
- Costas S. Iliopoulos, Dennis W. G. Moore, and Kunsoo Park. 1996. Covering a string. Algorithmica 16, 3 (1996), 288--297. DOI:https://doi.org/10.1007/BF01955677Google ScholarCross Ref
- Costas S. Iliopoulos and Laurent Mouchard. 1999. Quasiperiodicity: From detection to normal forms. Journal of Automata, Languages and Combinatorics 4, 3 (1999), 213--228.Google Scholar
- Costas S. Iliopoulos and William F. Smyth. 1998. An on-line algorithm of computing a minimum set of k-covers of a string. In Proceedings of the 9th Australasian Workshop on Combinatorial Algorithms (AWOCA’98). 97--106.Google Scholar
- Juha Kärkkäinen, Peter Sanders, and Stefan Burkhardt. 2006. Linear work suffix array construction. Journal of the ACM 53, 6 (2006), 918--936. DOI:https://doi.org/10.1145/1217856.1217858Google ScholarDigital Library
- Tomasz Kociumaka, Marcin Kubica, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. 2012. A linear time algorithm for seeds computation. In 23rd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'12), Yuval Rabani (Ed.). SIAM, 1095--1112. https://doi.org/10.1137/1.9781611973099Google ScholarCross Ref
- Tomasz Kociumaka, Gonzalo Navarro, and Nicola Prezza. 2019. Towards a definitive measure of repetitiveness. arxiv:1910.02151.Google Scholar
- Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. 2018. Efficient algorithms for shortest partial seeds in words. Theoretical Computer Science 710 (2018), 139--147. DOI:https://doi.org/10.1016/j.tcs.2016.11.035Google ScholarCross Ref
- Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. 2015. Fast algorithm for partial covers in words. Algorithmica 73, 1 (2015), 217--233. DOI:https://doi.org/10.1007/s00453-014-9915-3Google ScholarDigital Library
- Roman M. Kolpakov and Gregory Kucherov. 1999. Finding maximal repetitions in a word in linear time. In Proceedings of the 40th Annual Symposium on Foundations of Computer Science (FOCS’99). IEEE, Los Alamitos, CA, 596--604. DOI:https://doi.org/10.1109/SFFCS.1999.814634Google Scholar
- Yin Li and William F. Smyth. 2002. Computing the cover array in linear time. Algorithmica 32, 1 (2002), 95--106. DOI:https://doi.org/10.1007/s00453-001-0062-2Google ScholarDigital Library
- M. Lothaire. 2002. Algebraic Combinatorics on Words. Encyclopedia of Mathematics and Its Applications (1st edition). Cambridge University Press. DOI:https://doi.org/10.1017/cbo9781107326019Google Scholar
- M. Lothaire. 2005. Applied Combinatorics on Words. Encyclopedia of Mathematics and Its Applications (1st edition). Cambridge University Press. DOI:https://doi.org/10.1017/cbo9781107341005Google Scholar
- Udi Manber and Eugene W. Myers. 1993. Suffix arrays: A new method for on-line string searches. SIAM Journal on Computing 22, 5 (1993), 935--948. DOI:https://doi.org/10.1137/0222058Google ScholarDigital Library
- Dennis W. G. Moore and William F. Smyth. 1994. An optimal algorithm to compute all the covers of a string. Information Processing Letters 50, 5 (1994), 239--246. DOI:https://doi.org/10.1016/0020-0190(94)00045-XGoogle ScholarDigital Library
- Dennis W. G. Moore and William F. Smyth. 1995. A correction to “An Optimal Algorithm to Compute All the Covers of a String”. Information Processing Letters 54, 2 (1995), 101--103. DOI:https://doi.org/10.1016/0020-0190(94)00235-QGoogle ScholarDigital Library
- James H. Morris Jr. and Vaughan R. Pratt. 1970. A Linear Pattern-Matching Algorithm. Technical Report 40. Department of Computer Science, University of California, Berkeley.Google Scholar
- Sofya Raskhodnikova, Dana Ron, Ronitt Rubinfeld, and Adam D. Smith. 2013. Sublinear algorithms for approximating string compressibility. Algorithmica 65, 3 (2013), 685--709. DOI:https://doi.org/10.1007/s00453-012-9618-6Google ScholarDigital Library
- Jeong Seop Sim, Kunsoo Park, Sung-Ryul Kim, and Jee-Soo Lee. 2002. Finding approximate covers of strings. Journal of KIISE: Computer Systems and Theory 29, 1 (2002), 16--21. http://www.koreascience.or.kr/article/ArticleFullRecord.jsp?cn=JBGHG6_2002_v29n1_16.Google Scholar
- William F. Smyth. 2000. Repetitive perhaps, but certainly not boring. Theoretical Computer Science 249, 2 (2000), 343--355. DOI:https://doi.org/10.1016/S0304-3975(00)00067-0Google ScholarDigital Library
- Peter Weiner. 1973. Linear pattern matching algorithms. In Proceedings of the 14th Annual Symposium on Switching and Automata Theory (SWAT’73). IEEE, Los Alamitos, CA, 1--11. DOI:https://doi.org/10.1109/SWAT.1973.13Google ScholarDigital Library
Index Terms
- A Linear-Time Algorithm for Seeds Computation
Recommendations
Linear-time String Indexing and Analysis in Small Space
The field of succinct data structures has flourished over the past 16 years. Starting from the compressed suffix array by Grossi and Vitter (STOC 2000) and the FM-index by Ferragina and Manzini (FOCS 2000), a number of generalizations and applications ...
On left and right seeds of a string
We consider the problem of finding the repetitive structure of a given string y of length n. A factor u of y is a cover of y, if every letter of y lies within some occurrence of u in y. A string v is a seed of y, if it is a cover of a superstring of y. ...
Linear-size suffix tries
Suffix trees are highly regarded data structures for text indexing and string algorithms MCreight 76, Weiner 73. For any given string w of length n = | w | , a suffix tree for w takes O ( n ) nodes and links. It is often presented as a compacted version ...
Comments