Skip to main content
Top

2020 | OriginalPaper | Chapter

Minimal Unique Substrings and Minimal Absent Words in a Sliding Window

Authors : Takuya Mieno, Yuki Kuhara, Tooru Akagi, Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Published in: SOFSEM 2020: Theory and Practice of Computer Science

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

A substring u of a string T is called a minimal unique substring (MUS) of T if u occurs exactly once in T and any proper substring of u occurs at least twice in T. A string w is called a minimal absent word (MAW) of T if w does not occur in T and any proper substring of w occurs in T. In this paper, we study the problems of computing MUSs and MAWs in a sliding window over a given string T. We first show how the set of MUSs can change in a sliding window over T, and present an \(O(n\log \sigma )\)-time and O(d)-space algorithm to compute MUSs in a sliding window of width d over T, where \(\sigma \) is the maximum number of distinct characters in every window. We then give tight upper and lower bounds on the maximum number of changes in the set of MAWs in a sliding window over T. Our bounds improve on the previous results in Crochemore et al. (2017).

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
At least one of w[2..] and \(w[..|w|\, - \,1]\) is absent from T[i..j], because \(w \not \in \mathsf {MAW}\) (T[i..j]).
 
Literature
1.
go back to reference Chairungsee, S., Crochemore, M.: Using minimal absent words to build phylogeny. Theor. Comput. Sci. 450, 109–116 (2012)MathSciNetCrossRef Chairungsee, S., Crochemore, M.: Using minimal absent words to build phylogeny. Theor. Comput. Sci. 450, 109–116 (2012)MathSciNetCrossRef
2.
go back to reference Cleary, J.G., Witten, I.H.: Data compression using adaptive coding and partial string matching. IEEE Trans. Commun. 32(4), 396–402 (1984)CrossRef Cleary, J.G., Witten, I.H.: Data compression using adaptive coding and partial string matching. IEEE Trans. Commun. 32(4), 396–402 (1984)CrossRef
3.
go back to reference Crochemore, M., Mignosi, F., Restivo, A., Salemi, S.: Data compression using antidictionaries. Proc. IEEE 88(11), 1756–1768 (2000)CrossRef Crochemore, M., Mignosi, F., Restivo, A., Salemi, S.: Data compression using antidictionaries. Proc. IEEE 88(11), 1756–1768 (2000)CrossRef
4.
5.
go back to reference Gräf, S.: Optimized design and assessment of whole genome tiling arrays. Bioinformatics 23(13), i195–i204 (2007)CrossRef Gräf, S.: Optimized design and assessment of whole genome tiling arrays. Bioinformatics 23(13), i195–i204 (2007)CrossRef
7.
go back to reference Larsson, N.J.: Extended application of suffix trees to data compression. In: DCC 1996, pp. 190–199 (1996) Larsson, N.J.: Extended application of suffix trees to data compression. In: DCC 1996, pp. 190–199 (1996)
8.
go back to reference Mieno, T., Inenaga, S., Bannai, H., Takeda, M.: Shortest unique substring queries on run-length encoded strings. In: MFCS 2016, pp. 69:1–69:11 (2016) Mieno, T., Inenaga, S., Bannai, H., Takeda, M.: Shortest unique substring queries on run-length encoded strings. In: MFCS 2016, pp. 69:1–69:11 (2016)
9.
go back to reference Mieno, T., et al.: Minimal unique substrings and minimal absent words in a sliding window. CoRR abs/1909.02804 (2019) Mieno, T., et al.: Minimal unique substrings and minimal absent words in a sliding window. CoRR abs/1909.02804 (2019)
10.
go back to reference Ota, T., Fukae, H., Morita, H.: Dynamic construction of an antidictionary with linear complexity. Theor. Comput. Sci. 526, 108–119 (2014)MathSciNetCrossRef Ota, T., Fukae, H., Morita, H.: Dynamic construction of an antidictionary with linear complexity. Theor. Comput. Sci. 526, 108–119 (2014)MathSciNetCrossRef
11.
go back to reference Pei, J., Wu, W.C., Yeh, M.: On shortest unique substring queries. In: ICDE 2013, pp. 937–948 (2013) Pei, J., Wu, W.C., Yeh, M.: On shortest unique substring queries. In: ICDE 2013, pp. 937–948 (2013)
12.
go back to reference Senft, M.: Suffix tree for a sliding window: an overview. In: WDS, vol. 5, pp. 41–46 (2005) Senft, M.: Suffix tree for a sliding window: an overview. In: WDS, vol. 5, pp. 41–46 (2005)
13.
go back to reference Silva, R.M., Pratas, D., Castro, L., Pinho, A.J., Ferreira, P.J.S.G.: Three minimal sequences found in Ebola virus genomes and absent from human DNA. Bioinformatics 31(15), 2421–2425 (2015)CrossRef Silva, R.M., Pratas, D., Castro, L., Pinho, A.J., Ferreira, P.J.S.G.: Three minimal sequences found in Ebola virus genomes and absent from human DNA. Bioinformatics 31(15), 2421–2425 (2015)CrossRef
16.
go back to reference Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory IT–23(3), 337–349 (1977)MathSciNetCrossRef Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory IT–23(3), 337–349 (1977)MathSciNetCrossRef
Metadata
Title
Minimal Unique Substrings and Minimal Absent Words in a Sliding Window
Authors
Takuya Mieno
Yuki Kuhara
Tooru Akagi
Yuta Fujishige
Yuto Nakashima
Shunsuke Inenaga
Hideo Bannai
Masayuki Takeda
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-38919-2_13

Premium Partner