Skip to main content

Minimal Absent Words in a Sliding Window and Applications to On-Line Pattern Matching

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10472))

Abstract

An absent (or forbidden) word of a word y is a word that does not occur in y. It is then called minimal if all its proper factors occur in y. There exist linear-time and linear-space algorithms for computing all minimal absent words of y (Crochemore et al. in Inf Process Lett 67:111–117, 1998; Belazzougui et al. in ESA 8125:133–144, 2013; Barton et al. in BMC Bioinform 15:388, 2014). Minimal absent words are used for data compression (Crochemore et al. in Proc IEEE 88:1756–1768, 2000, Ota and Morita in Theoret Comput Sci 526:108–119, 2014) and for alignment-free sequence comparison by utilizing a metric based on minimal absent words (Chairungsee and Crochemore in Theoret Comput Sci 450:109–116, 2012). They are also used in molecular biology; for instance, three minimal absent words of the human genome were found to play a functional role in a coding region in Ebola virus genomes (Silva et al. in Bioinformatics 31:2421–2425, 2015). In this article we introduce a new application of minimal absent words for on-line pattern matching. Specifically, we present an algorithm that, given a pattern x and a text y, computes the distance between x and every window of size |x| on y. The running time is \(\mathcal {O}(\sigma |y|)\), where \(\sigma \) is the size of the alphabet. Along the way, we show an \(\mathcal {O}(\sigma |y|)\)-time and \(\mathcal {O}(\sigma |x|)\)-space algorithm to compute the minimal absent words of every window of size |x| on y, together with some new combinatorial insight on minimal absent words.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18, 333–340 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  2. Almirantis, Y., Charalampopoulos, P., Gao, J., Iliopoulos, C.S., Mohamed, M., Pissis, S.P., Polychronopoulos, D.: On avoided words, absent words, and their application to biological sequence analysis. Algorithms Mol. Biol. 12(1), 5:1–5:12 (2017)

    Google Scholar 

  3. Barton, C., Heliou, A., Mouchard, L., Pissis, S.P.: Linear-time computation of minimal absent words using suffix array. BMC Bioinform. 15, 11 (2014)

    Article  Google Scholar 

  4. Barton, C., Heliou, A., Mouchard, L., Pissis, S.P.: Parallelising the computation of minimal absent words. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds.) PPAM 2015. LNCS, vol. 9574, pp. 243–253. Springer, Cham (2016). doi:10.1007/978-3-319-32152-3_23

    Chapter  Google Scholar 

  5. Béal, M.-P., Mignosi, F., Restivo, A.: Minimal forbidden words and symbolic dynamics. In: Puech, C., Reischuk, R. (eds.) STACS 1996. LNCS, vol. 1046, pp. 555–566. Springer, Heidelberg (1996). doi:10.1007/3-540-60922-9_45

    Chapter  Google Scholar 

  6. Belazzougui, D., Cunial, F., Kärkkäinen, J., Mäkinen, V.: Versatile succinct representations of the bidirectional Burrows-wheeler transform. In: Bodlaender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 133–144. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40450-4_12

    Chapter  Google Scholar 

  7. Chairungsee, S., Crochemore, M.: Using minimal absent words to build phylogeny. Theoret. Comput. Sci. 450, 109–116 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  8. Crochemore, M., Fici, G., Mercas, R., Pissis, S.P.: Linear-time sequence comparison using minimal absent words. In: Kranakis, E., Navarro, G., Chávez, E. (eds.) LATIN 2016. LNCS, vol. 9644, pp. 334–346. Springer, Heidelberg (2016). doi:10.1007/978-3-662-49529-2_25

    Google Scholar 

  9. Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press, Cambridge (2007)

    Google Scholar 

  10. Crochemore, M., Mignosi, F., Restivo, A.: Automata and forbidden words. Inf. Process. Lett. 67(3), 111–117 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  11. Crochemore, M., Mignosi, F., Restivo, A., Salemi, S.: Data compression using antidictonaries. Proc. IEEE 88(11), 1756–1768 (2000)

    Article  Google Scholar 

  12. Dömölki, B.: An algorithm for syntactical analysis. Comput. Linguist. 3, 29–46 (1964)

    Google Scholar 

  13. Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: FOCS, pp. 390–398. IEEE Computer Society (2000)

    Google Scholar 

  14. Fici, G.: Minimal Forbidden Words and Applications. Thèse, Université de Marne la Vallée (2006)

    Google Scholar 

  15. Fujishige, Y., Tsujimaru, Y., Inenaga, S., Bannai, H., Takeda, M.: Computing DAWGs and minimal absent words in linear time for integer alphabets. In: MFCS. LIPIcs, vol. 58, pp. 38:1–38:14. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2016)

    Google Scholar 

  16. Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)

    Google Scholar 

  17. Hampikian, G., Andersen, T.L.: Absent sequences: nullomers and primes. In: PSB, pp. 355–366. World Scientific (2007)

    Google Scholar 

  18. Heliou, A., Pissis, S.P., Puglisi, S.J.: emMAW: computing minimal absent words in external memory. Bioinformatics (2017)

    Google Scholar 

  19. Herold, J., Kurtz, S., Giegerich, R.: Efficient computation of absent words in genomic sequences. BMC Bioinform. 9, 167 (2008)

    Article  Google Scholar 

  20. Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  21. Kucherov, G., Salikhov, K., Tsur, D.: Approximate string matching using a bidirectional index. Theoret. Comput. Sci. 638, 145–158 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  22. Landau, G.M., Myers, E.W., Schmidt, J.P.: Incremental string comparison. SIAM J. Comput. 27–2, 557–582 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  23. Mignosi, F., Restivo, A., Sciortino, M.: Words and forbidden factors. Theoret. Comput. Sci. 273(1–2), 99–117 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  24. Myers, G.: A fast bit-vector algorithm for approximate string matching based on dynamic programming. J. ACM 46(3), 395–415 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  25. Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)

    Article  Google Scholar 

  26. Navarro, G., Baeza-Yates, R.A., Sutinen, E., Tarhio, J.: Indexing methods for approximate string matching. IEEE Data Eng. Bull. 24(4), 19–27 (2001)

    Google Scholar 

  27. Navarro, G., Raffinot, M.: Flexible Pattern Matching in Strings: Practical On-line Search Algorithms for Texts and Biological Sequences. Cambridge University Press, Cambridge (2008)

    Google Scholar 

  28. Ota, T., Fukae, H., Morita, H.: Dynamic construction of an antidictionary with linear complexity. Theor. Comput. Sci. 526, 108–119 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  29. Ota, T., Morita, H.: On a universal antidictionary coding for stationary ergodic sources with finite alphabet. In: ISITA, pp. 294–298. IEEE (2014)

    Google Scholar 

  30. Rahman, M.S., Alatabbi, A., Athar, T., Crochemore, M., Rahman, M.S.: Absent words and the (dis)similarity analysis of DNA sequences: an experimental study. BMC Bioinform. Notes 9(1), 1–8 (2016)

    Google Scholar 

  31. Senft, M.: Suffix tree for a sliding window: an overview. In: WDS, pp. 41–46. Matfyzpress (2005)

    Google Scholar 

  32. Silva, R.M., Pratas, D., Castro, L., Pinho, A.J., Ferreira, P.J.S.G.: Three minimal sequences found in Ebola virus genomes and absent from human DNA. Bioinformatics 31(15), 2421–2425 (2015)

    Article  Google Scholar 

  33. Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  34. Wu, Z., Jiang, T., Su, W.: Efficient computation of shortest absent words in a genomic sequence. Inf. Process. Lett. 110(14–15), 596–601 (2010)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alice Héliou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer-Verlag GmbH Germany

About this paper

Cite this paper

Crochemore, M., Héliou, A., Kucherov, G., Mouchard, L., Pissis, S.P., Ramusat, Y. (2017). Minimal Absent Words in a Sliding Window and Applications to On-Line Pattern Matching. In: Klasing, R., Zeitoun, M. (eds) Fundamentals of Computation Theory. FCT 2017. Lecture Notes in Computer Science(), vol 10472. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-55751-8_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-55751-8_14

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-55750-1

  • Online ISBN: 978-3-662-55751-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics