Skip to main content

2015 | OriginalPaper | Buchkapitel

Fast and Simple Computations Using Prefix Tables Under Hamming and Edit Distance

verfasst von : Carl Barton, Costas S. Iliopoulos, Solon P. Pissis, William F. Smyth

Erschienen in: Combinatorial Algorithms

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this article, we introduce a new and simple data structure, the prefix table under Hamming distance, and present two algorithms to compute it efficiently: one asymptotically fast; the other very fast on average and in practice. Because the latter approach avoids the computation of global data structures, such as the suffix array and the longest common prefix array, it yields algorithms much faster in practice than existing methods. We show how this data structure can be used to solve two string problems of interest: (a) approximate string matching under Hamming distance; and (b) longest approximate overlap under Hamming distance. Analogously, we introduce the prefix table under edit distance, and present an efficient algorithm for its computation. In the process, we also define the border array under both distance measures, and provide an algorithm for conversion between prefix tables and border arrays.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Amir, A., Lewenstein, M., Porat, E.: Faster algorithms for string matching with \(k\) mismatches. In: Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2000), pp. 794–803. Society for Industrial and Applied Mathematics, USA (2000) Amir, A., Lewenstein, M., Porat, E.: Faster algorithms for string matching with \(k\) mismatches. In: Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2000), pp. 794–803. Society for Industrial and Applied Mathematics, USA (2000)
3.
Zurück zum Zitat Bland, W., Kucherov, G., Smyth, W.F.: Prefix table construction and conversion. In: Lecroq, T., Mouchard, L. (eds.) IWOCA 2013. LNCS, vol. 8288, pp. 41–53. Springer, Heidelberg (2013) CrossRef Bland, W., Kucherov, G., Smyth, W.F.: Prefix table construction and conversion. In: Lecroq, T., Mouchard, L. (eds.) IWOCA 2013. LNCS, vol. 8288, pp. 41–53. Springer, Heidelberg (2013) CrossRef
4.
Zurück zum Zitat Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press, New York (2007) MATHCrossRef Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press, New York (2007) MATHCrossRef
5.
Zurück zum Zitat Dori, S., Landau, G.M.: Construction of Aho Corasick automaton in linear time for integer alphabets. Inf. Process. Lett. 98(2), 66–72 (2006)MATHMathSciNetCrossRef Dori, S., Landau, G.M.: Construction of Aho Corasick automaton in linear time for integer alphabets. Inf. Process. Lett. 98(2), 66–72 (2006)MATHMathSciNetCrossRef
6.
Zurück zum Zitat Fischer, J.: Inducing the LCP-array. In: Dehne, F., Iacono, J., Sack, J.-R. (eds.) WADS 2011. LNCS, vol. 6844, pp. 374–385. Springer, Heidelberg (2011) CrossRef Fischer, J.: Inducing the LCP-array. In: Dehne, F., Iacono, J., Sack, J.-R. (eds.) WADS 2011. LNCS, vol. 6844, pp. 374–385. Springer, Heidelberg (2011) CrossRef
7.
Zurück zum Zitat Fischer, J., Heun, V.: Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput. 40(2), 465–492 (2011)MATHMathSciNetCrossRef Fischer, J., Heun, V.: Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput. 40(2), 465–492 (2011)MATHMathSciNetCrossRef
9.
Zurück zum Zitat Galil, Z., Giancarlo, R.: Improved string matching with \(k\) mismatches. ACM SIGACT News 17(4), 52–54 (1986)CrossRef Galil, Z., Giancarlo, R.: Improved string matching with \(k\) mismatches. ACM SIGACT News 17(4), 52–54 (1986)CrossRef
10.
Zurück zum Zitat Hall, H.S., Knight, S.R.: Higher Algebra. MacMillan, London (1950) Hall, H.S., Knight, S.R.: Higher Algebra. MacMillan, London (1950)
12.
Zurück zum Zitat Ilie, L., Navarro, G., Tinta, L.: The longest common extension problem revisited and applications to approximate string searching. J. Discrete Algorithms 8(4), 418–428 (2010)MATHMathSciNetCrossRef Ilie, L., Navarro, G., Tinta, L.: The longest common extension problem revisited and applications to approximate string searching. J. Discrete Algorithms 8(4), 418–428 (2010)MATHMathSciNetCrossRef
13.
Zurück zum Zitat Landau, G.M., Myers, E.W., Schmidt, J.P.: Incremental string comparison. SIAM J. Comput. 27–2, 557–582 (1998)MathSciNetCrossRef Landau, G.M., Myers, E.W., Schmidt, J.P.: Incremental string comparison. SIAM J. Comput. 27–2, 557–582 (1998)MathSciNetCrossRef
14.
Zurück zum Zitat Landau, G.M., Vishkin, U.: Efficient string matching in the presence of errors. In: IEEE (ed.) Proceedings of the 26th Annual Symposium on Foundations of Computer Science (FOCS 1985), USA, pp. 126–136. IEEE Computer Society (1985) Landau, G.M., Vishkin, U.: Efficient string matching in the presence of errors. In: IEEE (ed.) Proceedings of the 26th Annual Symposium on Foundations of Computer Science (FOCS 1985), USA, pp. 126–136. IEEE Computer Society (1985)
15.
Zurück zum Zitat Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Technical report 8 (1966) Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Technical report 8 (1966)
16.
Zurück zum Zitat Main, M.G., Lorentz, R.J.: An \(\cal O\)(n log n) algorithm for finding all repetitions in a string. J. Algs 5, 422–432 (1984)MATHMathSciNetCrossRef Main, M.G., Lorentz, R.J.: An \(\cal O\)(n log n) algorithm for finding all repetitions in a string. J. Algs 5, 422–432 (1984)MATHMathSciNetCrossRef
17.
Zurück zum Zitat Nong, G., Zhang, S., Chan, W.H.: Linear suffix array construction by almost pure induced-sorting. In: Proceedings of the 2009 Data Compression Conference, DCC 2009, pp. 193–202, IEEE Computer Society, Washington, DC (2009) Nong, G., Zhang, S., Chan, W.H.: Linear suffix array construction by almost pure induced-sorting. In: Proceedings of the 2009 Data Compression Conference, DCC 2009, pp. 193–202, IEEE Computer Society, Washington, DC (2009)
19.
Zurück zum Zitat Smyth, B.: Computing Patterns in Strings. Pearson Addison-Wesley, London (2003) Smyth, B.: Computing Patterns in Strings. Pearson Addison-Wesley, London (2003)
20.
Zurück zum Zitat Smyth, W.F., Wang, S.: New perspectives on the prefix array. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 133–143. Springer, Heidelberg (2008) CrossRef Smyth, W.F., Wang, S.: New perspectives on the prefix array. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 133–143. Springer, Heidelberg (2008) CrossRef
23.
Zurück zum Zitat Välimäki, N., Ladra, S., Mäkinen, V.: Approximate all-pairs suffix/prefix overlaps. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 76–87. Springer, Heidelberg (2010) CrossRef Välimäki, N., Ladra, S., Mäkinen, V.: Approximate all-pairs suffix/prefix overlaps. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 76–87. Springer, Heidelberg (2010) CrossRef
24.
Zurück zum Zitat Wu, S., Manber, U.: Fast text searching: allowing errors. Commun. ACM 35(10), 83–91 (1992)CrossRef Wu, S., Manber, U.: Fast text searching: allowing errors. Commun. ACM 35(10), 83–91 (1992)CrossRef
25.
Zurück zum Zitat Zhang, J., Kobert, K., Flouri, T., Stamatakis, A.: PEAR: a fast and accurate Illumina paired-end reAd mergeR. Bioinformatics 30(5), 614–620 (2013)CrossRef Zhang, J., Kobert, K., Flouri, T., Stamatakis, A.: PEAR: a fast and accurate Illumina paired-end reAd mergeR. Bioinformatics 30(5), 614–620 (2013)CrossRef
Metadaten
Titel
Fast and Simple Computations Using Prefix Tables Under Hamming and Edit Distance
verfasst von
Carl Barton
Costas S. Iliopoulos
Solon P. Pissis
William F. Smyth
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-19315-1_5