Skip to main content

2019 | OriginalPaper | Buchkapitel

Flexible and Efficient Algorithms for Abelian Matching in Genome Sequence

verfasst von : Simone Faro, Arianna Pavone

Erschienen in: Bioinformatics and Biomedical Engineering

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Approximate matching in strings is a fundamental and challenging problem in computer science and in computational biology, and increasingly fast algorithms are highly demanded in many applications including text processing and dna sequence analysis. Recently efficient solutions to specific approximate matching problems on genomic sequences have been designed using a filtering technique, based on the general abelian matching problem, which firstly locates the set of all candidate matching positions and then perform an additional verification test on the collected positions.
The abelian pattern matching problem consists in finding all substrings of a text which are permutations of a given pattern. In this paper we present a new class of algorithms based on a new efficient fingerprint computation approach, called Heap-Counting, which turns out to be fast, flexible and easy to be implemented. We prove that, when applied for searching short patterns on a dna sequence, our solutions have a linear worst case time complexity. In addition we present an experimental evaluation which shows that our newly presented algorithms are among the most efficient and flexible solutions in practice for the abelian matching problem in dna sequences.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The Smart tool is available online at https://​smart-tool.​github.​io/​smart/​.
 
Literatur
1.
Zurück zum Zitat Amir, A., Apostolico, A., Landau, G.M., Satta, G.: Efficient text fingerprinting via Parikh mapping. J. Discrete Algorithms 1(56), 409–421 (2003)MathSciNetCrossRef Amir, A., Apostolico, A., Landau, G.M., Satta, G.: Efficient text fingerprinting via Parikh mapping. J. Discrete Algorithms 1(56), 409–421 (2003)MathSciNetCrossRef
2.
Zurück zum Zitat Baeza-Yates, R.A., Navarro, G.: New and faster filters for multiple approximate string matching. Random Struct. Algorithms 20(1), 23–49 (2002)MathSciNetCrossRef Baeza-Yates, R.A., Navarro, G.: New and faster filters for multiple approximate string matching. Random Struct. Algorithms 20(1), 23–49 (2002)MathSciNetCrossRef
5.
Zurück zum Zitat Böcker, S.: Sequencing from compomers: using mass spectrometry for DNA de novo sequencing of 200+ nt. J. Comput. Biol. 11(6), 1110–1134 (2004)CrossRef Böcker, S.: Sequencing from compomers: using mass spectrometry for DNA de novo sequencing of 200+ nt. J. Comput. Biol. 11(6), 1110–1134 (2004)CrossRef
6.
Zurück zum Zitat Burcsi, P., Cicalese, F., Fici, G., Lipták, Z.: Algorithms for jumbled pattern matching in strings. Int. J. Found. Comput. Sci. 23(2), 357–374 (2012)MathSciNetCrossRef Burcsi, P., Cicalese, F., Fici, G., Lipták, Z.: Algorithms for jumbled pattern matching in strings. Int. J. Found. Comput. Sci. 23(2), 357–374 (2012)MathSciNetCrossRef
8.
Zurück zum Zitat Cantone, D., Faro, S.: Efficient online Abelian pattern matching in strings by simulating reactive multi-automata. In: Holub, J., Zdarek, J. (eds.) Proceedings of the PSC 2014, pp. 30–42 (2014) Cantone, D., Faro, S.: Efficient online Abelian pattern matching in strings by simulating reactive multi-automata. In: Holub, J., Zdarek, J. (eds.) Proceedings of the PSC 2014, pp. 30–42 (2014)
9.
Zurück zum Zitat Chhabra, T., Ghuman, S.S., Tarhio, J.: Tuning algorithms for jumbled matching. In: Holub, J., Zdarek, J. (eds.) Proceedings of the PSC 2015, pp. 57–66 (2015) Chhabra, T., Ghuman, S.S., Tarhio, J.: Tuning algorithms for jumbled matching. In: Holub, J., Zdarek, J. (eds.) Proceedings of the PSC 2015, pp. 57–66 (2015)
11.
Zurück zum Zitat Eres, R., Landau, G.M., Parida, L.: Permutation pattern discovery in biosequences. J. Comput. Biol. 11(6), 1050–1060 (2004)CrossRef Eres, R., Landau, G.M., Parida, L.: Permutation pattern discovery in biosequences. J. Comput. Biol. 11(6), 1050–1060 (2004)CrossRef
12.
Zurück zum Zitat Faro, S., Lecroq, T., Borzì, S., Di Mauro, S., Maggio, A.: The string matching algorithms research tool. In: Proceeding of Stringology, pp. 99–111 (2016) Faro, S., Lecroq, T., Borzì, S., Di Mauro, S., Maggio, A.: The string matching algorithms research tool. In: Proceeding of Stringology, pp. 99–111 (2016)
13.
Zurück zum Zitat Ghuman, S.S., Tarhio, J.: Jumbled matching with SIMD. In: Holub, J., Zdarek, J. (eds.) Proceeding of the PSC 2016, pp. 114–124 (2016) Ghuman, S.S., Tarhio, J.: Jumbled matching with SIMD. In: Holub, J., Zdarek, J. (eds.) Proceeding of the PSC 2016, pp. 114–124 (2016)
14.
Zurück zum Zitat Ghuman, S.S.: Improved online algorithms for jumbled matching. Doctoral Dissertation 242/2017, Aalto University publication series, Aalto University, School of Science, Department of Computer Science (2017) Ghuman, S.S.: Improved online algorithms for jumbled matching. Doctoral Dissertation 242/2017, Aalto University publication series, Aalto University, School of Science, Department of Computer Science (2017)
15.
Zurück zum Zitat Grabowski, S., Faro, S., Giaquinta, E.: String matching with inversions and translocations in linear average time (most of the time). Inf. Process. Lett. 111(11), 516–520 (2011)MathSciNetCrossRef Grabowski, S., Faro, S., Giaquinta, E.: String matching with inversions and translocations in linear average time (most of the time). Inf. Process. Lett. 111(11), 516–520 (2011)MathSciNetCrossRef
16.
Zurück zum Zitat Grossi, R., Luccio, F.: Simple and efficient string matching with \(k\) mismatches. Inf. Process. Lett. 33(3), 113–120 (1989)MathSciNetCrossRef Grossi, R., Luccio, F.: Simple and efficient string matching with \(k\) mismatches. Inf. Process. Lett. 33(3), 113–120 (1989)MathSciNetCrossRef
17.
Zurück zum Zitat Horspool, R.N.: Practical fast searching in strings. Softw. Pract. Exp. 10(6), 501–506 (1980)CrossRef Horspool, R.N.: Practical fast searching in strings. Softw. Pract. Exp. 10(6), 501–506 (1980)CrossRef
18.
Zurück zum Zitat Jokinen, P., Tarhio, J., Ukkonen, E.: A comparison of approximate string matching algorithms. Softw. Pract. Exp. 26(12), 1439–1458 (1996)CrossRef Jokinen, P., Tarhio, J., Ukkonen, E.: A comparison of approximate string matching algorithms. Softw. Pract. Exp. 26(12), 1439–1458 (1996)CrossRef
19.
Zurück zum Zitat Navarro, G.: Multiple approximate string matching by counting. In: Baeza-Yates, R. (ed.) 1997 Proceeding of the 4th South American Workshop on String Processing, pp. 125–139 (1997) Navarro, G.: Multiple approximate string matching by counting. In: Baeza-Yates, R. (ed.) 1997 Proceeding of the 4th South American Workshop on String Processing, pp. 125–139 (1997)
Metadaten
Titel
Flexible and Efficient Algorithms for Abelian Matching in Genome Sequence
verfasst von
Simone Faro
Arianna Pavone
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-17938-0_28