Skip to main content

2015 | OriginalPaper | Buchkapitel

Lossless Seeds for Searching Short Patterns with High Error Rates

verfasst von : Christophe Vroland, Mikaël Salson, Hélène Touzet

Erschienen in: Combinatorial Algorithms

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We address the problem of approximate pattern matching using the Levenshtein distance. Given a text T and a pattern P, find all locations in T that differ by at most k errors from P. For that purpose, we propose a filtration algorithm that is based on a novel type of seeds, combining exact parts and parts with a fixed number of errors. Experimental tests show that the method is specifically well-suited for short patterns with a large number of errors.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Baeza-Yates, R.A., Perleberg, C.H.: Fast and practical approximate string matching. Inf. Process. Lett. 59(1), 21–27 (1996)MATHMathSciNetCrossRef Baeza-Yates, R.A., Perleberg, C.H.: Fast and practical approximate string matching. Inf. Process. Lett. 59(1), 21–27 (1996)MATHMathSciNetCrossRef
2.
Zurück zum Zitat Belazzougui, D.: Improved space-time tradeoffs for approximate full-text indexing with one edit error. Algorithmica, pp. 1–27 (2014) Belazzougui, D.: Improved space-time tradeoffs for approximate full-text indexing with one edit error. Algorithmica, pp. 1–27 (2014)
3.
Zurück zum Zitat Belazzougui, D., Cunial, F., Kärkkäinen, J., Mäkinen, V.: Versatile succinct representations of the bidirectional burrows-wheeler transform. In: Bodlaender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 133–144. Springer, Heidelberg (2013) CrossRef Belazzougui, D., Cunial, F., Kärkkäinen, J., Mäkinen, V.: Versatile succinct representations of the bidirectional burrows-wheeler transform. In: Bodlaender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 133–144. Springer, Heidelberg (2013) CrossRef
4.
Zurück zum Zitat Chan, H.L., Lam, T.W., Sung, W.K., Tam, S.L., Wong, S.S.: A linear size index for approximate pattern matching. J. Discrete Algorithms 9(4), 358–364 (2011)MATHMathSciNetCrossRef Chan, H.L., Lam, T.W., Sung, W.K., Tam, S.L., Wong, S.S.: A linear size index for approximate pattern matching. J. Discrete Algorithms 9(4), 358–364 (2011)MATHMathSciNetCrossRef
5.
Zurück zum Zitat Chávez, E., Navarro, G.: A metric index for approximate string matching. In: Rajsbaum, S. (ed.) LATIN 2002. LNCS, vol. 2286, pp. 181–195. Springer, Heidelberg (2002) CrossRef Chávez, E., Navarro, G.: A metric index for approximate string matching. In: Rajsbaum, S. (ed.) LATIN 2002. LNCS, vol. 2286, pp. 181–195. Springer, Heidelberg (2002) CrossRef
6.
Zurück zum Zitat Döring, A., Weese, D., Rausch, T., Reinert, K.: SeqAn an efficient, generic C++ library for sequence analysis. BMC Bioinformatics 9(1), 11–19 (2008)CrossRef Döring, A., Weese, D., Rausch, T., Reinert, K.: SeqAn an efficient, generic C++ library for sequence analysis. BMC Bioinformatics 9(1), 11–19 (2008)CrossRef
7.
Zurück zum Zitat Ferragina, P., González, R., Navarro, G., Venturini, R.: Compressed text indexes: from theory to practice. J. Exp. Algorithmics (JEA) 13, 12 (2009) Ferragina, P., González, R., Navarro, G., Venturini, R.: Compressed text indexes: from theory to practice. J. Exp. Algorithmics (JEA) 13, 12 (2009)
9.
Zurück zum Zitat Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences and full-text indexes. ACM Trans. Alg. (TALG) 3(2) (2007) Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences and full-text indexes. ACM Trans. Alg. (TALG) 3(2) (2007)
10.
Zurück zum Zitat Hyyrö, H.: A bit-vector algorithm for computing levenshtein and damerau edit distances. Nord. J. Comput. 10(1), 29–39 (2003)MATH Hyyrö, H.: A bit-vector algorithm for computing levenshtein and damerau edit distances. Nord. J. Comput. 10(1), 29–39 (2003)MATH
11.
Zurück zum Zitat Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with Bowtie 2. Nat. Meth. 9(4), 357–359 (2012)CrossRef Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with Bowtie 2. Nat. Meth. 9(4), 357–359 (2012)CrossRef
12.
Zurück zum Zitat Li, H., Durbin, R.: Fast and accurate short read alignment with burrows-wheeler transform. bioinformatics 25(14), 1754–1760 (2009). (Oxford, England)CrossRef Li, H., Durbin, R.: Fast and accurate short read alignment with burrows-wheeler transform. bioinformatics 25(14), 1754–1760 (2009). (Oxford, England)CrossRef
14.
Zurück zum Zitat Myers, G.: A fast bit-vector algorithm for approximate string matching based on dynamic programming. J. ACM 46(3), 395–415 (1999)MATHMathSciNetCrossRef Myers, G.: A fast bit-vector algorithm for approximate string matching based on dynamic programming. J. ACM 46(3), 395–415 (1999)MATHMathSciNetCrossRef
15.
Zurück zum Zitat Navarro, G.: A guided tour to approximate string matching. ACM comput. surv. (CSUR) 33(1), 31–88 (2001)CrossRef Navarro, G.: A guided tour to approximate string matching. ACM comput. surv. (CSUR) 33(1), 31–88 (2001)CrossRef
16.
Zurück zum Zitat Navarro, G., Baeza-Yates, R.: A hybrid indexing method for approximate string matching. J. Discrete Algorithms 1, 19–27 (2001) Navarro, G., Baeza-Yates, R.: A hybrid indexing method for approximate string matching. J. Discrete Algorithms 1, 19–27 (2001)
17.
Zurück zum Zitat Navarro, G., Sutinen, E., Tanninen, J., Tarhio, J.: Indexing text with approximate q-grams. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 350–363. Springer, Heidelberg (2000) CrossRef Navarro, G., Sutinen, E., Tanninen, J., Tarhio, J.: Indexing text with approximate q-grams. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 350–363. Springer, Heidelberg (2000) CrossRef
18.
Zurück zum Zitat Petri, M., Culpepper, J.S.: Efficient indexing algorithms for approximate pattern matching in text. In: Proceedings of the Seventeenth Australasian Document Computing Symposium, ADCS 2012, pp. 9–16. ACM, New York (2012) Petri, M., Culpepper, J.S.: Efficient indexing algorithms for approximate pattern matching in text. In: Proceedings of the Seventeenth Australasian Document Computing Symposium, ADCS 2012, pp. 9–16. ACM, New York (2012)
19.
Zurück zum Zitat Russo, L., Navarro, G., Oliveira, A.L., Morales, P.: Approximate string matching with compressed indexes. Algorithms 2(3), 1105–1136 (2009)MathSciNetCrossRef Russo, L., Navarro, G., Oliveira, A.L., Morales, P.: Approximate string matching with compressed indexes. Algorithms 2(3), 1105–1136 (2009)MathSciNetCrossRef
20.
Zurück zum Zitat Schbath, S., Martin, V., Zytnicki, M., Fayolle, J., Loux, V., Gibrat, J.F.: Mapping reads on a genomic sequence: an algorithmic overview and a practical comparative analysis. J. Comput. Biol. 19(6), 796–813 (2012)MathSciNetCrossRef Schbath, S., Martin, V., Zytnicki, M., Fayolle, J., Loux, V., Gibrat, J.F.: Mapping reads on a genomic sequence: an algorithmic overview and a practical comparative analysis. J. Comput. Biol. 19(6), 796–813 (2012)MathSciNetCrossRef
21.
Zurück zum Zitat Schnattinger, T., Ohlebusch, E., Gog, S.: Bidirectional search in a string with wavelet trees. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 40–50. Springer, Heidelberg (2010) CrossRef Schnattinger, T., Ohlebusch, E., Gog, S.: Bidirectional search in a string with wavelet trees. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 40–50. Springer, Heidelberg (2010) CrossRef
22.
Zurück zum Zitat Shah, S.A., Hansen, N.R., Garrett, R.A.: Distribution of CRISPR spacer matches in viruses and plasmids of crenarchaeal acidothermophiles and implications for their inhibitory mechanism. Biochem. Soc. Trans. 37(1), 23 (2009)CrossRef Shah, S.A., Hansen, N.R., Garrett, R.A.: Distribution of CRISPR spacer matches in viruses and plasmids of crenarchaeal acidothermophiles and implications for their inhibitory mechanism. Biochem. Soc. Trans. 37(1), 23 (2009)CrossRef
23.
Zurück zum Zitat Slater, G.S.C., Birney, E.: Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6, 1–11 (2005)CrossRef Slater, G.S.C., Birney, E.: Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6, 1–11 (2005)CrossRef
24.
Zurück zum Zitat Stern, A., Keren, L., Wurtzel, O., Amitai, G., Sorek, R.: Self-targeting by CRISPR: gene regulation or autoimmunity? Trends Genet. 26(8), 335–340 (2010)CrossRef Stern, A., Keren, L., Wurtzel, O., Amitai, G., Sorek, R.: Self-targeting by CRISPR: gene regulation or autoimmunity? Trends Genet. 26(8), 335–340 (2010)CrossRef
25.
Zurück zum Zitat Storz, G., Altuvia, S., Wassarman, K.M.: An abundance of RNA regulators. Annu. Rev. Biochem. 74, 199–217 (2005)CrossRef Storz, G., Altuvia, S., Wassarman, K.M.: An abundance of RNA regulators. Annu. Rev. Biochem. 74, 199–217 (2005)CrossRef
26.
Zurück zum Zitat Weese, D., Holtgrewe, M., Reinert, K.: RazerS 3: faster, fully sensitive read mapping. Bioinformatics 28(20), 2592–2599 (2012)CrossRef Weese, D., Holtgrewe, M., Reinert, K.: RazerS 3: faster, fully sensitive read mapping. Bioinformatics 28(20), 2592–2599 (2012)CrossRef
27.
Zurück zum Zitat Wu, S., Manber, U.: Fast text searching: allowing errors. Commun. ACM 35(10), 83–91 (1992)CrossRef Wu, S., Manber, U.: Fast text searching: allowing errors. Commun. ACM 35(10), 83–91 (1992)CrossRef
Metadaten
Titel
Lossless Seeds for Searching Short Patterns with High Error Rates
verfasst von
Christophe Vroland
Mikaël Salson
Hélène Touzet
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-19315-1_32

Premium Partner