Skip to main content
Top

2015 | OriginalPaper | Chapter

Lossless Seeds for Searching Short Patterns with High Error Rates

Authors : Christophe Vroland, Mikaël Salson, Hélène Touzet

Published in: Combinatorial Algorithms

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We address the problem of approximate pattern matching using the Levenshtein distance. Given a text T and a pattern P, find all locations in T that differ by at most k errors from P. For that purpose, we propose a filtration algorithm that is based on a novel type of seeds, combining exact parts and parts with a fixed number of errors. Experimental tests show that the method is specifically well-suited for short patterns with a large number of errors.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
2.
go back to reference Belazzougui, D.: Improved space-time tradeoffs for approximate full-text indexing with one edit error. Algorithmica, pp. 1–27 (2014) Belazzougui, D.: Improved space-time tradeoffs for approximate full-text indexing with one edit error. Algorithmica, pp. 1–27 (2014)
3.
go back to reference Belazzougui, D., Cunial, F., Kärkkäinen, J., Mäkinen, V.: Versatile succinct representations of the bidirectional burrows-wheeler transform. In: Bodlaender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 133–144. Springer, Heidelberg (2013) CrossRef Belazzougui, D., Cunial, F., Kärkkäinen, J., Mäkinen, V.: Versatile succinct representations of the bidirectional burrows-wheeler transform. In: Bodlaender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 133–144. Springer, Heidelberg (2013) CrossRef
4.
go back to reference Chan, H.L., Lam, T.W., Sung, W.K., Tam, S.L., Wong, S.S.: A linear size index for approximate pattern matching. J. Discrete Algorithms 9(4), 358–364 (2011)MATHMathSciNetCrossRef Chan, H.L., Lam, T.W., Sung, W.K., Tam, S.L., Wong, S.S.: A linear size index for approximate pattern matching. J. Discrete Algorithms 9(4), 358–364 (2011)MATHMathSciNetCrossRef
5.
go back to reference Chávez, E., Navarro, G.: A metric index for approximate string matching. In: Rajsbaum, S. (ed.) LATIN 2002. LNCS, vol. 2286, pp. 181–195. Springer, Heidelberg (2002) CrossRef Chávez, E., Navarro, G.: A metric index for approximate string matching. In: Rajsbaum, S. (ed.) LATIN 2002. LNCS, vol. 2286, pp. 181–195. Springer, Heidelberg (2002) CrossRef
6.
go back to reference Döring, A., Weese, D., Rausch, T., Reinert, K.: SeqAn an efficient, generic C++ library for sequence analysis. BMC Bioinformatics 9(1), 11–19 (2008)CrossRef Döring, A., Weese, D., Rausch, T., Reinert, K.: SeqAn an efficient, generic C++ library for sequence analysis. BMC Bioinformatics 9(1), 11–19 (2008)CrossRef
7.
go back to reference Ferragina, P., González, R., Navarro, G., Venturini, R.: Compressed text indexes: from theory to practice. J. Exp. Algorithmics (JEA) 13, 12 (2009) Ferragina, P., González, R., Navarro, G., Venturini, R.: Compressed text indexes: from theory to practice. J. Exp. Algorithmics (JEA) 13, 12 (2009)
9.
go back to reference Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences and full-text indexes. ACM Trans. Alg. (TALG) 3(2) (2007) Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences and full-text indexes. ACM Trans. Alg. (TALG) 3(2) (2007)
10.
go back to reference Hyyrö, H.: A bit-vector algorithm for computing levenshtein and damerau edit distances. Nord. J. Comput. 10(1), 29–39 (2003)MATH Hyyrö, H.: A bit-vector algorithm for computing levenshtein and damerau edit distances. Nord. J. Comput. 10(1), 29–39 (2003)MATH
11.
go back to reference Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with Bowtie 2. Nat. Meth. 9(4), 357–359 (2012)CrossRef Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with Bowtie 2. Nat. Meth. 9(4), 357–359 (2012)CrossRef
12.
go back to reference Li, H., Durbin, R.: Fast and accurate short read alignment with burrows-wheeler transform. bioinformatics 25(14), 1754–1760 (2009). (Oxford, England)CrossRef Li, H., Durbin, R.: Fast and accurate short read alignment with burrows-wheeler transform. bioinformatics 25(14), 1754–1760 (2009). (Oxford, England)CrossRef
14.
go back to reference Myers, G.: A fast bit-vector algorithm for approximate string matching based on dynamic programming. J. ACM 46(3), 395–415 (1999)MATHMathSciNetCrossRef Myers, G.: A fast bit-vector algorithm for approximate string matching based on dynamic programming. J. ACM 46(3), 395–415 (1999)MATHMathSciNetCrossRef
15.
go back to reference Navarro, G.: A guided tour to approximate string matching. ACM comput. surv. (CSUR) 33(1), 31–88 (2001)CrossRef Navarro, G.: A guided tour to approximate string matching. ACM comput. surv. (CSUR) 33(1), 31–88 (2001)CrossRef
16.
go back to reference Navarro, G., Baeza-Yates, R.: A hybrid indexing method for approximate string matching. J. Discrete Algorithms 1, 19–27 (2001) Navarro, G., Baeza-Yates, R.: A hybrid indexing method for approximate string matching. J. Discrete Algorithms 1, 19–27 (2001)
17.
go back to reference Navarro, G., Sutinen, E., Tanninen, J., Tarhio, J.: Indexing text with approximate q-grams. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 350–363. Springer, Heidelberg (2000) CrossRef Navarro, G., Sutinen, E., Tanninen, J., Tarhio, J.: Indexing text with approximate q-grams. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 350–363. Springer, Heidelberg (2000) CrossRef
18.
go back to reference Petri, M., Culpepper, J.S.: Efficient indexing algorithms for approximate pattern matching in text. In: Proceedings of the Seventeenth Australasian Document Computing Symposium, ADCS 2012, pp. 9–16. ACM, New York (2012) Petri, M., Culpepper, J.S.: Efficient indexing algorithms for approximate pattern matching in text. In: Proceedings of the Seventeenth Australasian Document Computing Symposium, ADCS 2012, pp. 9–16. ACM, New York (2012)
19.
go back to reference Russo, L., Navarro, G., Oliveira, A.L., Morales, P.: Approximate string matching with compressed indexes. Algorithms 2(3), 1105–1136 (2009)MathSciNetCrossRef Russo, L., Navarro, G., Oliveira, A.L., Morales, P.: Approximate string matching with compressed indexes. Algorithms 2(3), 1105–1136 (2009)MathSciNetCrossRef
20.
go back to reference Schbath, S., Martin, V., Zytnicki, M., Fayolle, J., Loux, V., Gibrat, J.F.: Mapping reads on a genomic sequence: an algorithmic overview and a practical comparative analysis. J. Comput. Biol. 19(6), 796–813 (2012)MathSciNetCrossRef Schbath, S., Martin, V., Zytnicki, M., Fayolle, J., Loux, V., Gibrat, J.F.: Mapping reads on a genomic sequence: an algorithmic overview and a practical comparative analysis. J. Comput. Biol. 19(6), 796–813 (2012)MathSciNetCrossRef
21.
go back to reference Schnattinger, T., Ohlebusch, E., Gog, S.: Bidirectional search in a string with wavelet trees. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 40–50. Springer, Heidelberg (2010) CrossRef Schnattinger, T., Ohlebusch, E., Gog, S.: Bidirectional search in a string with wavelet trees. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 40–50. Springer, Heidelberg (2010) CrossRef
22.
go back to reference Shah, S.A., Hansen, N.R., Garrett, R.A.: Distribution of CRISPR spacer matches in viruses and plasmids of crenarchaeal acidothermophiles and implications for their inhibitory mechanism. Biochem. Soc. Trans. 37(1), 23 (2009)CrossRef Shah, S.A., Hansen, N.R., Garrett, R.A.: Distribution of CRISPR spacer matches in viruses and plasmids of crenarchaeal acidothermophiles and implications for their inhibitory mechanism. Biochem. Soc. Trans. 37(1), 23 (2009)CrossRef
23.
go back to reference Slater, G.S.C., Birney, E.: Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6, 1–11 (2005)CrossRef Slater, G.S.C., Birney, E.: Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6, 1–11 (2005)CrossRef
24.
go back to reference Stern, A., Keren, L., Wurtzel, O., Amitai, G., Sorek, R.: Self-targeting by CRISPR: gene regulation or autoimmunity? Trends Genet. 26(8), 335–340 (2010)CrossRef Stern, A., Keren, L., Wurtzel, O., Amitai, G., Sorek, R.: Self-targeting by CRISPR: gene regulation or autoimmunity? Trends Genet. 26(8), 335–340 (2010)CrossRef
25.
go back to reference Storz, G., Altuvia, S., Wassarman, K.M.: An abundance of RNA regulators. Annu. Rev. Biochem. 74, 199–217 (2005)CrossRef Storz, G., Altuvia, S., Wassarman, K.M.: An abundance of RNA regulators. Annu. Rev. Biochem. 74, 199–217 (2005)CrossRef
26.
go back to reference Weese, D., Holtgrewe, M., Reinert, K.: RazerS 3: faster, fully sensitive read mapping. Bioinformatics 28(20), 2592–2599 (2012)CrossRef Weese, D., Holtgrewe, M., Reinert, K.: RazerS 3: faster, fully sensitive read mapping. Bioinformatics 28(20), 2592–2599 (2012)CrossRef
27.
go back to reference Wu, S., Manber, U.: Fast text searching: allowing errors. Commun. ACM 35(10), 83–91 (1992)CrossRef Wu, S., Manber, U.: Fast text searching: allowing errors. Commun. ACM 35(10), 83–91 (1992)CrossRef
Metadata
Title
Lossless Seeds for Searching Short Patterns with High Error Rates
Authors
Christophe Vroland
Mikaël Salson
Hélène Touzet
Copyright Year
2015
DOI
https://doi.org/10.1007/978-3-319-19315-1_32

Premium Partner