Skip to main content
Top
Published in: The Journal of Supercomputing 5/2018

20-11-2017

New algorithms for fixed-length approximate string matching and approximate circular string matching under the Hamming distance

Authors: ThienLuan Ho, Seung-Rohk Oh, HyunJin Kim

Published in: The Journal of Supercomputing | Issue 5/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper proposes new algorithms for fixed-length approximate string matching and approximate circular string matching under the Hamming distance. Fixed-length approximate string matching and approximate circular string matching are special cases of approximate string matching and have numerous direct applications in bioinformatics and text searching. Firstly, a counter-vector-mismatches (CVM) algorithm is proposed to solve fixed-length approximate string matching with k-mismatches. The development of CVM algorithm is based on the parallel summation of counters located in the same machine word. Secondly, a parallel counter-vector-mismatches (PCVM) algorithm is proposed to accelerate CVM algorithm in parallel. The PCVM algorithm is integrated into two-level parallelisms that exploit not only word-level parallelism but also data parallelism via parallel environments such as multi-core processors and graphics processing units (GPUs). In the particular case of adopting GPUs, a shared-mem parallel counter-vector-mismatches (PCVMsmem) scheme can be implemented from PCVM algorithm. The PCVMsmem scheme can exploit the memory model of GPUs to optimize performance of PCVM algorithm. Finally, this paper shows several methods to adopt CVM and PCVM algorithms in case the input pattern is in circular structure. In the experiments with real DNA packages, our proposed algorithms and scheme work greatly faster than previous bit-vector-mismatches and parallel bit-vector-mismatches algorithms.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv (CSUR) 33(1):31–88CrossRef Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv (CSUR) 33(1):31–88CrossRef
2.
go back to reference Kefu X, Cui W, Yue H, Guo L (2013) Bit-parallel multiple approximate string matching based on GPU. Proc Comput Sci 17:523–529CrossRef Kefu X, Cui W, Yue H, Guo L (2013) Bit-parallel multiple approximate string matching based on GPU. Proc Comput Sci 17:523–529CrossRef
3.
go back to reference Man D, Nakano K, Ito Y (2013) The approximate string matching on the hierarchical memory machine, with performance evaluation. In: Proceedings of the 7th IEEE international symposium embedded multicore socs (MCSoC). IEEE, pp 79–84 Man D, Nakano K, Ito Y (2013) The approximate string matching on the hierarchical memory machine, with performance evaluation. In: Proceedings of the 7th IEEE international symposium embedded multicore socs (MCSoC). IEEE, pp 79–84
4.
go back to reference Michailidis PD, Margaritis KG (2005) A programmable array processor architecture for flexible approximate string matching algorithms. In: 2005 International Conference on Parallel Processing Workshops (ICPPW’05). IEEE, pp 201–209 Michailidis PD, Margaritis KG (2005) A programmable array processor architecture for flexible approximate string matching algorithms. In: 2005 International Conference on Parallel Processing Workshops (ICPPW’05). IEEE, pp 201–209
5.
go back to reference Guo Longjiang, Du Shufang, Ren Meirui, Liu Yu, Li Jinbao, He Jing, Tian Ning, Li Keqin (2013) Parallel algorithm for approximate string matching with k-differences. In: Proceedings of the 8th IEEE International Conference Networking, Architecture and Storage (NAS). IEEE, pp 257–261 Guo Longjiang, Du Shufang, Ren Meirui, Liu Yu, Li Jinbao, He Jing, Tian Ning, Li Keqin (2013) Parallel algorithm for approximate string matching with k-differences. In: Proceedings of the 8th IEEE International Conference Networking, Architecture and Storage (NAS). IEEE, pp 257–261
6.
go back to reference Hyyrö H (2003) A bit-vector algorithm for computing Levenshtein and Damerau edit distances. Nord. J. Comput. 10(1):29–39MathSciNetMATH Hyyrö H (2003) A bit-vector algorithm for computing Levenshtein and Damerau edit distances. Nord. J. Comput. 10(1):29–39MathSciNetMATH
7.
go back to reference Ho TL, Seung-Rohk O, Kim HJ (2017) A parallel approximate string matching under Levenshtein distance on graphics processing units using warp-shuffle operations. PLoS ONE 12(10):e0186251CrossRef Ho TL, Seung-Rohk O, Kim HJ (2017) A parallel approximate string matching under Levenshtein distance on graphics processing units using warp-shuffle operations. PLoS ONE 12(10):e0186251CrossRef
8.
go back to reference Amir A, Lewenstein M, Porat E (2004) Faster algorithms for string matching with \(k\)-mismatches. Journal of Algorithms 50(2):257–275MathSciNetCrossRefMATH Amir A, Lewenstein M, Porat E (2004) Faster algorithms for string matching with \(k\)-mismatches. Journal of Algorithms 50(2):257–275MathSciNetCrossRefMATH
9.
go back to reference Barton C, Iliopoulos CS, Pissis SP (2014) Fast algorithms for approximate circular string matching. Algorithms Mol Biol 9(1):9CrossRef Barton C, Iliopoulos CS, Pissis SP (2014) Fast algorithms for approximate circular string matching. Algorithms Mol Biol 9(1):9CrossRef
10.
go back to reference Liu Y, Guo L, Li J, Ren M, Li K (2012) Parallel algorithms for approximate string matching with \(k\)-mismatches on CUDA. In: Proceedings of the 26th IEEE International Conference on Parallel and Distributed Processing Symposium Workshops & Ph.D. forum (IPDPSW). IEEE, pp 2414–2422 Liu Y, Guo L, Li J, Ren M, Li K (2012) Parallel algorithms for approximate string matching with \(k\)-mismatches on CUDA. In: Proceedings of the 26th IEEE International Conference on Parallel and Distributed Processing Symposium Workshops & Ph.D. forum (IPDPSW). IEEE, pp 2414–2422
11.
go back to reference Ho TL, Seung-Rohk O, Kim HJ (2016) Circular bit-vector-mismatches: a new approximate circular string matching with \(k\)-mismatches. IEICE Trans Fundam Electron Commun Comput Sci 99:1726–1729CrossRef Ho TL, Seung-Rohk O, Kim HJ (2016) Circular bit-vector-mismatches: a new approximate circular string matching with \(k\)-mismatches. IEICE Trans Fundam Electron Commun Comput Sci 99:1726–1729CrossRef
12.
go back to reference Iliopoulos CS, Mouchard L, Pinzon YJ (2001) The Max-Shift algorithm for approximate string matching. In: Brodal GS, Frigioni D, Marchetti-Spaccamela A (eds) Algorithm engineering. Springer, Berlin, Heidelberg, pp 13–25 Iliopoulos CS, Mouchard L, Pinzon YJ (2001) The Max-Shift algorithm for approximate string matching. In: Brodal GS, Frigioni D, Marchetti-Spaccamela A (eds) Algorithm engineering. Springer, Berlin, Heidelberg, pp 13–25
14.
go back to reference Chapman B et al (2010) A parallel algorithm for the fixed-length approximate string matching problem for high throughput sequencing technologies. Parallel Comput From Multicores GPU’s Petascale 19:150 Chapman B et al (2010) A parallel algorithm for the fixed-length approximate string matching problem for high throughput sequencing technologies. Parallel Comput From Multicores GPU’s Petascale 19:150
15.
go back to reference Crochemore M, Iliopoulos CS, Pissis SP (2010) A parallel algorithm for fixed-length approximate string-matching with \(k\)-mismatches. In: Elomaa T, Mannila H, Orponen P (eds) Algorithms and applications. Springer, Berlin, Heidelberg, pp 92–101 Crochemore M, Iliopoulos CS, Pissis SP (2010) A parallel algorithm for fixed-length approximate string-matching with \(k\)-mismatches. In: Elomaa T, Mannila H, Orponen P (eds) Algorithms and applications. Springer, Berlin, Heidelberg, pp 92–101
16.
go back to reference Pissis S, Retha A (2015) Generalised implementation for fixed-length approximate string matching under Hamming distance and applications. In: Proceedings of IEEE international workshop parallel distributed processing symposium (IPDPSW). IEEE, pp 367–374 Pissis S, Retha A (2015) Generalised implementation for fixed-length approximate string matching under Hamming distance and applications. In: Proceedings of IEEE international workshop parallel distributed processing symposium (IPDPSW). IEEE, pp 367–374
17.
go back to reference Barton C, Iliopoulos CS, Kundu R, Pissis SP, Retha A, Vayani F (2015) Accurate and efficient methods to improve multiple circular sequence alignment. In: Bampis E (ed) Experimental algorithms. Springer, Cham, Switzerland, pp 247–258 Barton C, Iliopoulos CS, Kundu R, Pissis SP, Retha A, Vayani F (2015) Accurate and efficient methods to improve multiple circular sequence alignment. In: Bampis E (ed) Experimental algorithms. Springer, Cham, Switzerland, pp 247–258
18.
go back to reference Pissis SP, Stamatakis A, Pavlidis P(2013) MoTeX: a word-based HPC tool for MoTif eXtraction. In: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics, Computational Biology and Biomedical Informatics. ACM, pp 13 Pissis SP, Stamatakis A, Pavlidis P(2013) MoTeX: a word-based HPC tool for MoTif eXtraction. In: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics, Computational Biology and Biomedical Informatics. ACM, pp 13
19.
go back to reference Pissis SP (2014) MoTeX-II: structured MoTif eXtraction from large-scale datasets. BMC Bioinform 15(1):235CrossRef Pissis SP (2014) MoTeX-II: structured MoTif eXtraction from large-scale datasets. BMC Bioinform 15(1):235CrossRef
24.
go back to reference Baeza-Yates R, Gonnet GH (1992) A new approach to text searching. Commun ACM 35(10):74–82CrossRef Baeza-Yates R, Gonnet GH (1992) A new approach to text searching. Commun ACM 35(10):74–82CrossRef
25.
go back to reference Grabowski S, Fredriksson K (2008) Bit-parallel string matching under Hamming distance in O(n[m/w]) worst case time. Inf Process Lett 105(5):182–187MathSciNetCrossRefMATH Grabowski S, Fredriksson K (2008) Bit-parallel string matching under Hamming distance in O(n[m/w]) worst case time. Inf Process Lett 105(5):182–187MathSciNetCrossRefMATH
26.
go back to reference Lin CH, Wang GH, Huang CC (2014) Hierarchical parallelism of bit-parallel algorithm for approximate string matching on GPUs. In: Proceedings of IEEE symposium on computer applications and communications (SCAC). IEEE, pp 76–81 Lin CH, Wang GH, Huang CC (2014) Hierarchical parallelism of bit-parallel algorithm for approximate string matching on GPUs. In: Proceedings of IEEE symposium on computer applications and communications (SCAC). IEEE, pp 76–81
27.
go back to reference Ho TL, Seung-Rohk O, Kim HJ (2016) PAC-k: a parallel Aho–Corasick string matching approach on graphic processing units using non-overlapped threads. IEICE Trans Commun 99(7):1523–1531CrossRef Ho TL, Seung-Rohk O, Kim HJ (2016) PAC-k: a parallel Aho–Corasick string matching approach on graphic processing units using non-overlapped threads. IEICE Trans Commun 99(7):1523–1531CrossRef
29.
go back to reference Fang J, Varbanescu AL, Sips H (2011) A comprehensive performance comparison of CUDA and OpenCL. In: 2011 International Conference on Parallel Processing (ICPP). IEEE, pp 216–225 Fang J, Varbanescu AL, Sips H (2011) A comprehensive performance comparison of CUDA and OpenCL. In: 2011 International Conference on Parallel Processing (ICPP). IEEE, pp 216–225
Metadata
Title
New algorithms for fixed-length approximate string matching and approximate circular string matching under the Hamming distance
Authors
ThienLuan Ho
Seung-Rohk Oh
HyunJin Kim
Publication date
20-11-2017
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 5/2018
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-017-2192-6

Other articles of this Issue 5/2018

The Journal of Supercomputing 5/2018 Go to the issue

Premium Partner