Skip to main content
Top
Published in: Annals of Data Science 2/2018

29-07-2017

Parallel String Matching with Linear Array, Butterfly and Divide and Conquer Models

Authors: S. Viswanadha Raju, K. K. V. V. S. Reddy, Chinta Someswara Rao

Published in: Annals of Data Science | Issue 2/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

String Matching is a technique of searching a pattern in a text. It is the basic concept to extract the fruitful information from large volume of text, which is used in different applications like text processing, information retrieval, text mining, pattern recognition, DNA sequencing and data cleaning etc., . Though it is stated some of the simple mechanisms perform very well in practice, plenty of research has been published on the subject and research is still active in this area and there are ample opportunities to develop new techniques. For this purpose, this paper has proposed linear array based string matching, string matching with butterfly model and string matching with divide and conquer models for sequential and parallel environments. To assess the efficiency of the proposed models, the genome sequences of different sizes (10–100 Mb) are taken as input data set. The experimental results have shown that the proposed string matching algorithms performs very well compared to those of Brute force, KMP and Boyer moore string matching algorithms.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Fredriksson K, Grabowski S (2009) Average-optimal string matching. J Discrete Algorithms 7(4):579–594CrossRef Fredriksson K, Grabowski S (2009) Average-optimal string matching. J Discrete Algorithms 7(4):579–594CrossRef
2.
go back to reference Ilie L, Navarro G, Tinta L (2010) The longest common extension problem, revisited and applications to approximate string searching. J Discrete Algorithms 84:418–428CrossRef Ilie L, Navarro G, Tinta L (2010) The longest common extension problem, revisited and applications to approximate string searching. J Discrete Algorithms 84:418–428CrossRef
3.
go back to reference Fredriksson K, Grabowski S (2009) Average-optimal string matching. J Discrete Algorithms 7:579–594CrossRef Fredriksson K, Grabowski S (2009) Average-optimal string matching. J Discrete Algorithms 7:579–594CrossRef
4.
go back to reference Rao CS, Raju SV (2016) A novel multi pattern string matching algorithm with while shift. In: Second international conference on information and communication technology for competitive strategies, ACM, pp 50–55 Rao CS, Raju SV (2016) A novel multi pattern string matching algorithm with while shift. In: Second international conference on information and communication technology for competitive strategies, ACM, pp 50–55
5.
go back to reference Crochemore M, Rytter W (1994) Text algorithms. Oxford University Press, Inc., New York, NY Crochemore M, Rytter W (1994) Text algorithms. Oxford University Press, Inc., New York, NY
6.
go back to reference Boyer RS, Moore JS (1977) A fast string searching algorithm. Commun ACM 20(10):762–772 Boyer RS, Moore JS (1977) A fast string searching algorithm. Commun ACM 20(10):762–772
7.
go back to reference Knuth DE, Morris JH Jr, Pratt VR (1977) Fast pattern matching in strings. SIAM J Comput 6:323–350CrossRef Knuth DE, Morris JH Jr, Pratt VR (1977) Fast pattern matching in strings. SIAM J Comput 6:323–350CrossRef
8.
go back to reference Rasool A, Khare N (2012) Parallelization of KMP string matching algorithm on different SIMD architectures-multi-core and GPGPU’s. Int J Comput Appl 49(11):26–28 Rasool A, Khare N (2012) Parallelization of KMP string matching algorithm on different SIMD architectures-multi-core and GPGPU’s. Int J Comput Appl 49(11):26–28
9.
go back to reference Zhong C, Chen GL (2007) A fast determinate string matching algorithm for the network intrusion detection systems. In: Proceedings of the sixth international conference on machine learning and cybernetics, Hong Kong, pp 3173–3177 Zhong C, Chen GL (2007) A fast determinate string matching algorithm for the network intrusion detection systems. In: Proceedings of the sixth international conference on machine learning and cybernetics, Hong Kong, pp 3173–3177
10.
go back to reference Cao P, Wu S (2011) Parallel research on KMP algorithm. In: 2011 international conference on consumer electronics, communications and networks (CECNet), IEEE, pp 4252–4255 Cao P, Wu S (2011) Parallel research on KMP algorithm. In: 2011 international conference on consumer electronics, communications and networks (CECNet), IEEE, pp 4252–4255
11.
go back to reference Cheng HD, Fu KS (1987) VLSI architecture for string matching and pattern matching. Pattern Recogn 20:125–141CrossRef Cheng HD, Fu KS (1987) VLSI architecture for string matching and pattern matching. Pattern Recogn 20:125–141CrossRef
12.
go back to reference Foster MJ, Kung HT (1980) The design of special-purpose VLSI chips. Computer 1:26–38CrossRef Foster MJ, Kung HT (1980) The design of special-purpose VLSI chips. Computer 1:26–38CrossRef
13.
go back to reference de Reis CCT, Cruz O (2005) Approximate string matching algorithm using parallel methods for molecular sequence camparisons. In: IEEE, pp 140–143 de Reis CCT, Cruz O (2005) Approximate string matching algorithm using parallel methods for molecular sequence camparisons. In: IEEE, pp 140–143
14.
go back to reference Zubair M et al (2010) Text scanning approach for exact string matching. In: International conference on networking and information technology, pp 118–121 Zubair M et al (2010) Text scanning approach for exact string matching. In: International conference on networking and information technology, pp 118–121
15.
go back to reference Oh Y, Oh D, Ro WW (2013) GPU-friendly parallel genome matching with tiled access and reduced state transition table. Springer, Berlin Oh Y, Oh D, Ro WW (2013) GPU-friendly parallel genome matching with tiled access and reduced state transition table. Springer, Berlin
16.
go back to reference Tomohiro I, Inenaga S, Takeda M (2013) Palindrome pattern matching. Theor Comput Sci 483:162–170CrossRef Tomohiro I, Inenaga S, Takeda M (2013) Palindrome pattern matching. Theor Comput Sci 483:162–170CrossRef
17.
go back to reference Faro S (2016) Evaluation and improvement of fast algorithms for exact matching on genome sequences. ininternational conference on algorithms for computational biology 2016 Jun 21. Springer, Berlin Faro S (2016) Evaluation and improvement of fast algorithms for exact matching on genome sequences. ininternational conference on algorithms for computational biology 2016 Jun 21. Springer, Berlin
18.
go back to reference Memeti S, Pllana S (2016) A machine learning approach for accelerating DNA sequence analysis. Int J High Perform Comput Appl 26:1094342016654214 Memeti S, Pllana S (2016) A machine learning approach for accelerating DNA sequence analysis. Int J High Perform Comput Appl 26:1094342016654214
19.
go back to reference Rao CS, Raju SV (2016) Next generation sequencing (NGS) database for tandem repeats with multiple pattern \(2^{\circ }\)-shaft multicore string matching. Genomics Data 7:307–317CrossRef Rao CS, Raju SV (2016) Next generation sequencing (NGS) database for tandem repeats with multiple pattern \(2^{\circ }\)-shaft multicore string matching. Genomics Data 7:307–317CrossRef
20.
go back to reference Rao CS, Raju SV (2016) Similarity analysis between chromosomes of Homo sapiens and monkeys with correlation coefficient, rank correlation coefficient and cosine similarity measures. Genomics Data 7:202–209CrossRef Rao CS, Raju SV (2016) Similarity analysis between chromosomes of Homo sapiens and monkeys with correlation coefficient, rank correlation coefficient and cosine similarity measures. Genomics Data 7:202–209CrossRef
21.
go back to reference Montanari P, Bartolini I, Ciaccia P, Patella M, Ceri S, Masseroli M (2016) Pattern similarity search in genomic sequences. IEEE Trans Knowl Data Eng 28(11):3053–67CrossRef Montanari P, Bartolini I, Ciaccia P, Patella M, Ceri S, Masseroli M (2016) Pattern similarity search in genomic sequences. IEEE Trans Knowl Data Eng 28(11):3053–67CrossRef
22.
go back to reference Rao CS, Raju SV (2016) Concurrent information retrieval system (IRS) for large volume of data with multiple pattern multiple (2\(^N)\) shaft parallel string matching. Ann Data Sci 3:175–203CrossRef Rao CS, Raju SV (2016) Concurrent information retrieval system (IRS) for large volume of data with multiple pattern multiple (2\(^N)\) shaft parallel string matching. Ann Data Sci 3:175–203CrossRef
23.
go back to reference Smith LM, Sanders JZ, Kaiser RJ, Hughes P, Dodd C, Connell CR, Heiner C, Kent SB, Hood LE (1986) Fluorescence detection in automated DNA sequence analysis. Nature 321:674–679CrossRef Smith LM, Sanders JZ, Kaiser RJ, Hughes P, Dodd C, Connell CR, Heiner C, Kent SB, Hood LE (1986) Fluorescence detection in automated DNA sequence analysis. Nature 321:674–679CrossRef
24.
go back to reference Schena M, Shalon D, Davis RW, Brown PO (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 5235:467–470CrossRef Schena M, Shalon D, Davis RW, Brown PO (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 5235:467–470CrossRef
25.
go back to reference Abouelhoda MI, Kurtz S, Ohlebusch E (2002) The enhanced suffix array and its applications to genome analysis. In: International workshop on algorithms in bioinformatics. Springer, Berlin, pp 449–463 Abouelhoda MI, Kurtz S, Ohlebusch E (2002) The enhanced suffix array and its applications to genome analysis. In: International workshop on algorithms in bioinformatics. Springer, Berlin, pp 449–463
26.
go back to reference Someswararao C, Raju KB, Appaji SV, Raju SV, Reddy KK (2011) Recent advancements in parallel algorithms for string matching on computing models–a survey and experimental results. In: International conference on advanced computing, networking and security, Springer, Berlin, pp 270–278 Someswararao C, Raju KB, Appaji SV, Raju SV, Reddy KK (2011) Recent advancements in parallel algorithms for string matching on computing models–a survey and experimental results. In: International conference on advanced computing, networking and security, Springer, Berlin, pp 270–278
Metadata
Title
Parallel String Matching with Linear Array, Butterfly and Divide and Conquer Models
Authors
S. Viswanadha Raju
K. K. V. V. S. Reddy
Chinta Someswara Rao
Publication date
29-07-2017
Publisher
Springer Berlin Heidelberg
Published in
Annals of Data Science / Issue 2/2018
Print ISSN: 2198-5804
Electronic ISSN: 2198-5812
DOI
https://doi.org/10.1007/s40745-017-0124-1

Other articles of this Issue 2/2018

Annals of Data Science 2/2018 Go to the issue