Skip to main content
Erschienen in: Annals of Data Science 4/2016

30.08.2016

N-Folded Parallel String Matching Mechanism

verfasst von: Butchi Raju Katari, S. Viswanadha Raju

Erschienen in: Annals of Data Science | Ausgabe 4/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

A massive requirement of information vitalized the importance of managing enormous amount of data. It becomes a herculean task to fetch the anticipated data from large data storage as it includes text processing, text mining, pattern recognition, data cleaning etc., The need for concurrent events and coming up with high performance processing models to extract data is a challenge to the researchers. One of the solutions to this challenge is concurrent process to match string on processing models. While, some of the mechanisms do perform very well in practice. Frequent works have been published on this subject and research is still active in this area as the scope and opportunities to develop the new techniques is perennial. This paper proposes N-folded parallel string matching mechanism. This mechanism would be able to divide the input sequence files into various parts and the same would be distributed to the processors. Considering this mechanism as a model, experiments have been conducted considering chloroplast, mitochondria and different categories of plants genome sequence file as input for different sizes with seven possible patterns. The results of the experiment made evident that N-folded parallel string matching mechanism can reduce the processing time on a multi processor system.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Crochemore M, Rytter W, Crochemore M (1994) Text algorithms, vol 698. Oxford University Press, New York Crochemore M, Rytter W, Crochemore M (1994) Text algorithms, vol 698. Oxford University Press, New York
2.
Zurück zum Zitat Simon Y, Inayatullah M (2004) Improving approximate matching capabilities for meta map transfer applications. In: Proceedings of symposium on principles and practice of programming in java, pp 143–147 Simon Y, Inayatullah M (2004) Improving approximate matching capabilities for meta map transfer applications. In: Proceedings of symposium on principles and practice of programming in java, pp 143–147
3.
Zurück zum Zitat Fredriksson K, Grabowski S (2009) Average-optimal string matching. J Discret Algorithms 7(4):579–594CrossRef Fredriksson K, Grabowski S (2009) Average-optimal string matching. J Discret Algorithms 7(4):579–594CrossRef
4.
Zurück zum Zitat Luis Russo L, Navarro G, Oliveira A, Morales P (2009) Approximate string matching with compressed indexes algorithm. Algorithms 2(3):1105–1136CrossRef Luis Russo L, Navarro G, Oliveira A, Morales P (2009) Approximate string matching with compressed indexes algorithm. Algorithms 2(3):1105–1136CrossRef
5.
Zurück zum Zitat Ilie L, Navarro G, Tinta L (2010) The longest common extension problem, revisited and applications to approximate string searching. J Discret Algorithms 8(4):418–428CrossRef Ilie L, Navarro G, Tinta L (2010) The longest common extension problem, revisited and applications to approximate string searching. J Discret Algorithms 8(4):418–428CrossRef
6.
Zurück zum Zitat Viswanadha Raju S et al. (2011) Recent advancement is parallel algorithms for string matching on computing models—a survey and experimental results. LNCS, Springer, pp 270–278, ISBN: 978-3-642-29279-8 Viswanadha Raju S et al. (2011) Recent advancement is parallel algorithms for string matching on computing models—a survey and experimental results. LNCS, Springer, pp 270–278, ISBN: 978-3-642-29279-8
7.
Zurück zum Zitat Viswanadha Raju S et al. (2011) PDM data classification from STEP–an object oriented String matching approach. In: IEEE conference on application of information and communication technologies, pp 1–9. ISBN: 978-1-61284-831-0 Viswanadha Raju S et al. (2011) PDM data classification from STEP–an object oriented String matching approach. In: IEEE conference on application of information and communication technologies, pp 1–9. ISBN: 978-1-61284-831-0
8.
Zurück zum Zitat Boyer RS, Moore JS (1977) A fast string searching algorithm, Carom. Assoc Comput Mach 20(10):262–277 Boyer RS, Moore JS (1977) A fast string searching algorithm, Carom. Assoc Comput Mach 20(10):262–277
9.
Zurück zum Zitat Knuth D, Morris JH Jr, Pratt V (1977) Fast pattern matching in strings. SIAM J Comput 6(2):323–350CrossRef Knuth D, Morris JH Jr, Pratt V (1977) Fast pattern matching in strings. SIAM J Comput 6(2):323–350CrossRef
14.
Zurück zum Zitat Sunday DM (1990) A very fast substring search algorithm. Commun ACM 33(8):132–142CrossRef Sunday DM (1990) A very fast substring search algorithm. Commun ACM 33(8):132–142CrossRef
15.
Zurück zum Zitat Charras C, Lecrog T, Pehoushek JD (1998) A very fast string matching algorithm for small alphabets and long patterns. In: Combinatorial pattern matching. Springer, Berlin, pp 55–64 Charras C, Lecrog T, Pehoushek JD (1998) A very fast string matching algorithm for small alphabets and long patterns. In: Combinatorial pattern matching. Springer, Berlin, pp 55–64
16.
Zurück zum Zitat Rasool A, Khare N (2013) Performance improvement of BMH and BMHS using PDJ (possible double jump) and MValue (match value). Int J Comput Appl 72(1):1–6 Rasool A, Khare N (2013) Performance improvement of BMH and BMHS using PDJ (possible double jump) and MValue (match value). Int J Comput Appl 72(1):1–6
17.
Zurück zum Zitat Kim HJ, Lee S-W (2013) A hardware-based string matching using state transition compression for deep packet inspection. ETRI J 35(1):154–157CrossRef Kim HJ, Lee S-W (2013) A hardware-based string matching using state transition compression for deep packet inspection. ETRI J 35(1):154–157CrossRef
18.
Zurück zum Zitat Dharmapurikar S, Krishnamurthy P, Sproull TS, Lockwood JW (2004) Deep packet inspection using parallel bloom filters. IEEE Micro 24(1):52–61CrossRef Dharmapurikar S, Krishnamurthy P, Sproull TS, Lockwood JW (2004) Deep packet inspection using parallel bloom filters. IEEE Micro 24(1):52–61CrossRef
19.
Zurück zum Zitat Al-Mamory SO, Hamid A, Abdul-Razak A, Falah Z (2010) String matching enhancement for snort IDS. In: 2010 5th international conference on computer sciences and convergence information technology (ICCIT), pp 1020–1023 Al-Mamory SO, Hamid A, Abdul-Razak A, Falah Z (2010) String matching enhancement for snort IDS. In: 2010 5th international conference on computer sciences and convergence information technology (ICCIT), pp 1020–1023
20.
Zurück zum Zitat Kun B, Nai-jie G, Kun T, Xiao-hu L, Gang L (2005) A practical distributed string matching algorithm architecture and implementation. World Acad Sci Eng Technol 10:1307–6884 Kun B, Nai-jie G, Kun T, Xiao-hu L, Gang L (2005) A practical distributed string matching algorithm architecture and implementation. World Acad Sci Eng Technol 10:1307–6884
21.
Zurück zum Zitat Park JH, George KM (1999) Parallel string matching algorithms based on dataflow. In: IEEE Hawaii international conference on system sciences Park JH, George KM (1999) Parallel string matching algorithms based on dataflow. In: IEEE Hawaii international conference on system sciences
22.
Zurück zum Zitat Kawulok J (2013) Approximate string matching for searching DNA sequences. Int J Biosci Biochem Bioinform 3(2):145–148 Kawulok J (2013) Approximate string matching for searching DNA sequences. Int J Biosci Biochem Bioinform 3(2):145–148
23.
Zurück zum Zitat dos Reis CCT, Cruz O (2005) “Approximate String Matching Algorithm Using Parallel Methods for Molecular Sequence Camparisons. In IEEE data compression conference, pp 140–143 dos Reis CCT, Cruz O (2005) “Approximate String Matching Algorithm Using Parallel Methods for Molecular Sequence Camparisons. In IEEE data compression conference, pp 140–143
24.
Zurück zum Zitat Zubair M et al. (2010) Text scanning approach for exact string matching. In: International conference on networking and information technology, pp 118–121 Zubair M et al. (2010) Text scanning approach for exact string matching. In: International conference on networking and information technology, pp 118–121
25.
Zurück zum Zitat Tomohiro I, Inenaga S, Takeda M (2013) Palindrome pattern matching. Theor Comput Sci 483:162–170CrossRef Tomohiro I, Inenaga S, Takeda M (2013) Palindrome pattern matching. Theor Comput Sci 483:162–170CrossRef
26.
Zurück zum Zitat Oh Y, Oh D, Ro WW (2013) GPU-friendly parallel genome matching with tiled access and reduced state transition table. Int J Parallel Prog 41(4):526–551CrossRef Oh Y, Oh D, Ro WW (2013) GPU-friendly parallel genome matching with tiled access and reduced state transition table. Int J Parallel Prog 41(4):526–551CrossRef
27.
Zurück zum Zitat Nolte J, Horton P (2001) Parallel sequence matching with TACO’s distributed object groups—a case study from molecular biology. Clust Comput 4(1):71–77CrossRef Nolte J, Horton P (2001) Parallel sequence matching with TACO’s distributed object groups—a case study from molecular biology. Clust Comput 4(1):71–77CrossRef
28.
Zurück zum Zitat Tseng K-K, Lin Y-D, Lee T-H, Lai Y-C (2005) A parallel automaton string matching with pre-hashing and root-indexing techniques for content filtering coprocessor. In: IEEE international conference on application-specific systems, pp 113–118 Tseng K-K, Lin Y-D, Lee T-H, Lai Y-C (2005) A parallel automaton string matching with pre-hashing and root-indexing techniques for content filtering coprocessor. In: IEEE international conference on application-specific systems, pp 113–118
29.
Zurück zum Zitat Dandass YS, Burgess SC, Lawrence M, Bridges SM (2008) Accelerating string set matching in FPGA hardware for bioinformatics research. BMC Bioinform 9(1):1–11CrossRef Dandass YS, Burgess SC, Lawrence M, Bridges SM (2008) Accelerating string set matching in FPGA hardware for bioinformatics research. BMC Bioinform 9(1):1–11CrossRef
30.
Zurück zum Zitat Hart SN, Sarangi V, Moore R, Baheti S, Bhavsar JD, Couch FJ, Kocher JP (2013) SoftSearch: integration of multiple sequence features to identify breakpoints of structural variations. PLoS One 8(12):e83356CrossRef Hart SN, Sarangi V, Moore R, Baheti S, Bhavsar JD, Couch FJ, Kocher JP (2013) SoftSearch: integration of multiple sequence features to identify breakpoints of structural variations. PLoS One 8(12):e83356CrossRef
31.
Zurück zum Zitat Ounit R, Wanamaker S, Close TJ, Lonardi S (2015) CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genom 16(1):1CrossRef Ounit R, Wanamaker S, Close TJ, Lonardi S (2015) CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genom 16(1):1CrossRef
32.
Zurück zum Zitat Alam KK, Chang JL, Burke DH (2015) FASTaptamer: a bioinformatic toolkit for high-throughput sequence analysis of combinatorial selections. Mol Therapy 4(3):e230 Alam KK, Chang JL, Burke DH (2015) FASTaptamer: a bioinformatic toolkit for high-throughput sequence analysis of combinatorial selections. Mol Therapy 4(3):e230
33.
Zurück zum Zitat Yang X, Liu D, Lv N, Zhao F, Liu F, Zou J, Chen Y, Xiao X, Wu J, Liu P, Gao J (2015) TCRklass: a new k-string–based algorithm for human and mouse TCR repertoire characterization. J Immunol 194(1):446–454CrossRef Yang X, Liu D, Lv N, Zhao F, Liu F, Zou J, Chen Y, Xiao X, Wu J, Liu P, Gao J (2015) TCRklass: a new k-string–based algorithm for human and mouse TCR repertoire characterization. J Immunol 194(1):446–454CrossRef
34.
Zurück zum Zitat Rao CS, Raju SV (2016) Next generation sequencing (NGS) database for tandem repeats with multiple pattern \(2^{\circ }\)-shaft multicore string matching. Genom Data 7:307–317CrossRef Rao CS, Raju SV (2016) Next generation sequencing (NGS) database for tandem repeats with multiple pattern \(2^{\circ }\)-shaft multicore string matching. Genom Data 7:307–317CrossRef
35.
Zurück zum Zitat Sun D, Wang X (2016) A Q-gram filter for local alignment in large genomic database. Int J Hybrid Inf Technol 9(1):221–232CrossRef Sun D, Wang X (2016) A Q-gram filter for local alignment in large genomic database. Int J Hybrid Inf Technol 9(1):221–232CrossRef
Metadaten
Titel
N-Folded Parallel String Matching Mechanism
verfasst von
Butchi Raju Katari
S. Viswanadha Raju
Publikationsdatum
30.08.2016
Verlag
Springer Berlin Heidelberg
Erschienen in
Annals of Data Science / Ausgabe 4/2016
Print ISSN: 2198-5804
Elektronische ISSN: 2198-5812
DOI
https://doi.org/10.1007/s40745-016-0086-8

Weitere Artikel der Ausgabe 4/2016

Annals of Data Science 4/2016 Zur Ausgabe

Premium Partner