Skip to main content
Top

2016 | OriginalPaper | Chapter

Improving Cytogenetic Search with GPUs Using Different String Matching Schemes

Authors : Chantana Chantrapornchai, Chidchanok Choksuchat

Published in: Advanced Data Mining and Applications

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Cytogenetic data involves analysis of chromosomes structure using karyotyping. Current cytogenetic data of patients in a hospital are very large. A physician needs to search and analyses these data for typical aberration. This paper presents the approach to speedup large cytogenetic data search with GPUs. It utilizes the parallel threads in GPUs which concurrently look for a typical string. The two search schemes are parallelized and their performances are compared. Different search scheme can be parallelized in the same manner. The experimental results show that the speedup up to 15 times can be achieved compared to the sequential version even for large number of strings searched. With the help of shared memory, parallel string search can be improved further by 8 %. However, the shared memory has the limited size which cannot hold large number of strings. The percentage of data transfer time can be reduced if more strings are searched; i.e. more work load per thread. Using a more optimized string matching scheme leads to lower speedup due to the overhead of precomputing of state tables and occupies more GPU memory due to these tables. Thus, due to the nature of GPUs that have many concurrent threads, separate and smaller memory, the simple algorithm for a thread may be good enough. The optimization may be focused on the GPU-related issues such as memory coalesce, thread divergence etc. to improve the speedup further. We also present the application of the GPU string search for finding typical aberrations, and extracting the relevant data from patients’ record for further analysis.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Ficara, D., Antichi, G.: Scaling regular expression matching performance in parallel systems through sampling techniques. In: Proceedings of IEEE Global Telecommunications Conference (GLOBECOM 2011) (2011) Ficara, D., Antichi, G.: Scaling regular expression matching performance in parallel systems through sampling techniques. In: Proceedings of IEEE Global Telecommunications Conference (GLOBECOM 2011) (2011)
3.
go back to reference Mytkowicz, T., Musuvathi, M., Schulte, W.: Data-parallel finite-state machines. SIGPLAN Not. 49(4), 529–542 (2014) Mytkowicz, T., Musuvathi, M., Schulte, W.: Data-parallel finite-state machines. SIGPLAN Not. 49(4), 529–542 (2014)
4.
go back to reference Luchaup, D., Smith, R., et al.: Speculative parallel pattern matching. Trans. Info. For. Sec 6(2), 438–451 (2011)CrossRef Luchaup, D., Smith, R., et al.: Speculative parallel pattern matching. Trans. Info. For. Sec 6(2), 438–451 (2011)CrossRef
5.
go back to reference Daz-Uriarte, R., Rueda, O.M.: ADaCGH: a parallelized web-based application and R package for the analysis of aCGH data. PloS one 2(8) e737, 438–451 (2007) Daz-Uriarte, R., Rueda, O.M.: ADaCGH: a parallelized web-based application and R package for the analysis of aCGH data. PloS one 2(8) e737, 438–451 (2007)
6.
go back to reference Olshen, A., et al.: Circular binary segmentation for the analysis of array? Based DNA copy number data. Biostatistics 5(4), 557–572 (2004)CrossRefMATH Olshen, A., et al.: Circular binary segmentation for the analysis of array? Based DNA copy number data. Biostatistics 5(4), 557–572 (2004)CrossRefMATH
7.
go back to reference Hup, P., et al.: Analysis of array CGH data: from signal ratio to gain and loss of DNA regions. Bioinformatics 20(18), 3413–3422 (2004)CrossRef Hup, P., et al.: Analysis of array CGH data: from signal ratio to gain and loss of DNA regions. Bioinformatics 20(18), 3413–3422 (2004)CrossRef
8.
go back to reference Picard, F., et al.: A statistical approach for array CGH data analysis. BMC Bioinformatics 6(1), 27 (2005)CrossRef Picard, F., et al.: A statistical approach for array CGH data analysis. BMC Bioinformatics 6(1), 27 (2005)CrossRef
9.
go back to reference Fridlyand, J., et al.: Hidden Markov models approach to the analysis of array CGH data. J. Multivariate Anal. 90(1), 132–153 (2004)MathSciNetCrossRefMATH Fridlyand, J., et al.: Hidden Markov models approach to the analysis of array CGH data. J. Multivariate Anal. 90(1), 132–153 (2004)MathSciNetCrossRefMATH
10.
go back to reference Ohlebusch, E., Kurtz, S.: Space efficient computation of rare maximal exact matches between multiple sequences. J. Comput. Biol. 15(4), 357–377 (2008)MathSciNetCrossRef Ohlebusch, E., Kurtz, S.: Space efficient computation of rare maximal exact matches between multiple sequences. J. Comput. Biol. 15(4), 357–377 (2008)MathSciNetCrossRef
12.
go back to reference McKenna, A., et al.: The genome analysis toolkit: a mapreduce framework for analyzing next-generation dna sequencing data. Genome Res. 20(9), 1297–1303 (2010)CrossRef McKenna, A., et al.: The genome analysis toolkit: a mapreduce framework for analyzing next-generation dna sequencing data. Genome Res. 20(9), 1297–1303 (2010)CrossRef
13.
go back to reference Rao, C., Raju, K.B., Raju, S.V.: Parallel string matching with multi core processors-a comparative study for gene sequences. Global J. Comput. Sci. Technol. Hardware Comput. 13(1), 27–41 (2013) Rao, C., Raju, K.B., Raju, S.V.: Parallel string matching with multi core processors-a comparative study for gene sequences. Global J. Comput. Sci. Technol. Hardware Comput. 13(1), 27–41 (2013)
14.
go back to reference Prasad, R., Agarwal, S., Yadav, I., Singh, B.: A fast bit-parallel multi-patterns string matching algorithm for biological sequences. In: Proceedings of the International Symposium on Biocomputing, ISB 2010, pp. 46:1–46:4. ACM (2010) Prasad, R., Agarwal, S., Yadav, I., Singh, B.: A fast bit-parallel multi-patterns string matching algorithm for biological sequences. In: Proceedings of the International Symposium on Biocomputing, ISB 2010, pp. 46:1–46:4. ACM (2010)
15.
go back to reference Tran, T.T., Giraud, M., Varré, J.-S.: Bit-Parallel multiple pattern matching. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2011. LNCS, vol. 7204, pp. 292–301. Springer, Heidelberg (2012). doi:10.1007/978-3-642-31500-8_30 CrossRef Tran, T.T., Giraud, M., Varré, J.-S.: Bit-Parallel multiple pattern matching. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2011. LNCS, vol. 7204, pp. 292–301. Springer, Heidelberg (2012). doi:10.​1007/​978-3-642-31500-8_​30 CrossRef
16.
go back to reference Sitaridi, E.A.: Ross, K.A.: GPU-accelerated string matching for database applications. The VLDB Journal, pp. 1–22 (2015) Sitaridi, E.A.: Ross, K.A.: GPU-accelerated string matching for database applications. The VLDB Journal, pp. 1–22 (2015)
17.
go back to reference Chantrapornchai, C., Navpatanitch, S., Choksuchat, C.: Parallel patient karyotype information system using multi-threads. Appl. Med. Informatics 37(3), 39–48 (2015) Chantrapornchai, C., Navpatanitch, S., Choksuchat, C.: Parallel patient karyotype information system using multi-threads. Appl. Med. Informatics 37(3), 39–48 (2015)
18.
go back to reference Charras, C., Lecroq, T.: Handbook of Exact String Matching Algorithms. King’s College Publications, London (2004)MATH Charras, C., Lecroq, T.: Handbook of Exact String Matching Algorithms. King’s College Publications, London (2004)MATH
20.
go back to reference Dumontier, M., Callahanand, A., Cruz-Toledo, J., Ansell, P., Emonet, V., Belleau, F., Arnaud, D.: Bio2RDF release 3: a larger connected network of linked data for the life sciences. In: Proceedings of the 2014 International Conference on Posters & #38; Demonstrations Track, ISWC-PD 2014 vol. 1272, pp. 401–404 (2014). CEUR-WS.org Dumontier, M., Callahanand, A., Cruz-Toledo, J., Ansell, P., Emonet, V., Belleau, F., Arnaud, D.: Bio2RDF release 3: a larger connected network of linked data for the life sciences. In: Proceedings of the 2014 International Conference on Posters & #38; Demonstrations Track, ISWC-PD 2014 vol. 1272, pp. 401–404 (2014). CEUR-WS.​org
Metadata
Title
Improving Cytogenetic Search with GPUs Using Different String Matching Schemes
Authors
Chantana Chantrapornchai
Chidchanok Choksuchat
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-49586-6_13

Premium Partner