Skip to main content
Top

2018 | OriginalPaper | Chapter

Using SIMD Instructions to Accelerate Sequence Similarity Searches Inside a Database System

Authors : Sidath Randeni Kadupitige, Uwe Röhm

Published in: Databases Theory and Applications

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Database systems are optimised for managing large data sets, but they face difficulties making an impact to life sciences where the typical use cases involve much more complex analytical algorithms than found in traditional OLTP or OLAP scenarios. Although many database management systems (DBMS) are extensible via stored procedures to implement transactions or complex algorithms, these stored procedures are usually unable to leverage the inbuilt optimizations provided by the query engine, so other optimization avenues must be explored.
In this paper, we investigate how sequence alignment algorithms, one of the most common operations carried out on a bioinformatics or genomics database, can be efficiently implemented close to the data within an extensible database system. We investigate the use of single instruction, multiple data (SIMD) extensions to accelerate logic inside an DBMS. We also compare it to implementations of the same logic outside the DBMS.
Our implementation of an SIMD-accelerated Smith Waterman sequence-alignment algorithm shows an order of magnitude improvement on a non-accelerated version while running inside a DBMS. Our SIMD accelerated version also performs with little to no overhead inside the DBMS compared to the same logic running outside the DBMS.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Daily, J.: Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments. BMC Bioinform. 17(1), 81 (2016)MathSciNetCrossRef Daily, J.: Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments. BMC Bioinform. 17(1), 81 (2016)MathSciNetCrossRef
2.
go back to reference Delaney, K., Beauchemin, B., Cunningham, C., Kehayias, J., Randal, P.S., Nevarez, B.: Microsoft SQL Server 2012 Internals. Microsoft Press, Redmond (2013) Delaney, K., Beauchemin, B., Cunningham, C., Kehayias, J., Randal, P.S., Nevarez, B.: Microsoft SQL Server 2012 Internals. Microsoft Press, Redmond (2013)
3.
go back to reference Dorr, R.: How It Works: SQL Server 2016 SSE/AVX Support (2016) Dorr, R.: How It Works: SQL Server 2016 SSE/AVX Support (2016)
4.
go back to reference Farrar, M.: Striped smith-waterman speeds database searches six times over other SIMD implementations. Bioinformatics 23(2), 156–161 (2006)CrossRef Farrar, M.: Striped smith-waterman speeds database searches six times over other SIMD implementations. Bioinformatics 23(2), 156–161 (2006)CrossRef
5.
go back to reference Héman, S.: Updating compressed column stores. Ph.D. thesis, Informatics Institute (IVI) (2009) Héman, S.: Updating compressed column stores. Ph.D. thesis, Informatics Institute (IVI) (2009)
6.
go back to reference Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. PNAS 89(22), 10915–10919 (1992)CrossRef Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. PNAS 89(22), 10915–10919 (1992)CrossRef
7.
go back to reference IHGRC: Finishing the euchromatic sequence of the human genome. Nature 431(7011), 931–945 (2004)CrossRef IHGRC: Finishing the euchromatic sequence of the human genome. Nature 431(7011), 931–945 (2004)CrossRef
8.
go back to reference Larson, P., Birka, A., Hanson, E.N., Huang, W., Nowakiewicz, M., Papadimos, V.: Real-time analytical processing with SQL server. PVLDB 8(12), 1740–1751 (2015) Larson, P., Birka, A., Hanson, E.N., Huang, W., Nowakiewicz, M., Papadimos, V.: Real-time analytical processing with SQL server. PVLDB 8(12), 1740–1751 (2015)
9.
go back to reference Leturgez, L.: SIMD outside and inside Oracle 12c (2015) Leturgez, L.: SIMD outside and inside Oracle 12c (2015)
10.
go back to reference Manegold, S., Boncz, P.A., Kersten, M.L.: Optimizing database architecture for the new bottleneck: memory access. VLDB J. 9(3), 231–246 (2000)CrossRef Manegold, S., Boncz, P.A., Kersten, M.L.: Optimizing database architecture for the new bottleneck: memory access. VLDB J. 9(3), 231–246 (2000)CrossRef
11.
go back to reference Polychroniou, O., Raghavan, A., Ross, K.A.: Rethinking SIMD vectorization for in-memory databases. In: ACM SIGMOD, SIGMOD 2015, pp. 1493–1508. ACM, New York (2015) Polychroniou, O., Raghavan, A., Ross, K.A.: Rethinking SIMD vectorization for in-memory databases. In: ACM SIGMOD, SIGMOD 2015, pp. 1493–1508. ACM, New York (2015)
12.
go back to reference Rognes, T.: Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation. BMC Bioinform. 12, 221 (2011)CrossRef Rognes, T.: Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation. BMC Bioinform. 12, 221 (2011)CrossRef
13.
go back to reference Rognes, T., Seeberg, E.: Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors. Bioinformatics 16(8), 699–706 (2000)CrossRef Rognes, T., Seeberg, E.: Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors. Bioinformatics 16(8), 699–706 (2000)CrossRef
14.
go back to reference Röhm, U., Blakeley, J.A.: Data management for high-throughput genomics. In: Fourth Biennial Conference on Innovative Data Systems Research, CIDR 2009, Asilomar, CA, USA, 4–7 January 2009, Online Proceedings (2009) Röhm, U., Blakeley, J.A.: Data management for high-throughput genomics. In: Fourth Biennial Conference on Innovative Data Systems Research, CIDR 2009, Asilomar, CA, USA, 4–7 January 2009, Online Proceedings (2009)
16.
go back to reference Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)CrossRef Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)CrossRef
17.
go back to reference Sosic, M.: An SIMD dynamic programming C/C++ library. Master’s thesis, University of Zagreb (2015) Sosic, M.: An SIMD dynamic programming C/C++ library. Master’s thesis, University of Zagreb (2015)
18.
go back to reference Stonebraker, M., Brown, P., Zhang, D., Becla, J.: SciDB: a database management system for applications with complex analytics. Comput. Sci. Eng. 15(3), 54–62 (2013)CrossRef Stonebraker, M., Brown, P., Zhang, D., Becla, J.: SciDB: a database management system for applications with complex analytics. Comput. Sci. Eng. 15(3), 54–62 (2013)CrossRef
19.
go back to reference Wozniak, A.: Using video-oriented instructions to speed up sequence comparison. Comput. Appl. Biosci. 13(2), 145–150 (1997) Wozniak, A.: Using video-oriented instructions to speed up sequence comparison. Comput. Appl. Biosci. 13(2), 145–150 (1997)
20.
go back to reference Zhao, M., Lee, W.P., Garrison, E.P., Marth, G.T.: SSW library: an SIMD Smith-Waterman C/C++ library for use in genomic applications. PLoS ONE 8(12), e82138 (2013)CrossRef Zhao, M., Lee, W.P., Garrison, E.P., Marth, G.T.: SSW library: an SIMD Smith-Waterman C/C++ library for use in genomic applications. PLoS ONE 8(12), e82138 (2013)CrossRef
21.
go back to reference Zhou, J., Ross, K.A.: Implementing database operations using SIMD instructions. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, 3–6 June 2002, pp. 145–156 (2002) Zhou, J., Ross, K.A.: Implementing database operations using SIMD instructions. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, 3–6 June 2002, pp. 145–156 (2002)
22.
go back to reference Żukowski, M.: Balancing vectorized query execution with bandwidth-optimized storage. Ph.D. thesis, Informatics Institute (IVI) (2009) Żukowski, M.: Balancing vectorized query execution with bandwidth-optimized storage. Ph.D. thesis, Informatics Institute (IVI) (2009)
Metadata
Title
Using SIMD Instructions to Accelerate Sequence Similarity Searches Inside a Database System
Authors
Sidath Randeni Kadupitige
Uwe Röhm
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-92013-9_7

Premium Partner