Skip to main content
Top
Published in: The Journal of Supercomputing 4/2021

11-08-2020

Parallel implementation of Nussbaumer algorithm and number theoretic transform on a GPU platform: application to qTESLA

Authors: Wai-Kong Lee, Sedat Akleylek, Denis Chee-Keong Wong, Wun-She Yap, Bok-Min Goi, Seong-Oun Hwang

Published in: The Journal of Supercomputing | Issue 4/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Among the popular post-quantum schemes, lattice-based cryptosystems have received renewed interest since there are relatively simple, highly parallelizable and provably secure under a worst-case hardness assumption. However, polynomial multiplication over rings is the most time-consuming operation in most of the lattice-based cryptosystems. To further improve the performance of lattice-based cryptosystems for large scale usage, polynomial multiplication must be implemented in parallel. The polynomial multiplication can be performed using either number theoretic transform (NTT) or Nussbaumer algorithm. However, Nussbaumer algorithm is inherently serial. Meanwhile, the efficient implementation of NTT using various indexing methods on GPU platform remains unknown. In this paper, we explore the best combination of various indexing methods to implement NTT on GPU platform and the efficient way to parallelize the Nussbaumer algorithm. Our results suggest that the combination of Gentleman–Sande and Cooley–Tukey (GS-CT) indexing methods produced the best performance on RTX2060 GPU (i.e. 422,638 polynomial multiplications per second). A technique to parallelize Nussbaumer algorithm by reducing the non-coalesced global memory access to half is produced. To the best of our knowledge, this is the first GPU implementation of Nussbaumer algorithm and it outperforms the best aforementioned NTT (GS-CT) implementation by 14.5%. For illustration purpose, the proposed GPU implementation techniques are applied to qTESLA, a state-of-the-art lattice based signature scheme. We emphasize that the proposed implementation techniques are not specific to any cryptosystem; they can be easily adapted to any other lattice-based cryptosystems.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Shor P (1994) Algorithms for quantum computation: discrete logarithm and factoring. In: IEEE Proceedings of the 35th Annual Symposium on Foundations of Computer Science. IEEE, Santa Fe, pp 124–134 Shor P (1994) Algorithms for quantum computation: discrete logarithm and factoring. In: IEEE Proceedings of the 35th Annual Symposium on Foundations of Computer Science. IEEE, Santa Fe, pp 124–134
3.
go back to reference Dai W, Sunar B (2015) cuHE: a homomorphic encryption accelerator library. In: International Conference on Cryptography and Information Security in the Balkans. Springer Dai W, Sunar B (2015) cuHE: a homomorphic encryption accelerator library. In: International Conference on Cryptography and Information Security in the Balkans. Springer
4.
go back to reference Dai W, Chen D, Cheung RCC, Koc CK (2018) FFT-based McLaughlin’s montgomery exponentiation without conditional selections. IEEE Trans Comput 67(9):1301–1314MathSciNetMATH Dai W, Chen D, Cheung RCC, Koc CK (2018) FFT-based McLaughlin’s montgomery exponentiation without conditional selections. IEEE Trans Comput 67(9):1301–1314MathSciNetMATH
5.
go back to reference Feng X, Li S, Xu S (2019) RLWE-oriented high-speed polynomial multiplier utilizing multi-lane Stockham NTT algorithm. IEEE Trans Circuits Systems II Express Briefs 67(3):556–559CrossRef Feng X, Li S, Xu S (2019) RLWE-oriented high-speed polynomial multiplier utilizing multi-lane Stockham NTT algorithm. IEEE Trans Circuits Systems II Express Briefs 67(3):556–559CrossRef
6.
go back to reference Akleylek S, Tok ZY (2014) Efficient arithmetic for lattice-based cryptography on GPU using the CUDA platform. In: 22nd IEEE Signal Processing and Communications Applications Conference (SIU), Trabzon Akleylek S, Tok ZY (2014) Efficient arithmetic for lattice-based cryptography on GPU using the CUDA platform. In: 22nd IEEE Signal Processing and Communications Applications Conference (SIU), Trabzon
7.
go back to reference Akleylek S, Dagdelen O, Tok ZY (2016) On the efficiency of polynomial multiplication for lattice-based cryptography on GPUs using CUDA. In: International Conference on Cryptography and Information Security in the Balkans. Koper, pp 155–168 Akleylek S, Dagdelen O, Tok ZY (2016) On the efficiency of polynomial multiplication for lattice-based cryptography on GPUs using CUDA. In: International Conference on Cryptography and Information Security in the Balkans. Koper, pp 155–168
8.
go back to reference Lee W-K, Akleylek S, Yap W-S, Goi B-M (2019) Accelerating number theoretic transform in GPU platform for qTESLA scheme. In: 15th International Conference on Information Security Practice and Experience (ISPEC 2019), Kuala Lumpur Lee W-K, Akleylek S, Yap W-S, Goi B-M (2019) Accelerating number theoretic transform in GPU platform for qTESLA scheme. In: 15th International Conference on Information Security Practice and Experience (ISPEC 2019), Kuala Lumpur
9.
go back to reference Nussbaumer H (1980) Fast polynomial transform algorithms for digital convolution. IEEE Trans Acoust Speech Signal Process 28:205–215MathSciNetCrossRef Nussbaumer H (1980) Fast polynomial transform algorithms for digital convolution. IEEE Trans Acoust Speech Signal Process 28:205–215MathSciNetCrossRef
10.
go back to reference Cooley JW, Tukey JW (1965) An algorithm for the machine calculation of complex Fourier series. Math Comput 19(90):297–301MathSciNetCrossRef Cooley JW, Tukey JW (1965) An algorithm for the machine calculation of complex Fourier series. Math Comput 19(90):297–301MathSciNetCrossRef
11.
go back to reference Gentleman WM, Sande G (1966) Fast Fourier transforms—for fun and profit. Proc Joint Comput Conf 29:563–578 Gentleman WM, Sande G (1966) Fast Fourier transforms—for fun and profit. Proc Joint Comput Conf 29:563–578
12.
go back to reference Cochran WT, Cooley JW, Favin DL, Helms HD, Kaenel RA, Lang WW, Maling GC, Nels DE (1967) What is the fast Fourier transform? Proc IEEE 55:1664–1674CrossRef Cochran WT, Cooley JW, Favin DL, Helms HD, Kaenel RA, Lang WW, Maling GC, Nels DE (1967) What is the fast Fourier transform? Proc IEEE 55:1664–1674CrossRef
15.
go back to reference Emmart N, Weems CC (2011) High precision integer multiplication with a GPU using Strassen’s algorithm with multiple FFT sizes. Parallel Process Lett 21(3):359–375MathSciNetCrossRef Emmart N, Weems CC (2011) High precision integer multiplication with a GPU using Strassen’s algorithm with multiple FFT sizes. Parallel Process Lett 21(3):359–375MathSciNetCrossRef
16.
go back to reference Wang W, Hu Y, Chen L, Huang X, Sunar B (2013) Exploring the feasibility of fully homomorphic encryption. IEEE Trans Comput 64(3):698–706MathSciNetCrossRef Wang W, Hu Y, Chen L, Huang X, Sunar B (2013) Exploring the feasibility of fully homomorphic encryption. IEEE Trans Comput 64(3):698–706MathSciNetCrossRef
17.
go back to reference Barrett P (1986) Implementing the Rivest Shamir and Adleman Public Key Encryption Algorithm on a Standard Digital Signal Processor. Advances in Cryptology—CRYPTO’ 86. Lecture Notes in Computer Science, vol 263, pp 311–323 Barrett P (1986) Implementing the Rivest Shamir and Adleman Public Key Encryption Algorithm on a Standard Digital Signal Processor. Advances in Cryptology—CRYPTO’ 86. Lecture Notes in Computer Science, vol 263, pp 311–323
18.
go back to reference Shone N, Ngoc TN, Phai VD, Shi Q (2018) A deep learning approach to network intrusion detection. IEEE Trans Emerg Top Comput Intell 2(1):41–50CrossRef Shone N, Ngoc TN, Phai VD, Shi Q (2018) A deep learning approach to network intrusion detection. IEEE Trans Emerg Top Comput Intell 2(1):41–50CrossRef
19.
go back to reference Lee WK, Achar R, Nakhla MS (2018) Dynamic GPU parallel sparse LU factorization for fast circuit simulation. IEEE Trans Very Large Scale Integration (VLSI) Syst 26(11):2518–2529CrossRef Lee WK, Achar R, Nakhla MS (2018) Dynamic GPU parallel sparse LU factorization for fast circuit simulation. IEEE Trans Very Large Scale Integration (VLSI) Syst 26(11):2518–2529CrossRef
20.
go back to reference Emmart N, Zheng, F, Weems C (2018) Faster modular exponentiation using double precision floating point arithmetic on the GPU. In: Proceedings of the IEEE 25th Symposium on Computer Arithmetic. IEEE, Amherst, Massachusetts, pp 130–137 Emmart N, Zheng, F, Weems C (2018) Faster modular exponentiation using double precision floating point arithmetic on the GPU. In: Proceedings of the IEEE 25th Symposium on Computer Arithmetic. IEEE, Amherst, Massachusetts, pp 130–137
22.
go back to reference Du C, Bai G (2016) Efficient polynomial multiplier architecture for ring-LWE based public key cryptosystems. In: IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, Montreal, pp 1162–1165 Du C, Bai G (2016) Efficient polynomial multiplier architecture for ring-LWE based public key cryptosystems. In: IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, Montreal, pp 1162–1165
25.
go back to reference Avanzi R, Bos JW, Ducas L, Kiltz E, Lepoint T, Lyubashevsky V, Schanck JM, Schwabe P, Seiler G, Stehlé D CRYSTALS-KYBER: Algorithm Specifications and Supporting Documentation. https://pq-crystals.org/. Accessed 25 June 2020 Avanzi R, Bos JW, Ducas L, Kiltz E, Lepoint T, Lyubashevsky V, Schanck JM, Schwabe P, Seiler G, Stehlé D CRYSTALS-KYBER: Algorithm Specifications and Supporting Documentation. https://​pq-crystals.​org/​. Accessed 25 June 2020
26.
go back to reference Alkim E, Avanzi R, Bos J, Ducas L, Piedra A, Pöppelmann T, Schwabe P, Stebila D, Newhope-Algorithm Specifications and Supporting Documentation. https://newhopecrypto.org/. Accessed 25 June 2020 Alkim E, Avanzi R, Bos J, Ducas L, Piedra A, Pöppelmann T, Schwabe P, Stebila D, Newhope-Algorithm Specifications and Supporting Documentation. https://​newhopecrypto.​org/​. Accessed 25 June 2020
29.
go back to reference Chang CC, Lee WK, Liu Y, Goi BM, Phan RCW (2018) Signature gateway: offloading signature generation to IoT gateway accelerated by GPU. IEEE Internet Things J 6(3):4448–4461CrossRef Chang CC, Lee WK, Liu Y, Goi BM, Phan RCW (2018) Signature gateway: offloading signature generation to IoT gateway accelerated by GPU. IEEE Internet Things J 6(3):4448–4461CrossRef
Metadata
Title
Parallel implementation of Nussbaumer algorithm and number theoretic transform on a GPU platform: application to qTESLA
Authors
Wai-Kong Lee
Sedat Akleylek
Denis Chee-Keong Wong
Wun-She Yap
Bok-Min Goi
Seong-Oun Hwang
Publication date
11-08-2020
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 4/2021
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-020-03392-x

Other articles of this Issue 4/2021

The Journal of Supercomputing 4/2021 Go to the issue

Premium Partner