Top

Published in:

2015 | OriginalPaper | Chapter

Montgomery Modular Multiplication on ARM-NEON Revisited

Authors : Hwajeong Seo, Zhe Liu, Johann Großschädl, Jongseok Choi, Howon Kim

Published in: Information Security and Cryptology - ICISC 2014

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Montgomery modular multiplication constitutes the “arithmetic foundation” of modern public-key cryptography with applications ranging from RSA, DSA and Diffie-Hellman over elliptic curve schemes to pairing-based cryptosystems. The increased prevalence of SIMD-type instructions in commodity processors (e.g. Intel SSE, ARM NEON) has initiated a massive body of research on vector-parallel implementations of Montgomery modular multiplication. In this paper, we introduce the Cascade Operand Scanning (COS) method to speed up multi-precision multiplication on SIMD architectures. We developed the COS technique with the goal of reducing Read-After-Write (RAW) dependencies in the propagation of carries, which also reduces the number of pipeline stalls (i.e. bubbles). The COS method operates on 32-bit words in a row-wise fashion (similar to the operand-scanning method) and does not require a “non-canonical” representation of operands with a reduced radix. We show that two COS computations can be “coarsely” integrated into an efficient vectorized variant of Montgomery multiplication, which we call Coarsely Integrated Cascade Operand Scanning (CICOS) method. Due to our sophisticated instruction scheduling, the CICOS method reaches record-setting execution times for Montgomery modular multiplication on ARM-NEON platforms. Detailed benchmarking results obtained on an ARM Cortex-A9 and Cortex-A15 processors show that the proposed CICOS method outperforms Bos et al’s implementation from SAC 2013 by up to 57 % (A9) and 40 % (A15), respectively.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Lossless Data Hiding for Binary Document Images Using -Pairs Pattern

next chapter A Fair and Efficient Mutual Private Set Intersection Protocol from a Two-Way Oblivious Pseudorandom Function

Note that the timings in the proceedings version of Bos et al’s paper differ from the version in the IACR eprint archive at https://eprint.iacr.org/2013/519. We used the faster timings from the eprint version for comparison with our work.

Operands \(A[0 \sim 7]\) and \(B[0 \sim 7]\) are stored in 32-bit registers. Intermediate results \(C[0 \sim 15]\) are stored in 64-bit registers. We use two packed 32-bit registers in the 64-bit register.

In the first round, the range is within [0, 0x1_ffff_fffd], because higher bits and lower bits of intermediate results \((C[0 \sim 7])\) are located in range of [0, 0xffff_fffe] and [0, 0xffff_ffff], respectively. From second round, the addition of higher and lower bits are located within [0, 0x1_ffff_fffe], because both higher and lower bits are located in range of [0, 0xffff_ffff].

In the first round, intermediate results (\(C[0\sim 7]\)) are in range of [0, 0x1_ffff_fffd] so multiplication and accumulation results are in range of [0, 0xffff_ffff_ffff_fffe]. From second round, the intermediate results are located in [0, 0x1_ffff_fffe] so multiplication and accumulation results are in range of [0, 0xffff_ffff_ffff_ffff].

NEON engine supports sixteen 128-bit registers. We assigned four registers for operands (\(A, B\)), four for intermediate results (\(C\)) and four for temporal storages.

Operands \(A[0 \sim 7]\), \(B[0 \sim 7]\), \(M[0 \sim 7]\), \(Q[0 \sim 7]\) and \(M'\) are stored in 32-bit registers. Intermediate results \(C[0 \sim 15]\) are stored in 64-bit registers.

Barrett, P.: Implementing the rivest shamir and adleman public key encryption algorithm on a standard digital signal processor. In: Odlyzko, A.M. (ed.) CRYPTO 1986. LNCS, vol. 263, pp. 311–323. Springer, Heidelberg (1987) CrossRef

Bernstein, D.J., Schwabe, P.: NEON crypto. In: Prouff, E., Schaumont, P. (eds.) CHES 2012. LNCS, vol. 7428, pp. 320–339. Springer, Heidelberg (2012) CrossRef

Lin, B.: Solving sequential problems in parallel: An SIMD solution to RSA cryptography, Feb 2006. http://cache.freescale.com/files/32bit/doc/app_note/AN3057.pdf

Bos, J.W., Kaihara, M.E.: montgomery multiplication on the cell. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2009, Part I. LNCS, vol. 6067, pp. 477–485. Springer, Heidelberg (2010) CrossRef

Bos, J.W., Montgomery, P.L., Shumow, D., Zaverucha, G.M.: Montgomery multiplication using vector instructions. In: Lange, T., Lauter, K., Lisoněk, P. (eds.) SAC 2013. LNCS, vol. 8282, pp. 471–490. Springer, Heidelberg (2014) CrossRef

Câmara, D., Gouvêa, C.P.L., López, J., Dahab, R.: Fast software polynomial multiplication on ARM processors using the NEON engine. In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES Workshops 2013. LNCS, vol. 8128, pp. 137–154. Springer, Heidelberg (2013) CrossRef

Faz-Hernández, A., Longa, P., Sánchez, A.H.: Efficient and secure algorithms for GLV-based scalar multiplication and their implementation on GLV-GLS curves. In: Benaloh, J. (ed.) CT-RSA 2014. LNCS, vol. 8366, pp. 1–27. Springer, Heidelberg (2014) CrossRef

Gueron, S., Krasnov, V.: Software implementation of modular exponentiation, using advanced vector instructions architectures. In: Özbudak, F., Rodríguez-Henríquez, F. (eds.) WAIFI 2012. LNCS, vol. 7369, pp. 119–135. Springer, Heidelberg (2012) CrossRef

Intel Corporation: Using streaming SIMD extensions (SSE2) to perform big multiplications. Application note AP-941, July 2000. http://software.intel.com/sites/default/files/14/4f/24960

10.

Montgomery, P.L.: Modular multiplication without trial division. Math. Comput. 44(170), 519–521 (1985)CrossRefMATH

11.

Pabbuleti, K.C., Mane, D.H., Desai, A., Albert, C., Schaumont, P.: Simd acceleration of modular arithmetic on contemporary embedded platforms. In: 2013 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE (2013)

12.

Quisquater, J.-J.: Procédé de codage selon la méthode dite rsa, par un microcontrôleur et dispositifs utilisant ce procédé. Demande de brevet français. (Dépôt numéro: 90 02274), 122 (1990)

13.

Quisquater, J.-J.: Encoding system according to the so-called rsa method, by means of a microcontroller and arrangement implementing this system, 24 November 1992. US Patent 5,166,978

14.

Sánchez, A.H., Rodríguez-Henríquez, F.: NEON implementation of an attribute-based encryption scheme. In: Jacobson, M., Locasto, M., Mohassel, P., Safavi-Naini, R. (eds.) ACNS 2013. LNCS, vol. 7954, pp. 322–338. Springer, Heidelberg (2013) CrossRef

Title: Montgomery Modular Multiplication on ARM-NEON Revisited
Authors: Hwajeong Seo
Zhe Liu
Johann Großschädl
Jongseok Choi
Howon Kim
Publisher: Springer International Publishing
Book: Information Security and Cryptology - ICISC 2014
Print ISBN: 978-3-319-15942-3

Electronic ISBN: 978-3-319-15943-0

Copyright Year: 2015
DOI: https://doi.org/10.1007/978-3-319-15943-0_20

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner