FPGA Implementation

An Exploration of Mechanisms for Dynamic Cryptographic Instruction Set Extension

Instruction Set Extensions (ISEs) supplement a host processor with special-purpose, typically fixed-function hardware components and instructions to utilize them. For cryptographic use-cases, this can be very effective due to the demand for non-standard or niche operations that are not supported by general-purpose architectures. However, one disadvantage of fixed-function ISEs is inflexibility, contradicting a need for “algorithm agility.” This paper explores a new approach, namely the provision of re-configurable mechanisms to support dynamic (run-time changeable) ISEs. Our results, obtained using an FPGA-based LEON3 prototype, show that this approach provides a flexible general-purpose platform for cryptographic ISEs with all known advantages of previous work, but relies on careful analysis of the associated security issues.

Philipp Grabher, Johann Großschädl, Simon Hoerder, Kimmo Järvinen, Dan Page, Stefan Tillich, Marcin Wójcik

FPGA-Based True Random Number Generation Using Circuit Metastability with Adaptive Feedback Control

The paper presents a novel and efficient method to generate true random numbers on FPGAs by inducing metastability in bi-stable circuit elements, e.g. flip-flops. Metastability is achieved by using precise programmable delay lines (PDL) that accurately equalize the signal arrival times to flip-flops. The PDLs are capable of adjusting signal propagation delays with resolutions higher than fractions of a pico second. In addition, a real time monitoring system is utilized to assure a high degree of randomness in the generated output bits, resilience against fluctuations in environmental conditions, as well as robustness against active adversarial attacks. The monitoring system employs a feedback loop that actively monitors the probability of output bits; as soon as any bias is observed in probabilities, it adjusts the delay through PDLs to return to the metastable operation region. Implementation on Xilinx Virtex 5 FPGAs and results of NIST randomness tests show the effectiveness of our approach.

Mehrdad Majzoobi, Farinaz Koushanfar, Srinivas Devadas

Generic Side-Channel Countermeasures for Reconfigurable Devices

In this work, we propose and evaluate generic hardware countermeasures against DPA attacks for recent FPGA devices. The proposed set of FPGA-specific countermeasures can be combined to resist a large variety of first-order DPA attacks, even with 100 million recorded power traces. This set includes generic and resource-efficient countermeasures for on-chip noise generation, random-data processing delays and S-box scrambling using dual-ported block memories. In particular, it is possible to build many of these countermeasures into a single IP-core or hard macro that then provides basic protection for any cryptographic implementation just by its inclusion in the design process – what is particularly useful for engineers with no or little background on security and side-channel attacks.

Tim Güneysu, Amir Moradi

AES

Improved Collision-Correlation Power Analysis on First Order Protected AES

The recent results presented by Moradi et al. on AES at CHES 2010 and Witteman et al. on square-and-multiply always RSA exponentiation at CT-RSA 2011 have shown that collision-correlation power analysis is able to recover the secret keys on embedded implementations. However, we noticed that the attack published last year by Moradi et al. is not efficient on correctly first-order protected implementations. We propose in this paper improvements on collision-correlation attacks which require less power traces than classical second-order power analysis techniques. We present here two new methods and show in practice their real efficiency on two first-order protected AES implementations. We also mention that other symmetric embedded algorithms can be targeted by our new techniques.

Christophe Clavier, Benoit Feix, Georges Gagnerot, Mylène Roussellet, Vincent Verneuil

Higher-Order Glitches Free Implementation of the AES Using Secure Multi-party Computation Protocols

Higher-order side channel attacks (HO-SCA) is a powerful technique against cryptographic implementations and the design of appropriate countermeasures is nowadays an important topic. In parallel, another class of attacks, called

glitches attacks

, have been investigated which exploit the hardware glitches phenomena occurring during the physical execution of algorithms. We introduce in this paper a circuit model that encompasses sufficient conditions to resist glitches effects. This allows us to construct the first countermeasure thwarting both glitches and HO-SCA attacks. Our new construction requires Secure Multi-Party Computation protocols and we propose to apply the one introduced by Ben’Or et al.at STOC in 1988. The adaptation of the latter protocol to the context of side channel analysis results in a completely new higher-order masking scheme, particularly interesting when addressing resistance in the presence of glitches. An application of our scheme to the AES block cipher is detailed.

Emmanuel Prouff, Thomas Roche

Protecting AES with Shamir’s Secret Sharing Scheme

Cryptographic algorithms embedded on physical devices are particularly vulnerable to Side Channel Analysis (SCA). The most common countermeasure for block cipher implementations is masking, which randomizes the variables to be protected by combining them with one or several random values. In this paper, we propose an original masking scheme based on Shamir’s Secret Sharing scheme [22] as an alternative to Boolean masking. We detail its implementation for the AES using the same tool than Rivain and Prouff in CHES 2010 [16]: multi-party computation. We then conduct a security analysis of our scheme in order to compare it to Boolean masking. Our results show that for a given amount of noise the proposed scheme - implemented to the first order - provides the same security level as 3

rd

up to 4

th

order boolean masking, together with a better efficiency.

Louis Goubin, Ange Martinelli

A Fast and Provably Secure Higher-Order Masking of AES S-Box

This paper proposes an efficient and secure higher-order masking algorithm for AES S-box that consumes the most computation time of the higher-order masked AES. During the past few years, much of the research has focused on finding higher-order masking schemes for this AES S-box, but these are still slow for embedded processors use. Our proposed higher-order masking of AES S-box is constructed based on the inversion operation over the composite field. We replace the subfield operations over the composite field into the table lookup operation, but these precomputation tables do not require much ROM space because these are the operations over

GF

(2

4

). In the implementation results, we show that the higher-order masking scheme using our masked S-box is about 2.54 (second-order masking) and 3.03 (third-order masking) times faster than the fastest method among the existing higher-order masking schemes of AES.

HeeSeok Kim, Seokhie Hong, Jongin Lim

Elliptic Curve Cryptosystems

Software Implementation of Binary Elliptic Curves: Impact of the Carry-Less Multiplier on Scalar Multiplication

The availability of a new carry-less multiplication instruction in the latest Intel desktop processors significantly accelerates multiplication in binary fields and hence presents the opportunity for reevaluating algorithms for binary field arithmetic and scalar multiplication over elliptic curves. We describe how to best employ this instruction in field multiplication and the effect on performance of doubling and halving operations. Alternate strategies for implementing inversion and half-trace are examined to restore most of their competitiveness relative to the new multiplier. These improvements in field arithmetic are complemented by a study on serial and parallel approaches for Koblitz and random curves, where parallelization strategies are implemented and compared. The contributions are illustrated with experimental results improving the state-of-the-art performance of halving and doubling-based scalar multiplication on NIST curves at the 112- and 192-bit security levels, and a new speed record for side-channel resistant scalar multiplication in a random curve at the 128-bit security level.

Jonathan Taverne, Armando Faz-Hernández, Diego F. Aranha, Francisco Rodríguez-Henríquez, Darrel Hankerson, Julio López

High-Speed High-Security Signatures

This paper shows that a $390 mass-market quad-core 2.4GHz Intel Westmere (Xeon E5620) CPU can create 108000 signatures per second and verify 71000 signatures per second on an elliptic curve at a 2

128

security level. Public keys are 32 bytes, and signatures are 64 bytes. These performance figures include strong defenses against software side-channel attacks: there is no data flow from secret keys to array indices, and there is no data flow from secret keys to branch conditions.

Daniel J. Bernstein, Niels Duif, Tanja Lange, Peter Schwabe, Bo-Yin Yang

To Infinity and Beyond: Combined Attack on ECC Using Points of Low Order

We present a novel combined attack against ECC implementations that exploits specially crafted, but valid input points. The core idea is that after fault injection, these points turn into points of very low order. Using side channel information we deduce when the point at infinity occurs during the scalar multiplication, which leaks information about the secret key. In the best case, our attack breaks a simple and differential side channel analysis resistant implementation with input/output point validity and curve parameter checks using a single query.

Junfeng Fan, Benedikt Gierlichs, Frederik Vercauteren

Lattices

Random Sampling for Short Lattice Vectors on Graphics Cards

We present a GPU implementation of the Simple Sampling Reduction (SSR) algorithm that searches for short vectors in lattices. SSR makes use of the famous BKZ algorithm. It complements an exhaustive search in a suitable search region to insert random, short vectors to the lattice basis. The sampling of short vectors can be executed in parallel.

Our GPU implementation increases the number of sampled vectors per second from 5200 to more than 120,000. With this we are the first to present a parallel implementation of SSR and we make use of the computing capability of modern graphics cards to enhance the search for short vectors even more.

Michael Schneider, Norman Göttert

Extreme Enumeration on GPU and in Clouds

- How Many Dollars You Need to Break SVP Challenges -

The complexity of the Shortest Vector Problem (SVP) in lattices is directly related to the security of NTRU and the provable level of security of many recently proposed lattice-based cryptosystems. We integrate several recent algorithmic improvements for solving SVP and take first place at dimension 120 in the SVP Challenge Hall of Fame. Our implementation allows us to find a short vector at dimension 114 using 8 NVIDIA video cards in less than two days.

Specifically, our improvements to the recent Extreme Pruning in enumeration approach, proposed by Gama

et al.

in Eurocrypt 2010, include: (1) a more flexible bounding function in polynomial form; (2) code to take advantage of Clouds of commodity PCs (via the MapReduce framework); and (3) the use of NVIDIA’s Graphics Processing Units (GPUs). We may now reasonably estimate the cost of a wide range of SVP instances in U.S. dollars, as rent paid to cloud-computing service providers, which is arguably a simpler and more practical measure of complexity.

Po-Chun Kuo, Michael Schneider, Özgür Dagdelen, Jan Reichelt, Johannes Buchmann, Chen-Mou Cheng, Bo-Yin Yang

Modulus Fault Attacks against RSA-CRT Signatures

RSA-CRT fault attacks have been an active research area since their discovery by Boneh, DeMillo and Lipton in 1997. We present alternative key-recovery attacks on RSA-CRT signatures: instead of targeting one of the sub-exponentiations in RSA-CRT, we inject faults into the

public modulus

before CRT interpolation, which makes a number of countermeasures against Boneh et al.’s attack ineffective.

Our attacks are based on orthogonal lattice techniques and are very efficient in practice: depending on the fault model, between 5 to 45 faults suffice to recover the RSA factorization within a few seconds. Our simplest attack requires that the adversary knows the faulty moduli, but more sophisticated variants work even if the moduli are unknown, under reasonable fault models. All our attacks have been fully validated experimentally with fault-injection laser techniques.

Éric Brier, David Naccache, Phong Q. Nguyen, Mehdi Tibouchi

Side Channel Attacks

Breaking Mifare DESFire MF3ICD40: Power Analysis and Templates in the Real World

With the advent of side-channel analysis, implementations of mathematically secure ciphers face a new threat: by exploiting the physical characteristics of a device, adversaries are able to break algorithms such as AES or Triple-DES (3DES), for which no efficient analytical or brute-force attacks exist. In this paper, we demonstrate practical, noninvasive side-channel attacks on the Mifare DESFire MF3ICD40 contactless smartcard, a 3DES-based alternative to the cryptanalytically weak Mifare Classic [9,25]. We detail on how to recover the complete 112-bit secret key of the employed 3DES algorithm, using non-invasive power analysis and template attacks. Our methods can be put into practice at a low cost with standard equipment, thus posing a severe threat to many real-world applications that employ the DESFire MF3ICD40 smartcard.

David Oswald, Christof Paar

Information Theoretic and Security Analysis of a 65-Nanometer DDSLL AES S-Box

In a recent work from Eurocrypt 2011, Renauld et al. discussed the impact of the increased variability in nanoscale CMOS devices on their evaluation against side-channel attacks. In this paper, we complement this work by analyzing an implementation of the AES S-box, in the DDSLL dual-rail logic style, using the same 65-nanometer technology. For this purpose, we first compare the performance results of the static CMOS and dual-rail S-boxes. We show that full custom design allows to nicely mitigate the performance drawbacks that are usually reported for dual-rail circuits. Next, we evaluate the side-channel leakages of these S-boxes, using both simulations and actual measurements. We take advantage of state-of-the-art evaluation tools, and discuss the quantity and nature (e.g. linearity) of the physical information they provide. Our results show that the security improvement of the DDSLL S-box is typically in the range of one order of magnitude (in terms of “number of traces to recover the key”). They also confirm the importance of a profiled information theoretic analysis for the worst-case security evaluation of leaking devices. They finally raise the important question whether dual-rail logic styles remain a promising approach for reducing the side-channel information leakages in front of technology scaling, as hardware constraints such as balanced routing may become increasingly challenging to fulfill, as circuit sizes tend towards the nanometer scale.

Mathieu Renauld, Dina Kamel, François-Xavier Standaert, Denis Flandre

Thwarting Higher-Order Side Channel Analysis with Additive and Multiplicative Maskings

Higher-order side channel attacks is a class of powerful techniques against cryptographic implementations. Their complexity grows exponentially with the order, but for small orders (

e.g.

2 and 3) recent studies have demonstrated that they pose a serious threat in practice. In this context, it is today of great importance to design software countermeasures enabling to counteract higher-order side channel attacks for any arbitrary chosen order. At CHES 2010, Rivain and Prouff have introduced such a countermeasure for the AES. It works for any arbitrary chosen order and benefits from a formal resistance proof. Until now, it was the single one with such assets. By generalizing at any order a countermeasure introduced at ACNS 2010 by Genelle

etal.

, we propose in this paper an alternative to Rivain and Prouff’s solution. The new scheme can also be proven secure at any order and has the advantage of being at least 2 times more efficient than the existing solutions for orders 2 and 3, while maintaining the RAM consumption lower than 200 bytes.

Laurie Genelle, Emmanuel Prouff, Michaël Quisquater

Extractors against Side-Channel Attacks: Weak or Strong?

Randomness extractors are important tools in cryptography. Their goal is to compress a high-entropy source into a more uniform output. Beyond their theoretical interest, they have recently gained attention because of their use in the design and proof of leakage-resilient primitives, such as stream ciphers and pseudorandom functions. However, for these proofs of leakage resilience to be meaningful in practice, it is important to instantiate and implement the components they are based on. In this context, while numerous works have investigated the implementation properties of block ciphers such as the AES Rijndael, very little is known about the application of side-channel attacks against extractor implementations. In order to close this gap, this paper instantiates a low-cost hardware extractor and analyzes it both from a performance and from a side-channel security point of view. Our investigations lead to contrasted conclusions. On the one hand, extractors can be efficiently implemented and protected with masking. On the other hand, they provide adversaries with many more exploitable leakage samples than, e.g. block ciphers. As a result, they can ensure high security margins against standard (non-profiled) side-channel attacks and turn out to be much weaker against profiled attacks. From a methodological point of view, our analysis consequently raises the question of which attack strategies should be considered in security evaluations.

Marcel Medwed, François-Xavier Standaert

Invited Talk

Standardization Works for Security Regarding the Electromagnetic Environment

Telecommunication functions of electronic devices have been and will continue to increase. The so called smart community, a society in which more advanced communications technology is used, will enable life to be increasingly convenient. Thus, telecommunications will become more and more important. However, when such functions become unavailable for some reason, it will negatively impact society. Therefore, device robustness and information leakage are serious issues that need to be addressed. Security regarding electromagnetic waves has been extensively studied in terms of electromagnetic compatibility. In particular, high power electromagnetic phenomena and information leakage due to electromagnetic waves have been discussed in IEEE EMC TC5, ITU-T SG5 and IEC SC77C. In this presentation, an overview of the results, trends, and future works are discussed. Recently developed recommendation ITU-T K.84 (test methods and guide against information leaks through unintentional EM emissions), a leakage mechanism, and protection methods are also discussed.

Tetsuya Tominaga

Fault Attacks

Meet-in-the-Middle and Impossible Differential Fault Analysis on AES

Since the early work of Piret and Quisquater on fault attacks against AES at CHES 2003, many works have been devoted to reduce the number of faults and to improve the time complexity of this attack. This attack is very efficient as a single fault is injected on the third round before the end, and then it allows to recover the whole secret key in 2

32

in time and memory. However, since this attack, it is an open problem to know if provoking a fault at a former round of the cipher allows to recover the key. Indeed, since two rounds of AES achieve a full diffusion and adding protections against fault attack decreases the performance, some countermeasures propose to protect only the three first and last rounds. In this paper, we give an answer to this problem by showing two practical cryptographic attacks on one round earlier of AES-128 and for all keysize variants. The first attack requires 10 faults and its complexity is around 2

40

in time and memory, an improvement allows only 5 faults and its complexity in memory is reduced to 2

24

while the second one requires either 1000 or 45 faults depending on fault model and recovers the secret key in around 2

40

in time and memory.

Patrick Derbez, Pierre-Alain Fouque, Delphine Leresteux

On the Power of Fault Sensitivity Analysis and Collision Side-Channel Attacks in a Combined Setting

At CHES 2010 two powerful new attacks were presented, namely the Fault Sensitivity Analysis and the Correlation Collision Attack. This paper shows how these ideas can be combined to create even stronger attacks. Two solutions are presented; both extract leakage information by the fault sensitivity analysis method while each one applies a slightly different collision attack to deduce the secret information without the need of any hypothetical leakage model. Having a similar fault injection method, one attack utilizes the non-uniform distribution of faulty ciphertext bytes while the other one exploits the data-dependent timing characteristics of the target combination circuit. The results when attacking several AES ASIC cores of the SASEBO LSI chips in different process technologies are presented. Successfully breaking the cores protected against DPA attacks using either gate-level countermeasures or logic styles indicates the strength of the attacks.

Amir Moradi, Oliver Mischke, Christof Paar, Yang Li, Kazuo Ohta, Kazuo Sakiyama

Lightweight Symmetric Algorithms

spongent: A Lightweight Hash Function

This paper proposes

spongent

– a family of lightweight hash functions with hash sizes of 88 (for preimage resistance only), 128, 160, 224, and 256 bits based on a sponge construction instantiated with a

present

-type permutation, following the hermetic sponge strategy. Its smallest implementations in ASIC require 738, 1060, 1329, 1728, and 1950 GE, respectively. To our best knowledge, at all security levels attained, it is the hash function with the smallest footprint in hardware published so far, the parameter being highly technology dependent.

spongent

offers a lot of flexibility in terms of serialization degree and speed. We explore some of its numerous implementation trade-offs.

We furthermore present a security analysis of

spongent

. Basing the design on a

present

-type primitive provides confidence in its security with respect to the most important attacks. Several dedicated attack approaches are also investigated.

Andrey Bogdanov, Miroslav Knežević, Gregor Leander, Deniz Toz, Kerem Varıcı, Ingrid Verbauwhede

The LED Block Cipher

We present a new block cipher

LED

. While dedicated to compact hardware implementation, and offering the smallest silicon footprint among comparable block ciphers, the cipher has been designed to simultaneously tackle three additional goals. First, we explore the role of an ultra-light (in fact non-existent) key schedule. Second, we consider the resistance of ciphers, and

LED

in particular, to related-key attacks: we are able to derive simple yet interesting

AES

-like security proofs for

LED

regarding related- or single-key attacks. And third, while we provide a block cipher that is very compact in hardware, we aim to maintain a reasonable performance profile for software implementation.

Jian Guo, Thomas Peyrin, Axel Poschmann, Matt Robshaw

Piccolo: An Ultra-Lightweight Blockcipher

We propose a new 64-bit blockcipher

Piccolo

supporting 80 and 128-bit keys. Adopting several novel design and implementation techniques,

Piccolo

achieves both high security and notably compact implementation in hardware. We show that

Piccolo

offers a sufficient security level against known analyses including recent related-key differential attacks and meet-in-the-middle attacks. In our smallest implementation, the hardware requirements for the 80 and the 128-bit key mode are only 683 and 758 gate equivalents, respectively. Moreover,

Piccolo

requires only 60 additional gate equivalents to support the decryption function due to its involution structure. Furthermore, its efficiency on the energy consumption which is evaluated by energy per bit is also remarkable. Thus,

Piccolo

is one of the competitive ultra-lightweight blockciphers which are suitable for extremely constrained environments such as RFID tags and sensor nodes.

Kyoji Shibutani, Takanori Isobe, Harunaga Hiwatari, Atsushi Mitsuda, Toru Akishita, Taizo Shirai

PUFs

Lightweight and Secure PUF Key Storage Using Limits of Machine Learning

A lightweight and secure key storage scheme using silicon Physical Unclonable Functions (PUFs) is described. To derive stable PUF bits from chip manufacturing variations, a lightweight error correction code (ECC) encoder / decoder is used. With a register count of 69, this codec core does not use any traditional error correction techniques and is 75% smaller than a previous provably secure implementation, and yet achieves robust environmental performance in 65

nm

FPGA and 0.13

μ

ASIC implementations. The security of the syndrome bits uses a new security argument that relies on what

cannot

be learned from a machine learning perspective. The number of

Leaked Bits

is determined for each Syndrome Word, reducible using

Syndrome Distribution Shaping

. The design is secure from a min-entropy standpoint against a machine-learning-equipped adversary that, given a ceiling of leaked bits, has a classification error bounded by

ε

. Numerical examples are given using latest machine learning results.

Meng-Day (Mandel) Yu, David M’Raihi, Richard Sowell, Srinivas Devadas

Recyclable PUFs: Logically Reconfigurable PUFs

We introduce the concept of Logically Reconfigurable Physical Unclonable Functions (LR-PUFs). In contrast to classical Physically Unclonable Functions (PUFs) LR-PUFs can be dynamically ‘reconfigured’ after deployment such that their challenge/response behavior changes in a random manner. To this end, we amend a conventional PUF with a stateful control logic that transforms challenges and responses of the PUF. We present and evaluate two different constructions for LR-PUFs that are simple, efficient and can easily be implemented. Moreover, we introduce a formal security model for LR-PUFs and prove that both constructions are secure under reasonable assumptions. Finally, we demonstrate that LR-PUFs enable the construction of securely recyclable access tokens, such as electronic tickets: LR-PUFs enhance security against manipulation and forgery, while reconfiguration allows secure re-use of tokens for subsequent transactions.

Stefan Katzenbeisser, Ünal Koçabas, Vincent van der Leest, Ahmad-Reza Sadeghi, Geert-Jan Schrijen, Heike Schröder, Christian Wachsmann

Uniqueness Enhancement of PUF Responses Based on the Locations of Random Outputting RS Latches

Physical Unclonable Functions (PUFs) are expected to represent an important solution for secure ID generation and authentication etc. In general, PUFs are considered to be more secure the larger their output entropy. However, the entropy of conventional PUFs is lower than the output bit length, because some output bits are random numbers, which are regarded as unnecessary for ID generation and discarded. We propose a novel PUF structure based on a Butterfly PUF with multiple RS latches, which generates larger entropy by utilizing location information of the RS latches generating random numbers. More specifically, while conventional PUFs generate binary values (0/1), the proposed PUF generates ternary values (0/1/random) in order to increase entropy. We estimate the entropy of the proposed PUF. According to our experiment with 40 FPGAs, a Butterfly PUF with 128 RS latches can improve entropy from 116 bits to 192.7 bits, this being maximized when the frequency of each ternary value is equal. We also show the appropriate RS latch structure for satisfying this condition, and validate it through an FPGA experiment.

Dai Yamamoto, Kazuo Sakiyama, Mitsugu Iwamoto, Kazuo Ohta, Takao Ochiai, Masahiko Takenaka, Kouichi Itoh

MECCA: A Robust Low-Overhead PUF Using Embedded Memory Array

The generation of unique keys by Integrated Circuits (IC) has important applications in areas such as Intellectual Property (IP) counter-plagiarism and embedded security integration. To this end, Physical Unclonable Functions (PUF) have been proposed to build tamper-resistant hardware by exploiting random process variations. Existing PUFs suffer from increased overhead to the original design due to their specific functions for generating unique keys and/or routing constraints. In this paper, we propose a novel memory-cell based PUF (MECCA PUF), which performs authentication by exploiting the intrinsic process variations in read/write reliability of cells in static memories. The reliability of cells is characterized after manufacturing by inducing temporal failures, such as write and access failures in the cells using a programmable word line duty cycle controller. Since most modern designs already have considerable amount of embedded memory, the proposed approach incurs very little overhead (<1%) compared to existing PUF designs. Simulation results for 1000 chips with 10% inter-die variations show that the PUF provides large choice of challenge-response pairs with high uniqueness (49.9% average inter-die Hamming distance) and excellent reproducibility (0.85% average intra-die Hamming distance).

Aswin Raghav Krishna, Seetharam Narasimhan, Xinmu Wang, Swarup Bhunia

Public-Key Cryptosystems

FPGA Implementation of Pairings Using Residue Number System and Lazy Reduction

Recently, a lot of progress has been made in the implementation of pairings in both hardware and software. In this paper, we present two FPGA-based high speed pairing designs using the Residue Number System and lazy reduction. We show that by combining RNS, which is naturally suitable for parallel architectures, and lazy reduction, which performs one reduction for multiple multiplications, the speed of pairing computation in hardware can be largely increased. The results show that both designs achieve higher speed than previous designs. The fastest version computes an optimal ate pairing at 126-bit security level in 0.573 ms, which is 2 times faster than all previous hardware implementations at the same security level.

Ray C. C. Cheung, Sylvain Duquesne, Junfeng Fan, Nicolas Guillermin, Ingrid Verbauwhede, Gavin Xiaoxu Yao

High Speed Cryptoprocessor for η T Pairing on 128-bit Secure Supersingular Elliptic Curves over Characteristic Two Fields

This paper presents an efficient architecture for computing cryptographic

η

T

pairing for providing 128-bit security. A cryptoprocessor is proposed for Miller’s Algorithm with a new 1223-bit Karatsuba multiplier that exploits parallelism. To the best of our knowledge this is the first hardware implementation of 128-bit secure

η

T

pairing on supersingular elliptic curves over characteristic two fields. The design has been implemented on Xilinx FPGAs. The place-and-route results show that the proposed design takes only 190

μs

to complete an 128-bit secure

η

T

pairing on a Virtex-6 FPGA. The proposed cryptoprocessor achieves eight times speedup compared to the best known existing design. It also outperforms the previous designs with respect to

area

×

time

product.

Santosh Ghosh, Dipanwita Roychowdhury, Abhijit Das

Fast Multi-precision Multiplication for Public-Key Cryptography on Embedded Microprocessors

Multi-precision multiplication is one of the most fundamental operations on microprocessors to allow public-key cryptography such as RSA and Elliptic Curve Cryptography (ECC). In this paper, we present a novel multiplication technique that increases the performance of multiplication by sophisticated caching of operands. Our method significantly reduces the number of needed

load

instructions which is usually one of the most expensive operation on modern processors. We evaluate our new technique on an 8-bit ATmega128 microcontroller and compare the result with existing solutions. Our implementation needs only 2,395 clock cycles for a 160-bit multiplication which outperforms related work by a factor of 10% to 23%. The number of required

load

instructions is reduced from 167 (needed for the best known hybrid multiplication) to only 80. Our implementation scales very well even for larger Integer sizes (required for RSA) and limited register sets. It further fully complies to existing multiply-accumulate instructions that are integrated in most of the available processors.

Michael Hutter, Erich Wenger

Small Public Keys and Fast Verification for $\mathcal{M}$ ultivariate $\mathcal{Q}$ uadratic Public Key Systems

Security of public key schemes in a post-quantum world is a challenging task—as both RSA and ECC will be broken then. In this paper, we show how post-quantum signature systems based on

$\mathcal{M}$

ultivariate

$\mathcal{Q}$

uadratic (

$\mathcal{MQ}$

) polynomials can be improved up by about 9/10, and 3/5, respectively, in terms of public key size and verification time. The exact figures are 88% and 59%. This is particularly important for small-scale devices with restricted energy, memory, or computational power. In addition, we provide evidence that this reduction does not affect security and that it is also optimal in terms of possible attacks. We do so by combining the previously unrelated concepts of reduced and equivalent keys. Our new scheme is based on the so-called

Unbalanced Oil and Vinegar

class of

$\mathcal{MQ}$

-schemes. We have derived our results mathematically and verified the speed-ups through a C++ implementation.

Albrecht Petzoldt, Enrico Thomae, Stanislav Bulygin, Christopher Wolf

Hash Functions

Throughput vs. Area Trade-offs in High-Speed Architectures of Five Round 3 SHA-3 Candidates Implemented Using Xilinx and Altera FPGAs

In this paper we present a comprehensive comparison of all Round 3 SHA-3 candidates and the current standard SHA-2 from the point of view of hardware performance in modern FPGAs. Each algorithm is implemented using multiple architectures based on the concepts of folding, unrolling, and pipelining. Trade-offs between speed and area are investigated, and the best architecture from the point of view of the throughput to area ratio is identified. Finally, all algorithms are ranked based on their overall performance, and the characteristic features of each algorithm important from the point of view of its implementation in hardware are identified.

Ekawat Homsirikamol, Marcin Rogawski, Kris Gaj

Efficient Hashing Using the AES Instruction Set

In this work, we provide a software benchmark for a large range of 256-bit blockcipher-based hash functions. We instantiate the underlying blockcipher with AES, which allows us to exploit the recent AES instruction set (AES-NI). Since AES itself only outputs 128 bits, we consider double-block-length constructions, as well as (single-block-length) constructions based on

Rijndael-256

. Although we primarily target architectures supporting AES-NI, our framework has much broader applications by estimating the performance of these hash functions on any (micro-)architecture given AES-benchmark results. As far as we are aware, this is the first comprehensive performance comparison of multi-block-length hash functions in software.

Joppe W. Bos, Onur Özen, Martijn Stam

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

FPGA Implementation

An Exploration of Mechanisms for Dynamic Cryptographic Instruction Set Extension

FPGA-Based True Random Number Generation Using Circuit Metastability with Adaptive Feedback Control

Generic Side-Channel Countermeasures for Reconfigurable Devices

AES

Improved Collision-Correlation Power Analysis on First Order Protected AES

Higher-Order Glitches Free Implementation of the AES Using Secure Multi-party Computation Protocols

Protecting AES with Shamir’s Secret Sharing Scheme

A Fast and Provably Secure Higher-Order Masking of AES S-Box

Elliptic Curve Cryptosystems

Software Implementation of Binary Elliptic Curves: Impact of the Carry-Less Multiplier on Scalar Multiplication

High-Speed High-Security Signatures

To Infinity and Beyond: Combined Attack on ECC Using Points of Low Order

Lattices

Random Sampling for Short Lattice Vectors on Graphics Cards

Extreme Enumeration on GPU and in Clouds

Modulus Fault Attacks against RSA-CRT Signatures

Side Channel Attacks

Breaking Mifare DESFire MF3ICD40: Power Analysis and Templates in the Real World

Information Theoretic and Security Analysis of a 65-Nanometer DDSLL AES S-Box

Thwarting Higher-Order Side Channel Analysis with Additive and Multiplicative Maskings

Extractors against Side-Channel Attacks: Weak or Strong?

Invited Talk

Standardization Works for Security Regarding the Electromagnetic Environment

Fault Attacks

Meet-in-the-Middle and Impossible Differential Fault Analysis on AES

On the Power of Fault Sensitivity Analysis and Collision Side-Channel Attacks in a Combined Setting

Lightweight Symmetric Algorithms

spongent: A Lightweight Hash Function

The LED Block Cipher

Piccolo: An Ultra-Lightweight Blockcipher

PUFs

Lightweight and Secure PUF Key Storage Using Limits of Machine Learning

Recyclable PUFs: Logically Reconfigurable PUFs

Uniqueness Enhancement of PUF Responses Based on the Locations of Random Outputting RS Latches

MECCA: A Robust Low-Overhead PUF Using Embedded Memory Array

Public-Key Cryptosystems

FPGA Implementation of Pairings Using Residue Number System and Lazy Reduction

High Speed Cryptoprocessor for η T Pairing on 128-bit Secure Supersingular Elliptic Curves over Characteristic Two Fields

Fast Multi-precision Multiplication for Public-Key Cryptography on Embedded Microprocessors

Small Public Keys and Fast Verification for $\mathcal{M}$ ultivariate $\mathcal{Q}$ uadratic Public Key Systems

Hash Functions

Throughput vs. Area Trade-offs in High-Speed Architectures of Five Round 3 SHA-3 Candidates Implemented Using Xilinx and Altera FPGAs

Efficient Hashing Using the AES Instruction Set

Backmatter