Processing Techniques in Side-Channel Analysis

Frontmatter

Robust Profiling for DPA-Style Attacks

Profiled side-channel attacks are understood to be powerful when applicable: in the best case when an adversary can comprehensively characterise the leakage, the resulting model leads to attacks requiring a minimal number of leakage traces for success. Such ‘complete’ leakage models are designed to capture the scale, location and shape of the profiling traces, so that any deviation between these and the attack traces potentially produces a mismatch which renders the model unfit for purpose. This severely limits the applicability of profiled attacks in practice and so poses an interesting research challenge: how can we design profiled distinguishers that can tolerate (some) differences between profiling and attack traces?

This submission is the first to tackle the problem head on: we propose distinguishers (utilising unsupervised machine learning methods, but also a ‘down-to-earth’ method combining mean traces and PCA) and evaluate their behaviour across an extensive set of distortions that we apply to representative trace data. Our results show that the profiled distinguishers are effective and robust to distortions to a surprising extent.

Carolyn Whitnall, Elisabeth Oswald

Less is More

Dimensionality Reduction from a Theoretical Perspective

Reducing the dimensionality of the measurements is an important problem in side-channel analysis. It allows to capture multi-dimensional leakage as one single compressed sample, and therefore also helps to reduce the computational complexity. The other side of the coin with dimensionality reduction is that it may at the same time reduce the efficiency of the attack, in terms of success probability.

In this paper, we carry out a mathematical analysis of dimensionality reduction. We show that optimal attacks remain optimal after a first pass of preprocessing, which takes the form of a linear projection of the samples. We then investigate the state-of-the-art dimensionality reduction techniques, and find that asymptotically, the optimal strategy coincides with the linear discriminant analysis.

Nicolas Bruneau, Sylvain Guilley, Annelie Heuser, Damien Marion, Olivier Rioul

Blind Source Separation from Single Measurements Using Singular Spectrum Analysis

Singular Spectrum Analysis (SSA) is a powerful data decomposition/recompose technique that can be used to reduce the noise in time series. Compared to existing solutions aiming at similar purposes, such as frequency-based filtering, it benefits from easier-to-exploit intuitions, applicability in contexts where low sampling rates make standard frequency analyses challenging, and the (theoretical) possibility to separate a signal source from a noisy source even if both run at the same frequency. In this paper, we first describe how to apply SSA in the context of side-channel analysis, and then validate its interest in three different scenarios. Namely, we consider unprotected software, masked software, and unprotected hardware block cipher implementations. Our experiments confirm significant noise reductions in all three cases, leading to success rates improved accordingly. They also put forward the stronger impact of SSA in more challenging scenarios, e.g. masked implementations (because the impact of noise increases exponentially with the number of shares in this case), or noisy hardware implementations (because of the established connection between the amount of noise and the attacks’ success rate in this case). Since noise is a fundamental ingredient for most countermeasures against side-channel attacks, we conclude SSA can be an important element in the toolbox of evaluation laboratories, in order to efficiently preprocess their measurements in a black box manner.

Santos Merino Del Pozo, François-Xavier Standaert

Cryptographic Hardware Implementations

Frontmatter

Highly Efficient $$GF(2^8)$$ G F ( 2 8 ) Inversion Circuit Based on Redundant GF Arithmetic and Its Application to AES Design

This paper proposes a compact and efficient

$$GF(2^8)$$

G

F

(

2

8

)

inversion circuit design based on a combination of non-redundant and redundant Galois Field (GF) arithmetic. The proposed design utilizes redundant GF representations, called Polynomial Ring Representation (PRR) and Redundantly Represented Basis (RRB), to implement

$$GF(2^8)$$

G

F

(

2

8

)

inversion using a tower field

$$GF((2^4)^2)$$

G

F

(

2

4

)

2

)

. In addition to the redundant representations, we introduce a specific normal basis that makes it possible to map the former components for the 16th and 17th powers of input onto logic gates in an efficient manner. The latter components for

$$GF(2^4)$$

G

F

(

2

4

)

inversion and

$$GF(2^4)$$

G

F

(

2

4

)

multiplication are then implemented by PRR and RRB, respectively. The flexibility of the redundant representations provides efficient mappings from/to the

$$GF(2^8)$$

G

F

(

2

8

)

. This paper also evaluates the efficacy of the proposed circuit by means of gate counts and logic synthesis with a 65 nm CMOS standard cell library and comparisons with conventional circuits, including those with tower fields

$$GF(((2^2)^2)^2)$$

G

F

(

2

)

2

)

2

)

. Consequently, we show that the proposed circuit achieves approximately 40 % higher efficiency in terms of area-time product than the conventional best

$$GF(((2^2)^2)^2)$$

G

F

(

2

)

2

)

2

)

circuit excluding isomorphic mappings. We also demonstrate that the proposed circuit achieves the best efficiency (i.e., area-time product) for an AES encryption S-Box circuit including isomorphic mappings.

Rei Ueno, Naofumi Homma, Yukihiro Sugawara, Yasuyuki Nogami, Takafumi Aoki

NaCl’s Crypto_box in Hardware

This paper presents a low-resource hardware implementation of the widely used

crypto_box

function of the Networking and Cryptography library (NaCl). It supports the X25519 Diffie-Hellman key exchange using Curve25519, the Salsa20 stream cipher, and the Poly1305 message authenticator. Our targeted application is a secure communication between devices in the Internet of Things (IoT) and Internet servers. Such devices are highly resource-constrained and require carefully optimized hardware implementations. We propose the first solution that enables 128-bit-secure public-key authenticated encryption on passively-powered IoT devices like WISP nodes. From a cryptographic point of view we thus make a first step to turn these devices into fully-fledged participants of Internet communication. Our crypto processor needs a silicon area of 14.6 kGEs and less than 40

$$\mu $$

W of power at 1 MHz for a 130 nm low-leakage CMOS process technology.

Michael Hutter, Jürgen Schilling, Peter Schwabe, Wolfgang Wieser

Lightweight Coprocessor for Koblitz Curves: 283-Bit ECC Including Scalar Conversion with only 4300 Gates

We propose a lightweight coprocessor for 16-bit microcontrollers that implements high security elliptic curve cryptography. It uses a 283-bit Koblitz curve and offers 140-bit security. Koblitz curves offer fast point multiplications if the scalars are given as specific

$$\tau $$

τ

-adic expansions, which results in a need for conversions between integers and

$$\tau $$

τ

-adic expansions. We propose the first lightweight variant of the conversion algorithm and, by using it, introduce the first lightweight implementation of Koblitz curves that includes the scalar conversion. We also include countermeasures against side-channel attacks making the coprocessor the first lightweight coprocessor for Koblitz curves that includes a set of countermeasures against timing attacks, SPA, DPA and safe-error fault attacks. When the coprocessor is synthesized for 130 nm CMOS, it has an area of only 4,323 GE. When clocked at 16 MHz, it computes one 283-bit point multiplication in 98 ms with a power consumption of 97.70

$$\mu $$

μ

W, thus, consuming 9.56

$$\mu $$

μ

J of energy.

Sujoy Sinha Roy, Kimmo Järvinen, Ingrid Verbauwhede

Single Base Modular Multiplication for Efficient Hardware RNS Implementations of ECC

The paper describes a new RNS modular multiplication algorithm for efficient implementations of ECC over

$$\mathbb {F}_P$$

F

P

. Thanks to the proposition of RNS-friendly Mersenne-like primes, the proposed RNS algorithm requires 2 times less moduli than the state-of-art ones, leading to 4 times less precomputations and about 2 times less operations. FPGA implementations of our algorithm are presented, with area reduced up to 46 %, for a time overhead less than 10 %.

Karim Bigou, Arnaud Tisserand

Homomorphic Encryption in Hardware

Frontmatter

Accelerating Homomorphic Evaluation on Reconfigurable Hardware

Homomorphic encryption allows computation on encrypted data and makes it possible to securely outsource computational tasks to untrusted environments. However, all proposed schemes are quite inefficient and homomorphic evaluation of ciphertexts usually takes several seconds on high-end CPUs, even for evaluating simple functions. In this work we investigate the potential of FPGAs for speeding up those evaluation operations. We propose an architecture to accelerate schemes based on the ring learning with errors (RLWE) problem and specifically implemented the somewhat homomorphic encryption scheme YASHE, which was proposed by Bos, Lauter, Loftus, and Naehrig in 2013. Due to the large size of ciphertexts and evaluation keys, on-chip storage of all data is not possible and external memory is required. For efficient utilization of the external memory we propose an efficient double-buffered memory access scheme and a polynomial multiplier based on the number theoretic transform (NTT). For the parameter set (

$$n=16384,\lceil \log _2 q \rceil ={512}$$

n

=

16384

,

⌈

log

2

q

⌉

=

512

) capable of evaluating 9 levels of multiplications, we can perform a homomorphic addition in 0.94 ms and a homomorphic multiplication in 48.67 ms.

Thomas Pöppelmann, Michael Naehrig, Andrew Putnam, Adrian Macias

Modular Hardware Architecture for Somewhat Homomorphic Function Evaluation

We present a hardware architecture for all building blocks required in polynomial ring based fully homomorphic schemes and use it to instantiate the somewhat homomorphic encryption scheme YASHE. Our implementation is the first FPGA implementation that is designed for evaluating functions on homomorphically encrypted data (up to a certain multiplicative depth) and we illustrate this capability by evaluating the SIMON-64/128 block cipher in the encrypted domain. Our implementation provides a fast polynomial operations unit using CRT and NTT for multiplication combined with an optimized memory access scheme; a fast Barrett like polynomial reduction method; an efficient divide and round unit required in the multiplication of ciphertexts and an efficient CRT unit. These building blocks are integrated in an instruction-set coprocessor to execute YASHE, which can be controlled by a computer for evaluating arbitrary functions (up to the multiplicative depth 44 and 128-bit security level). Our architecture was compiled for a single Virtex-7 XC7V1140T FPGA, where it consumes 23 % of registers, 50 % of LUTs, 53 % of DSP slices, and 38 % of BlockRAM memory. The implementation evaluates SIMON-64/128 in approximately 157.7 s (at 143 MHz) and it processes 2048 ciphertexts at once giving a relative time of only 77 ms per block. This is 26.6 times faster than the leading software implementation on a 4-core Intel Core-i7 processor running at 3.4 GHz.

Sujoy Sinha Roy, Kimmo Järvinen, Frederik Vercauteren, Vassil Dimitrov, Ingrid Verbauwhede

Accelerating LTV Based Homomorphic Encryption in Reconfigurable Hardware

After being introduced in 2009, the first fully homomorphic encryption (FHE) scheme has created significant excitement in academia and industry. Despite rapid advances in the last 6 years, FHE schemes are still not ready for deployment due to an efficiency bottleneck. Here we introduce a custom hardware accelerator optimized for a class of reconfigurable logic to bring LTV based somewhat homomorphic encryption (SWHE) schemes one step closer to deployment in real-life applications. The accelerator we present is connected via a fast PCIe interface to a CPU platform to provide homomorphic evaluation services to any application that needs to support blinded computations. Specifically we introduce a number theoretical transform based multiplier architecture capable of efficiently handling very large polynomials. When synthesized for the Xilinx Virtex 7 family the presented architecture can compute the product of large polynomials in under 6.25 msec making it the fastest multiplier design of its kind currently available in the literature and is more than 102 times faster than a software implementation. Using this multiplier we can compute a relinearization operation in 526 msec. When used as an accelerator, for instance, to evaluate the AES block cipher, we estimate a per block homomorphic evaluation performance of 442 msec yielding performance gains of 28.5 and 17 times over similar CPU and GPU implementations, respectively.

Yarkın Doröz, Erdinç Öztürk, Erkay Savaş, Berk Sunar

Side-Channel Attacks on Public Key Cryptography

Frontmatter

Stealing Keys from PCs Using a Radio: Cheap Electromagnetic Attacks on Windowed Exponentiation

We present new side-channel attacks on RSA and ElGamal implementations that use sliding-window or fixed-window (

m

-ary) modular exponentiation. The attacks extract decryption keys using a very low measurement bandwidth (a frequency band of less than 100 kHz around a carrier under 2 MHz) even when attacking multi-GHz CPUs.

We demonstrate the attacks’ feasibility by extracting keys from GnuPG (unmodified ElGamal and non-blinded RSA), within seconds, using a nonintrusive measurement of electromagnetic emanations from laptop computers. The measurement equipment is cheap and compact, uses readily-available components (a Software Defined Radio USB dongle or a consumer-grade radio receiver), and can operate untethered while concealed, e.g., inside pita bread.

The attacks use a few non-adaptive chosen ciphertexts, crafted so that whenever the decryption routine encounters particular bit patterns in the secret key, intermediate values occur with a special structure that causes observable fluctuations in the electromagnetic field. Through suitable signal processing and cryptanalysis, the bit patterns and eventually the whole secret key are recovered.

Daniel Genkin, Lev Pachmanov, Itamar Pipman, Eran Tromer

Exclusive Exponent Blinding May Not Suffice to Prevent Timing Attacks on RSA

The references [

1

,

3

,

9

] treat timing attacks on RSA with CRT and Montgomery’s multiplication algorithm in unprotected implementations. It has been widely believed that exponent blinding would prevent any timing attack on RSA. At cost of significantly more timing measurements this paper extends the before-mentioned attacks to RSA with CRT when Montgomery’s multiplication algorithm and exponent blinding are applied. Simulation experiments are conducted, which confirm the theoretical results. Effective countermeasures exist. In particular, the attack efficiency is higher than in the previous version [

12

] while large parts of both papers coincide.

Werner Schindler

Who Watches the Watchmen?: Utilizing Performance Monitors for Compromising Keys of RSA on Intel Platforms

Asymmetric-key cryptographic algorithms when implemented on systems with branch predictors, are subjected to side-channel attacks exploiting the deterministic branch predictor behavior due to their key-dependent input sequences. We show that branch predictors can also leak information through the hardware performance monitors which are accessible by an adversary at the user-privilege level. This paper presents an iterative attack which target the key-bits of 1024 bit RSA, where in offline phase, the system’s underlying branch predictor is approximated by a theoretical predictor in literature. Subsimulations are performed to classify the message-space into distinct partitions based on the event branch misprediction and the target key bit value. In online phase, we ascertain the secret key bit using branch mispredictions obtained from the hardware performance monitors which reflect the behavior of the underlying predictor hardware. We theoretically prove that the probability of success is equivalent to the accurate modelling of the theoretical predictor to the underlying system predictor. Experimentations reveal that the success-rate increases with message-count and reaches such a significant value so as to consider side-channel from the performance counters as a real threat to RSA-like ciphers due to the underlying branch predictors and needs to be considered for developing secured-systems.

Sarani Bhattacharya, Debdeep Mukhopadhyay

Cipher Design and Cryptanalysis

Frontmatter

Improved Cryptanalysis of the DECT Standard Cipher

The DECT Standard Cipher (DSC) is a 64-bit key stream cipher used in the Digital Enhanced Cordless Telecommunications (DECT) standard to protect the confidentiality of the communications. In this paper we present an improved cryptanalysis approach which is more effective than the Nohl-Tews-Weinmann (NTW) attack and requires four times less plaintext material. Under the best conditions, our known plaintext attack requires only 3 min of communication compared to 10 min for the NTW attack. Our approach is able to quickly recover the secret key with a success rate of more than 50 % by analysing

$$2^{13}$$

keystreams and performing an exhaustive search over

$$2^{31}$$

keys. Additionally, the attack was successfully conducted against real intercepted DECT traffic where the plaintext was only 90 % accurate. To the best of our knowledge, the approach we present in this paper is the most effective cryptanalysis published so far against the DSC cipher.

Iwen Coisel, Ignacio Sanchez

Practical Key Recovery for Discrete-Logarithm Based Authentication Schemes from Random Nonce Bits

We propose statistical cryptanalysis of discrete-logarithm based authentication schemes such as Schnorr identification scheme or Girault-Poupard-Stern identification and signature schemes. We consider two scenarios where an adversary is given some information on the nonces used during the signature generation process or during some identification sessions. In the first scenario, we assume that some bits of the nonces are known exactly by the adversary, while no information is provided about the other bits. We show, for instance, that the GPS scheme with 128-bit security can be broken using only 710 signatures assuming that the adversary knows (on average) one bit per nonce. In the second scenario, we assume that all bits of the nonces are obtained from the correct ones by independent bit flipping with some small probability. A detailed heuristic analysis is provided, supported by extensive experiments.

Aurélie Bauer, Damien Vergnaud

The Simeck Family of Lightweight Block Ciphers

Two lightweight block cipher families,

Simon

and

Speck

, have been proposed by researchers from the NSA recently. In this paper, we introduce

Simeck

, a new family of lightweight block ciphers that combines the good design components from both

Simon

and

Speck

, in order to devise even more compact and efficient block ciphers. For

Simeck32/64

, we can achieve 505 GEs (before the Place and Route phase) and 549 GEs (after the Place and Route phase), with the power consumption of 0.417

$$\mu W$$

in CMOS 130 nm ASIC, and 454 GEs (before the Place and Route phase) and 488 GEs (after the Place and Route phase), with the power consumption of 1.292

$$\mu W$$

in CMOS 65 nm ASIC. Furthermore, all of the instances of

Simeck

are smaller than the ones of hardware-optimized cipher

Simon

in terms of area and power consumption in both CMOS 130 nm and CMOS 65 nm techniques. In addition, we also give the security evaluation of

Simeck

with respect to many traditional cryptanalysis methods, including differential attacks, linear attacks, impossible differential attacks, meet-in-the-middle attacks, and slide attacks. Overall, all of the instances of

Simeck

can satisfy the area, power, and throughput requirements in passive RFID tags.

Gangqiang Yang, Bo Zhu, Valentin Suder, Mark D. Aagaard, Guang Gong

TriviA: A Fast and Secure Authenticated Encryption Scheme

In this paper, we propose a new hardware friendly authenticated encryption (AE) scheme

$${\textsf {TriviA}}$$

based on (i) a stream cipher for generating keys for the ciphertext and the tag, and (ii) a pairwise independent hash to compute the tag.

We have adopted one of the ISO-standardized stream ciphers for lightweight cryptography, namely

Trivium

, to obtain our underlying stream cipher

. This new stream cipher has a state that is a little larger than the state of

Trivium

to accommodate a 128-bit secret key and IV.

Our pairwise independent hash is also an adaptation of the

EHC

or “Encode-Hash-Combine” hash, that requires the optimum number of field multiplications and hence requires small hardware footprint. We have implemented the design in synthesizable RTL. Pre-layout synthesis, using 65 nm standard cell technology under typical operating conditions, reveals that

$${\textsf {TriviA}}$$

is able to achieve a high throughput of 91.2 Gbps for an area of 24.4 KGE. We prove that our construction has at least 128-bit security for privacy and 124-bit security of authenticity under the assumption that the underlying stream cipher produces a pseudorandom bit stream.

Avik Chakraborti, Anupam Chattopadhyay, Muhammad Hassan, Mridul Nandi

True Random Number Generators and Entropy Estimations

Frontmatter

A Physical Approach for Stochastic Modeling of TERO-Based TRNG

Security in random number generation for cryptography is closely related to the entropy rate at the generator output. This rate has to be evaluated using an appropriate stochastic model. The stochastic model proposed in this paper is dedicated to the transition effect ring oscillator (TERO) based true random number generator (TRNG) proposed by Varchola and Drutarovsky in 2010. The advantage and originality of this model is that it is derived from a physical model based on a detailed study and on the precise electrical description of the noisy physical phenomena that contribute to the generation of random numbers. We compare the proposed electrical description with data generated in a 28 nm CMOS ASIC implementation. Our experimental results are in very good agreement with those obtained with both the physical model of TERO’s noisy behavior and with the stochastic model of the TERO TRNG, which we also confirmed using the AIS 31 test suites.

Patrick Haddad, Viktor Fischer, Florent Bernard, Jean Nicolai

Predictive Models for Min-entropy Estimation

Random numbers are essential for cryptography. In most real-world systems, these values come from a cryptographic pseudorandom number generator (PRNG), which in turn is seeded by an entropy source. The security of the entire cryptographic system then relies on the accuracy of the claimed amount of entropy provided by the source. If the entropy source provides less unpredictability than is expected, the security of the cryptographic mechanisms is undermined, as in [

5

,

7

,

10

]. For this reason, correctly estimating the amount of entropy available from a source is critical.

In this paper, we develop a set of tools for estimating entropy, based on mechanisms that attempt to predict the next sample in a sequence based on all previous samples. These mechanisms are called

predictors

. We develop a framework for using predictors to estimate entropy, and test them experimentally against both simulated and real noise sources. For comparison, we subject the entropy estimates defined in the August 2012 draft of NIST Special Publication 800-90B [

4

] to the same tests, and compare their performance.

John Kelsey, Kerry A. McKay, Meltem Sönmez Turan

Side-Channel Analysis and Fault Injection Attacks

Frontmatter

Improved Side-Channel Analysis of Finite-Field Multiplication

A side-channel analysis of multiplication in

$$\mathsf {GF}(2^{128})$$

GF

(

2

128

)

has recently been published by Belaïd, Fouque and Gérard at Asiacrypt 2014, with an application to AES-GCM. Using the least significant bit of the Hamming weight of the multiplication result, the authors have shown how to recover the secret multiplier efficiently. However such least significant bit is very sensitive to noise measurement; this implies that, without averaging, their attack can only work for high signal-to-noise ratios (

$$ \mathsf {SNR}> 128$$

SNR

>

128

). In this paper we describe a new side-channel attack against the multiplication in

$$\mathsf {GF}(2^{128})$$

GF

(

2

128

)

that uses the most significant bits of the Hamming weight. We show that much higher values of noise can be then tolerated. For instance with an

$$\mathsf {SNR}$$

SNR

equal to 8, the key can be recovered using

$$2^{20}$$

2

20

consumption traces with time and memory complexities respectively equal to

$$2^{51.68}$$

2

51.68

and

$$2^{36}$$

2

36

. We moreover show that the new method can be extended to attack the fresh re-keying countermeasure proposed by Medwed, Standaert, Großschädl and Regazzoni at Africacrypt 2010.

Sonia Belaïd, Jean-Sébastien Coron, Pierre-Alain Fouque, Benoît Gérard, Jean-Gabriel Kammerer, Emmanuel Prouff

Evaluation and Improvement of Generic-Emulating DPA Attacks

At CT-RSA 2014, Whitnall, Oswald and Standaert gave the impossibility result that no generic DPA strategies (i.e., without any

a priori

knowledge about the leakage characteristics) can recover secret information from a physical device by considering an injective target function (e.g., AES and PRESENT S-boxes), and as a remedy, they proposed a slightly relaxed strategy “generic-emulating DPAs” free from the non-injectivity constraint. However, as we show in this paper, the only generic-emulating DPA proposed in their work, namely the SLR-based DPA, suffers from two drawbacks: unstable outcomes in the high-noise regime (i.e., for a small number of traces) and poor performance especially on real smart cards (compared with traditional DPAs with a specific power model). In order to solve these problems, we introduce two new generic-emulating distinguishers, based on lasso and ridge regression strategies respectively, with more stable and better performances than the SLR-based one. Further, we introduce the cross-validation technique that improves the generic-emulating DPAs in general and might be of independent interest. Finally, we compare the performances of all aforementioned generic-emulating distinguishers (both with and without cross-validation) in simulated leakages functions of different degrees, and on an AES ASIC implementation. Our experimental results show that our generic-emulating distinguishers are stable and some of them behave even better than (resp., almost the same as) the best Difference-of-Means distinguishers in simulated leakages (resp., on a real implementation), and thus make themselves good alternatives to traditional DPAs.

Weijia Wang, Yu Yu, Junrong Liu, Zheng Guo, François-Xavier Standaert, Dawu Gu, Sen Xu, Rong Fu

Transient-Steady Effect Attack on Block Ciphers

A new Transient-Steady Effect attack on block ciphers called TSE attack is presented in this paper. The concept of transient-steady effect denotes the phenomenon that the output of a combinational circuit keeps a temporal value for a while before it finally switches to the correct value. Unlike most existing fault attacks, our attack does not need a large amount of encryptions to build a statistical model. By injecting a clock glitch to capture the temporal value caused by transient-steady effect, attackers can obtain the information of key from faulty outputs directly. This work shows that AES implementations, which have transient-steady property, are vulnerable to our attack. Experiments are successfully conducted on two kinds of unmasked S-boxes and one kind of masked S-box implemented in serial with FPGA board. After a moderate pre-computation, we need only 1 encryption to recover a key byte of the unmasked S-boxes, and 20 encryptions to recover a key byte of the masked S-box. Furthermore, we investigate the key recover method for parallel unmasked implementation, and discuss a possible attack scenario which may deem WDDL-AES insecure.

Yanting Ren, An Wang, Liji Wu

Higher-Order Side-Channel Attacks

Frontmatter

Assessment of Hiding the Higher-Order Leakages in Hardware

What Are the Achievements Versus Overheads?

Higher-order side-channel attacks are becoming amongst the major interests of academia as well as industry sector. It is indeed being motivated by the development of countermeasures which can prevent the leakages up to certain orders. As a concrete example, threshold implementation (TI) as an efficient way to realize Boolean masking in hardware is able to avoid first-order leakages. Trivially, the attacks conducted at second (and higher) orders can exploit the corresponding leakages hence devastating the provided security. Hence, the extension of TI to higher orders was being expected which has been presented at ASIACRYPT 2014. Following its underlying univariate settings it can provide security at higher orders, and its area and time overheads naturally increase with the desired security order.

In this work we look at the feasibility of higher-order attacks on first-order TI from another perspective. Instead of increasing the order of resistance by employing higher-order TIs, we realize the first-order TI designs following the principles of a power-equalization technique dedicated to FPGA platforms, that naturally leads to hardening higher-order attacks. We show that although the first-order TI designs, which are additionally equipped by the power-equalization methodology, have significant area overhead, they can maintain the same throughput and more importantly can avoid the higher-order leakages to be practically exploitable by up to 1 billion traces.

Amir Moradi, Alexander Wild

Multi-variate High-Order Attacks of Shuffled Tables Recomputation

Masking schemes based on tables recomputation are classical countermeasures against high-order side-channel attacks. Still, they are known to be attackable at order

d

in the case the masking involves

d

shares. In this work, we mathematically show that an attack of order strictly greater than

d

can be more successful than an attack at order

d

. To do so, we leverage the idea presented by Tunstall, Whitnall and Oswald at FSE 2013: we exhibit attacks which exploit the multiple leakages linked to one mask during the recomputation of tables. Specifically, regarding first-order table recomputation, improved by a shuffled execution, we show that there is a window of opportunity, in terms of noise variance, where a novel highly multivariate third-order attack is more efficient than a classical bivariate second-order attack. Moreover, we show on the example of the high-order secure table computation presented by Coron at EUROCRYPT 2014 that the window of opportunity enlarges linearly with the security order

d

.

Nicolas Bruneau, Sylvain Guilley, Zakaria Najm, Yannick Teglia

Leakage Assessment Methodology

A Clear Roadmap for Side-Channel Evaluations

Evoked by the increasing need to integrate side-channel countermeasures into security-enabled commercial devices, evaluation labs are seeking a standard approach that enables a fast, reliable and robust evaluation of the side-channel vulnerability of the given products. To this end, standardization bodies such as NIST intend to establish a leakage assessment methodology fulfilling these demands. One of such proposals is the Welch’s

t

-test, which is being put forward by Cryptography Research Inc., and is able to relax the dependency between the evaluations and the device’s underlying architecture. In this work, we deeply study the theoretical background of the test’s different flavors, and present a roadmap which can be followed by the evaluation labs to efficiently and correctly conduct the tests. More precisely, we express a stable, robust and efficient way to perform the tests at higher orders. Further, we extend the test to multivariate settings, and provide details on how to efficiently and rapidly carry out such a multivariate higher-order test. Including a suggested methodology to collect the traces for these tests, we point out practical case studies where different types of

t

-tests can exhibit the leakage of supposedly secure designs.

Tobias Schneider, Amir Moradi

Physically Unclonable Functions and Hardware Trojans

Frontmatter

Secure Key Generation from Biased PUFs

PUF-based key generators have been widely considered as a root-of-trust in digital systems. They typically require an error-correcting mechanism (e.g. based on the code-offset method) for dealing with bit errors between the enrollment and reconstruction of keys. When the used PUF does

not

have full entropy, entropy leakage between the helper data and the device-unique key material can occur. If the entropy level of the PUF becomes too low, the PUF-derived key can be attacked through the publicly available helper data. In this work we provide several solutions for preventing this entropy leakage for PUFs suffering from i.i.d.

biased bits

. The methods proposed in this work pose no limit on the amount of bias that can be tolerated, which solves an important open problem for PUF-based key generation. Additionally, the solutions are all evaluated based on reliability, efficiency, leakage and reusability showing that depending on requirements for the key generator different solutions are preferable.

Roel Maes, Vincent van der Leest, Erik van der Sluis, Frans Willems

The Gap Between Promise and Reality: On the Insecurity of XOR Arbiter PUFs

In this paper we demonstrate the first real-world cloning attack on a commercial PUF-based RFID tag. The examined commercial PUFs can be attacked by measuring only 4 protocol executions, which takes less than 200 ms. Using a RFID smartcard emulator, it is then possible to impersonate, i.e., “clone” the PUF. While attacking the 4-way PUF used by these tags can be done using traditional machine learning attacks, we show that the tags can still be attacked if they are configured as presumably secure XOR PUFs. We achieved this by using a new reliability-based machine learning attack that uses a divide-and-conquer approach for attacking the XOR PUFs. This new divide-and-conquer approach results in only a linear increase in needed number of challenge and responses for increasing numbers of XORs. This is in stark contrast to the state-of-the-art machine learning attacks on XOR PUFs that are shown to have an exponential increase in challenge and responses.

Hence, it is now possible to attack XOR PUF constructs that were previously believed to be secure against machine learning attacks. Since XOR Arbiter PUFs are one of the most popular and promising electrical strong PUF designs, our reliability-based machine learning attack raises doubts that secure and lightweight electrical strong PUFs can be realized in practice.

Georg T. Becker

End-To-End Design of a PUF-Based Privacy Preserving Authentication Protocol

We demonstrate a prototype implementation of a provably secure protocol that supports privacy-preserving mutual authentication between a server and a constrained device. Our proposed protocol is based on a physically unclonable function (PUF) and it is optimized for resource-constrained platforms. The reported results include a full protocol analysis, the design of its building blocks, their integration into a constrained device, and finally its performance evaluation. We show how to obtain efficient implementations for each of the building blocks of the protocol, including a fuzzy extractor with a novel helper-data construction technique, a truly random number generator (TRNG), and a pseudo-random function (PRF). The prototype is implemented on a SASEBO-GII board, using the on-board SRAM as the source of entropy for the PUF and the TRNG. We present three different implementations. The first two execute on a MSP430 soft-core processor and have a security level of 64-bit and 128-bit respectively. The third uses a hardware accelerator and has 128-bit security level. To our best knowledge, this work is the first effort to describe the end-to-end design and evaluation of a privacy-preserving PUF-based authentication protocol.

Aydin Aysu, Ege Gulcan, Daisuke Moriyama, Patrick Schaumont, Moti Yung

Improved Test Pattern Generation for Hardware Trojan Detection Using Genetic Algorithm and Boolean Satisfiability

Test generation for

Hardware Trojan Horses

(HTH) detection is extremely challenging, as Trojans are designed to be triggered by very rare logic conditions at internal nodes of the circuit. In this paper, we propose a

Genetic Algorithm

(GA) based Automatic Test Pattern Generation (ATPG) technique, enhanced by automated solution to an associated

Boolean Satisfiability

problem. The main insight is that given a specific internal trigger condition, it is not possible to attack an arbitrary node (payload) of the circuit, as the effect of the induced logic malfunction by the HTH might not get propagated to the output. Based on this observation, a fault simulation based framework has been proposed, which enumerates the feasible payload nodes for a specific triggering condition. Subsequently, a compact set of test vectors is selected based on their ability to detect the logic malfunction at the feasible payload nodes, thus increasing their effectiveness. Test vectors generated by the proposed scheme were found to achieve higher detection coverage over large population of HTH in ISCAS benchmark circuits, compared to a previously proposed logic testing based Trojan detection technique.

Sayandeep Saha, Rajat Subhra Chakraborty, Srinivasa Shashank Nuthakki, Anshul, Debdeep Mukhopadhyay

Side-Channel Attacks in Practice

Frontmatter

DPA, Bitslicing and Masking at 1 GHz

We present DPA attacks on an ARM Cortex-A8 processor running at 1 GHz. This high-end processor is typically found in portable devices such as phones and tablets. In our case, the processor sits in a single board computer and runs a full-fledged Linux operating system. The targeted AES implementation is bitsliced and runs in constant time and constant flow. We show that, despite the complex hardware and software, high clock frequencies and practical measurement issues, the implementation can be broken with DPA starting from a few thousand measurements of the electromagnetic emanation of a decoupling capacitor near the processor. To harden the bitsliced implementation against DPA attacks, we mask it using principles of hardware gate-level masking. We evaluate the security of our masked implementation against first-order and second-order attacks. Our experiments show that successful attacks require roughly two orders of magnitude more measurements.

Josep Balasch, Benedikt Gierlichs, Oscar Reparaz, Ingrid Verbauwhede

SoC It to EM: ElectroMagnetic Side-Channel Attacks on a Complex System-on-Chip

Increased complexity in modern embedded systems has presented various important challenges with regard to side-channel attacks. In particular, it is common to deploy SoC-based target devices with high clock frequencies in security-critical scenarios; understanding how such features align with techniques more often deployed against simpler devices is vital from both destructive (i.e., attack) and constructive (i.e., evaluation and/or countermeasure) perspectives. In this paper, we investigate electromagnetic-based leakage from three different means of executing cryptographic workloads (including the general purpose ARM core, an on-chip co-processor, and the NEON core) on the AM335x SoC. Our conclusion is that addressing challenges of the type above

is

feasible, and that key recovery attacks can be conducted with modest resources.

J. Longo, E. De Mulder, D. Page, M. Tunstall

Finding the AES Bits in the Haystack: Reverse Engineering and SCA Using Voltage Contrast

In this paper, we demonstrate how the Scanning Electron Microscope (SEM) becomes a powerful tool for Side Channel Analysis (SCA) and Hardware Reverse Engineering. We locate the AES hardware circuit of a XMEGA microprocessor with Capacitive-Coupled Voltage Contrast (CCVC) images and use them in a powerful Voltage Contrast Side Channel Analysis (VCSCA). This enables an attacker to locate AES bit-wires in the top metal-layer and thus, to recover valuable netlist information. An attacker gets a valuable entry-point to look for weaknesses or Intellectual Property (IP) in the AES circuit. Additionally we show the great potential of the VCSCA in a non-invasive Side Channel Analysis for Reverse Engineering (SCARE) approach. Finally, we recover the full key of the AES hardware-engine in a practical template-based VCSCA and a no-plaintext, no-ciphertext and no-key Simple Side Channel Analysis (SSCA). We show that future VCSCA attacks present a big hardware security-risk that IC vendors need to consider.

Christian Kison, Jürgen Frinken, Christof Paar

Lattice-Based Implementations

Frontmatter

Efficient Ring-LWE Encryption on 8-Bit AVR Processors

Public-key cryptography based on the “ring-variant” of the Learning with Errors (ring-LWE) problem is both efficient and believed to remain secure in a post-quantum world. In this paper, we introduce a carefully-optimized implementation of a ring-LWE encryption scheme for 8-bit AVR processors like the ATxmega128. Our research contributions include several optimizations for the Number Theoretic Transform (NTT) used for polynomial multiplication. More concretely, we describe the Move-and-Add (MA) and the Shift-Add-Multiply-Subtract-Subtract (SAMS2) technique to speed up the performance-critical multiplication and modular reduction of coefficients, respectively. We take advantage of incompletely-reduced intermediate results to minimize the total number of reduction operations and use a special coefficient-storage method to decrease the RAM footprint of NTT multiplications. In addition, we propose a byte-wise scanning strategy to improve the performance of a discrete Gaussian sampler based on the Knuth-Yao random walk algorithm. For medium-term security, our ring-LWE implementation needs 590 k, 672 k, and 276 k clock cycles for key-generation, encryption, and decryption, respectively. On the other hand, for long-term security, the execution time of key-generation, encryption, and decryption amount to 2.2 M, 2.6 M, and 686 k cycles, respectively. These results set new speed records for ring-LWE encryption on an 8-bit processor and outperform related RSA and ECC implementations by an order of magnitude.

Zhe Liu, Hwajeong Seo, Sujoy Sinha Roy, Johann Großschädl, Howon Kim, Ingrid Verbauwhede

A Masked Ring-LWE Implementation

Lattice-based cryptography has been proposed as a postquantum public-key cryptosystem. In this paper, we present a masked ring-LWE decryption implementation resistant to first-order side-channel attacks. Our solution has the peculiarity that the entire computation is performed in the masked domain. This is achieved thanks to a new, bespoke masked decoder implementation. The output of the ring-LWE decryption are Boolean shares suitable for derivation of a symmetric key. We have implemented a hardware architecture of the masked ring-LWE processor on a Virtex-II FPGA, and have performed side channel analysis to confirm the soundness of our approach. The area of the protected architecture is around 2000 LUTs, a

$$20\,\%$$

increase with respect to the unprotected architecture. The protected implementation takes 7478 cycles to compute, which is only a factor

$$\times 2.6$$

larger than the unprotected implementation.

Oscar Reparaz, Sujoy Sinha Roy, Frederik Vercauteren, Ingrid Verbauwhede

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

Processing Techniques in Side-Channel Analysis

Frontmatter

Robust Profiling for DPA-Style Attacks

Less is More

Blind Source Separation from Single Measurements Using Singular Spectrum Analysis

Cryptographic Hardware Implementations

Frontmatter

Highly Efficient $$GF(2^8)$$ G F ( 2 8 ) Inversion Circuit Based on Redundant GF Arithmetic and Its Application to AES Design

NaCl’s Crypto_box in Hardware

Lightweight Coprocessor for Koblitz Curves: 283-Bit ECC Including Scalar Conversion with only 4300 Gates

Single Base Modular Multiplication for Efficient Hardware RNS Implementations of ECC

Homomorphic Encryption in Hardware

Frontmatter

Accelerating Homomorphic Evaluation on Reconfigurable Hardware

Modular Hardware Architecture for Somewhat Homomorphic Function Evaluation

Accelerating LTV Based Homomorphic Encryption in Reconfigurable Hardware

Side-Channel Attacks on Public Key Cryptography

Frontmatter

Stealing Keys from PCs Using a Radio: Cheap Electromagnetic Attacks on Windowed Exponentiation

Exclusive Exponent Blinding May Not Suffice to Prevent Timing Attacks on RSA

Who Watches the Watchmen?: Utilizing Performance Monitors for Compromising Keys of RSA on Intel Platforms

Cipher Design and Cryptanalysis

Frontmatter

Improved Cryptanalysis of the DECT Standard Cipher

Practical Key Recovery for Discrete-Logarithm Based Authentication Schemes from Random Nonce Bits

The Simeck Family of Lightweight Block Ciphers

TriviA: A Fast and Secure Authenticated Encryption Scheme

True Random Number Generators and Entropy Estimations

Frontmatter

A Physical Approach for Stochastic Modeling of TERO-Based TRNG

Predictive Models for Min-entropy Estimation

Side-Channel Analysis and Fault Injection Attacks

Frontmatter

Improved Side-Channel Analysis of Finite-Field Multiplication

Evaluation and Improvement of Generic-Emulating DPA Attacks

Transient-Steady Effect Attack on Block Ciphers

Higher-Order Side-Channel Attacks

Frontmatter

Assessment of Hiding the Higher-Order Leakages in Hardware

Multi-variate High-Order Attacks of Shuffled Tables Recomputation

Leakage Assessment Methodology

Physically Unclonable Functions and Hardware Trojans

Frontmatter

Secure Key Generation from Biased PUFs

The Gap Between Promise and Reality: On the Insecurity of XOR Arbiter PUFs

End-To-End Design of a PUF-Based Privacy Preserving Authentication Protocol

Improved Test Pattern Generation for Hardware Trojan Detection Using Genetic Algorithm and Boolean Satisfiability

Side-Channel Attacks in Practice

Frontmatter

DPA, Bitslicing and Masking at 1 GHz

SoC It to EM: ElectroMagnetic Side-Channel Attacks on a Complex System-on-Chip

Finding the AES Bits in the Haystack: Reverse Engineering and SCA Using Voltage Contrast

Lattice-Based Implementations

Frontmatter

Efficient Ring-LWE Encryption on 8-Bit AVR Processors

A Masked Ring-LWE Implementation

Backmatter

Premium Partner