Intrusive Attacks and Countermeasures

3D Hardware Canaries

3D integration is a promising advanced manufacturing process offering a variety of new hardware security protection opportunities. This paper presents a way of securing 3D ICs using Hamiltonian paths as hardware integrity verification sensors. As 3D integration consists in the stacking of many metal layers, one can consider surrounding a security-sensitive circuit part by a wire cage.

After exploring and comparing different cage construction strategies (and reporting preliminary implementation results on silicon), we introduce a ”hardware canary”. The canary is a spatially distributed chain of functions

F

i

positioned at the vertices of a 3D cage surrounding a protected circuit. A correct answer (

F

n

∘ … ∘

F

1

)(

m

) to a challenge

m

attests the canary’s integrity.

Sébastien Briais, Stéphane Caron, Jean-Michel Cioranesco, Jean-Luc Danger, Sylvain Guilley, Jacques-Henri Jourdan, Arthur Milchior, David Naccache, Thibault Porteboeuf

Breakthrough Silicon Scanning Discovers Backdoor in Military Chip

This paper is a short summary of the first real world detection of a backdoor in a military grade FPGA. Using an innovative patented technique we were able to detect and analyse in the first documented case of its kind, a backdoor inserted into the Actel/Microsemi ProASIC3 chips for accessing FPGA configuration. The backdoor was found amongst additional JTAG functionality and exists on the silicon itself, it was not present in any firmware loaded onto the chip. Using Pipeline Emission Analysis (PEA), our pioneered technique, we were able to extract the secret key to activate the backdoor, as well as other security keys such as the AES and the Passkey. This way an attacker can extract all the configuration data from the chip, reprogram crypto and access keys, modify low-level silicon features, access unencrypted configuration bitstream or permanently damage the device. Clearly this means the device is wide open to intellectual property (IP) theft, fraud, re-programming as well as reverse engineering of the design which allows the introduction of a new backdoor or Trojan. Most concerning, it is not possible to patch the backdoor in chips already deployed, meaning those using this family of chips have to accept the fact they can be easily compromised or will have to be physically replaced after a redesign of the silicon itself.

Sergei Skorobogatov, Christopher Woods

Simple Photonic Emission Analysis of AES

Photonic Side Channel Analysis for the Rest of Us

This work presents a novel low-cost optoelectronic setup for time- and spatially resolved analysis of photonic emissions and a corresponding methodology, Simple Photonic Emission Analysis (SPEA). Observing the backside of ICs, the system captures extremly weak photoemissions from switching transistors and relates them to program running in the chip. SPEA utilizes both spatial and temporal information about these emissions to perform side channel analysis of ICs. We successfully performed SPEA of a proof-of-concept AES implementation and were able to recover the full AES secret key by monitoring accesses to the S-Box. This attack directly exploits the side channel leakage of a single transistor and requires no additional data processing. The system costs and the necessary time for an attack are comparable to power analysis techniques. The presented approach significantly reduces the amount of effort required to perform attacks based on photonic emission analysis and allows AES key recovery in a relevant amount of time.

Alexander Schlösser, Dmitry Nedospasov, Juliane Krämer, Susanna Orlic, Jean-Pierre Seifert

Masking

Compiler Assisted Masking

Differential Power Analysis (DPA) attacks find a statistical correlation between the power consumption of a cryptographic device and intermediate values within the computation. Randomization via (Boolean) masking of intermediate values breaks this statistical dependence and thus prevents such attacks (at least up to a certain order). Especially for software implementations, (first-order) masking schemes are popular in academia and industry, albeit typically not as the sole countermeasure. The current practice then is to manually ‘insert’ Boolean masks: essentially software developers need to manipulate low-level assembly language to implement masking. In this paper we make a first step to automate this process, at least for first-order Boolean masking, allowing the development of compilers capable of protecting programs against DPA.

Andrew Moss, Elisabeth Oswald, Dan Page, Michael Tunstall

Threshold Implementations of All 3 ×3 and 4 ×4 S-Boxes

Side-channel attacks have proven many hardware implementations of cryptographic algorithms to be vulnerable. A recently proposed masking method, based on secret sharing and multi-party computation methods, introduces a set of sufficient requirements for implementations to be provably resistant against first-order DPA with minimal assumptions on the hardware. The original paper doesn’t describe how to construct the Boolean functions that are to be used in the implementation. In this paper, we derive the functions for all invertible 3 ×3, 4 ×4 S-boxes and the 6 ×4 DES S-boxes. Our methods and observations can also be used to accelerate the search for sharings of larger (e.g. 8 ×8) S-boxes. Finally, we investigate the cost of such protection.

Begül Bilgin, Svetla Nikova, Ventzislav Nikov, Vincent Rijmen, Georg Stütz

How Far Should Theory Be from Practice?

Evaluation of a Countermeasure

New countermeasures aiming at protecting against power analysis attacks are often proposed proving the security of the scheme given a specific leakage assumption. Besides the classical power models like Hamming weight or Hamming distance, newer schemes also focus on other dynamic power consumption like the one caused by glitches in the combinational circuits. The question arises if with the increasing downscale in process technology and the larger role of static leakage or other harder to model leakages, the pure theoretical proof of a countermeasure’s security is still good practice. As a case study we take a new large ROM-based masking countermeasure recently presented at CT-RSA 2012. We evaluate the security of the scheme both under the leakage assumptions given in the original article and using a more real-world approach utilizing collision attacks. We can demonstrate that while the new construction methods of the schemes provide a higher security given the assumed leakage model, the security gain in practice is only marginal compared to the conventional large ROM scheme. This highlights the needs for a closer collaboration of the different disciplines when proposing new countermeasures to provide better security statements covering both the theoretical reasoning and the practical evaluations.

Amir Moradi, Oliver Mischke

Efficient and Provably Secure Methods for Switching from Arithmetic to Boolean Masking

A large number of secret key cryptographic algorithms combine Boolean and arithmetic instructions. To protect such algorithms against first order side channel analysis, it is necessary to perform conversions between Boolean masking and arithmetic masking. Louis Goubin proposed in [5] an efficient method to convert from Boolean to arithmetic masking. However the conversion method he also proposed in [5] to switch from arithmetic to Boolean is less efficient and could be a bottleneck in some implementations. Two faster methods were proposed in [2] and [9], both using precomputed tables. We show in this paper that the algorithm in [2] is bugged, and propose an efficient correction. Then, we propose an alternative to the algorithm in [9] with a valuable timing/ memory tradeoff. This new method offers better security in practice and is well adapted for 8-bit architectures in terms of time performance (3.3 times faster than Goubin’s algorithm for one single conversion).

Blandine Debraize

Improved Fault Attacks and Side Channel Analysis

A Differential Fault Attack on the Grain Family of Stream Ciphers

In this paper we study a differential fault attack against the Grain family of stream ciphers. The attack works due to certain properties of the Boolean functions and corresponding choices of the taps from the LFSR. The existing works, by Berzati et al. (2009) and Karmakar et al. (2011), are applicable only on Grain-128 exploiting certain properties of the combining Boolean function

h

. That idea could not easily be extended to the corresponding Boolean function used in Grain v1. Here we show that the differential fault attack can indeed be efficiently mounted for the Boolean function used in Grain v1. In this case we exploit the idea that there exists certain suitable

α

such that

$h(\mathbf{x}) + h({\mathbf x} + \mathbf{\alpha})$

is linear. In our technique, we present methods to identify the fault locations and then construct set of linear equations to obtain the contents of the LFSR and the NFSR. As a countermeasure to such fault attack, we provide exact design criteria for Boolean functions to be used in Grain like structure.

Subhadeep Banik, Subhamoy Maitra, Santanu Sarkar

Algebraic Side-Channel Attacks Beyond the Hamming Weight Leakage Model

Algebraic side-channel attacks (ASCA) are a method of cryptanalysis which allow performing key recoveries with very low data complexity. In an ASCA, the side-channel leaks of a device under test (DUT) are represented as a system of equations, and a machine solver is used to find a key which satisfies these equations. A primary limitation of the ASCA method is the way it tolerates errors. If the correct key is excluded from the system of equations due to noise in the measurements, the attack will fail. On the other hand, if the DUT is described in a more robust manner to better tolerate errors, the loss of information may make computation time intractable. In this paper, we first show how this robustness-information tradeoff can be simplified by using an optimizer, which exploits the probability data output by a side-channel decoder, instead of a standard SAT solver. For this purpose, we describe a way of representing the leak equations as vectors of aposteriori probabilities, enabling a natural integration of template attacks and ASCA. Next, we put forward the applicability of ASCA against devices which do not conform to simple leakage models (e.g. based on the Hamming weight of the manipulated data). We finally report on various experiments that illustrate the strengths and weaknesses of standard and optimizing solvers in various settings, hence demonstrating the versatility of ASCA.

Yossef Oren, Mathieu Renauld, François-Xavier Standaert, Avishai Wool

Selecting Time Samples for Multivariate DPA Attacks

Masking on the algorithm level, i.e. concealing all sensitive intermediate values with random data, is a popular countermeasure against DPA attacks. A properly implemented masking scheme forces an attacker to apply a higher-order DPA attack. Such attacks are known to require a number of traces growing exponentially in the attack order, and computational power growing combinatorially in the number of time samples that have to be exploited jointly. We present a novel technique to identify such tuples of time samples before key recovery, in black-box conditions and using only known inputs (or outputs). Attempting key recovery only once the tuples have been identified can reduce the computational complexity of the overall attack substantially, e.g. from months to days. Experimental results based on power traces of a masked software implementation of the AES confirm the effectiveness of our method and show exemplary speed-ups.

Oscar Reparaz, Benedikt Gierlichs, Ingrid Verbauwhede

Unified and Optimized Linear Collision Attacks and Their Application in a Non-profiled Setting

Side-channel collision attacks are one of the most investigated techniques allowing the combination of mathematical and physical cryptanalysis. In this paper, we discuss their relevance in the security evaluation of leaking devices with two main contributions. On the one hand, we suggest that the exploitation of linear collisions in block ciphers can be naturally re-written as a Low Density Parity Check Code decoding problem. By combining this re-writing with a Bayesian extension of the collision detection techniques, we succeed in improving the efficiency and error tolerance of previously introduced attacks. On the other hand, we provide various experiments in order to discuss the practicality of such attacks compared to standard DPA. Our results exhibit that collision attacks are less efficient in classical implementation contexts, e.g. 8-bit microcontrollers leaking according to a linear power consumption model. We also observe that the detection of collisions in software devices may be difficult in the case of optimized implementations, because of less regular assembly codes. Interestingly, the soft decoding approach is particularly useful in these more challenging scenarios. Finally, we show that there exist (theoretical) contexts in which collision attacks succeed in exploiting leakages whereas all other non-profiled side-channel attacks fail.

Benoît Gérard, François-Xavier Standaert

Leakage Resiliency and Security Analysis

Towards Super-Exponential Side-Channel Security with Efficient Leakage-Resilient PRFs

Leakage-resilient constructions have attracted significant attention over the last couple of years. In practice, pseudorandom functions are among the most important such primitives, because they are stateless and do not require a secure initialization as, e.g. stream ciphers. However, their deployment in actual applications is still limited by security and efficiency concerns. This paper contributes to solve these issues in two directions. On the one hand, we highlight that the condition of bounded data complexity, that is guaranteed by previous leakage-resilient constructions, may not be enough to obtain practical security. We show experimentally that, if implemented in an 8-bit microcontroller, such constructions can actually be broken. On the other hand, we present tweaks for tree-based leakage-resilient PRFs that improve their efficiency and their security, by taking advantage of parallel implementations. Our security analyses are based on worst-case attacks in a noise-free setting and suggest that under reasonable assumptions, the side-channel resistance of our construction grows super-exponentially with a security parameter that corresponds to the degree of parallelism of the implementation. In addition, it exhibits that standard DPA attacks are not the most relevant tool for evaluating such leakage-resilient constructions and may lead to overestimated security. As a consequence, we investigate more sophisticated tools based on lattice reduction, which turn out to be powerful in the physical cryptanalysis of these primitives. Eventually, we put forward that the AES is not perfectly suited for integration in a leakage-resilient design. This observation raises interesting challenges for developing block ciphers with better properties regarding leakage-resilience.

Marcel Medwed, François-Xavier Standaert, Antoine Joux

Practical Leakage-Resilient Symmetric Cryptography

Leakage resilient cryptography attempts to incorporate side-channel leakage into the black-box security model and designs cryptographic schemes that are provably secure within it. Informally, a scheme is

leakage-resilient

if it remains secure even if an adversary learns a bounded amount of arbitrary information about the schemes internal state. Unfortunately, most leakage resilient schemes are unnecessarily complicated in order to achieve strong provable security guarantees. As advocated by Yu et al. [CCS’10], this mostly is an artefact of the security proof and in practice much simpler construction may already suffice to protect against

realistic

side-channel attacks. In this paper, we show that indeed for simpler constructions leakage-resilience can be obtained when we aim for relaxed security notions where the leakage-functions and/or the inputs to the primitive are chosen

non-adaptively

. For example, we show that a three round Feistel network instantiated with a leakage resilient PRF yields a leakage resilient PRP if the inputs are chosen non-adaptively (This complements the result of Dodis and Pietrzak [CRYPTO’10] who show that if a adaptive queries are allowed, a superlogarithmic number of rounds is necessary.) We also show that a minor variation of the classical GGM construction gives a leakage resilient PRF if both, the leakage-function and the inputs, are chosen non-adaptively.

Sebastian Faust, Krzysztof Pietrzak, Joachim Schipper

A Statistical Model for DPA with Novel Algorithmic Confusion Analysis

Side-channel attacks (SCAs) exploit weakness in the physical implementation of cryptographic algorithms, and have emerged as a realistic threat to many critical embedded systems. However, no theoretical model for the widely used differential power analysis (DPA) has revealed exactly what the success rate of DPA depends on and how. This paper proposes a statistical model for DPA that takes characteristics of both the physical implementation and cryptographic algorithm into consideration. Our model establishes a quantitative relation between the success rate of DPA and a cryptographic system. The side-channel characteristic of the physical implementation is modeled as the ratio between the difference-of-means power and the standard deviation of power distribution. The side-channel property of the cryptographic algorithm is extracted by a novel algorithmic confusion analysis. Experimental results on DES and AES verify this model and demonstrate the effectiveness of algorithmic confusion analysis. We expect the model to be extendable to other SCAs, and provide valuable guidelines for truly SCA-resilient system design and implementation.

Yunsi Fei, Qiasi Luo, A. Adam Ding

Physically Unclonable Functions

Practical Security Analysis of PUF-Based Two-Player Protocols

In recent years, PUF-based schemes have not only been suggested for the basic tasks of tamper sensitive key storage or the identification of hardware systems, but also for more complex protocols like oblivious transfer (OT) or bit commitment (BC), both of which possess broad and diverse applications. In this paper, we continue this line of research. We first present an attack on two recent OT- and BC-protocols which have been introduced at CRYPTO 2011 by Brzuska et al. [1,2]. The attack quadratically reduces the number of CRPs which malicious players must read out in order to cheat, and fully operates within the original communication model of [1,2]. In practice, this leads to insecure protocols when electrical PUFs with a medium challenge-length are used (e.g., 64 bits), or whenever optical PUFs are employed. These two PUF types are currently among the most popular designs. Secondly, we discuss countermeasures against the attack, and show that interactive hashing is suited to enhance the security of PUF-based OT and BC, albeit at the price of an increased round complexity.

Ulrich Rührmair, Marten van Dijk

Soft Decision Error Correction for Compact Memory-Based PUFs Using a Single Enrollment

Secure storage of cryptographic keys in hardware is an essential building block for high security applications. It has been demonstrated that Physically Unclonable Functions (PUFs) based on uninitialized SRAM are an effective way to securely store a key based on the unique physical characteristics of an Integrated Circuit (IC). The start-up state of an SRAM memory is unpredictable but not truly random as well as noisy, hence privacy amplification techniques and a Helper Data Algorithm (HDA) are required in order to recover the correct value of a full entropy secret key. At the core of an HDA are error correcting techniques. The best known method to recover a full entropy 128-bit key requires 4700 SRAM cells. Earlier work by Maes et al. has reduced the number of SRAM cells to 1536 by using soft decision decoding; however, this method requires multiple measurements (and thus also power resets) during the storage of a key, which will be shown to be an unacceptable overhead for many applications. This article demonstrates how soft decision decoding with only a single measurement during storage can reduce the required number of SRAM cells to 3900 (a 17% reduction) without increasing the size of en-/decoder. The number of SRAM cells can even be reduced to 2900 (a 38% reduction). This does increase cost of the decoder, but depending on design requirements it can be shown to be worthwhile. Therefore, it is possible to securely store a 128-bit key at a very low overhead in an IC or FPGA.

Vincent van der Leest, Bart Preneel, Erik van der Sluis

PUFs: Myth, Fact or Busted? A Security Evaluation of Physically Unclonable Functions (PUFs) Cast in Silicon

Physically Unclonable Functions (PUFs) are an emerging technology and have been proposed as central building blocks in a variety of cryptographic protocols and security architectures. However, the security features of PUFs are still under investigation: Evaluation results in the literature are difficult to compare due to varying test conditions, different analysis methods and the fact that representative data sets are publicly unavailable.

In this paper, we present the first large-scale security analysis of ASIC implementations of the five most popular intrinsic electronic PUF types, including arbiter, ring oscillator, SRAM, flip-flop and latch PUFs. Our analysis is based on PUF data obtained at different operating conditions from 96 ASICs housing multiple PUF instances, which have been manufactured in TSMC 65 nm CMOS technology. In this context, we present an evaluation methodology and quantify the robustness and unpredictability properties of PUFs. Since all PUFs have been implemented in the same ASIC and analyzed with the same evaluation methodology, our results allow for the first time a fair comparison of their properties.

Stefan Katzenbeisser, Ünal Kocabaş, Vladimir Rožić, Ahmad-Reza Sadeghi, Ingrid Verbauwhede, Christian Wachsmann

PUFKY: A Fully Functional PUF-Based Cryptographic Key Generator

We present PUFKY: a practical and modular design for a cryptographic key generator based on a Physically Unclonable Function (PUF). A fully functional reference implementation is developed and successfully evaluated on a substantial set of FPGA devices. It uses a highly optimized ring oscillator PUF (ROPUF) design, producing responses with up to 99% entropy. A very high key reliability is guaranteed by a syndrome construction secure sketch using an efficient and extremely low-overhead BCH decoder. This first complete implementation of a PUF-based key generator, including a PUF, a BCH decoder and a cryptographic entropy accumulator, utilizes merely 17% (1162slices) of the available resources on a low-end FPGA, of which 82% are occupied by the ROPUF and only 18% by the key generation logic. PUFKY is able to produce a cryptographically secure 128-bit key with a failure rate < 10

− 9

in 5.62ms. The design’s modularity allows for rapid and scalable adaptations for other PUF implementations or for alternative key requirements. The presented PUFKY core is immediately deployable in an embedded system, e.g. by connecting it to an embedded microcontroller through a convenient bus interface.

Roel Maes, Anthony Van Herrewege, Ingrid Verbauwhede

Efficient Implementations

NEON Crypto

NEON is a vector instruction set included in a large fraction of new ARM-based tablets and smartphones. This paper shows that NEON supports high-security cryptography at surprisingly high speeds; normally data arrives at lower speeds, giving the CPU time to handle tasks other than cryptography. In particular, this paper explains how to use a single 800MHz Cortex A8 core to compute the existing NaCl suite of high-security cryptographic primitives at the following speeds: 5.60 cycles per byte (1.14 Gbps) to encrypt using a shared secret key, 2.30 cycles per byte (2.78 Gbps) to authenticate using a shared secret key, 527102 cycles (1517/second) to compute a shared secret key for a new public key, 624846 cycles (1280/second) to verify a signature, and 244655 cycles (3269/second) to sign a message. These speeds make no use of secret branches and no use of secret memory addresses.

Daniel J. Bernstein, Peter Schwabe

Towards One Cycle per Bit Asymmetric Encryption: Code-Based Cryptography on Reconfigurable Hardware

Most advanced security systems rely on public-key schemes based either on the factorization or the discrete logarithm problem. Since both problems are known to be closely related, a major breakthrough in cryptanalysis tackling one of those problems could render a large set of cryptosystems completely useless.

Code-based public-key schemes are based on the alternative security assumption that decoding generic linear binary codes is NP-complete. In the past, most researchers focused on the McEliece cryptosystem, neglecting the fact that the scheme by Niederreiter has some important advantages. Smaller keys, more practical plain and ciphertext sizes and less computations. In this work we describe a novel FPGA implementation of the Niederreiter scheme, showing that its advantages can result a very efficient design for an asymmetric cryptosystem that can encrypt more than 1.5 million plaintexts per seconds on a Xilinx Virtex-6 FPGA, outperforming all other popular public key cryptosystems by far.

Stefan Heyse, Tim Güneysu

Solving Quadratic Equations with XL on Parallel Architectures

Solving a system of multivariate quadratic equations (MQ) is an NP-complete problem whose complexity estimates are relevant to many cryptographic scenarios. In some cases it is required in the best known attack; sometimes it is a generic attack (such as for the multivariate PKCs), and sometimes it determines a provable level of security (such as for the QUAD stream ciphers).

Under reasonable assumptions, the best way to solve generic MQ systems is the XL algorithm implemented with a sparse matrix solver such as Wiedemann’s algorithm. Knowing how much time an implementation of this attack requires gives us a good idea of how future cryptosystems related to MQ can be broken, similar to how implementations of the General Number Field Sieve that factors smaller RSA numbers give us more insight into the security of actual RSA-based cryptosystems.

This paper describes such an implementation of XL using the block Wiedemann algorithm. In 5 days we are able to solve a system with 32 variables and 64 equations over

$\mathbb{F}_{16}$

(a computation of about 2

60.3

bit operations) on a small cluster of 8 nodes, with 8 CPU cores and 36 GB of RAM in each node. We do not expect system solvers of the F

4

/F

5

family to accomplish this due to their much higher memory demand. Our software also offers implementations for

$\mathbb{F}_{2}$

and

$\mathbb{F}_{31}$

and can be easily adapted to other small fields. More importantly, it scales nicely for small clusters, NUMA machines, and a combination of both.

Chen-Mou Cheng, Tung Chou, Ruben Niederhagen, Bo-Yin Yang

Efficient Implementations of MQPKS on Constrained Devices

Multivariate Quadratic Public Key Schemes (MQPKS) attracted the attention of researchers in the last decades for two reasons. First they are thought to resist attacks by quantum computers and second, most of the schemes were broken. The latter may be the reason why implementations are rare. This work investigates one of the most promising member of MQPKS and its variants, namely UOV, Rainbow and enTTS. UOV resisted all kinds of attacks for 13 years and can be considered one of the best examined MQPKS. We describe implementations of UOV, Rainbow and enTTS on an 8-bit microcontroller. To address the problem of large keys, we used several optimizations and also implemented the 0/1-UOV scheme introduced at CHES 2011. To achieve a practically usable security level on the selected device, all recent attacks are summarized and parameters for standard security levels are given. To allow judgement of scaling, the schemes are implemented for the most common security levels in embedded systems 2

64

, 2

80

and 2

128

bits symmetric security. This allows for the first time a direct comparison of the four schemes because they are implemented for exactly the same security levels on the same platform and also by the same developer.

Peter Czypek, Stefan Heyse, Enrico Thomae

Lightweight Cryptography

Towards Green Cryptography: A Comparison of Lightweight Ciphers from the Energy Viewpoint

We provide a comprehensive evaluation of several lightweight block ciphers with respect to various hardware performance metrics, with a particular focus on the energy cost. This case study serves as a background for discussing general issues related to the relative nature of hardware implementations comparisons. We also use it to extract intuitive observations for new algorithm designs. Implementation results show that the most significant differences between lightweight ciphers are observed when considering both encryption and decryption architectures, and the impact of key scheduling algorithms. Yet, these differences are moderated when looking at their amplitude, and comparing them with the impact of physical parameters tuning, e.g. frequency / voltage scaling.

Stéphanie Kerckhof, François Durvaux, Cédric Hocquet, David Bol, François-Xavier Standaert

Lightweight Cryptography for the Cloud: Exploit the Power of Bitslice Implementation

This paper shows the great potential of lightweight cryptography in fast and timing-attack resistant software implementations in cloud computing by exploiting bitslice implementation. This is demonstrated by bitslice implementations of the PRESENT and Piccolo light-weight block ciphers. In particular, bitsliced PRESENT-80/128 achieves 4.73 cycles/byte and Piccolo-80 achieves 4.57 cycles/byte including data conversion on an Intel Xeon E3-1280 processor (Sandy Bridge microarchitecture). It is also expected that bitslice implementation offers resistance to side channel attacks such as cache timing attacks and cross-VM attacks in a multi-tenant cloud environment. Lightweight cryptography is not limited to constrained devices, and this work opens the way to its application in cloud computing.

Seiichi Matsuda, Shiho Moriai

Low-Latency Encryption – Is “Lightweight = Light + Wait”?

The processing time required by a cryptographic primitive implemented in hardware is an important metric for its performance but it has not received much attention in recent publications on lightweight cryptography. Nevertheless, there are important applications for cost effective low-latency encryption. As the first step in the field, this paper explores the low-latency behavior of hardware implementations of a set of block ciphers. The latency of the implementations is investigated as well as the trade-offs with other metrics such as circuit area, time-area product, power, and energy consumption. The obtained results are related back to the properties of the underlying cipher algorithm and, as it turns out, the number of rounds, their complexity, and the similarity of encryption and decryption procedures have a strong impact on the results. We provide a qualitative description and conclude with a set of recommendations for aspiring low-latency block cipher designers.

Miroslav Knežević, Ventzislav Nikov, Peter Rombouts

We Still Love RSA

Attacking RSA–CRT Signatures with Faults on Montgomery Multiplication

In this paper, we present several efficient fault attacks against implementations of RSA–CRT signatures that use modular exponentiation algorithms based on Montgomery multiplication. They apply to any padding function, including randomized paddings, and as such are the first fault attacks effective against RSA–PSS.

The new attacks work provided that a small register can be forced to either zero, or a constant value, or a value with zero high-order bits. We show that these models are quite realistic, as such faults can be achieved against many proposed hardware designs for RSA signatures.

Pierre-Alain Fouque, Nicolas Guillermin, Delphine Leresteux, Mehdi Tibouchi, Jean-Christophe Zapalowicz

Reduce-by-Feedback: Timing Resistant and DPA-Aware Modular Multiplication Plus: How to Break RSA by DPA

We (re-) introduce the Reduce-By-Feedback scheme given by Vielhaber (1987), Benaloh and Dai (1995), and Jeong and Burleson (1997).

We show, how to break RSA, when implemented with the standard version of Reduce-by-Feedback or Montgomery multiplication, by Differential Power Analysis. We then modify Reduce-by-Feedback to avoid this attack. The modification is not possible for Montgomery multiplication.

We show that both the original and the modified Reduce-by-Feedback algorithm resist timing attacks.

Furthermore, some VLSI-specific implementation details (delayed carry adder, re-use of MUX tree and logic) are provided.

Michael Vielhaber

Side Channel Attack to Actual Cryptanalysis: Breaking CRT-RSA with Low Weight Decryption Exponents

Towards the cold boot attack (a kind of side channel attack), the problems of reconstructing RSA parameters when (i) certain bits are unknown (Heninger and Shacham, Crypto 2009) and (ii) the bits are available but with some error probability (Henecka, May and Meurer, Crypto 2010) have been considered very recently. In this paper we exploit the error correction heuristic proposed by Henecka et al to show that CRT-RSA schemes having low Hamming weight decryption exponents are insecure given small encryption exponents (e.g.,

e

= 2

16

+ 1). In particular, we show that the CRT-RSA schemes presented by Lim and Lee (SAC 1996) and Galbraith, Heneghan and McKee (ACISP 2005) with low weight decryption exponents can be broken in a few minutes in certain cases. Further, the scheme of Maitra and Sarkar (CT-RSA 2010), where the decryption exponents are not of low weight but they have large low weight factors, can also be cryptanalysed. We also identify a few modifications of the error correction strategy that provides significantly improved experimental outcome towards the cold boot attack.

Santanu Sarkar, Subhamoy Maitra

Hardware Implementations

Pushing the Limits of High-Speed GF(2 m ) Elliptic Curve Scalar Multiplication on FPGAs

In this paper we present an FPGA implementation of a high-speed elliptic curve scalar multiplier for binary finite fields. High speeds are achieved by boosting the operating clock frequency while at the same time reducing the number of clock cycles required to do a scalar multiplication. To increase clock frequency, the design uses optimized implementations of the underlying field primitives and a mathematically analyzed pipeline design. To reduce clock cycles, a new scheduling scheme is presented that allows overlapped processing of scalar bits. The resulting scalar multiplier is the fastest reported implementation for generic curves over binary finite fields. Additionally, the optimized primitives leads to area requirements that is significantly lesser compared to other high-speed implementations. Detailed implementation results are furnished in order to support the claims.

Chester Rebeiro, Sujoy Sinha Roy, Debdeep Mukhopadhyay

On the Design of Hardware Building Blocks for Modern Lattice-Based Encryption Schemes

We present both a hardware and a software implementation variant of the

learning with errors

(LWE) based cryptosystem presented by Lindner and Peikert. This work helps in assessing the practicality of lattice-based encryption. For the software implementation, we give a comparison between a matrix and polynomial based variant of the LWE scheme. This module includes multiplication in polynomial rings using Fast Fourier Transform (FFT). In order to implement lattice-based cryptography in an efficient way, it is crucial to apply the systems over polynomial rings. FFT speeds up multiplication in polynomial rings, which is the most critical operation in lattice-based cryptography, from quadratic to quasi-linear runtime. For the hardware variant, we show how this fundamental building block of lattice-based cryptography can be implemented and evaluated in terms of performance. A second important component for lattice-based cryptosystems is the sampling from discrete Gaussian distributions. We examine three different variants for sampling Gaussian distributed integers, namely rejection sampling, a rounding based approach, and a look-up table based approach in hardware.

Norman Göttert, Thomas Feller, Michael Schneider, Johannes Buchmann, Sorin Huss

Practical Lattice-Based Cryptography: A Signature Scheme for Embedded Systems

Nearly all of the currently used and well-tested signature schemes (e.g. RSA or DSA) are based either on the factoring assumption or the presumed intractability of the discrete logarithm problem. Further algorithmic advances on these problems may lead to the unpleasant situation that a large number of schemes have to be replaced with alternatives. In this work we present such an alternative – a signature scheme whose security is derived from the hardness of lattice problems. It is based on recent theoretical advances in lattice-based cryptography and is highly optimized for practicability and use in embedded systems. The public and secret keys are roughly 12000 and 2000 bits long, while the signature size is approximately 9000 bits for a security level of around 100 bits. The implementation results on reconfigurable hardware (Spartan/Virtex 6) are very promising and show that the scheme is scalable, has low area consumption, and even outperforms some classical schemes.

Tim Güneysu, Vadim Lyubashevsky, Thomas Pöppelmann

An Efficient Countermeasure against Correlation Power-Analysis Attacks with Randomized Montgomery Operations for DF-ECC Processor

Correlation power-analysis (CPA) attacks are a serious threat for cryptographic device because the key can be disclosed from data-dependent power consumption.

Hiding

power consumption of encryption circuit can increase the security against CPA attacks, but it results in a large overhead for cost, speed, and energy dissipation.

Masking

processed data such as randomized scalar or primary base point on elliptic curve is another approach to prevent CPA attacks. However, these methods requiring pre-computed data are not suitable for hardware implementation of real-time applications. In this paper, a new CPA countermeasure performing all field operations in a randomized Montgomery domain is proposed to eliminate the correlation between target and reference power traces. After implemented in 90-nm CMOS process, our protected 521-bit dual-field elliptic curve cryptographic (DF-ECC) processor can perform one elliptic curve scalar multiplication (ECSM) in 4.57ms over

GF

(

p

521

) and 2.77ms over

GF

(2

409

) with 3.6% area and 3.8% power overhead. Experiments from an FPGA evaluation board demonstrate that the private key of unprotected device will be revealed within 10

3

power traces, whereas the same attacks on our proposal cannot successfully extract the key value even after 10

6

measurements.

Jen-Wei Lee, Szu-Chi Chung, Hsie-Chia Chang, Chen-Yi Lee

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

Intrusive Attacks and Countermeasures

3D Hardware Canaries

Breakthrough Silicon Scanning Discovers Backdoor in Military Chip

Simple Photonic Emission Analysis of AES

Masking

Compiler Assisted Masking

Threshold Implementations of All 3 ×3 and 4 ×4 S-Boxes

How Far Should Theory Be from Practice?

Efficient and Provably Secure Methods for Switching from Arithmetic to Boolean Masking

Improved Fault Attacks and Side Channel Analysis

A Differential Fault Attack on the Grain Family of Stream Ciphers

Algebraic Side-Channel Attacks Beyond the Hamming Weight Leakage Model

Selecting Time Samples for Multivariate DPA Attacks

Unified and Optimized Linear Collision Attacks and Their Application in a Non-profiled Setting

Leakage Resiliency and Security Analysis

Towards Super-Exponential Side-Channel Security with Efficient Leakage-Resilient PRFs

Practical Leakage-Resilient Symmetric Cryptography

A Statistical Model for DPA with Novel Algorithmic Confusion Analysis

Physically Unclonable Functions

Practical Security Analysis of PUF-Based Two-Player Protocols

Soft Decision Error Correction for Compact Memory-Based PUFs Using a Single Enrollment

PUFs: Myth, Fact or Busted? A Security Evaluation of Physically Unclonable Functions (PUFs) Cast in Silicon

PUFKY: A Fully Functional PUF-Based Cryptographic Key Generator

Efficient Implementations

NEON Crypto

Towards One Cycle per Bit Asymmetric Encryption: Code-Based Cryptography on Reconfigurable Hardware

Solving Quadratic Equations with XL on Parallel Architectures

Efficient Implementations of MQPKS on Constrained Devices

Lightweight Cryptography

Towards Green Cryptography: A Comparison of Lightweight Ciphers from the Energy Viewpoint

Lightweight Cryptography for the Cloud: Exploit the Power of Bitslice Implementation

Low-Latency Encryption – Is “Lightweight = Light + Wait”?

We Still Love RSA

Attacking RSA–CRT Signatures with Faults on Montgomery Multiplication

Reduce-by-Feedback: Timing Resistant and DPA-Aware Modular Multiplication Plus: How to Break RSA by DPA

Side Channel Attack to Actual Cryptanalysis: Breaking CRT-RSA with Low Weight Decryption Exponents

Hardware Implementations

Pushing the Limits of High-Speed GF(2 m ) Elliptic Curve Scalar Multiplication on FPGAs

On the Design of Hardware Building Blocks for Modern Lattice-Based Encryption Schemes

Practical Lattice-Based Cryptography: A Signature Scheme for Embedded Systems

An Efficient Countermeasure against Correlation Power-Analysis Attacks with Randomized Montgomery Operations for DF-ECC Processor

Backmatter