Skip to main content

2016 | Buch

Cryptographic Hardware and Embedded Systems – CHES 2016

18th International Conference, Santa Barbara, CA, USA, August 17-19, 2016, Proceedings

herausgegeben von: Benedikt Gierlichs, Axel Y. Poschmann

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

This book constitutes the proceedings of the 18th International Conference on Cryptographic Hardware and Embedded Systems, CHES 2016, held in Santa Barbara, CA, USA, in August 2016.
The 30 full papers presented in this volume were carefully reviewed and selected from 148 submissions. They were organized in topical sections named: side channel analysis; automotive security; invasive attacks; side channel countermeasures; new directions; software implementations; cache attacks; physical unclonable functions; hardware implementations; and fault attacks.

Inhaltsverzeichnis

Frontmatter

Side Channel Analysis

Frontmatter
Correlated Extra-Reductions Defeat Blinded Regular Exponentiation
Abstract
Walter and Thomson (CT-RSA ’01) and Schindler (PKC ’02) have shown that extra-reductions allow to break RSA-CRT even with message blinding. Indeed, the extra-reduction probability depends on the type of operation (square, multiply, or multiply with a constant). Regular exponentiation schemes can be regarded as protections since the operation sequence does not depend on the secret.
In this article, we show that there exists a strong negative correlation between extra-reductions of two consecutive operations, provided that the first feeds the second. This allows to mount successful attacks even against blinded asymmetrical computations with a regular exponentiation algorithm, such as Square-and-Multiply Always or Montgomery Ladder. We investigate various attack strategies depending on the context—known or unknown modulus, known or unknown extra-reduction detection probability, etc.—and implement them on two devices: a single core ARM Cortex-M4 and a dual core ARM Cortex M0-M4.
Margaux Dugardin, Sylvain Guilley, Jean-Luc Danger, Zakaria Najm, Olivier Rioul
Horizontal Side-Channel Attacks and Countermeasures on the ISW Masking Scheme
Abstract
A common countermeasure against side-channel attacks consists in using the masking scheme originally introduced by Ishai, Sahai and Wagner (ISW) at Crypto 2003, and further generalized by Rivain and Prouff at CHES 2010. The countermeasure is provably secure in the probing model, and it was showed by Duc, Dziembowski and Faust at Eurocrypt 2014 that the proof can be extended to the more realistic noisy leakage model. However the extension only applies if the leakage noise \(\sigma \) increases at least linearly with the masking order \(n\), which is not necessarily possible in practice.
In this paper we investigate the security of an implementation when the previous condition is not satisfied, for example when the masking order \(n\) increases for a constant noise \(\sigma \). We exhibit two (template) horizontal side-channel attacks against the Rivain-Prouff’s secure multiplication scheme and we analyze their efficiency thanks to several simulations and experiments.
Eventually, we describe a variant of Rivain-Prouff’s multiplication that is still provably secure in the original ISW model, and also heuristically secure against our new attacks.
Alberto Battistello, Jean-Sébastien Coron, Emmanuel Prouff, Rina Zeitoun
Towards Easy Leakage Certification
Abstract
Side-channel attacks generally rely on the availability of good leakage models to extract sensitive information from cryptographic implementations. The recently introduced leakage certification tests aim to guarantee that this condition is fulfilled based on sound statistical arguments. They are important ingredients in the evaluation of leaking devices since they allow a good separation between engineering challenges (how to produce clean measurements) and cryptographic ones (how to exploit these measurements). In this paper, we propose an alternative leakage certification test that is significantly simpler to implement than the previous proposal from Eurocrypt 2014. This gain admittedly comes at the cost of a couple of heuristic (yet reasonable) assumptions on the leakage distribution. To confirm its relevance, we first show that it allows confirming previous results of leakage certification. We then put forward that it leads to additional and useful intuitions regarding the information losses caused by incorrect assumptions in leakage modeling.
François Durvaux, François-Xavier Standaert, Santos Merino Del Pozo
Simple Key Enumeration (and Rank Estimation) Using Histograms: An Integrated Approach
Abstract
The main contribution of this paper, is a new key enumeration algorithm that combines the conceptual simplicity of the rank estimation algorithm of Glowacz et al. (from FSE 2015) and the parallelizability of the enumeration algorithm of Bogdanov et al. (SAC 2015) and Martin et al. (from ASIACRYPT 2015). Our new algorithm is based on histograms. It allows obtaining simple bounds on the (small) rounding errors that it introduces and leads to straightforward parallelization. We further show that it can minimize the bandwidth of distributed key testing by selecting parameters that maximize the factorization of the lists of key candidates produced by the enumeration, which can be highly beneficial, e.g. if these tests are performed by a hardware coprocessor. We also put forward that the conceptual simplicity of our algorithm translates into efficient implementations (that slightly improve the state-of-the-art). As an additional consolidating effort, we finally describe an open source implementation of this new enumeration algorithm, combined with the FSE 2015 rank estimation one, that we make available with the paper.
Romain Poussier, François-Xavier Standaert, Vincent Grosso

Automotive Security

Frontmatter
Physical Layer Group Key Agreement for Automotive Controller Area Networks
Abstract
Distribution of cryptographic keys between devices communicating over a publicly accessible medium is an important component of secure design for networked systems. In this paper, we consider the problem of group key exchange between Electronic Control Units (ECUs) connected to the Controller Area Network (CAN) within an automobile. Typically, existing solutions map schemes defined for traditional network systems to the CAN. Our contribution is to utilize physical properties of the CAN bus to generate group keys. We demonstrate that pairwise interaction between ECUs over the CAN bus can be used to efficiently derive group keys in both authenticated and non-authenticated scenarios. We illustrate the efficiency and security properties of the proposed protocols. The scalability and security properties of our scheme are similar to multi-party extensions of Diffie-Hellman protocol, without the computational overhead of group operations.
Shalabh Jain, Jorge Guajardo
– vatiCAN – Vetted, Authenticated CAN Bus
Abstract
In recent years, several attacks have impressively demonstrated that the software running on embedded controllers in cars can be successfully exploited – often even remotely. The fact that components that were hitherto purely mechanical, such as connections to the brakes, throttle, and steering wheel, have been computerized makes digital exploits life-threatening. Because of the interconnectedness of sensors, controllers and actuators, any compromised controller can impersonate any other controller by mimicking its control messages, thus effectively depriving the driver of his control.
The fact that carmakers develop vehicles in evolutionary steps rather than as revolution, has led us to propose a backward-compatible authentication mechanism for the widely used CAN vehicle communication bus. vatiCAN allows recipients of a message to verify its authenticity via HMACs, while not changing CAN messages for legacy, non-critical components. In addition, vatiCAN detects and prevents attempts to spoof identifiers of critical components. We implemented a vatiCAN prototype and show that it incurs a CAN message latency of less than 4 ms, while giving strong guarantees against non-authentic messages.
Stefan Nürnberger, Christian Rossow

Invasive Attacks

Frontmatter
Mitigating SAT Attack on Logic Locking
Abstract
Logic locking is a technique that has been proposed to protect outsourced IC designs from piracy and counterfeiting by untrusted foundries. A locked IC preserves the correct functionality only when a correct key is provided. Recently, the security of logic locking is threatened by a new attack called SAT attack, which can decipher the correct key of most logic locking techniques within a few hours [12] even for a reasonably large number of keys. This attack iteratively solves SAT formulas which progressively eliminate the incorrect keys till the circuit unlocked. In this paper, we present a circuit block (referred to as Anti-SAT block) to thwart the SAT attack. We show that the number of SAT attack iterations to reveal the correct key in a circuit comprising an Anti-SAT block is an exponential function of the key-size thereby making the SAT attack computationally infeasible. Through our experiments, we illustrate the effectiveness of our approach to securing modern chips fabricated in untrusted foundries.
Yang Xie, Ankur Srivastava
No Place to Hide: Contactless Probing of Secret Data on FPGAs
Abstract
Field Programmable Gate Arrays (FPGAs) have been the target of different physical attacks in recent years. Many different countermeasures have already been integrated into these devices to mitigate the existing vulnerabilities. However, there has not been enough attention paid to semi-invasive attacks from the IC backside due to the following reasons. First, the conventional semi-invasive attacks from the IC backside — such as laser fault injection and photonic emission analysis — cannot be scaled down without further effort to the very latest nanoscale technologies of modern FPGAs and programmable SoCs. Second, the more advanced solutions for secure storage, such as controlled Physically Unclonable Functions (PUFs), make the conventional memory-readout techniques almost impossible. In this paper, however, novel approaches have been explored: Attacks based on Laser Voltage Probing (LVP) and its derivatives, as commonly used in Integrated Circuit (IC) debug for nanoscale low voltage technologies, are successfully launched against a 60 nanometer technology FPGA. We discuss how these attacks can be used to break modern bitstream encryption implementations. Our attacks were carried out on a Proof-of-Concept PUF-based key generation implementation. To the best of our knowledge this is the first time that LVP is used to perform an attack on secure ICs.
Heiko Lohrke, Shahin Tajik, Christian Boit, Jean-Pierre Seifert

Side Channel Countermeasures I

Frontmatter
Strong 8-bit Sboxes with Efficient Masking in Hardware
Abstract
Block ciphers are arguably the most important cryptographic primitive in practice. While their security against mathematical attacks is rather well understood, physical threats such as side-channel analysis (SCA) still pose a major challenge for their security. An effective countermeasure to thwart SCA is using a cipher representation that applies the threshold implementation (TI) concept. However, there are hardly any results available on how this concept can be adopted for block ciphers with large (i.e., 8-bit) Sboxes. In this work we provide a systematic analysis on and search for 8-bit Sbox constructions that can intrinsically feature the TI concept, while still providing high resistance against cryptanalysis. Our study includes investigations on Sboxes constructed from smaller ones using Feistel, SPN, or MISTY network structures. As a result, we present a set of new Sboxes that not only provide strong cryptographic criteria, but are also optimized for TI. We believe that our results will found an inspiring basis for further research on high-security block ciphers that intrinsically feature protection against physical attacks.
Erik Boss, Vincent Grosso, Tim Güneysu, Gregor Leander, Amir Moradi, Tobias Schneider
Masking AES with Shares in Hardware
Abstract
Masking requires splitting sensitive variables into at least \(d+1\) shares to provide security against DPA attacks at order d. To this date, this minimal number has only been deployed in software implementations of cryptographic algorithms and in the linear parts of their hardware counterparts. So far there is no hardware construction that achieves this lower bound if the function is nonlinear and the underlying logic gates can glitch. In this paper, we give practical implementations of the AES using \(d+1\) shares aiming at first- and second-order security even in the presence of glitches. To achieve this, we follow the conditions presented by Reparaz et al. at CRYPTO 2015 to allow hardware masking schemes, like Threshold Implementations, to provide theoretical higher-order security with \(d+1\) shares. The decrease in number of shares has a direct impact in the area requirements: our second-order DPA resistant core is the smallest in area so far, and its S-box is \(50\,\%\) smaller than the current smallest Threshold Implementation of the AES S-box with similar security and attacker model. We assess the security of our masked cores by practical side-channel evaluations. The security guarantees are met with 100 million traces.
Thomas De Cnudde, Oscar Reparaz, Begül Bilgin, Svetla Nikova, Ventzislav Nikov, Vincent Rijmen

New Directions

Frontmatter
Differential Computation Analysis: Hiding Your White-Box Designs is Not Enough
Abstract
Although all current scientific white-box approaches of standardized cryptographic primitives are broken, there is still a large number of companies which sell “secure” white-box products. In this paper, we present a new approach to assess the security of white-box implementations which requires neither knowledge about the look-up tables used nor any reverse engineering effort. This differential computation analysis (DCA) attack is the software counterpart of the differential power analysis attack as applied by the cryptographic hardware community.
We developed plugins to widely available dynamic binary instrumentation frameworks to produce software execution traces which contain information about the memory addresses being accessed. To illustrate its effectiveness, we show how DCA can extract the secret key from numerous publicly (non-commercial) available white-box programs implementing standardized cryptography by analyzing these traces to identify secret-key dependent correlations. This approach allows one to extract the secret key material from white-box implementations significantly faster and without specific knowledge of the white-box design in an automated manner.
Joppe W. Bos, Charles Hubain, Wil Michiels, Philippe Teuwen
Antikernel: A Decentralized Secure Hardware-Software Operating System Architecture
Abstract
The “kernel” model has been part of operating system architecture for decades, but upon closer inspection it clearly violates the principle of least required privilege. The kernel is a single entity which provides many services (memory management, interfacing to drivers, context switching, IPC) having no real relation to each other, and has the ability to observe or tamper with all state of the system. This work presents Antikernel, a novel operating system architecture consisting of both hardware and software components and designed to be fundamentally more secure than the state of the art. To make formal verification easier, and improve parallelism, the Antikernel system is highly modular and consists of many independent hardware state machines (one or more of which may be a general-purpose CPU running application or systems software) connected by a packet-switched network-on-chip (NoC). We create and verify an FPGA-based prototype of the system.
Andrew Zonenberg, Bülent Yener

Software Implementations

Frontmatter
Software Implementation of Koblitz Curves over Quadratic Fields
Abstract
In this work, we retake an old idea presented by Koblitz in his landmark paper [21], where he suggested the possibility of defining anomalous elliptic curves over the base field \(\mathbb {F}_4\). We present a careful implementation of the base and quadratic field arithmetic required for computing the scalar multiplication operation in such curves. In order to achieve a fast reduction procedure, we adopted a redundant trinomial strategy that embeds elements of the field \(\mathbb {F}_{4^{m}},\) with m a prime number, into a ring of higher order defined by an almost irreducible trinomial. We also report a number of techniques that allow us to take full advantage of the native vector instructions of high-end microprocessors. Our software library achieves the fastest timings reported for the computation of the timing-protected scalar multiplication on Koblitz curves, and competitive timings with respect to the speed records established recently in the computation of the scalar multiplication over prime fields.
Thomaz Oliveira, Julio López, Francisco Rodríguez-Henríquez
QcBits: Constant-Time Small-Key Code-Based Cryptography
Abstract
This paper introduces a constant-time implementation for a quasi-cyclic moderate-density-parity-check (QC-MDPC) code based encryption scheme. At a \(2^{80}\) security level, the software takes \(14\,679\,937\) Cortex-M4 and \(1\,560\,072\) Haswell cycles to decrypt a short message, while the previous records were \(18\,416\,012\) and \(3\,104\,624\) (non-constant-time) cycles. Such speed is achieved by combining two techniques: 1) performing each polynomial multiplication in \(\mathbb {F}_2[x]/(x^r-1)\) and \(\mathbb {Z}[x]/(x^r-1)\) using a sequence of “constant-time rotations” and 2) bitslicing.
Tung Chou
Kummer: Efficient Hyperelliptic Signatures and Key Exchange on Microcontrollers
Abstract
We describe the design and implementation of efficient signature and key-exchange schemes for the AVR ATmega and ARM Cortex M0 microcontrollers, targeting the 128-bit security level. Our algorithms are based on an efficient Montgomery ladder scalar multiplication on the Kummer surface of Gaudry and Schost’s genus-2 hyperelliptic curve, combined with the Jacobian point recovery technique of Chung, Costello, and Smith. Our results are the first to show the feasibility of software-only hyperelliptic cryptography on constrained platforms, and represent a significant improvement on the elliptic-curve state-of-the-art for both key exchange and signatures on these architectures. Notably, our key-exchange scalar-multiplication software runs in under 9520k cycles on the ATmega and under 2640k cycles on the Cortex M0, improving on the current speed records by 32 % and 75 % respectively.
Joost Renes, Peter Schwabe, Benjamin Smith, Lejla Batina

Cache Attacks

Frontmatter
Flush, Gauss, and Reload – A Cache Attack on the BLISS Lattice-Based Signature Scheme
Abstract
We present the first side-channel attack on a lattice-based signature scheme, using the Flush+Reload cache-attack. The attack is targeted at the discrete Gaussian sampler, an important step in the Bimodal Lattice Signature Schemes (BLISS). After observing only 450 signatures with a perfect side-channel, an attacker is able to extract the secret BLISS-key in less than 2 minutes, with a success probability of 0.96. Similar results are achieved in a proof-of-concept implementation using the Flush+Reload technique with less than 3500 signatures.
We show how to attack sampling from a discrete Gaussian using CDT or Bernoulli sampling by showing potential information leakage via cache memory. For both sampling methods, a strategy is given to use this additional information, finalize the attack and extract the secret key. We provide experimental evidence for the idealized perfect side-channel attacks and the Flush+Reload attack on two recent CPUs.
Leon Groot Bruinderink, Andreas Hülsing, Tanja Lange, Yuval Yarom
CacheBleed: A Timing Attack on OpenSSL Constant Time RSA
Abstract
The scatter-gather technique is a commonly implemented approach to prevent cache-based timing attacks. In this paper we show that scatter-gather is not constant time. We implement a cache timing attack against the scatter-gather implementation used in the modular exponentiation routine in OpenSSL version 1.0.2f. Our attack exploits cache-bank conflicts on the Sandy Bridge microarchitecture. We have tested the attack on an Intel Xeon E5-2430 processor. For 4096-bit RSA our attack can fully recover the private key after observing 16,000 decryptions.
Yuval Yarom, Daniel Genkin, Nadia Heninger
Cache Attacks Enable Bulk Key Recovery on the Cloud
Abstract
Cloud services keep gaining popularity despite the security concerns. While non-sensitive data is easily trusted to cloud, security critical data and applications are not. The main concern with the cloud is the shared resources like the CPU, memory and even the network adapter that provide subtle side-channels to malicious parties. We argue that these side-channels indeed leak fine grained, sensitive information and enable key recovery attacks on the cloud. Even further, as a quick scan in one of the Amazon EC2 regions shows, high percentage – 55 % – of users run outdated, leakage prone libraries leaving them vulnerable to mass surveillance.
The most commonly exploited leakage in the shared resource systems stem from the cache and the memory. High resolution and the stability of these channels allow the attacker to extract fine grained information. In this work, we employ the Prime and Probe attack to retrieve an RSA secret key from a co-located instance. To speed up the attack, we reverse engineer the cache slice selection algorithm for the Intel Xeon E5-2670 v2 that is used in our cloud instances. Finally we employ noise reduction to deduce the RSA private key from the monitored traces. By processing the noisy data we obtain the complete 2048-bit RSA key used during the decryption.
Mehmet Sinan İnci, Berk Gulmezoglu, Gorka Irazoqui, Thomas Eisenbarth, Berk Sunar

Physical Unclonable Functions

Frontmatter
Strong Machine Learning Attack Against PUFs with No Mathematical Model
Abstract
Although numerous attacks revealed the vulnerability of different PUF families to non-invasive Machine Learning (ML) attacks, the question is still open whether all PUFs might be learnable. Until now, virtually all ML attacks rely on the assumption that a mathematical model of the PUF functionality is known a priori. However, this is not always the case, and attention should be paid to this important aspect of ML attacks. This paper aims to address this issue by providing a provable framework for ML attacks against a PUF family, whose underlying mathematical model is unknown. We prove that this PUF family is inherently vulnerable to our novel PAC (Probably Approximately Correct) learning framework. We apply our ML algorithm on the Bistable Ring PUF (BR-PUF) family, which is one of the most interesting and prime examples of a PUF with an unknown mathematical model. We practically evaluate our ML algorithm through extensive experiments on BR-PUFs implemented on Field-Programmable Gate Arrays (FPGA). In line with our theoretical findings, our experimental results strongly confirm the effectiveness and applicability of our attack. This is also interesting since our complex proof heavily relies on the spectral properties of Boolean functions, which are known to hold only asymptotically. Along with this proof, we further provide the theorem that all PUFs must have some challenge bit positions, which have larger influences on the responses than other challenge bits.
Fatemeh Ganji, Shahin Tajik, Fabian Fäßler, Jean-Pierre Seifert
Efficient Fuzzy Extraction of PUF-Induced Secrets: Theory and Applications
Abstract
The device-unique response of a physically unclonable function (PUF) can serve as the root of trust in an embedded cryptographic system. Fuzzy extractors transform this noisy non-uniformly distributed secret into a stable high-entropy key. The overall efficiency thereof, typically depending on error-correction with a binary [nkd] block code, is determined by the universal and well-known \((n-k)\) bound on the min-entropy loss. We derive new considerably tighter bounds for PUF-induced distributions that suffer from, e.g., bias or spatial correlations. The bounds are easy-to-evaluate and apply to large non-trivial codes, e.g., BCH, Hamming and Reed-Muller codes. Apart from an inherent reduction in implementation footprint, the newly developed theory also facilitates the analysis of state-of-the-art error-correction methods for PUFs. As such, we debunk the reusability claim of the reverse fuzzy extractor. Moreover, we provide proper quantitative motivation for debiasing schemes, as this was missing in the original proposals.
Jeroen Delvaux, Dawu Gu, Ingrid Verbauwhede, Matthias Hiller, Meng-Day (Mandel) Yu
Run-Time Accessible DRAM PUFs in Commodity Devices
Abstract
A Physically Unclonable Function (PUF) is a unique and stable physical characteristic of a piece of hardware, which emerges due to variations in the fabrication processes. Prior works have demonstrated that PUFs are a promising cryptographic primitive to enable secure key storage, hardware-based device authentication and identification. So far, most PUF constructions require addition of new hardware or FPGA implementations for their operation. Recently, intrinsic PUFs, which can be found in commodity devices, have been investigated. Unfortunately, most of them suffer from the drawback that they can only be accessed at boot time. This paper is the first to enable the run-time access of decay-based intrinsic DRAM PUFs in commercial off-the-shelf systems, which requires no additional hardware or FPGAs. A key advantage of our PUF construction is that it can be queried during run-time of a Linux system. Furthermore, by exploiting different decay times of individual DRAM cells, the challenge-response space is increased. Finally, we introduce lightweight protocols for device authentication and secure channel establishment, that leverage the DRAM PUFs at run-time.
Wenjie Xiong, André Schaller, Nikolaos A. Anagnostopoulos, Muhammad Umair Saleem, Sebastian Gabmeyer, Stefan Katzenbeisser, Jakub Szefer

Side Channel Countermeasures II

Frontmatter
On the Multiplicative Complexity of Boolean Functions and Bitsliced Higher-Order Masking
Abstract
Higher-order masking is a widely used countermeasure to make software implementations of blockciphers achieve high security levels against side-channel attacks. Unfortunately, it often comes with a strong impact in terms of performances which may be prohibitive in some contexts. This situation has motivated the research for efficient schemes that apply higher-order masking with minimal performance overheads. The most widely used approach is based on a polynomial representation of the cipher s-box(es) allowing the application of standard higher-order masking building blocks such as the ISW scheme (Ishai-Sahai-Wagner, Crypto 2003). Recently, an alternative approach has been considered which is based on a bitslicing of the s-boxes. This approach has been shown to enjoy important efficiency benefits, but it has only been applied to specific blockciphers such as AES, PRESENT, or custom designs. In this paper, we present a generic method to find a Boolean representation of an s-box with efficient bitsliced higher-order masking. Specifically, we propose a method to construct a circuit with low multiplicative complexity. Compared to previous work on this subject, our method can be applied to any s-box of common size and not necessarily to small s-boxes. We use it to derive higher-order masked s-box implementations that achieve important performance gain compared to optimized state-of-the-art implementations.
Dahmun Goudarzi, Matthieu Rivain
Reducing the Number of Non-linear Multiplications in Masking Schemes
Abstract
In recent years, methods to securely mask S-boxes against side-channel attacks by representing them as polynomials over finite binary fields have become quite efficient. A good cost model for this is to count how many non-linear multiplications are needed. In this work we improve on the current state-of-the-art generic method published by Coron–Roy–Vivek at CHES 2014 by working over slightly larger fields than strictly needed. This leads us, for example, to evaluate DES S-boxes with only 3 non-linear multiplications and, as a result, obtain \(25\,\%\) improvement in the running time for secure software implementations of DES when using three or more shares.
On the theoretical side, we prove a logarithmic upper bound on the number of non-linear multiplications required to evaluate any d-bit S-box, when ignoring the cost of working in unreasonably large fields. This upper bound is lower than the previous lower bounds proved under the assumption of working over the field \({\mathbb F}_{2^d}\), and we show this bound to be sharp. We also achieve a way to evaluate the AES S-box using only 3 non-linear multiplications over \(\mathbb {F}_{2^{16}}\).
Jürgen Pulkus, Srinivas Vivek
Faster Evaluation of SBoxes via Common Shares
Abstract
We describe a new technique for improving the efficiency of the masking countermeasure against side-channel attacks. Our technique is based on using common shares between secret variables, in order to reduce the number of finite field multiplications. Our algorithms are proven secure in the ISW probing model with \(n \geqslant t+1\) shares against t probes. For AES, we get an equivalent of 2.8 non-linear multiplications for every SBox evaluation, instead of 4 in the Rivain-Prouff countermeasure. We obtain similar improvements for other block-ciphers. Our technique is easy to implement and performs relatively well in practice, with roughly a 20 % speed-up compared to existing algorithms.
Jean-Sébastien Coron, Aurélien Greuet, Emmanuel Prouff, Rina Zeitoun

Hardware Implementations

Frontmatter
Four on FPGA: New Hardware Speed Records for Elliptic Curve Cryptography over Large Prime Characteristic Fields
Abstract
We present fast and compact implementations of Four\(\mathbb {Q}\) (ASIACRYPT 2015) on field-programmable gate arrays (FPGAs), and demonstrate, for the first time, the high efficiency of this new elliptic curve on reconfigurable hardware. By adapting Four\(\mathbb {Q}\)’s algorithms to hardware, we design FPGA-tailored architectures that are significantly faster than any other ECC alternative over large prime characteristic fields. For example, we show that our single-core and multi-core implementations can compute at a rate of 6389 and 64730 scalar multiplications per second, respectively, on a Xilinx Zynq-7020 FPGA, which represent factor-2.5 and 2 speedups in comparison with the corresponding variants of the fastest Curve25519 implementation on the same device. These results show the potential of deploying Four\(\mathbb {Q}\) on hardware for high-performance and embedded security applications. All the presented implementations exhibit regular, constant-time execution, protecting against timing and simple side-channel attacks.
Kimmo Järvinen, Andrea Miele, Reza Azarderakhsh, Patrick Longa
A High Throughput/Gate AES Hardware Architecture by Compressing Encryption and Decryption Datapaths
— Toward Efficient CBC-Mode Implementation
Abstract
This paper proposes a highly efficient AES hardware architecture that supports both encryption and decryption for the CBC mode. Some conventional AES architectures employ pipelining techniques to enhance the throughput and efficiency. However, such pipelined architectures are frequently unfit because many practical cryptographic applications work in the CBC mode, where block-wise parallelism is not available for encryption. In this paper, we present an efficient AES encryption/decryption hardware design suitable for such block-chaining modes. In particular, new operation-reordering and register-retiming techniques allow us to unify the inversion circuits for encryption and decryption (i.e., SubBytes and InvSubBytes) without any delay overhead. A new unification technique for linear mappings further reduces both the area and critical delay in total. Our design employs a common loop architecture and can therefore efficiently perform even in the CBC mode. We also present a shared key scheduling datapath that can work on-the-fly in the proposed architecture. To the best of our knowledge, the proposed architecture has the shortest critical path delay and is the most efficient in terms of throughput per area among conventional AES encryption/decryption architectures with tower-field S-boxes. We evaluate the performance of the proposed and some conventional datapaths by logic synthesis results with the TSMC 65-nm standard-cell library and NanGate 45- and 15-nm open-cell libraries. As a result, we confirm that our proposed architecture achieves approximately 53–72 % higher efficiency (i.e., a higher bps/GE) than any other conventional counterpart.
Rei Ueno, Sumio Morioka, Naofumi Homma, Takafumi Aoki
Efficient High-Speed WPA2 Brute Force Attacks Using Scalable Low-Cost FPGA Clustering
Abstract
WPA2-Personal is widely used to protect Wi-Fi networks against illicit access. While attackers typically use GPUs to speed up the discovery of weak network passwords, attacking random passwords is considered to quickly become infeasible with increasing password length. Professional attackers may thus turn to commercial high-end FPGA-based cluster solutions to significantly increase the speed of those attacks. Well known manufacturers such as Elcomsoft have succeeded in creating world’s fastest commercial FPGA-based WPA2 password recovery system, but since they rely on high-performance FPGAs the costs of these systems are well beyond the reach of amateurs. In this paper, we present a highly optimized low-cost FPGA cluster-based WPA-2 Personal password recovery system that can not only achieve similar performance at a cost affordable by amateurs, but in comparison our implementation would also be more than 5 times as fast on the original hardware. Since the currently fastest system is not only significantly slower but proprietary as well, we believe that we are the first to present the internals of a highly optimized and fully pipelined FPGA WPA2 password recovery system. In addition, we evaluated our approach with respect to performance and power usage and compare it to GPU-based systems.
Markus Kammerstetter, Markus Muellner, Daniel Burian, Christian Kudera, Wolfgang Kastner

Fault Attacks

Frontmatter
EnCounter: On Breaking the Nonce Barrier in Differential Fault Analysis with a Case-Study on PAEQ
Abstract
This work exploits internal differentials within a cipher in the context of Differential Fault Analysis (DFA). This in turn overcomes the nonce barrier which acts as a natural counter-measure against DFA. We introduce the concept of internal differential fault analysis which requires only one faulty ciphertext. In particular, the analysis is applicable to parallelizable ciphers that use the counter-mode. As a proof of concept we develop an internal differential fault attack called https://static-content.springer.com/image/chp%3A10.1007%2F978-3-662-53140-2_28/429806_1_En_28_IEq2_HTML.gif on PAEQ which is an AES based parallelizable authenticated cipher presently in the second round of on-going CAESAR competition. The attack is able to uniquely retrieve the key of three versions of full-round PAEQ of key-sizes 64, 80 and 128 bits with complexities of about \(2^{16}\), \(2^{16}\) and \(2^{50}\) respectively. Finally, this work addresses in detail the instance of fault analysis with varying amounts of partial state information and also presents the first analysis of PAEQ.
Dhiman Saha, Dipanwita Roy Chowdhury
Curious Case of Rowhammer: Flipping Secret Exponent Bits Using Timing Analysis
Abstract
Rowhammer attacks have exposed a serious vulnerability in modern DRAM chips to induce bit flips in data which is stored in memory. In this paper, we develop a methodology to combine timing analysis to perform the hammering in a controlled manner to create bit flips in cryptographic keys which are stored in memory. The attack would require only user level privilege for Linux kernel versions before 4.0 and is unaware of the memory location of the key. An intelligent combination of timing Prime + Probe attack and row-buffer collision is shown to induce bit flip faults in a 1024 bit RSA key on modern processors using realistic number of hammering attempts. This demonstrates the feasibility of fault analysis of ciphers using purely software means on commercial x86 architectures, which to the best of our knowledge has not been reported earlier. The attack is also relevant for the newest Linux kernel in a Cross-VM environment where the VMs having root privilege are not denied to access the pagemap.
Sarani Bhattacharya, Debdeep Mukhopadhyay
A Design Methodology for Stealthy Parametric Trojans and Its Application to Bug Attacks
Abstract
Over the last decade, hardware Trojans have gained increasing attention in academia, industry and by government agencies. In order to design reliable countermeasures, it is crucial to understand how hardware Trojans can be built in practice. This is an area that has received relatively scant treatment in the literature. In this contribution, we examine how particularly stealthy Trojans can be introduced to a given target circuit. The Trojans are triggered by violating the delays of very rare combinational logic paths. These are parametric Trojans, i.e., they do not require any additional logic and are purely based on subtle manipulations on the sub-transistor level to modify the parameters of the transistors. The Trojan insertion is based on a two-phase approach. In the first phase, a SAT-based algorithm identifies rarely sensitized paths in a combinational circuit. In the second phase, a genetic algorithm smartly distributes delays for each gate to minimize the number of faults caused by random vectors.
As a case study, we apply our method to a 32-bit multiplier circuit resulting in a stealthy Trojan multiplier. This Trojan multiplier only computes faulty outputs if specific combinations of input pairs are applied to the circuit. The multiplier can be used to realize bug attacks, introduced by Biham et al. In addition to the bug attacks proposed previously, we extend this concept for the specific fault model of the path delay Trojan multiplier and show how it can be used to attack ECDH key agreement protocols.
Our method is a general approach to path delay faults. It is a versatile tool for designing stealthy Trojans for a given circuit and is not restricted to multipliers and the bug attack.
Samaneh Ghandali, Georg T. Becker, Daniel Holcomb, Christof Paar
Backmatter
Metadaten
Titel
Cryptographic Hardware and Embedded Systems – CHES 2016
herausgegeben von
Benedikt Gierlichs
Axel Y. Poschmann
Copyright-Jahr
2016
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-662-53140-2
Print ISBN
978-3-662-53139-6
DOI
https://doi.org/10.1007/978-3-662-53140-2