Skip to main content
Top

2021 | Book

Smart Card Research and Advanced Applications

19th International Conference, CARDIS 2020, Virtual Event, November 18–19, 2020, Revised Selected Papers

insite
SEARCH

About this book

This book constitutes the proceedings of the 19th International Conference on Smart Card Research and Advanced Applications, CARDIS 2020, which took place during November 18-20, 2020. The conference was originally planned to take place in Lübeck, Germany, and changed to an online format due to the COVID-19 pandemic. The 12 full papers presented in this volume were carefully reviewed and selected from 26 submissions. They were organized in topical sections named: post-quantum cryptography; efficient implementations; and physical attacks.

Table of Contents

Frontmatter

Post-Quantum Cryptography

Frontmatter
A Constant Time Full Hardware Implementation of Streamlined NTRU Prime
Abstract
This paper presents a constant time hardware implementation of the NIST round 2 post-quantum cryptographic algorithm Streamlined NTRU Prime. We implement the entire KEM algorithm, including all steps for key generation, encapsulation and decapsulation, and all en- and decoding. We focus on optimizing the resources used, as well as applying optimization and parallelism available due to the hardware design. We show the core en- and decapsulation requires only a fraction of the total FPGA fabric resource cost, which is dominated by that of the hash function, and the en- and decoding algorithm. For the NIST Security Level 3, our implementation uses a total of 1841 slices on a Xilinx Zynq Ultrascale+ FPGA, together with 14 BRAMs and 19 DSPs. The maximum achieved frequency is 271 MHz, at which the key generation, encapsulation and decapsulation take 4808 \(\upmu \)s, 524 \(\upmu \)s and 958 \(\upmu \)s respectively. To our knowledge, this work is the first full hardware implementation where the entire algorithm is implemented.
Adrian Marotzke
Lightweight Post-quantum Key Encapsulation for 8-bit AVR Microcontrollers
Abstract
Recent progress in quantum computing has increased interest in the question of how well the existing proposals for post-quantum cryptosystems are suited to replace RSA and ECC. While some aspects of this question have already been researched in detail (e.g. the relative computational cost of pre- and post-quantum algorithms), very little is known about the RAM footprint of the proposals and what execution time they can reach when low memory consumption rather than speed is the main optimization goal. This question is particularly important in the context of the Internet of Things (IoT) since many IoT devices are extremely constrained and possess only a few kB of RAM. We aim to contribute to answering this question by exploring the software design space of the lattice-based key-encapsulation scheme ThreeBears on an 8-bit AVR microcontroller. More concretely, we provide new techniques for the optimization of the ring arithmetic of ThreeBears (which is, in essence, a 3120-bit modular multiplication) to achieve either high speed or low RAM footprint, and we analyze in detail the trade-offs between these two metrics. A low-memory implementation of BabyBear that is secure against Chosen Plaintext Attacks (CPA) needs just about 1.7 kB RAM, which is significantly below the RAM footprint of other lattice-based cryptosystems reported in the literature. Yet, the encapsulation time of this RAM-optimized BabyBear version is below 12.5 million cycles, which is less than the execution time of scalar multiplication on Curve25519. The decapsulation is more than four times faster and takes roughly 3.4 million cycles on an ATmega1284 microcontroller.
Hao Cheng, Johann Großschädl, Peter B. Rønne, Peter Y. A. Ryan
Classic McEliece Implementation with Low Memory Footprint
Abstract
The Classic McEliece cryptosystem is one of the most trusted quantum-resistant cryptographic schemes. Deploying it in practical applications, however, is challenging due to the size of its public key. In this work, we bridge this gap. We present an implementation of Classic McEliece on an ARM Cortex-M4 processor, optimized to overcome memory constraints. To this end, we present an algorithm to retrieve the public key ad-hoc. This reduces memory and storage requirements and enables the generation of larger key pairs on the device. To further improve the implementation, we perform the public key operation by streaming the key to avoid storing it as a whole. This additionally reduces the risk of denial of service attacks. Finally, we use these results to implement and run TLS on the embedded device.
Johannes Roth, Evangelos Karatsiolis, Juliane Krämer

Efficient Implementations

Frontmatter
A Fast and Compact RISC-V Accelerator for Ascon and Friends
Abstract
Ascon-\(p\) is the core building block of Ascon, the winner in the lightweight category of the CAESAR competition. With Isap, another Ascon-\(p\)-based AEAD scheme is currently competing in the 2nd round of the NIST lightweight cryptography standardization project. In contrast to Ascon, Isap focuses on providing hardening/protection against a large class of implementation attacks, such as DPA, DFA, SFA, and SIFA, entirely on mode-level. Consequently, Ascon-\(p\) can be used to realize a wide range of cryptographic computations such as authenticated encryption, hashing, pseudorandom number generation, with or without the need for implementation security, which makes it the perfect choice for lightweight cryptography on embedded devices.
In this paper, we implement Ascon-\(p\) as an instruction extension for RISC-V that is tightly coupled to the processors register file and thus does not require any dedicated registers. This single instruction allows us to realize all cryptographic computations that typically occur on embedded devices with high performance. More concretely, with Isap and Ascon’s family of modes for AEAD and hashing, we can perform cryptographic computations with a performance of about 2 cycles/byte, or about 4 cycles/byte if protection against fault attacks and power analysis is desired.
As we show, our instruction extension requires only 4.7 kGE, or about half the area of dedicated Ascon co-processor designs, and is easy to integrate into low-end embedded devices like 32-bit ARM Cortex-M or RISC-V microprocessors. Finally, we analyze the provided implementation security of Isap, when implemented using our instruction extension.
Stefan Steinegger, Robert Primas
Optimized Software Implementations for the Lightweight Encryption Scheme ForkAE
Abstract
In this work we develop optimized software implementations for ForkAE, a second round candidate in the ongoing NIST lightweight cryptography standardization process. Moreover, we analyze the performance and efficiency of different ForkAE implementations on two embedded platforms: ARM Cortex-A9 and ARM Cortex-M0.
First, we study portable ForkAE implementations. We apply a decryption optimization technique which allows us to accelerate decryption by up to 35%. Second, we go on to explore platform-specific software optimizations. In platforms where cache-timing attacks are not a risk, we present a novel table-based approach to compute the SKINNY round function. Compared to the existing portable implementations, this technique speeds up encryption and decryption by 20% and 25%, respectively. Additionally, we propose a set of platform-specific optimizations for processors with parallel hardware extensions such as ARM NEON. Without relying on parallelism provided by long messages (cf. bit-sliced implementations), we focus on the primitive-level ForkSkinny parallelism provided by ForkAE to reduce encryption and decryption latency by up to 30%. We benchmark the performance of our implementations on the ARM Cortex-M0 and ARM Cortex-A9 processors and give a comparison with the other SKINNY-based schemes in the NIST lightweight competition – SKINNY-AEAD and Romulus.
Arne Deprez, Elena Andreeva, Jose Maria Bermudo Mera, Angshuman Karmakar, Antoon Purnal
Secure and Efficient Delegation of Pairings with Online Inputs
Abstract
Delegation of pairings from a computationally weaker client to a computationally stronger server has been advocated to expand the applicability of pairing-based cryptographic protocols to computation paradigms with resource-constrained devices. Important requirements for such delegation protocols include privacy of the client’s inputs and security of the client’s output, in the sense of detecting, with high probability, any malicious server’s attempt to convince the client of an incorrect pairing result. In this paper we show that pairings with inputs only available in the online phase can be efficiently, privately and securely delegated to a single, possibly malicious, server. We present new protocols in 2 different scenarios: (1) the two pairing inputs are publicly known; (2) privacy of both pairing inputs needs to be maintained (left open in previous papers; e.g., [27]). In both cases, we improve the online-phase client’s runtime with respect to previous work. In the latter case, we show the first protocol where the client’s online-phase runtime is faster than non-delegated computation for all of the most practical known curves. In previous work, the client’s runtime was worse, especially for one of the most practical elliptic curves underlying the pairing function (i.e., BN-12).
Giovanni Di Crescenzo, Matluba Khodjaeva, Delaram Kahrobaei, Vladimir Shpilrain

Physical Attacks

Frontmatter
On the Security of Off-the-Shelf Microcontrollers: Hardware Is Not Enough
Abstract
We complete the state-of-the-art on the side-channel security of real-world devices by analysing two 32-bit microcontrollers equipped with an unprotected co-processor. Our results show that (i) the lack of understanding of their hardware architecture can be circumvented with standard detection tools – for this purpose, we combine a simple variation of the Test Vector Leakage Assessment methodology with Signal-to-Noise Ratio estimations, which enables the efficient identification of attack vectors; (ii) standard distinguishers then lead to powerful key recoveries with less than 5,000 traces; and (iii) preprocessing like the continuous wavelet transform can be useful in such a black box evaluation context.
Balazs Udvarhelyi, Antoine van Wassenhove, Olivier Bronchain, François-Xavier Standaert
A Power Side-Channel Attack on the CCA2-Secure HQC KEM
Abstract
The Hamming Quasi-Cyclic (HQC) proposal is a promising candidate in the second round of the NIST Post-Quantum Cryptography Standardization project. It features small public key sizes, precise estimation of its decryption failure rates and contrary to most of the code-based systems, its security does not rely on hiding the structure of an error-correcting code. In this paper, we propose the first power side-channel attack on the Key Encapsulation Mechanism (KEM) version of HQC. Our attack utilizes a power side-channel to build an oracle that outputs whether the BCH decoder in HQC’s decryption algorithm corrects an error for a chosen ciphertext. Based on the decoding algorithm applied in HQC, it is shown how to design queries such that the output of the oracle allows to retrieve a large part of the secret key. The remaining part of the key can then be determined by an algorithm based on linear algebra. It is shown in experiments that less than 10000 measurements are sufficient to successfully mount the attack on the HQC reference implementation running on an ARM Cortex-M4 microcontroller.
Thomas Schamberger, Julian Renner, Georg Sigl, Antonia Wachter-Zeh
How Deep Learning Helps Compromising USIM
Abstract
It is known that secret keys can be extracted from some USIM cards using Correlation Power Analysis (CPA). In this paper, we demonstrate a more advanced attack on USIMs, based on deep learning. We show that a Convolutional Neural Network (CNN) trained on one USIM can recover the key from another USIM using at most 20 traces (four traces on average). Previous CPA attacks on USIM cards required high-quality oscilloscopes for power trace acquisition, an order of magnitude more traces from the victim card, and expert-level skills from the attacker. Now the attack can be mounted with a $1000 budget and basic skills in side-channel analysis.
Martin Brisfors, Sebastian Forsmark, Elena Dubrova
Differential Analysis and Fingerprinting of ZombieLoads on Block Ciphers
Abstract
Microarchitectural Data Sampling (MDS) [16, 18] enables to observe in-flight data that has recently been loaded or stored in shared short-time buffers on a physical CPU core. In-flight data sampled from line-fill buffers (LFBs) are also known as “ZombieLoads” [16]. We present a new method that links the analysis of ZombieLoads to Differential Power Analysis (DPA) techniques and provides an alternative way to derive the secret key of block ciphers. This method compares observed ZombieLoads with predicted intermediate values that occur during cryptographic computations depending on a key hypothesis and known data. We validate this approach using an Advanced Encryption Standard (AES) software implementation. Further, we provide a novel technique of cache line fingerprinting that reduces the superposition of ZombieLoads from different cache lines in the data sets resulting from an MDS attack. Thereby, this technique is helpful to reveal static secret data such as AES round keys which is shown in practice on an AES implementation.
Till Schlüter, Kerstin Lemke-Rust
Low-Cost Body Biasing Injection (BBI) Attacks on WLCSP Devices
Abstract
Body Biasing Injection (BBI) uses a voltage applied with a physical probe onto the backside of the integrated circuit die Compared to other techniques such as electromagnetic fault injection (EMFI) or Laser Fault Injection (LFI), this technique appears less popular in academic literature based on published results. It is hypothesized being due to (1) moderate cost of equipment, and (2) effort required in device preperation.
This work demonstrates that BBI (and indeed many other backside attacks) can be trivially performed on Wafer-Level Chip-Scale Packaging (WLCSP), which inherently expose the die backside. A low-cost ($15) design for the BBI tool is introduced, and validated with faults introduced on a STM32F415OG against code flow, RSA, and some initial results on various hardware block attacks are discussed.
Colin O’Flynn
Let’s Tessellate: Tiling for Security Against Advanced Probe and Fault Adversaries
Abstract
The wire probe-and-fault models are currently the most used models to provide arguments for side-channel and fault security. However, several practical attacks are not yet covered by these models. This work extends the wire fault model to include more advanced faults such as area faults and permanent faults. Moreover, we show the tile probe-and-fault adversary model from CRYPTO 2018’s CAPA envelops the extended wire fault model along with known extensions to the probing model such as glitches, transitions, and couplings. In other words, tiled (tessellated ) designs offer security guarantees even against advanced probe and fault adversaries.
As tiled models use multi-party computation techniques, countermeasures are typically expensive for software/hardware. This work investigates a tiled countermeasure based on the ISW methodology which is shown to perform significantly better than CAPA for practical parameters.
Siemen Dhooghe, Svetla Nikova
Backmatter
Metadata
Title
Smart Card Research and Advanced Applications
Editors
Dr. Pierre-Yvan Liardet
Nele Mentens
Copyright Year
2021
Electronic ISBN
978-3-030-68487-7
Print ISBN
978-3-030-68486-0
DOI
https://doi.org/10.1007/978-3-030-68487-7

Premium Partner