Skip to main content
main-content

Über dieses Buch

This book constitutes the thoroughly refereed post-conference proceedings of the 18th International Conference on Smart Card Research and Advanced Applications, CARDIS 2019, held in Prague, Czech Republic, in November 2019.

The 15 revised full papers presented in this book were carefully reviewed and selected from 31 submissions. The papers are organized in the following topical sections: system-on-a-chip security; post-quantum cryptography; side-channel analysis; microarchitectural attacks; cryptographic primitives; advances in side-channel analysis.

CARDIS has provided a space for security experts from industry and academia to exchange on security of smart cards and related applications.

Inhaltsverzeichnis

Frontmatter

System-on-a-Chip Security

Frontmatter

In-situ Extraction of Randomness from Computer Architecture Through Hardware Performance Counters

Abstract
True Random Number Generators (TRNGs) are one of the most crucial components in the design and use of cryptographic protocols and communication. Predictability of such random numbers are catastrophic and can lead to the complete collapse of security, as all the mathematical proofs are based on the entropy of the source which generates these bit patterns. The randomness in the TRNGs is hugely attributed to the inherent noise of the system, which is often derived from hardware subsystems operating in an ambiguous manner. However, most of these solutions need an add-on device to provide these randomness sources, which can lead to not only latency issues but also can be a potential target of adversaries by probing such an interface. In this paper, we address to alleviate these issues by proposing an in-situ TRNG construction, which depends on the functioning of the underlying hardware architecture. These functions are observed via the Hardware Performance Counters (HPCs) and are shown to exhibit high-quality randomness in the least significant bit positions. We provide extensive experiments to research on the choice of the HPCs, and their ability to pass the standard NIST and AIS 20/31 Tests. We also analyze a possible scenario where an adversary tries to interfere with the HPC values and show its effect on the TRNG output with respect to the NIST and AIS 20/31 Tests. Additionally, to alleviate the delay caused for accessing the HPC events and increase the throughput of the random-source, we also propose a methodology to cascade the random numbers from the HPC values with a secured hash function.
Manaar Alam, Astikey Singh, Sarani Bhattacharya, Kuheli Pratihar, Debdeep Mukhopadhyay

Optimized Threshold Implementations: Minimizing the Latency of Secure Cryptographic Accelerators

Abstract
Threshold implementations have emerged as one of the most popular masking countermeasures for hardware implementations of cryptographic primitives. In this work, we first provide a generic construction for \(d+1\) TI sharing which achieves the minimal number of output shares for any n-input Boolean function of degree \(t=n-1\) and for any d. Secondly, we demonstrate the applicability of our results on a first-order and second-order \(d+1\) low-latency PRINCE implementation.
Dušan Božilov, Miroslav Knežević, Ventzislav Nikov

Breaking the Lightweight Secure PUF: Understanding the Relation of Input Transformations and Machine Learning Resistance

Abstract
Physical Unclonable Functions (PUFs) and, in particular, strong PUFs such as the XOR Arbiter PUF have gained much research interest as an authentication mechanism for embedded systems. One of the biggest problems of strong PUFs is their vulnerability to so called machine learning attacks. In this paper, we take a closer look at one aspect of machine learning attacks that has not yet gained the needed attention: the generation of the sub-challenges in XOR Arbiter PUFs fed to the individual Arbiter PUFs. Specifically, we look at one of the most popular ways to generate sub-challenges based on a combination of permutations and XORs as it has been described for the “Lightweight Secure PUF”. Previous research suggested that using such a sub-challenge generation increases the machine learning resistance significantly.
Our contribution in the field of sub-challenge generation is three-fold: First, drastically improving attack results by Rührmair et al., we describe a novel attack that can break the Lightweight Secure PUF in time roughly equivalent to an XOR Arbiter PUF without transformation of the challenge input. Second, we give a mathematical model that gives insight into the weakness of the Lightweight Secure PUF and provides a way to study generation of sub-challenges in general. Third, we propose a new, efficient, and cost-effective way for sub-challenge generation that mitigates the attack strategy we used and outperforms the Lightweight Secure PUF in both machine learning resistance and resource overhead.
Nils Wisiol, Georg T. Becker, Marian Margraf, Tudor A. A. Soroceanu, Johannes Tobisch, Benjamin Zengin

Post-Quantum Cryptography

Frontmatter

Improving Speed of Dilithium’s Signing Procedure

Abstract
Dilithium is a round 2 candidate for digital signature schemes in NIST initiative for post-quantum cryptographic schemes. Since Dilithium is built upon the “Fiat Shamir with Aborts” framework, its signing procedure performs rejection sampling of its signatures to ensure they do not leak information about the secret key. Thus, the signing procedure is iterative in nature with a number of rejected iterations, which serve as unnecessary overheads hampering its overall performance. As a first contribution, we propose an optimization that reduces the computations in the rejected iterations through early-evaluation of the conditional checks. This allows to perform an early detection of the rejection condition and reject a given iteration as early as possible. We also incorporate a number of standard optimizations such as unrolling and inlining to further improve the speed of the signing procedure. We incorporate and evaluate our optimizations over the software implementation of Dilithium on both the Intel Core i5-4460 and ARM Cortex-M4 CPUs. As a second contribution, we identify opportunities to present a more refined evaluation of Dilithium’s signing procedure in several scenarios where pre-computations can be carried out. We also evaluate the performance of our optimizations and the memory requirements for the pre-computed intermediates in the considered scenarios. We could yield speed-ups in the range of 6% upto 35%, considering all the aforementioned scenarios, thus presenting the fastest software implementation of Dilithium till date.
Prasanna Ravi, Sourav Sen Gupta, Anupam Chattopadhyay, Shivam Bhasin

An Efficient and Provable Masked Implementation of qTESLA

Abstract
Now that the NIST’s post-quantum cryptography competition has entered in its second phase, the time has come to focus more closely on practical aspects of the candidates. While efficient implementations of the proposed schemes are somewhat included in the submission packages, certain issues like the threat of side-channel attacks are often lightly touched upon by the authors. Hence, the community is encouraged by the NIST to join the war effort to treat those peripheral, but nonetheless crucial, topics. In this paper, we study the lattice-based signature scheme qTESLA in the context of the masking countermeasure. Continuing a line of research opened by Barthe et al. at Eurocrypt 2018 with the masking of the GLP signature scheme, we extend and modify their work to mask qTESLA. Based on the work of Migliore et al. in ACNS 2019, we slightly modify the parameters to improve the masked performance while keeping the same security. The masking can be done at any order and specialized gadgets are used to get maximal efficiency at order 1. We implemented our countermeasure in the original code of the submission and performed tests at different orders to assess the feasibility of our technique.
François Gérard, Mélissa Rossi

Side-Channel Analysis

Frontmatter

Side-Channel Attacks on Blinded Scalar Multiplications Revisited

Abstract
In a series of recent articles (from 2011 to 2017), Schindler et al. show that exponent/scalar blinding is not as effective a countermeasure as expected against side-channel attacks targeting RSA modular exponentiation and ECC scalar multiplication. Precisely, these works demonstrate that if an attacker is able to retrieve many randomizations of the same secret, this secret can be fully recovered even when a significative proportion of the blinded secret bits are erroneous. With a focus on ECC, this paper improves the best results of Schindler et al. in the specific case of structured-order elliptic curves. Our results show that larger blinding material and higher error rates can be successfully handled by an attacker in practice. This study also opens new directions in this line of work by the proposal of a three-steps attack process that isolates the attack critical path (in terms of complexity and success rate) and hence eases the development of future solutions.
Thomas Roche, Laurent Imbert, Victor Lomné

Remote Side-Channel Attacks on Heterogeneous SoC

Abstract
Thanks to their performance and flexibility, FPGAs are increasingly adopted for hardware acceleration on various platforms such as system on chip and cloud datacenters. Their use for commercial and industrial purposes raises concern about potential hardware security threats. By getting access to the FPGA fabric, an attacker could implement malicious logic to perform remote hardware attacks. Recently, several papers demonstrated that FPGA can be used to eavesdrop or disturb the activity of resources located within and outside the chip. In a complex SoC that contains a processor and a FPGA within the same die, we experimentally demonstrate that FPGA-based voltage sensors can eavesdrop computations running on the CPU and that advanced side-channel attacks can be conducted remotely to retrieve the secret key of a symmetric crypto-algorithm.
Joseph Gravellier, Jean-Max Dutertre, Yannick Teglia, Philippe Loubet Moundi, Francis Olivier

Optimal Collision Side-Channel Attacks

Abstract
Collision side-channel attacks are effective attacks against cryptographic implementations, however, optimality and efficiency of collision side-channel attacks is an open question. In this paper, we show that collision side-channel attacks can be derived using maximum likelihood principle when the distribution of the values of the leakage function is known. This allows us to exhibit the optimal collision side-channel attack and its efficient computation. Finally, we can compute an upper bound for the success rate of the optimal post-processing strategy, and we show that our method and the optimal strategy have success rates close to each other. Attackers can benefit from our method as we present an efficient collision side-channel attack. Evaluators can benefit from our method as we present a tight upper bound for the success rate of the optimal strategy.
Cezary Glowacz, Vincent Grosso

Microarchitectural Attacks

Frontmatter

A Bit-Level Approach to Side Channel Based Disassembling

Abstract
Side-Channel Based Disassembling (SCBD) is a powerful application of side-channel analysis that allows recovering instructions executed by a processor from its physical leakages, such as the electromagnetic field (EM) emitted by the chip. These attacks directly compromise code confidentiality, but they can also reveal to an adversary many critical information on the system’s internals. In this work, we propose a new approach for SCBD that directly focuses the bit encoding of an instruction using local EM leakage. We exploit a very precise bit-level leakage model and derive from it new algorithms that aim at recovering the actual bit values. We also propose strategies to automate the complex tasks of finding the best EM probe positions and combining them to improve results. On a PIC16 target, our method succeed in recovering the bits of an instruction with an average rate of 99,41% per bit. Compared to the state of the art, our disassembler is easier to train, recovers more information about instructions than just opcode and requires almost no modifications to target other processor architectures. Thus, this kind of disassemblers might become a threat to more complex processors, where side-channel disassembling has not been proved to be feasible yet.
Valence Cristiani, Maxime Lecomte, Thomas Hiscock

CCCiCC: A Cross-Core Cache-Independent Covert Channel on AMD Family 15h CPUs

Abstract
Spectre and similar microarchitectural attacks have recently caused a major paradigm shift in hardware and software development to restrict attacker-controlled speculative execution and microarchitectural sampling. So far, research has focused on cache interaction, instruction scheduling, microarchitectural sampling and speculative side effects, whereas instruction decoding research has been notably absent. We disclose two cross-core covert channels on multiple AMD processor generations (Family 15h) spanning from Bulldozer to Excavator with partial applicability to Zen.
In this work, cross-core instruction decoding and synchronization interactions are explored as a source of information leakage on these processors to yield multiple cache-independent covert channels in a non-SMT environment. In contrast to other attacks, we do not rely on memory interaction nor on speculative execution. None of the existing mitigations in the Linux kernel and processor microcode against transient execution attacks have any measurable effect on the CCCiCC covert channels. To the best of our knowledge, this is not fixable with a microcode update since any updated instruction would also become usable for signaling.
Carl-Daniel Hailfinger, Kerstin Lemke-Rust, Christof Paar

Design Considerations for EM Pulse Fault Injection

Abstract
Electromagnetic-fault injection (EM-FI) setups are appealing since they can be made at a low cost, achieve relatively high spatial resolutions, and avoid the need of tampering with the PCB or packaging of the target. In this paper we first sketch the importance of understanding the pulse characteristics of a pulse injection setup in order to successfully mount an attack. We then look into the different components that make up an EM-pulse setup and demonstrate their impact on the pulse shape. The different components are then assembled to form an EM-pulse injection setup. The effectiveness of the setup and how different design decisions impact the outcome of a fault injection campaign are demonstrated on a 32-bit ARM microcontroller.
Arthur Beckers, Masahiro Kinugawa, Yuichi Hayashi, Daisuke Fujimoto, Josep Balasch, Benedikt Gierlichs, Ingrid Verbauwhede

Cryptographic Primitives

Frontmatter

Lightweight MACs from Universal Hash Functions

Abstract
Lightweight cryptography is a topic of growing importance, with the goal to secure the communication of low-end devices that are not powerful enough to use conventional cryptography. There have been many recent proposals of lightweight block ciphers, but comparatively few results on lightweight Message Authentication Codes (MACs).
Therefore, this paper focuses on lightweight MACs. We review some existing constructions, and revisit the choices made in mainstream MACs with a focus on lightweight cryptography. We consider MACs based on universal hash functions, because they offer information theoretic security, can be implemented efficiently and are widely used in conventional cryptography. However, many constructions used in practice (such as GMAC or Poly1305-AES) follow the Wegman-Carter-Shoup construction, which is only secure up to \(2^{64}\) queries with a 128-bit state.
We point out that there are simple solutions to reach security beyond the birthday bound, and we propose a concrete instantiation, \(\mathsf {MAC611}\), reaching 61-bit security with a 61-bit universal hash function. We wrote an optimized implementation on two ARM micro-controllers, and we obtain very good performances on the Cortex-M4, at only 3.7 c/B for long messages, and less than one thousand cycles for short messages.
Sébastien Duval, Gaëtan Leurent

FELICS-AEAD: Benchmarking of Lightweight Authenticated Encryption Algorithms

Abstract
Cryptographic algorithms that can simultaneously provide both encryption and authentication play an increasingly important role in modern security architectures and protocols (e.g. TLS v1.3). Dozens of authenticated encryption systems have been designed in the past five years, which has initiated a large body of research in cryptanalysis. The interest in authenticated encryption has further risen after the National Institute of Standards and Technology (NIST) announced an initiative to standardize “lightweight” authenticated ciphers and hash functions that are suitable for resource-constrained devices. However, while there already exist some cryptanalytic results on these recent designs, little is known about their performance, especially when they are executed on small 8, 16, and 32-bit microcontrollers. In this paper, we introduce an open-source benchmarking tool suite for a fair and consistent evaluation of Authenticated Encryption with Associated Data (AEAD) algorithms written in C or assembly language for 8-bit AVR, 16-bit MSP430, and 32-bit ARM Cortex-M3 platforms. The tool suite is an extension of the FELICS benchmarking framework and provides a new AEAD-specific low-level API that allows users to collect very fine-grained and detailed results for execution time, RAM consumption, and binary code size in a highly automated fashion. FELICS-AEAD comes with two pre-defined evaluation scenarios, which were developed to resemble security-critical operations commonly carried out by real IoT applications to ensure the benchmarks are meaningful in practice. We tested the AEAD tool suite using five authenticated encryption algorithms, namely AES-GCM and the CAESAR candidates ACORN, ASCON, Ketje-Jr, and NORX, and present some preliminary results.
Luan Cardoso dos Santos, Johann Großschädl, Alex Biryukov

Advances in Side-Channel Analysis

Frontmatter

A Comparison of -Test and Mutual Information as Distinguisher for Side-Channel Analysis

Abstract
Masking is known as the most widely studied countermeasure against side-channel analysis attacks. Since a masked implementation is based on a certain number of shares (referred to as the order of masking), it still exhibits leakages at higher orders. In order to exploit such leakages, higher-order statistical moments individually at each order need to be estimated reflecting the higher-order attacks. Instead, Mutual Information Analysis (MIA) known for more than 10 years avoids such a moment-based analysis by considering the entire distribution for the key recovery. Recently the \(\chi ^2\)-test has been proposed for leakage detection and as a distinguisher where also the whole distribution of the leakages is analyzed.
In this work, we compare these two schemes to examine their dependency. Indeed, one of the goals of this research is to conclude whether one can outperform the other. In addition to a theoretical comparison, we present two case studies and their corresponding practical evaluations. Both case studies are masked hardware implementations; one is an FPGA-based realization of a threshold implementation of PRESENT, and the other is an AES implementation as a coprocessor on a commercial smart card.
Bastian Richter, David Knichel, Amir Moradi

Key Enumeration from the Adversarial Viewpoint

When to Stop Measuring and Start Enumerating?
Abstract
In this work, we formulate and investigate a pragmatic question related to practical side-channel attacks complemented with key enumeration. In a real attack scenario, after an attacker has extracted side-channel information, it is possible that despite the entropy of the key has been significantly reduced, she cannot yet achieve a direct key recovery. If the correct key lies within a sufficiently small set of most probable keys, it can then be recovered with a plaintext and the corresponding ciphertext, by performing enumeration. Our proposal relates to the following question: how does an attacker know when to stop acquiring side-channel observations and when to start enumerating with a given computational effort? Since key enumeration is an expensive (i.e. time-consuming) task, this is an important question from an adversarial viewpoint. To answer this question, we present an efficient (heuristic) way to perform key-less rank estimation, based on simple entropy estimations using histograms.
Melissa Azouaoui, Romain Poussier, François-Xavier Standaert, Vincent Verneuil

Backmatter

Weitere Informationen

Premium Partner

    Bildnachweise