Skip to main content
Top

2003 | Book

Cryptographic Hardware and Embedded Systems - CHES 2002

4th International Workshop Redwood Shores, CA, USA, August 13–15, 2002 Revised Papers

Editors: Burton S. Kaliski, çetin K. Koç, Christof Paar

Publisher: Springer Berlin Heidelberg

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

ThesearetheproceedingsofCHES2002,theFourthWorkshoponCryptographic Hardware and Embedded Systems. After the ?rst two CHES Workshops held in Massachusetts, and the third held in Europe, this is the ?rst Workshop on the West Coast of the United States. There was a record number of submissions this year and in response the technical program was extended to 3 days. As is evident by the papers in these proceedings, there have been again many excellent submissions. Selecting the papers for this year’s CHES was not an easy task, and we regret that we could not accept many contributions due to the limited availability of time. There were 101 submissions this year, of which 39 were selected for presentation. We continue to observe a steady increase over previous years: 42 submissions at CHES ’99, 51 at CHES 2000, and 66 at CHES 2001. We interpret this as a continuing need for a workshop series that c- bines theory and practice for integrating strong security features into modern communicationsandcomputerapplications. Inadditiontothesubmittedcont- butions, Jean-Jacques Quisquater (UCL, Belgium), Sanjay Sarma (MIT, USA) and a panel of experts on hardware random number generation gave invited talks. As in the previous years, the focus of the Workshop is on all aspects of cr- tographic hardware and embedded system security. Of special interest were c- tributionsthatdescribenewmethodsfore?cienthardwareimplementationsand high-speed software for embedded systems, e. g. , smart cards, microprocessors, DSPs, etc. CHES also continues to be an important forum for new theoretical and practical ?ndings in the important and growing ?eld of side-channel attacks.

Table of Contents

Frontmatter

Invited Talk

CHES: Past, Present, and Future

CHES is (coming to be) a very interesting conference thanks to the excellent submitted papers, the new results about embedded systems, the coprocessors, and the new use of FPGAs.

Jean-Jacques Quisquater

Attack Strategies

Optical Fault Induction Attacks

We describe a new class of attacks on secure microcontrollers and smartcards. Illumination of a target transistor causes it to conduct, thereby inducing a transient fault. Such attacks are practical; they do not even require expensive laser equipment. We have carried them out using a flashgun bought second-hand from a camera store for $30 and with an $8 laser pointer. As an illustration of the power of this attack, we developed techniques to set or reset any individual bit of SRAM in a microcontroller. Unless suitable countermeasures are taken, optical probing may also be used to induce errors in cryptographic computations or protocols, and to disrupt the processor’s control flow. It thus provides a powerful extension of existing glitching and fault analysis techniques. This vulnerability may pose a big problem for the industry, similar to those resulting from probing attacks in the mid-1990s and power analysis attacks in the late 1990s.We have therefore developed a technology to block these attacks. We use self-timed dual-rail circuit design techniques whereby a logical 1 or 0 is not encoded by a high or low voltage on a single line, but by (HL) or (LH) on a pair of lines. The combination (HH) signals an alarm, which will typically reset the processor. Circuits can be designed so that single-transistor failures do not lead to security failure. This technology may also make power analysis attacks very much harder too.

Sergei P. Skorobogatov, Ross J. Anderson
Template Attacks

We present template attacks, the strongest form of side channel attack possible in an information theoretic sense. These attacks can break implementations and countermeasures whose security is dependent on the assumption that an adversary cannot obtain more than one or a limited number of side channel samples. They require that an adversary has access to an identical experimental device that he can program to his choosing. The success of these attacks in such constraining situations is due manner in which noise within each sample is handled. In contrast to previous approaches which viewed noise as a hindrance that had to be reduced or eliminated, our approach focuses on precisely modeling noise, and using this to fully extract information present in a single sample. We describe in detail how an implementation of RC4, not amenable to techniques such as SPA and DPA, can easily be broken using template attacks with a single sample. Other applications include attacks on certain DES implementations which use DPA-resistant hardware and certain SSL accelerators which can be attacked by monitoring electromagnetic emanations from an RSA operation even from distances of fifteen feet.

Suresh Chari, Josyula R. Rao, Pankaj Rohatgi
The EM Side—Channel(s)

We present results of a systematic investigation of leakage of compromising information via electromagnetic (EM) emanations from CMOS devices. These emanations are shown to consist of a multiplicity of signals, each leaking somewhat different information about the underlying computation. We show that not only can EM emanations be used to attack cryptographic devices where the power side-channel is unavailable, they can even be used to break power analysis countermeasures.

Dakshi Agrawal, Bruce Archambeault, Josyula R. Rao, Pankaj Rohatgi

Finite Field and Modular Arithmetic I

Enhanced Montgomery Multiplication

We define the Non Reduced Montgomery Multiplication of order s, of A and B, modulo N (odd integer) by NRMMs(A,B,N) = (AB + N (-ABN-1 (mod 2s))) 2-s. Given an upper bound on A and B, with respect to N, we show how to choose the variable s in a way that guarantees that NRMMsA,B,N) <2N.A few applications are demonstrated, showing the advantage of using NRMMs with an appropriately chosen s, over the classical Montgomery Multiplication.

Shay Gueron
New Algorithm for Classical Modular Inverse

The Montgomery inverse is used in cryptography for the computation of modular inverse of b modulo a, where a is a prime. We analyse existing algorithms from the point of view of their hardware implementation. We propose a new, hardware-optimal algorithm for the calculation of the classical modular inverse. The left-shift binary algorithm is shown to naturally calculate the classical modular inverse in fewer operations than the algorithm derived from the Montgomery inverse.

Róbert Lórencz
Increasing the Bitlength of a Crypto-Coprocessor

We present a novel technique which allows a virtual increase of the bitlength of a crypto-coprocessor in an efficient and elegant way. The proposed algorithms assume that the coprocessor is equipped with a special modular multiplication instruction. This instruction, called MultModDiv(A,B,N) computes A * B mod N and [(A*B)/N]. In addition to the doubling algorithm, we also present two conceivable economic implementations of the MultModDiv instruction: one hardware and one software realization. The hardware realization of the MultModDiv instruction has the same performance as the modular multiplication presented in the paper. The software realization requires two calls of the modular multiplication instruction. Our most efficient algorithm needs only six calls to an n-bit MultModDiv instruction to compute a modular 2n-bit multiplication. Obviously, special variants of our algorithm, e.g., squaring, require fewer calls.

Wieland Fischer, Jean-Pierre Seifert

Elliptic Curve Cryptography I

Enhancing Simple Power-Analysis Attacks on Elliptic Curve Cryptosystems

Recent applications of lattice attacks against elliptic curve cryptosystems have shown that the protection of ephemeral keys in the ECDSA is of greatest importance. This paper shows how to enhance simple power-analysis attacks on elliptic-curve point-multiplication algorithms by using Markov models. We demonstrate the attack on an addition-subtraction algorithm (fixing the sequence of elliptic-curve operations) which is similar to the one described by Morain et al. in [MO90] and apply the method to the general addition-subtraction method described in ANSI X9.62 [ANS99].

Elisabeth Oswald
Implementation of Elliptic Curve Cryptography with Built-In Counter Measures against Side Channel Attacks

Many software implementations of public key cryptosystems have been concerned with efficiency. The advent of side channel attacks, such as timing and power analysis attacks, force us to reconsider the strategy of implementation of group arithmetic. This paper presents a study of software counter measures against side channel attacks for elliptic curve cryptosystems.We introduce two new counter measures. The first is a new implementation technique, namely, homogeneous group operations, which has the property that addition and doubling on elliptic curves cannot be distinguished from side channel analysis. Being inexpensive time-wise, this technique is an alternative to a well-known Montgomery ladder. The second is a non-deterministic method of point exponentiation with precomputations. Although requiring rather large ROM, it provides an effective resistance against high-order power analysis attacks for the price of index re-computations and ROM accesses.An experimental implementation of NIST-recommended elliptic curves over binary fields with a balanced suite of counter measures built-in in group arithmetic is presented, and the penalty paid is analyzed. The results of the implementation in C on an AMD Duron 600 MHz running Linux are included in the paper.

Elena Trichina, Antonio Bellezza
Secure Elliptic Curve Implementations: An Analysis of Resistance to Power-Attacks in a DSP Processor

With the popularity of wireless communication devices a growing new important dimension of embedded systems design is that of security. This paper presents exploration of power attack resistance, using a statistical approach for identifying regions of the power trace which pose a possible security threat. Unlike previous power analysis research, a new metric supporting small timing shifts and complex processor architectures is presented. This research helps to identify how to create secure implementations of software. Elliptic curve point multiplications using the Weierstrass curve and Jacobi form over 192-bit prime fields were implemented and analyzed. Over 60 real measured power traces of elliptic curve point multiplications running at 100MHz on a DSP VLIW processor core were analyzed. Modification of power traces through software design was performed to maximize resistance to power attacks in addition to improving energy dissipation and performance by 44% with a 31% increase in code size. This research is important for industry since efficient yet secure cryptography is crucial for wireless communication embedded system devices and future IP enabled smart cards.

Catherine H. Gebotys, Robert J. Gebotys
Address-Bit Differential Power Analysis of Cryptographic Schemes OK-ECDH and OK-ECDSA

The differential power analysis (DPA) is a powerful attack against the implementation of cryptographic schemes on mobile devices. This paper proposes an alternative DPA using the addresses of registers of elliptic curve based cryptosystems (ECC) implemented on smart cards. We call the analysis the address-bit DPA in this paper. The analysis was originally investigated by Messerges, Dabbish and Sloan, however it was thought to be of no effect if the intermediate data are randomized. We extend the analysis and show how the extended analysis works against scalar exponentiations even if the implementation is resistant against the data-based DPA. We show experimental results of our analysis of cryptographic schemes OK-ECDH and OK-ECDSA, which are candidates of the CRYPTREC project in Japan, and evidence of their weakness.

Kouichi Itoh, Tetsuya Izu, Masahiko Takenaka

AES and AES Candidates

2Gbit/s Hardware Realizations of RIJNDAEL and SERPENT: A Comparative Analysis

We present and evaluate efficient VLSI implementations of both Rijndael and Serpent. The two cipher algorithms have been implemented by two comparable design teams within the same timeframe using the same fabrication process and EDA tools. We are thus in a position to compare to what degree the Rijndael and Serpent ciphers are suitable for dedicated hardware architectures. Both ASICs support encryption as well as decryption in ECB mode and include on-chip subkey generation. The two designs have been fabricated in a 0.6μm 3LM CMOS technology. Measurement results verified an encryption and decryption throughput of 2.26Gbit/s and 1.96Gbit/s for Rijndael and Serpent respectively. Circuit complexity is in the order of 300k transistors in either case.

A.K. Lutz, J. Treichler, F.K. Gürkaynak, H. Kaeslin, G. Basler, A. Erni, S. Reichmuth, P. Rommens, S. Oetiker, W. Fichtner
Efficient Software Implementation of AES on 32-Bit Platforms

Rijndael is the winner algorithm of the AES contest; therefore it should become the most used symmetric-key cryptographic algorithm. One important application of this new standard is cryptography on smart cards. In this paper we present an optimisation of the Rijndael algorithm to speed up execution on 32-bits processors with memory constraints, such as those used in smart cards. First a theoretical analysis of the Rijndael algorithm and of the proposed optimisation is discussed, and then simulation results of the optimised algorithm on different processors are presented and compared with other reference implementations, as known from the technical literature.

Guido Bertoni, Luca Breveglieri, Pasqualina Fragneto, Marco Macchetti, Stefano Marchesin
An Optimized S-Box Circuit Architecture for Low Power AES Design

Reducing the power consumption of AES circuits is a critical problem when the circuits are used in low power embedded systems. We found the S-Boxes consume much of the total AES circuit power and the power for an S-Box is mostly determined by the number of dynamic hazards. In this paper, we propose a low-power S-Box circuit architecture: a multi-stage PPRM architecture over composite fields. In this S-Box, (i) the signal arrival times of gates are as close as possible if the depths of the gates from the primary inputs are the same, and (ii) the hazard-transparent XOR gates are located after the other gates that may block the hazards. A low power consumption of 29 μW at 10 MHz using 0.13 μm 1.5V CMOS technology was achieved, while the consumptions of the BDD, SOP, and composite field S-Boxes are 275, 95, and 136 μW, respectively.

Sumio Morioka, Akashi Satoh
Simplified Adaptive Multiplicative Masking for AES

Software counter measures against side channel attacks considerably hinder performance of cryptographic algorithms in terms of memory or execution time or both. The challenge is to achieve secure implementation with as little extra cost as possible. In this paper we optimize a counter measure for the AES block cipher consisting in transforming a boolean mask to a multiplicative mask prior to a non-linear Byte Substitution operation (thus, avoiding S-box re-computations for every run or storing multiple S-box tables in RAM), while preserving a boolean mask everywhere else. We demonstrate that it is possible to achieve such transformation for a cost of two additional multiplications in the field.However, due to an inherent vulnerability of multiplicative masking to so-called zero attack, an additional care must be taken to securize its implementation. We describe one possible, although not perfect, approach to such an implementation which combines algebraic techniques and partial re-computation of S-boxes. This adds one more multiplication operation, and either occasional S-box re-computations or extra 528 bytes of memory to the total price of the counter measure.

Elena Trichina, Domenico De Seta, Lucia Germani
Multiplicative Masking and Power Analysis of AES

The recently proposed multiplicative masking countermeasure against power analysis attacks on AES is interesting as it does not require the costly recomputation and RAM storage of S-boxes for every run of AES. This is important for applications where the available space is very limited such as the smart card applications. Unfortunately, it is here shown that this method is in fact inherently vulnerable to differential power analysis. However, it is also shown that the multiplicative masking method can be modified so as to provide resistance to differential power analysis of nonideal but controllable security level, at the expense of increased computational complexity. Other possible random masking methods are also discussed.

Jovan D. Golić, Christophe Tymen

Tamper Resistance

Keeping Secrets in Hardware: The Microsoft XboxTM Case Study

This paper discusses the hardware foundations of the cryptosystem employed by the XboxTM video game console from Microsoft. A secret boot block overlay is buried within a system ASIC. This secret boot block decrypts and verifies portions of an external FLASH-type ROM. The presence of the secret boot block is camouflaged by a decoy boot block in the external ROM. The code contained within the secret boot block is transferred to the CPU in the clear over a set of high-speed busses where it can be extracted using simple custom hardware. The paper concludes with recommendations for improving the Xbox security system. One lesson of this study is that the use of a high-performance bus alone is not a sufficient security measure, given the advent of inexpensive rapid prototyping services and affordable high-performance FPGAs.

Andrew Huang

RSA Implementation

A DPA Attack against the Modular Reduction within a CRT Implementation of RSA

Published DPA attack scenarios against the RSA implementation exploit the possibility of predicting intermediate data during a straight-forward square-multiply exponentiation algorithm. An implementation of RSA using CRT (Chinese Remainder Theorem) prevents the pre-calculation of intermediate results during the exponentiation algorithm by an attacker. In this paper, we present a DPA attack that uses byte-wise hypotheses on the remainder after the modular reduction with one of the primes. Instead of using random input data this attack uses k series of input data with an equidistant step distance of 1, 256, (256)2, ..., (256)k. The basic assumption of this DPA attack named MRED (“Modular Reduction on Equidistant Data”) is that the distance of the input data equals the distance of the intermediate data after the modular reduction at least for a subgroup of single measurements. A function F k that is composed of the k DPA results is used for the approximation of a multiple of the prime. Finally the gcd gives the prime. The number of DPA calculations increases linear to the number of bytes of the prime to be attacked. MRED is demonstrated using simulated measurement data. The practical efficiency is assessed. If the applicability of this attack is limited due to padding formats in RSA signature applications, the least significant bytes of the remainder after the modular reduction step can still be revealed. Multiplicative message blinding can protect the reduction modulo a secret prime against MRED.

Bert den Boer, Kerstin Lemke, Guntram Wicke
Further Results and Considerations on Side Channel Attacks on RSA

This paper contains three parts. In the first part we present a new side channel attack on a plaintext encrypted by EME-OAEP PKCS#1 v.2.1. In contrast with Manger’s attack, we attack that part of the plaintext, which is shielded by the OAEP method. In the second part we show that Bleichenbacher’s and Manger’s attack on the RSA encryption scheme PKCS#1 v.1.5 and EME-OAEP PKCS#1 v.2.1 can be converted to an attack on the RSA signature scheme with any message encoding (not only PKCS). In the third part we deploy a general idea of fault-based attacks on the RSA-KEM scheme and present two particular attacks as the examples. The result is the private key instead of the plaintext as with attacks on PKCS#1 v.1.5 and v.2.1. These attacks should highlight the fact that the RSA-KEM scheme is not an entirely universal solution to problems of RSAES-OAEP implementation and that even here the manner of implementation is significant.

Vlastímil Klíma, Tomáš Rosa
Fault Attacks on RSA with CRT: Concrete Results and Practical Countermeasures

This article describes concrete results and practically validated countermeasures concerning differential fault attacks on RSA using the CRT. We investigate smartcards with an RSA coprocessor where any hardware countermeasures to defeat fault attacks have been switched off. This scenario was chosen in order to analyze the reliability of software countermeasures. We start by describing our laboratory setting for the attacks. Hereafter, we describe the experiments and results of a straightforward implementation of a well-known countermeasure. This implementation turned out to be not sufficient. With the data obtained by these experiments we developed a practical error model. This enabled us to specify enhanced software countermeasures for which we were not able to produce any successful attacks on the investigated chips. Nevertheless, we are convinced that only sophisticated hardware countermeasures (sensors, filters, etc.) in combination with software countermeasures will be able to provide security.

C. Aumüller, P. Bier, W. Fischer, P. Hofreiter, J.-P. Seifert

Finite Field and Modular Arithmetic II

Some Security Aspects of the MIST Randomized Exponentiation Algorithm

The MIST exponentiation algorithm is intended for use in embedded crypto-systems to provide protection against power analysis and other side channel attacks. It generates randomly different addition chains for performing a particular exponentiation. This means that side channel attacks on RSA decryption or signing which require averaging over a number of exponentiation power traces become impossible. However, averaging over digit-by-digit multiplication traces may allow the detection of operand re-use. Although this provides a handle for an attacker by which the exponent search space might be considerably reduced, the number of possible exponents is shown to be still well outside the range of feasible computation in the foreseeable future.

Colin D. Walter
The Montgomery Powering Ladder

This paper gives a comprehensive analysis of Montgomery powering ladder. Initially developed for fast scalar multiplication on elliptic curves, we extend the scope of Montgomery ladder to any exponentiation in an abelian group. Computationally, the Montgomery ladder has the triple advantage of presenting a Lucas chain structure, of being parallelized, and of sharing a common operand. Furthermore, contrary to the classical binary algorithms, it behaves very regularly, which makes it naturally protected against a large variety of implementation attacks.

Marc Joye, Sung-Ming Yen
DPA Countermeasures by Improving the Window Method

We propose three differential power analysis (DPA) countermeasures for securing the public key cryptosystems. All countermeasures are based on the window method, and can be used in both RSA and elliptic curve cryptosystems (ECC). By using the optimal countermeasure, performance penalty is small. In comparison with k-ary method, computation time of our countermeasure is only 105% in 1024-bit RSA and 119% in 160-bit ECC.

Kouichi Itoh, Jun Yajima, Masahiko Takenaka, Naoya Torii
Efficient Subgroup Exponentiation in Quadratic and Sixth Degree Extensions

This paper describes several speedups for computation in the order p + 1 subgroup of Fp*2 and the order p2 - p + 1 subgroup of Fp*6 These results are in a way complementary to LUC and XTR, where computations in these groups are sped up using trace maps. As a side result, we present an efficient method for XTR with p = 3 mod 4.

Martijn Stam, Arjen K. Lenstra

Elliptic Curve Cryptography II

On the Efficient Generation of Elliptic Curves over Prime Fields

We present a variant of the complex multiplication method that generates elliptic curves of cryptographically strong order. Our variant is based on the computation ofWeber polynomials that require significantly less time and space resources than their Hilbert counterparts. We investigate the time efficiency and precision requirements for generating off-line Weber polynomials and its comparison to another variant based on the off-line generation of Hilbert polynomials. We also investigate the efficiency of our variant when the computation of Weber polynomials should be made on-line due to limitations in resources (e.g., hardware devices of limited space).We present trade-offs that could be useful to potential implementors of elliptic curve cryptosystems on resource-limited hardware devices.

Elisavet Konstantinou, Yiannis C. Stamatiou, Christos Zaroliagis
An End-to-End Systems Approach to Elliptic Curve Cryptography

Since its proposal by Victor Miller [17] and Neal Koblitz [15] in the mid 1980s, Elliptic Curve Cryptography (ECC) has evolved into a mature public-key cryptosystem. Offering the smallest key size and the highest strength per bit, its computational efficiency can benefit both client devices and server machines. We have designed a programmable hardware accelerator to speed up point multiplication for elliptic curves over binary polynomial fields GF(2m). The accelerator is based on a scalable architecture capable of handling curves of arbitrary field degrees up to m = 255. In addition, it delivers optimized performance for a set of commonly used curves through hard-wired reduction logic. A prototype implementation running in a Xilinx XCV2000E FPGA at 66.4 MHz shows a performance of 6987 point multiplications per second for GF(2163). We have integrated ECC into OpenSSL, today's dominant implementation of the secure Internet protocol SSL, and tested it with the Apache web server and open-source web browsers.

Nils Gura, Sheueling Chang Shantz, Hans Eberle, Sumit Gupta, Vipul Gupta, Daniel Finchelstein, Edouard Goupy, Douglas Stebila
A Low-Power Design for an Elliptic Curve Digital Signature Chip

We present a VHDL design that incorporates optimizations intended to provide digital signature generation with as little power, space, and time as possible. These three primary objectives of power, size, and speed must be balanced along with other important goals, including flexibility of the hardware and ease of use. The highest-level function offered by our hardware design is Elliptic Curve Optimal El Gamal digital signature generation. Our parameters are defined over the finite field GF(2178), which gives security that is roughly equivalent to that provided by 1500-bit RSA signatures. Our optimizations include using the point-halving algorithm for elliptic curves, field towers to speed up the finite field arithmetic in general, and further enhancements of basic finite field arithmetic operations. The result is a synthesized VHDL digital signature design (using a CMOS 0.5μm, 5V , 25°C library) of 191,000 gates that generates a signature in 4.4 ms at 20 MHz.

Richard Schroeppel, Cheryl Beaver, Rita Gonzales, Russell Miller, Timothy Draelos
A Reconfigurable System on Chip Implementation for Elliptic Curve Cryptography over

The performance of elliptic curve based public key cryptosystems is mainly appointed by the efficiency of the underlying finite field arithmetic. This work describes two generic and scalable architectures of finite field coprocessors, which are implemented within the latest family of Field Programmable System Level Integrated Circuits FPSLIC from Atmel, Inc. The HW architectures are adapted from Karatsuba’s divide and conquer algorithm and allow for a reasonable speedup of the top-level elliptic curve algorithms. The VHDL hardware models are automatically generated based on an eligible operand size, which permits the optimal utilization of a particular FPSLIC device.

M. Ernst, M. Jung, F. Madlener, S. Huss, R. Blümel
Genus Two Hyperelliptic Curve Coprocessor

Hyperelliptic curve cryptography with genus larger than one has not been seriously considered for cryptographic purposes because many existing implementations are significantly slower than elliptic curve versions with the same level of security. In this paper, the first ever complete hardware implementation of a hyperelliptic curve coprocessor is described. This coprocessor is designed for genus two curves over $$ \mathbb{F}_{2^{113} } $$ . Additionally, a modification to the Extended Euclidean Algorithm is presented for the GCD calculation required by Cantor’s algorithm. On average, this new method computes the GCD in one-fourth the time required by the Extended Euclidean Algorithm.

N. Boston, T. Clancy, Y. Liow, J. Webster

Random Number Generation

True Random Number Generator Embedded in Reconfigurable Hardware

This paper presents a new True Random Number Generator (TRNG) based on an analog Phase-Locked Loop (PLL) implemented in a digital Altera Field Programmable Logic Device (FPLD). Starting with an analysis of the one available on chip source of randomness - the PLL synthesized low jitter clock signal, a new simple and reliable method of true randomness extraction is proposed. Basic assumptions about statistical properties of jitter signal are confirmed by testing of mean value of the TRNG output signal. The quality of generated true random numbers is confirmed by passing standard NIST statistical tests. The described TRNG is tailored for embedded System-On-a-Programmable- Chip (SOPC) cryptographic applications and can provide a good quality true random bit-stream with throughput of several tens of kilobits per second. The possibility of including the proposed TRNG into a SOPC design significantly increases the system security of embedded cryptographic hardware.

Viktor Fischer, Miloš Drutarovský
Evaluation Criteria for True (Physical) Random Number Generators Used in Cryptographic Applications

Random number generators are essential components of many cryptographic systems. Inappropriate random number generators may weaken the security properties of the system considerably. This paper considers evaluation criteria for true (physical) random number generators. General objectives are formulated and possible criteria and measures are discussed which shall ensure these goals. Central parts of the mathematical-technical reference of the German evaluation guidance document AIS 31 ([19],[2]) are cited and rationale is given.

Werner Schindler, Wolfgang Killmann
A Hardware Random Number Generator

Some of the desirable properties a cryptographic random number generator should have are lack of bias, bit independence, unpredictiability and nonrepeatability. In this paper, we discuss how a hardware random number generator formed from simple components can provide these properties. The components include two state machines with different structures, and free-running oscillators. The generated numbers pass the DIEHARD battery of tests.

Thomas E. Tkacik

Invited Talk

RFID Systems and Security and Privacy Implications

The Auto-ID Center is developing low-cost radio frequency identification (RFID) based systems with the initial application as next generation bar-codes. We describe RFID technology, summarize our approach and our research, and most importantly, describe the research opportunities in RFID for experts in cryptography and information security. The common theme in low-cost RFID systems is that computation resources are very limited, and all aspects of the RFID system are connected to each other. Understanding these connections and the resulting design trade-offs is an important prerequisite to effectively answering the challenges of security and privacy in low-cost RFID systems.

Sanjay E. Sarma, Stephen A. Weis, Daniel W. Engels

New Primitives

A New Class of Invertible Mappings

Invertible transformations over n-bit words are essential ingredients in many cryptographic constructions. When n is small (e.g., n = 8) we can compactly represent any such transformation as a lookup table, but when n is large (e.g., n = 64) we usually have to represent it as a composition of simpler operations such as linear mappings, S-P networks, Feistel structures, etc. Since these cryptographic constructions are often implemented in software on standard microprocessors, we are particularly interested in invertible univariate or multivariate transformations which can be implemented as small compositions of basic machine instructions on 32 or 64 bit words. In this paper we introduce a new class of provably invertible mappings which can mix arithmetic operations (negation, addition, subtraction, multiplication) and boolean operations (not, xor, and, or), are highly efficient, and have desirable cryptographic properties. In particular, we show that for any n the mapping x → x + (x2V C) (mod 2n) is a permutation with a single cycle of length 2n iff both the least significant bit and the third least significant bit in the constant C are 1.

Alexander Klimov, Adi Shamir

Finite Field and Modular Arithmetic II

Scalable and Unified Hardware to Compute Montgomery Inverse in GF(p) and GF(2n)

Computing the inverse of a number in finite fields GF(p) or GF(2n) is equally important for cryptographic applications. This paper proposes a novel scalable and unified architecture for a Montgomery inverse hardware that operates in both GF(p) and GF(2n) fields. We adjust and modify a GF(2n) Montgomery inverse algorithm to accommodate multi-bit shifting hardware, making it very similar to a previously proposed GF(p) algorithm. The architecture is intended to be scalable, which allows the hardware to compute the inverse of long precision numbers in a repetitive way. After implementing this unified design it was compared with other designs. The unified hardware was found to be eight times smaller than another reconfigurable design, with comparable performance. Even though the unified design consumes slightly more area and it is slightly slower than the scalable inverter implementations for GF(p) only, it is a practical solution whenever arithmetic in the two finite fields is needed.

Adnan Abdul-Aziz Gutub, Alexandre F. Tenca, Erkay Savaş, RCCetin K. Koç
Dual-Field Arithmetic Unit for GF(p) and GF(2m)

In this article we present a hardware solution for finite field arithmetic with application in asymmetric cryptography. It supports calculation in GF(p) as well as in GF(2m). Addition and multiplication with interleaved modular reduction are the main functionality of the unit. Additional functions—like shift operations and integer incrementation—allow the calculation of the multiplicative inverse and covering all operations required to implement Elliptic Curve Cryptography. Redundant number representation and efficient modular reduction make it ready for future cryptographic bitlengths and allow operation at high clock frequency on moderate hardware resources.

Johannes Wolkerstorfer
Error Detection in Polynomial Basis Multipliers over Binary Extension Fields

In many of cryptographic schemes, the most time consuming basic arithmetic operation is the finite field multiplication and its hardware implementation may require millions of logic gates. It is a complex and costly task to develop such large finite field multipliers which will always yield error free outputs. In this effect, this paper considers fault tolerant multiplication in finite fields. It deals with detection of errors of bit-parallel and bit-serial polynomial basis multipliers over finite fields of characteristic two. Our approach is to partition the multiplier structure into a number of smaller computational units and use the parity prediction technique to detect errors.

Arash Reyhani-Masoleh, M.A. Hasan
Hardware Implementation of Finite Fields of Characteristic Three

In this paper we examine a number of ways of implementing characteristic three arithmetic in hardware. While this type of arithmetic is not traditionally used in cryptographic systems, recent advances in Tate and Weil pairing based cryptosystems show that it is potentially valuable. We examine a hardware oriented representation of the field elements, comparing the resulting algorithms for field addition and multiplication operations, and show that characteristic three arithmetic need not significantly under-perform comparable characteristic two alternatives.

D. Page, N.P. Smart

Elliptic Curve Cryptography III

Preventing Differential Analysis in GLV Elliptic Curve Scalar Multiplication

In [2], Gallant, Lambert and Vanstone proposed a very efficient algorithmto compute Q = kP on elliptic curves having non-trivial efficiently computable endomorphisms. Cryptographic protocols are sensitive to implementations, indeed as shown in [6],[7] information about the secret can be revealed analysing external leakage of the support, typically a smart card. Several software countermeasures have been proposed to protect the secret. However, speed computation is needed for practical use. In this paper, we propose a method to protect scalar multiplication on elliptic curves against Differential Analysis, that benefits from the speed of the Gallant, Lambert and Vanstone method. It can be viewed as a two-dimensional analogue of Coron’s method [1] of randomising the exponent k. We propose two variants of this method (one linear and one affine), the second one slightly more effective, whereas the first one offers “two in one”, combining point-blinding and exponent randomisation, which have hitherto been dealt separately. For instance, for at most a mere 37.5% (resp. 25%) computation speed loss on elliptic curves over fields with 160 (resp. 240) bits the computation of kP can take on 240 different consumption patterns.

Mathieu Ciet, Jean-Jacques Quisquater, Francesco Sica
Randomized Signed-Scalar Multiplication of ECC to Resist Power Attacks

Recently it has been shown that smart cards as cryptographic devices are vulnerable to power attacks if they have no defence against them. Randomization on ECC scalar multiplication is one of the fundamental concepts in methods of defence against side-channel attacks. In this paper by using the randomization concept together with the NAF recoding algorithm, we propose an efficient countermeasure for ECCs against power attacks. The countermeasure provides a randomized signed-scalar representation at every scalar multiplication to resist DPA. To protect against SPA it additionally employs a simple SPA-immune addition-subtraction multiplication algorithm. Our analysis shows that it needs no additional computation load compared to the ordinary binary scalar multiplication, where the average number of doublings plus additions for a bit length n is 1.5n+O(1).

Jae Cheol Ha, Sang Jae Moon
Fast Multi-scalar Multiplication Methods on Elliptic Curves with Precomputation Strategy Using Montgomery Trick

Our development of efficient methods for the precomputation of multi-scalar multiplication for elliptic curve cryptosystems (ECCs) is presented. Multi-scalar multiplication is required in many forms of ECC, including schemes for the verification of ECDSA signatures. The simultaneous method is one known method for fast multi-scalar multiplication. The method has two stages: a precomputation stage and an evaluation stage. Points for use in the evaluation stage are computed in the precomputation stage. The actual multi-scalar multiplication is carried out on the basis of the precomputed points in the evaluation stage. In the evaluation stage of the simultaneous method, we are able to quickly compute the points of the multi-scalar multiple because few additions are required. On the other hand, if we use a large window width, we have to compute an enormous number of points in the precomputation stage. Hence, we have to compute an abundance of inversions, which carries a high computational cost. The result is that a large amount of time is required by the precomputation stage. This is the well-known draw-back of the simultaneous method. In our proposed method, we apply the Montgomery trick to reduce the number of inversions required with a width window w from O(22w) to O(w). In addition, our proposed method computes uP and vQ for any u, v, then compute uP + vQ, where P,Qare elliptic points. This procedure enables us to remove points that will not be used later from the process of precomputation. Without our proposed method, an algorithm to compute precomputation table would have to be changed dependently on unused points. Compared with the method without Montgomery trick, our proposed method is 3.6 times faster than the conventional simultaneous method, i.e., than in the absence of the Montgomery trick. Moreover, the optimal window width for our proposed method is 3, whereas the corresponding width for conventional simultaneous methods is 2.

Katsuyuki Okeya, Kouichi Sakurai

Hardware for Cryptanalysis

Experience Using a Low-Cost FPGA Design to Crack DES Keys

This paper describes the authors’ experiences attacking the IBM 4758 CCA, used in retail banking to protect the ATM infrastructure. One of the authors had previously proposed a theoretical attack to extract DES keys from the system, but it failed to take account of realworld banking security practice. We developed a practical scheme that collected the necessary data in a single 10-minute session. Risk of discovery by intrusion detection systems made it necessary to complete the key “cracking” part of the attack within a few days, so a hardware DES cracker was implemented on a US$995 off-the-shelf FPGA development board. This gave a 20-fold increase in key testing speed over the use of a standard 800 MHz PC. The attack was not only successful in its aims, but also shed new light on the protocol vulnerabilities being exploited. In addition, the FPGA development led to a fresh way of demonstrating the non-randomness of some of the DES S-boxes and indicated when pipelining can be a more effective technique than replication of processing blocks. The wide range of insights we obtained demonstrates that there can be significant value in implementing attacks “for real”.

Richard Clayton, Mike Bond
A Time-Memory Tradeo. Using Distinguished Points: New Analysis & FPGA Results

In 1980, Martin Hellman [1] introduced the concept of cryptanalytic time-memory tradeoffs, which allows the cryptanalysis of any N key symmetric cryptosystem in O(N2/3) operations with O(N2/3) storage, provided a precomputation of O(N) is performed beforehand. This procedure is well known but did not lead to realistic implementations. This paper considers a cryptanalytic time-memory tradeoff using distinguished points, a method referenced to Rivest [2]. The algorithm proposed decreases the expected number of memory accesses with sensible modifications of the other parameters and allows much more realistic implementations of fast key search machines.We present a detailed analysis of the algorithm and solve theoretical open problems of previous models. We also propose efficient mask functions in terms of hardware cost and probability of success. These results were experimentally confirmed and we used a purpose-built FPGA design to perform realistic tradeoffs against DES. The resulting online attack is feasible on a single PC and we recover a 40-bit key in about 10 seconds.

Francois-Xavier Standaert, Gael Rouvroy, Jean-Jacques Quisquater, Jean-Didier Legat
Backmatter
Metadata
Title
Cryptographic Hardware and Embedded Systems - CHES 2002
Editors
Burton S. Kaliski
çetin K. Koç
Christof Paar
Copyright Year
2003
Publisher
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-36400-9
Print ISBN
978-3-540-00409-7
DOI
https://doi.org/10.1007/3-540-36400-5