1 Introduction
The main motivation for this work was to generalise the cube attack of Dinur and Shamir [
5], and the AIDA attack of Vielhaber [
21], from the binary field to arbitrary finite fields. However, some of the tools and results for differentiation over finite fields could have a broader applicability in cryptography.
Higher order derivatives were introduced in a cryptographic context by Lai [
16]. This notion had already been used extensively, under the name of finite difference, or discrete derivative, or difference operator, in other areas of mathematics (e.g. for the numerical approximation of the derivative or for solving difference equations).
The (discrete) derivative of a function f is defined as the function \((\varDelta _{\mathbf {a}} f)(\mathbf {x}) = f(\mathbf {x}+\mathbf {a})-f(\mathbf {x})\), for a fixed difference \(\mathbf {a}\) (the domain and codomain of f are commutative groups in additive notation). Usually f is a function of n variables over a ring, so \(\mathbf {x} = (x_1, \ldots , x_n)\) and \(\mathbf {a} = (a_1, \ldots , a_n)\). The operator \(\varDelta _{\mathbf {a}}\) which associates to each function f is derivative will be called differentiation. An important particular case is differentiation with respect to one variable, namely \(\mathbf {a}=h\mathbf {e}_i\) where the difference step h is a scalar constant (equal to 1 by default) and \(\mathbf {e}_i\) are the canonical basis vectors having a 1 in position i and zeroes elsewhere. Higher order differentiation means repeated application of the differentiation operator.
The functions we use here are functions in several variables over a finite field. Any such function can be represented as a polynomial function. The differentiation operator decreases the degree by at least one, so after sufficiently many applications of the operator the result will be the identically zero function. However for certain choices of differences \(\mathbf {a}\), this can happen prematurely, for example over the binary field \(\mathrm {GF}(2)\) differentiating twice using the same difference \(\mathbf {a}\) will always result in the zero function, regardless of the original function f. For our applications, we need to ensure we choose the differences so that this does not happen prematurely, i.e. there is at least a function f which does not become identically zero after computing the higher order derivative w.r.t. those differences.
A number of cryptographic attacks can be reformulated using higher order differentiation. Differential cryptanalysis (introduced by Biham and Shamir [
3]) has been thus reformulated by Lai [
16]; the cube attack of Dinur and Shamir [
5] and the related AIDA attack of Vielhaber [
21] have been reformulated in Knellwolf and Meier [
14], Duan and Lai [
7].
In both the cube attack and the AIDA attack we have a “black box” function f in several public and secret variables and we select a set of indices of public variables \(I=\{i_1, \ldots , i_k\}\). Then f is evaluated at each point of a “cube” consisting of the vectors that have all the possible combinations of 0/1 values for the variables with index in I, whereas the remaining variables are left indeterminate; the resulting values are summed and the sum will be denoted \(f_I\). The attacks hope that for suitable choices of subsets I of public variables, the resulting \(f_I\) is linear in the secret variables, for the cube attack (or equals to one secret variable or the sum of two secret variables for the AIDA attack). This situation is particularly likely when the cardinality k of I is just marginally lower than the total degree of the function, seen that the degree decreases by k in general, and occasionally by slightly more than k. Such subsets I are found in a preprocessing phase, where the values of the secret variables can be chosen freely. In the online phase the secret variables are unknown, and by computing the \(f_I\) for the sets I identified in the preprocessing phase, one obtains a system of linear equations in the secret variables.
It was shown (see Knellwolf and Meier [
14], Duan and Lai [
7]) that choosing the variable indices
\(I=\{i_1, \ldots , i_k\}\) and computing
\(f_I\) (as described above) is equivalent to computing the
k-order derivative of
f with respect to the canonical basis vectors
\(\mathbf {e}_{i_1}, \ldots , \mathbf {e}_{i_k}\), i.e. by differentiating once with respect to
\(x_{i_1}\), then w.r.t.
\(x_{i_2}\) and so on, finally differentiating w.r.t.
\(x_{i_k}.\)
It is known that the binary cube/AIDA attack can also be viewed in terms of the Moebius/Reed–Muller transform (which associates to a polynomial function
f a new function
\(f^M\) where
\(f^M(y_1, \ldots , y_n)\) equals the coefficient of
\(x_1^{y_1}\cdots x_n^{y_n}\) in
f). This transform has been used in several statistical attacks before the cube attack (see for example Filiol [
10] and O’Neal [
20]). It has also been used for improving the efficiency of the AIDA attack (see Vielhaber [
22]) and the Cube attack (see Fouque and Vannet [
11]).
All the attacks above, as well as the higher order differentiation used in cryptography are over the binary field. While all cryptographic functions can be viewed as binary functions, there are a number of ciphers which make significant use of operations modulo a prime
\(p>2\) in their internal processing, for example ZUC [
9], IDEA [
17,
18], MMB [
4]. It may therefore be advantageous for such ciphers to also be viewed and analysed as functions over
\(\mathrm {GF}(p)\), the field of integers modulo
p. Unlike the binary case, a polynomial function can now have degree more than one in each variable, in fact it can have degree up to
\(p-1\) in each variable. There are yet other ciphers which use operations over Galois fields of the form
\(\mathrm {GF}(p^s)\), for example SNOW [
8] and in such fields the degree of the polynomial functions can be up to
\(p^s-1\) in each variable.
A first generalisation of the cube attack to
\(\mathrm {GF}(p^s)\) was sketched by Dinur and Shamir in [
5, p. 284] and also developed more explicitly by Agnese and Pedicini [
1]. We show that, like in the binary case, their approach can be viewed as
k-order differentiation, where we differentiate once with respect to each of the variables
\(x_{i_1}, \ldots , x_{i_k}\). However we argue that their generalisation, while correct, has very low chances to lead to a successful attack because we cannot differentiate sufficiently many times. Namely, on one hand, like in the binary case, the best chances of success are when the function is differentiated a number of times just marginally lower than its total degree; on the other hand in their proposed scheme the number of times that the function is differentiated is upper bounded by the number of variables, which (unlike the binary case) can be significantly lower then the degree of the function (see Remark
3).
Our proposed generalisation of the cube attack to \(\mathrm {GF}(p)\) improves the chances of success by differentiating several times with respect to each of the chosen variables. Thus there is no intrinsic limit on the number of differentiations and therefore this number can be as close as we want to the degree of the polynomial in the public variables (only limited by the computing power available).
We first examine higher order differentiation in
\(\mathrm {GF}(p)\) (Sect.
4.1). For repeated differentiation with respect to the same variable, we can use any non-zero difference steps and there will be functions whose degree in that variable will decrease by exactly one for each differentiation. Choosing all the steps equal to one gives a compact and efficient formula for evaluating the higher order derivative for a “black box” function. While evaluating a higher order derivative we can obtain, at negligible extra cost, several derivatives of smaller order, just like one can use an efficient Moebius transform algorithm to compute all the subcubes of a large cube (see [
11]).
We then show, in Sect.
4.2 that the main result of the classical cube attack, [
5, Theorem 1], no longer holds when we differentiate repeatedly with respect to the same variable in
\(\mathrm {GF}(p)\) (see Example
3 for a counterexample). However, we show that a modified version of this theorem does hold, see Theorem
6. In Sect.
4.3 we show that this Theorem can also be viewed as describing the relation between differentiation and the Moebius Transform over
\(\mathrm {GF}(p)\).
The proposed algorithm generalising the cube attack to
\(\mathrm {GF}(p)\) is sketched in Sect.
4.4. Now we not only choose variables for the “cube” but we also choose the number of times we are going to differentiate with respect to each variable. For computational efficiency, choosing only one variable (or a small number of variables) and differentiating a large number of times with respect to that variable is preferable. In
\(\mathrm {GF}(p)\) probabilistic linearity testing has a smaller expected number of tests than in
\(\mathrm {GF}(2)\), see [
13]. We implemented this algorithm and give some concrete examples of functions where this algorithm is much more successful than a binary cube attack. Unfortunately we were unsuccessful so far in breaking actual ciphers (or reduced round versions thereof). We suspect this is due to the fact that the ciphers that use operations in
\(\mathrm {GF}(p)\) do combine them with operations in other fields/rings (e.g. ZUC includes operations modulo
\(2^{31}-1\) but also modulo
\(2^{32}\), IDEA includes operations modulo
\(2^{16}-1\) and modulo
\(2^{16}\)), thus increasing the degree of the polynomials over
\(\mathrm {GF}(p)\) beyond reach.
While this paper concentrates on generalising the cube attack, it is likely that other attacks that use differentiation could also be generalised to
\(\mathrm {GF}(p)\) using our technique, for example cube testers (see [
2]) can be generalised using Theorem
7, which shows that the values of the higher order derivative are uniformly distributed.
Finally, for completeness, we deal with generalisations to finite fields of the form
\(\mathrm {GF}(p^s)\) in Sect.
5.2. Here, for functions such as
\(x^d\) with
\(p \mid d\), differentiation with respect to
x decreases the degree by more than one regardless of the difference step. We give a more precise expression of the decrease in degree for higher order differentiation depending on the representation of the degree as an integer in base
p. Any function can be differentiated at most
\(s(p-1)\) times before it becomes identically zero. We describe a choice of difference steps for which there are functions which do not become zero for anything less than
\(s(p-1)\) differentiations. Due to the fact that differentiation only uses the additive group of
\(\mathrm {GF}(p^s)\), which is isomorphic to
\(\mathrm {GF}(p)^s\), differentiation over
\(\mathrm {GF}(p^s)\) can in fact be reduced to differentiation over
\(\mathrm {GF}(p)\) in each component of the projection of the function
f. Therefore, we feel that developing a cube attack in
\(\mathrm {GF}(p^s)\), while possible, does not bring any additional advantages compared to a cube attack in
\(\mathrm {GF}(p)\).
2 Preliminaries
Throughout this paper \(\mathrm {GF}(p^s)\) denotes the finite field with \(p^s\) elements where p is prime. R denotes an arbitrary commutative ring with identity and no zero divisors (i.e. an integral domain); the typical examples we have in mind are polynomials rings over \(\mathrm {GF}(p^s)\) or over other fields. Note that since R is a domain, its characteristic is either zero or a prime number.
We denote by \(\mathbf {e}_i = (0,\ldots , 0, 1, 0, \ldots , 0) \in R^n\) the vector which has a 1 in position i and zeroes elsewhere, i.e. \(\mathbf {e}_1, \ldots , \mathbf {e}_n\) is the canonical basis of the vector space \(R^n\).
We recall the definition of discrete derivative/discrete differentiation here, which will be called simply derivative and differentiation for the rest of this paper.
The differentiation operator is a linear operator; composition of such operators is commutative and associative. Repeated application of the operator (also called higher order differentiation or higher order derivative in [
16]) will be denoted by
$$\begin{aligned} \varDelta ^{(m)}_{\mathbf {a_1}, \ldots , \mathbf {a_m}} f = \varDelta _{\mathbf {a_1}}\varDelta _{\mathbf {a_2}}\ldots \varDelta _\mathbf {a_m}f \end{aligned}$$
(1)
where
\(\mathbf {a_1}, \ldots , \mathbf {a_m} \in R^n\) are not necessarily distinct. An explicit formula can be obtained easily from Definition
1 by induction (see [
16, Proposition 3]):
In particular, when we differentiate a function w.r.t. one variable repeatedly, with all differentiation steps equal to 1 we obtain the following formula, well known in the context of finite differences:
$$\begin{aligned} \varDelta ^{(m)}_{\mathbf {e_{1}}, \ldots , \mathbf {e_{1}}} f (x_1, \ldots , x_n) = \sum _{i=0}^{m} (-1)^{m-i}\left( {\begin{array}{c}m\\ i\end{array}}\right) f(x_1+i, x_2, \ldots , x_n). \end{aligned}$$
(2)
While the derivative can be defined for any function, in the sequel we will concentrate on polynomial functions. We will denote by
\(\deg _{x_i}(f)\) the degree of
f in the variable
\(x_i\). The total degree will be denoted
\(\deg (f)\), with the usual convention of
\(\deg (0) = - \infty \).
Differentiating with respect to one variable decreases the degree in that variable by at least one:
Recall that for integers
\(d, k_1, k_2, \ldots , k_m\) such that
\(\sum _{i=1}^m k_i = d\) and
\(k_i\ge 0\) the multinomial coefficient is defined as:
$$\begin{aligned} \left( {\begin{array}{c}d\\ k_1, k_2, \ldots , k_m\end{array}}\right) = \frac{d!}{k_1! k_2! \cdots k_m!}, \end{aligned}$$
generalising binomials coefficients, as
\(\left( {\begin{array}{c}d\\ k\end{array}}\right) = \left( {\begin{array}{c}d\\ k, d-k\end{array}}\right) \).
Next we examine the effect of higher order differentiation on univariate monomials. This result is straightforward and probably known but we could not find a reference so we included a proof for completeness; the general formula for univariate polynomials can be obtained using the linearity of the \(\varDelta \) operator.
\(\square \)
Recall that in a finite field \(\mathrm {GF}(p^s)\) we have \(a^{p^s} = a\) for all elements a. Hence, while (formal) polynomials over \(\mathrm {GF}(p^s)\) could have any degree in each variable, when we talk about the associated polynomial function, there will always be a unique polynomial of degree at most \(p^s-1\) in each variable which defines the same polynomial function. In other words we are working in the quotient ring \(\mathrm {GF}(p^s)[x_1, \ldots ,x_n]/\langle x_1^{p^s} - x_1, \ldots x_n^{p^s} - x_n \rangle \), and we use as representative of each class the unique polynomial which has degree at most \(p^s-1\) in each variable.
Moreover, all functions in n variables over a finite field can be written in Algebraic Normal Form (ANF) i.e. as polynomial functions of n variables. To summarise, each function in n variables over \(\mathrm {GF}(p^s)\) can be uniquely expressed as a polynomial function defined by a polynomial in \(\mathrm {GF}(p^s)[x_1, \ldots , x_n]\) of degree at most \(p^s-1\) in each variable.
The transform that associates to each function
\(f:\mathrm {GF}(p)^n\rightarrow \mathrm {GF}(p)\) the coefficients of its ANF is called the Moebius transform or Reed–Muller transform. More precisely, using the usual representation of
\(\mathrm {GF}(p)\) as
\(\{0, 1, \ldots , p-1\}\), the transform of
f is a function
\(f^M:\mathrm {GF}(p)^n\rightarrow \mathrm {GF}(p)\) with
\(f^M(y_1, \ldots , y_n)\) defined as being equal to the coefficient of
\(x_1^{y_1}\cdots x_n^{y_n}\) in the ANF of
f. We define a partial order on
\(\mathrm {GF}(p)^n\) as
\(\mathbf {a}\preceq \mathbf {b}\) iff
\(a_i\le b_i\) for all
\(i=1, \ldots , n\) (the order
\(\le \) on
\(\mathrm {GF}(p)=\{0, 1, \ldots , p-1\}\) being the usual order on the integers). In the binary case, the Moebius transform of
f can be obtained from the truth table of
f using the following result (see for example [
19, Chapter 13, Theorem 1]):
Note that using the Theorem above and Proposition
1 we also have that
\(f^M(\mathbf {y}) = \varDelta ^{(m)}_{\mathbf {e_{i_1}}, \ldots , \mathbf {e_{i_m}}} f (\mathbf {0})\) where
\(i_1, \ldots i_m\) are the positions of the non-zero elements of
\(\mathbf {y}\). An efficient algorithm for computing the truth table of
\(f^M\) from the truth table of
f is given for example in [
12, Algorithm 9.6]. The algorithm has
\(\mathcal {O}(n2^{n})\) complexity and works in-place (i.e. no significant additional memory needed). For computing
\(f^M\) over
\(\mathrm {GF}(p)\) with
\(p>2\), Theorem
2 does not hold; however
\(f^M\) can be computed by the an in-place algorithm given in [
12, Algorithm 9.9] with
\(np^{n-1}(p-1)^2\) operations, hence
\(\mathcal {O}(np^{n+1})\) complexity.
5 Generalisations to \(\mathrm {GF}(p^s)\)
In this section we take our generalisation further, to arbitrary finite fields \(\mathrm {GF}(p^s)\). An important particular case would be \(\mathrm {GF}(2^s)\), as many cryptographic algorithms (e.g. SNOW, AES) include operations over a field of this type.
5.1 Preliminaries
We need some known results regarding the values of binomial coefficients and multinomial coefficients in fields of finite characteristic.
Kummer’s theorem has been generalised to multinomial coefficients by various authors (see for example [
6] and citations therein).
We will be interested in the situations where the multinomial coefficients are not zero modulo p.
5.2 Differentiation in \(\mathrm {GF}(p^s)\)
When moving from \(\mathrm {GF}(p)\) to \(\mathrm {GF}(p^s)\) several things work differently. For a start, there are monomials for which differentiating once w.r.t. a variable x decreases the degree in x by more than one, regardless of the difference step. For example let us differentiate \(x^d\) once, obtaining \((x+h)^d - x^d\). The coefficient of \(x^{d-1}\) is dh, so when d is a multiple of p the degree is strictly less than \(d-1\). We examine the general case:
The previous theorem implies:
Is the bound in Corollary
2 tight, in the sense that there are functions which are non-zero after
m differentiations? In particular, for Corollary
3, are there functions which are still non-zero after
\(s(p-1)\) differentiations? We will show that this is indeed the case, but only if we choose the
\(h_i\) carefully. First let us illustrate a choice of the steps
\(h_i\) which we need to avoid. By Eq. (
2), if we differentiate
p times with all steps equal to 1 the result is identically zero regardless of the original function
f:
$$\begin{aligned} \varDelta ^{(p)}_{\mathbf {e_{1}}, \ldots , \mathbf {e_{1}}} f (x_1, \ldots , x_n)= & {} \sum _{i=0}^{p} (-1)^{p-i}\left( {\begin{array}{c}p\\ i\end{array}}\right) f(x_1+i, x_2, \ldots , x_n)\\ {}= & {} f(x_1+p, x_2, \ldots , x_n) - f(x_1, x_2, \ldots , x_n)\\= & {} 0 \end{aligned}$$
because all the coefficients
\(\displaystyle \left( {\begin{array}{c}p\\ i\end{array}}\right) \) for
\(0<i<p\) are divisible by
p.
Denote by
\(b_0, \ldots , b_{s-1}\) a basis of
\(\mathrm {GF}(p^s)\) viewed as a
s-dimensional vector space over
\(\mathrm {GF}(p)\). We choose the sequence
\(h_{1}, \ldots , h_{(p-1)s}\) as follows:
\(p-1\) values of
\(b_0\), followed by
\(p-1\) values of
\(b_1\) etc. For these choices of
\(h_i\) Proposition
1 becomes:
We show next that our choice of \(h_i\) is indeed a valid choice in the sense that there are functions which can be differentiated \(m(p-1)\) times w.r.t. the same variable without becoming zero.
Note that there are other possible valid choices of \(h_i\), but we aimed to keep things simple computationally by using this particular choice.
Generalising the results of this section to differentiation w.r.t. several variables is not difficult but the notation becomes cumbersome. Moreover, we will see in the next subsection that such a generalisation is not very useful for a practical attack.
5.3 Reducing differentiation over \(\mathrm {GF}(p^s)\) to differentiation over \(\mathrm {GF}(p)\)
Fix a basis \(b_0, \ldots , b_{s-1} \in \mathrm {GF}(p^s)\) for \(\mathrm {GF}(p^s)\) viewed as an s-dimensional vector space over \(\mathrm {GF}(p)\). Any element \(a\in \mathrm {GF}(p^s)\) can be uniquely written as \(a= a_0b_0+\cdots +a_{s-1}b_{s-1}\) with \(a_i\in \mathrm {GF}(p)\). Denote by \(\varphi :\mathrm {GF}(p^s)\rightarrow \mathrm {GF}(p)^s\) the vector space isomorphism defined by \(\varphi (a) = (a_0, \ldots , a_{s-1})\); this can be naturally extended to \(\varphi :\mathrm {GF}(p^s)^n\rightarrow \mathrm {GF}(p)^{sn}\); denote by \(\pi _j: \mathrm {GF}(p^s)\rightarrow \mathrm {GF}(p)\) the s projection homomorphisms defined as \(\pi _j(a) = a_j\).
Let
\(f:\mathrm {GF}(p^s)^n \rightarrow \mathrm {GF}(p^s)\) be a polynomial function in
n variables
\(x_1, \ldots , x_n\). By writing
\(x_i = x_{i0}b_0 + \cdots + x_{i, s-1} b_{s-1}\) the function
f can be alternatively viewed as a function
\(\overline{f}:\mathrm {GF}(p)^{sn} \rightarrow \mathrm {GF}(p)^{s}\) defined by
\(\overline{f} = \varphi \circ f \circ \varphi ^{-1}\):
$$\begin{aligned} \begin{array}{ccc} \mathrm {GF}(p^s)^n &{} \mathop {\longrightarrow }\limits ^{f} &{}\mathrm {GF}(p^s) \\ \varphi \downarrow &{} &{} \downarrow \varphi \\ \mathrm {GF}(p)^{sn} &{} \mathop {\longrightarrow }\limits ^{\overline{f}} &{}\mathrm {GF}(p)^s \\ \end{array} \end{aligned}$$
Alternatively we can view
f as
s polynomial (projection) functions
\(\overline{f}_0, \ldots , \overline{f}_{s-1}: \mathrm {GF}(p)^{sn} \rightarrow \mathrm {GF}(p)\), defined by
\(\overline{f}_i= \pi _i \circ f\circ \varphi ^{-1} \) each in
sn variables
\(x_{ij}\) with
\(i=1, \ldots , n\) and
\(j=0, \ldots , s-1\).
Due to the linearity of the differentiation operator and the fact that
\(\varphi \) is an isomorphism, the following diagram commutes:
$$\begin{aligned} \begin{array}{ccc} \mathcal {F}(\mathrm {GF}(p^s)^n,\mathrm {GF}(p^s)) &{} \mathop {\longrightarrow }\limits ^{\varDelta _{\mathbf {a}}} &{} \mathcal {F}(\mathrm {GF}(p^s)^n,\mathrm {GF}(p^s)) \\ \Phi \downarrow &{} &{} \downarrow \Phi \\ \mathcal {F}(\mathrm {GF}(p)^{sn}, \mathrm {GF}(p)^s) &{} \mathop {\longrightarrow }\limits ^{\varDelta _{\varphi (\mathbf {a})}} &{} \mathcal {F}(\mathrm {GF}(p)^{sn}, \mathrm {GF}(p)^s)\\ \end{array} \end{aligned}$$
where
\(\mathcal {F}(A,B)\) denotes the set of functions from
A to
B, for any sets
A and
B, and
\(\Phi \) is the operator defined as
\(\Phi (f) = \overline{f}\). For our purposes we will be interested in differentiations w.r.t. one variable. In the next theorem, since we are differentiating functions in a different number of variables, we will use for the canonical basis vectors the notation
\(\mathbf {e}^{(n)}_{i}\) instead of
\(\mathbf {e}_{i}\) to clarify the length
n of the vector.