Skip to main content
Erschienen in:

Open Access 11.09.2024

Approximating neural distinguishers using differential-linear imbalance

verfasst von: Guangqiu Lv, Chenhui Jin, Zhen Shi, Ting Cui

Erschienen in: The Journal of Supercomputing | Ausgabe 19/2024

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

At CRYPTO 2019, Gohr first proposed neural distinguishers (NDs) on SPECK32, which are superior to the distinguishers based on the differential distribution table (DDT). Benamira et al. noted that NDs rely on the differential distribution of the last three rounds, and Bao et al. pointed out that NDs depend on the strong correlations between the bit values of ciphertext pairs satisfying the expected differential. Hence, one may guess that there exist deep relations between NDs and the differential-linear imbalances. To approximate NDs under a single ciphertext pair, we utilize differential-linear imbalances to construct simplified distinguishers. These newly constructed distinguishers offer comparable distinguishing advantages to that of NDs but with reduced time complexities. For instance, one such simplified distinguisher has only \(2^{-1.35}\) of the original time complexity of NDs. Our experiments demonstrate that these new distinguishers achieve a matching rate of 98.2% for 5-round SPECK32 under a single ciphertext pair. Furthermore, we achieve the highest accuracies for 7-round and 8-round SPECK32 up to date by using a maximum of 512 ciphertext pairs. Finally, by replacing NDs with simplified distinguishers, we significantly reduce the time complexities of differential-neural attacks on 11–14 rounds of SPECK32.
Hinweise

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

In recent decades, automatic cryptanalysis tools have become increasingly desired. The link between machine learning and cryptography was discussed as early as 1993 [1]. With the successful application of neural networks in side-channel cryptanalysis [2] in recent years, Gohr [3] at CRYPTO’19 firstly proposed 5–8 rounds of NDs on SPECK32. Additionally, Gohr presented an 11-round SPECK32 key recovery attack with a time complexity of \(2^{38}\), which was the lowest-time-complexity attack at that time. This work opens new possibilities for machine learning-aided cryptanalysis.
Up to now, the maximum number of rounds of practical attacks on SPECK32 in the single-key setting is 13, which is based on differential-neural cryptanalysis. For conventional cryptanalysis, the maximum number of rounds of practical attacks on SPECK32 is 12 [4]. Bao et al. in [5] proposed the concept of conditional neutral bits/vectors for differential characteristics and first presented a 13-round practical attack on SPECK32. Later, Wang et al. in [6] further improved the accuracy of neural distinguishers through a more powerful neural network, GoogLeNet, and presented a 13-round practical attack on SPECK32 with the lowest time complexity.
Related works on the simplification of neural distinguishers. At EUROCRYPT’21, Benamira et al. in [7] presented significant progress on the simplification of Gohr’s NDs. Through experiments, it was found that ND relies heavily on the differential distribution in the penultimate and antepenultimate rounds. Then, a truncated differential distinguisher (TDD) based on an exhaustive search of partial subkeys was proposed. For 6-round SPECK32, Benamira et al. showed that 94.98% of ciphertext pairs tested have the same classifications under the TDD and ND. Gohr’s neural distinguisher is composed of three nonlinear functions/blocks. The third block of Gohr’s neural distinguisher is a layer called Multi-Layer Perceptron (MLP) or Fully Connected Neural Network (FCNN). Moreover, Benamira et al. presented the following conjectures:
  • Conjecture 1. It is possible that Gohr’s neural distinguisher can be replaced by a strategy inspired by both differential cryptanalysis and machine learning.
  • Conjecture 2. The multilayer perceptron (MLP) block in Gohr’s neural distinguisher can be replaced by another ensemble classifier.
  • Conjecture 3. Gohr’s neural network outperforms other non-neural network machine learning models.
  • Conjecture 4. Gohr’s neural network may exploit differential-linear characteristics.
Benamira et al. experimentally confirmed the first three conjectures, and Conjecture 4 was first verified by Chen et al. in [8]. They first proposed an extended differential-linear connectivity table (EDLCT) to characterize differential-linear approximations. To construct their distinguishers, they first utilized the feature set sensitivity test (FSST) to select a significant bit set of ciphertext. Then, they used all linear combinations on these ciphertext bits and the logistic regression (LR) classification algorithm to construct distinguishers. The LR can be formulated as follows
$$\begin{aligned} z = \frac{1}{{1 + {e^{{W}X^T}}}} = \frac{1}{{1 + {e^{ - ({w_0} + {w_1} \times {x_1} + \cdots + {w_n} \times {x_n})}}}}, \end{aligned}$$
(1)
where \(X=(x_1,x_2,\ldots ,x_n)\) is a binary vector, and \(W=(w_1,w_2,\ldots ,w_n)\) represents the weights associated with each value \(x_i\). In this context, \(x_i\) represents the inner product between the current ciphertext pair and the i-th output mask of differential-linear approximations. If \(z>0.5\), the predicted label for the input sample is 1, otherwise it is 0. Their experiments showed that Gohr’s neural distinguishers are closely related to EDLCT, which opens new possibilities for the simplification of neural distinguishers.
At ASIACRYPT’23, Bao et al. discovered that NDs exploit the strong correlations between bit values in right pairs of XOR-differential propagation through addition modulo \(2^n\). Specifically, they observed a strong imbalance in certain bits of a ciphertext pair when a specified differential is satisfied. Furthermore, Bao et al. pointed out that this type of information is contained in the extended differential-linear distribution. Their simplified models of NDs based on DDT achieve comparable accuracies as NDs with a large memory of approximately 32.5 GB, which NDs do not possess. In summary, the existing methods for constructing the simplified models of NDs suffer from limitations such as the need to obtain a ND in advance, the requirement to exhaust some bits of key, or a large memory overhead.
Enhancing the accuracy of NDs has become increasingly challenging. To address this, Bellini et al. [9] proposed a cipher-agnostic neural training pipeline to find good input differences. Generally, utilizing more information is a common method to improve the accuracy of neural networks. However, the maximum number of ciphertext pairs simultaneously used for one judgment in [6, 10] is 16 for the reason that neural networks have limitations on the input scale. The scale of a neural network is proportional to the input scale, meaning that increasing the number of ciphertext pairs leads to a rapid growth in memory and computing requirements.
Motivation Cryptography can be viewed as a rigorous mathematical application, such as statistics and algebraic analysis. However, it faces a challenge due to the interpretability issues of neural distinguishers. The effectiveness of neural networks in cryptographic applications is fundamentally attributed to the inherent imbalance in the bit vectors of ciphertext pairs. Hence, this paper aims to develop simplified distinguishers for NDs and to enhance their capabilities from the perspective of differential-linear imbalance (Table 1).
Our Contributions We construct simplified models of neural distinguishers with comparable accuracies and lower time complexities. Consequently, replacing NDs with simplified distinguishers significantly reduces the time complexity of differential-neural attacks on 11–14 rounds of SPECK32. This paper has three main contributions:
First, we approximate NDs with differential-linear imbalance. We present four types of multiple differential-linear distinguishers (MDLDs)1 for 5–8 rounds of SPECK32, each using a different classification algorithm. As demonstrated in Table 2, MDLDs exhibit comparable distinguishing advantages to NDs. In contrast to previous simplified models, MDLDs not only follow similar construction conditions to NDs but also overcome previous shortcomings. Second, the accuracy of MDLDs can be significantly improved by utilizing more ciphertext pairs, as demonstrated in Table 3. Additionally, we provide a comprehensive comparison between MDLDs and neural distinguishers in terms of matching rate, memory complexity and time complexity. The detailed results are presented in Tables 4 and 5. Experimental results reveal that the time complexity of DL+FCNN distinguisher, one of the four MDLDs presented in this paper, is approximately \(2^{-1.35}\) of that of NDs. Third, we introduce simplified distinguishers as replacements for the NDs utilized in Gohr’s differential-neural attacks and introduce the attacks for 11–14 rounds of SPECK32. Among all attacks that offer success rates, we achieve the lowest time complexities for 12–14 rounds of SPECK32. Notably, no simplified models have been applied to Gohr’s differential-neural attacks before.
Outline In Sect. 2, we introduce notations, basic concepts and the main development context of differential-neural cryptanalysis. In Sect. 3, we present the MDLDs against 5–8 rounds of SPECK32 and significantly improve the accuracy of 7/8-round SPECK32 distinguishers by utilizing an adequate number of ciphertext pairs for each judgment. Additionally, we provide a comprehensive comparison between MDLDs and neural distinguishers in terms of matching rate, memory complexity and time complexity. In Sect. 4, by replacing NDs of Gohr’s differential-neural attacks with MDLDs, we achieve a significant reduction in the overall time complexities of the SPECK32 attacks. Finally, we present our conclusions in Sect. 5.

2 Preliminaries

2.1 Notations and basic concepts

Table 1
Notations
Symbols
Descriptions
\(\wedge\), \(\oplus\), \(\boxplus\), \(\cdot\)
Bitwise AND, eXclusive-OR, modular addition, inner product over the binary field
\(\lll ,\ggg\)
Left-rotation and right-rotation
||
Concatenation operation
\(\mathrm{E,E^{-1}}\)
The encryption function of a cipher and its inverse function
\(\mathrm{F,F^{-1}}\)
The round function of a cipher and its inverse function
HW(x)
The Hamming weight of x
\(\textrm{N}_r\)
A neural distinguisher of r-round SPECK32
\(\Pr _{D}[\mathcal {A}]\)
The probability of \(x\in \mathcal {A}\), where the variable x follows the distribution D
Unless otherwise specified, the sizes of batch, training set and test set for the neural network utilized in this paper are \(10^4\), \(10^7\) and \(10^6\), respectively.
The three input differences utilized in this paper are \(0x0040\_0000\), \(0x0020\_0000\) and \(0x2800\_0010\), which result in non-negligible accuracies for 5–8 rounds of neural distinguishers on SPECK32. Unless otherwise specified, the default input difference of the distinguishers utilized is \(0x0040\_0000\).
Definition 1
(Advantage of a distinguisher [11]) Let \(Z^n\) be a sequence of n independent and identically distributed random variables over Z following a probabilistic distribution D, where \(D\in \{D_0,D_1\}\). An algorithm that takes a sequence of n samples as input and outputs either 0 or 1 is known as a distinguisher. The acceptance region of a distinguisher is defined by \(\mathcal A\), where the distinguisher outputs 0 when the realization \(z^n\in \mathcal A\), outputs 1 otherwise. The ability to distinguish one distribution from another is known as the advantage of the distinguisher and is defined as
$$\begin{aligned} {\textrm{Adv}^n_{\mathcal {A}}=|Pr_{D^n_0}[{\mathcal {A}}]-Pr_{D^n_1}[{\mathcal {A}}]|}. \end{aligned}$$
The accuracy (Acc) of a distinguisher is \(Acc= {\Pr} [D=D_0] {\times} \Pr _{D^n_0} [ {{\mathcal {A}}} ] + \Pr [D=D_1] \times \Pr _{D^{n}_1}[{{{{\bar{\mathcal A}}}}}]\), where \({{\bar{ \mathcal A}}}\) is the complementary set of \({\mathcal {A}}\). For the case where \(\Pr [D=D_0]=\Pr [D=D_1]=0.5\), \(Adv=2|Acc-\frac{1}{2}|\) holds.
Theorem 1
(Optimal Distinguisher [11]) The acceptance region of a distinguisher with the maximum advantage to distinguish \(D=D_0\) from \(D=D_1\) is \({{\mathcal A}_{{opt}}} = \{z^n\in {{\mathcal A}^{n}}:\rm{LLR}(z^n)\geqslant 0\}\), where
$$\begin{aligned} \textrm{LLR}(z^n)=\sum _{a\in \mathcal Z}\textrm{N}(a|z^n)\ \textrm{log}\frac{\textrm{Pr}_{ D_0}[\mathcal {A}]}{\textrm{Pr}_{ D_1}[{\mathcal A}]} \end{aligned}$$
is the logarithmic likelihood ratio (LLR), and \(\textrm{N}(a|z^n)\) is the number of times that symbol a occurs in sequence \(z^n\).
Definition 2
(Extended Differential-linear Connectivity Table, EDLCT [8]) Let \(\textrm{S}:\mathbb {F}_2^n \rightarrow \mathbb {F}_2^m\), \(\Delta \in \mathbb {F}_2^n\) and \({\Gamma _1},{\Gamma _2} \in \mathbb {F}_2^{m}\). Then, each item of EDLCT is defined as
$$\begin{aligned} \begin{aligned} {EDLCT}_S(\Delta ,{\Gamma _1},{\Gamma _2})=\#\{ x \in \mathbb {F}_2^n|{\Gamma _1} \cdot \mathrm{{S}}(x) = {\Gamma _2} \cdot \mathrm{{S}}(x \oplus \Delta )\} - {2^{n - 1}} \end{aligned} \end{aligned}$$
and each item of \(\overline{EDLCT}\) is defined as \(\overline{EDLCT}(\Delta ,{\Gamma _1},{\Gamma _2})={EDLCT}(\Delta ,{\Gamma _1},{\Gamma _2})/2^n\).

2.2 A brief description of SPECK

SPECK is a lightweight iterative cipher designed by the US National Security Agency. SPECK32 is a member of the SPECK family. SPECK32 needs to perform 22 rounds, as shown in Algorithm 1.

2.3 Overview of neural distinguishers

In the following, we briefly recall the neural distinguisher and the significant conjectures and conclusions about NDs in [3, 7].
For the sake of clarity, the ciphertext pairs that are utilized simultaneously by the distinguisher to give a judgment are called a ciphertext group, in which the number of ciphertext pairs is denoted by NumPairs. To be consistent with the statement in [3, 7], the ciphertext pairs that are utilized by the distinguishers to determine whether the guessed key is true or not form a ciphertext structure. The number of ciphertext groups in one ciphertext structure is denoted by DataSize. Next, NDs with NumPairs\(=1\) proposed in [3] are briefly recalled. The neural network itself is a classification algorithm in which the training set is in the form of \(\{({c_i},{c_i}',y_i)|i\in \{0,\cdots ,n-1\}\}\). Let the input difference of a r-round neural distinguisher \(\textrm{N}_r\) be \(\Delta\). Then, the label \(y_i\) of a ciphertext pair \(({c_i},{c_i}')\), encrypted from a plaintext pair \(({p_i},{p_i}')\), is defined as
$$\begin{aligned} {y_i} = \left\{ {\begin{array}{*{20}{l}} {1,}&{}{if\;{p_i} \oplus {p_i}^\prime = \Delta ;}\\ {0,}&{}{otherwise.} \end{array}} \right. \end{aligned}$$
The ciphertext pairs labeled 1 (resp. 0) are called the positive (resp. negative) data. Taking \(\{({c_i},{c_i}',y_i)|i\in \{0,\cdots ,n-1\}\}\) as the training set of the neural network, one can obtain a distinguisher that can distinguish the ciphertext pairs labeled 1 from those labeled 0. For each ciphertext pair \(({c_i},{c_i}')\), the output of NDs is called a score of \(({c_i},{c_i}')\), which is between 0 and 1.
Let the score of \(({c_i},{c_i}')\) given by NDs be \(\bar{y}_i\). Generally if \(\bar{y}_i\ge 0.5\), it is considered that the ciphertext pair \(({c_i},{c_i}')\) comes from the positive data; otherwise, \(({c_i},{c_i}')\) is from the negative data. The proportion of positive (resp. negative) data in the training set is denoted by \(\Pr _{pos}\)(resp. \(\Pr _{neg}\)). The true positive rate and true negative rate are denoted by TPR and TNR, respectively, where \(\textrm{TPR}=\Pr [\bar{y}_i=1|{y}_i=1]\) and \(\textrm{TNR}= \Pr [\bar{y}_i=0|{y}_i=0]\). Then, the accuracy (Acc) of a neural distinguisher is \(Acc = \Pr _{pos} \times \textrm{TPR} + \Pr _{neg} \times \textrm{TNR}\). Generally, half of the ciphertext pairs are labeled 1, i.e., \(\Pr _{pos}={\Pr _{neg}}=0.5\). Hence \(Acc = \frac{\textrm{TPR} }{2} + \frac{\textrm{TNR}}{2}\) holds.
In what follows, the structure of a neural distinguisher is briefly reviewed. The neural network used by Gohr’s NDs is an ordinary convolutional neural network. Gohr’s NDs consist of 3 nonlinear blocks, as shown in Fig. 1. Block 1 is the initial convolution block, and Block 2 is the residual convolution block. In the 5/6-round SPECK32 distinguishers proposed by Gohr, Block 2 is repeated 10 times. The residual block adds the input of the first layer of the two convolution/linear layers to the output of the second layer, which is generally used to solve the vanishing gradient problem when the neural network has too many layers. Block 3 contains the FCNN classification algorithm, which is also called MLP in [7].

3 Approximating NDs using differential-linear imbalances

In this section, we introduce multiple differential-linear distinguishers (MDLDs) for 5–8 rounds of SPECK32 that demonstrate a comparable distinguishing advantage to NDs. Notably, both the NDs and our MDLDs rely solely on a single pair of ciphertext. The accuracies of MDLDs are higher when utilizing the maximum number of ciphertext pairs that these distinguishers can support, as demonstrated in Sect. 3.2.

3.1 Constructions of simplified distinguishers

Let \((c,{\bar{c}})\) represent a ciphertext pair and \((\Gamma _1,\bar{\Gamma }_1),\cdots , (\Gamma _N,\bar{\Gamma }_N)\) be the output masks of N differential-linear approximations. We denote \(x_i\) as \(x_i=\Gamma _i\cdot c \oplus \bar{\Gamma }_i\cdot \bar{c}\) for \(1\le i \le N\). To ascertain whether \((c,{\bar{c}})\) is encrypted from a plaintext pair with the fixed difference \(\Delta _{in}\) (hereinafter referred to as a "right pair" for brevity), one can judge it through a N-bit binary vector \(x_1\Vert x_2\Vert \dots \Vert x_N\). Using SPECK32 as an illustrative example, only these N bits are fed into the newly developed distinguishers, whereas a pair of ciphertext, totaling 64 bits, is utilized as input for the previous NDs. This represents a significant decrease in input size compared to the NDs. This paper employs four classification algorithms: Extended Logarithmic Likelihood Ratio(ELLR, a variant of Logarithmic Likelihood Ratio [11]), Logistic Regression (LR, Eq. 1), Light Gradient Boosting Machine(LGBM, [12]) and Fully Connected Neural Network(FCNN, [13]), to build up four types of distinguishers, namely DL+ELLR, DL+LR, DL+LGBM and DL+FCNN, respectively.
Taking DL+ELLR as an example, for a ciphertext pair \((c,{\bar{c}})\) and the corresponding binary vector \(x_1\Vert x_2\Vert \dots \Vert x_N\), one can classify \((c,{\bar{c}})\) as a right pair if the following is greater than 0. Note that the probability for a random binary variable to be 0, denoted by \(P_{Random}\), is \(\frac{1}{2}\).
$$\begin{aligned} \begin{aligned} {\text {ELLR}}({x_1}\Vert {x_2} \ldots \Vert {x_N})&= C + {\log _2}\prod \limits _{i = 1}^N {\left( {{{\frac{{1 + {{( - 1)}^{{x_i}}}{c_i}}}{2}} \Bigg / {P_{Random}}}} \right) } \\&= C + \sum \limits _{i = 1}^N {{{\log }_2}\left( {1 + {{( - 1)}^{{x_i}}}{c_i}} \right) } \\ \end{aligned} \end{aligned}$$
(2)
Here C is a constant and \(c_i\) represents the correlation of the i-th differential-linear approximation with output mask \((\Gamma _i,\bar{\Gamma }_i )\). Since the bits \(x_1, x_2,\dots ,x_N\) are not mutually independent, we introduce the constant C to the Logarithmic Likelihood Ratio distinguisher [11], as shown in Eq. 2. The constant C should be 0 when the variables \(x_1,x_2,\dots ,x_N\) are independent from each other. For distinguishers using LR (or LGBM, or FCNN), they leverage a pre-trained LR (or LGBM, or FCNN) classification algorithm to assess whether the N-bit binary vector \(x_1\Vert x_2\Vert \dots \Vert x_N\) is random. Consequently, they can determine whether the current ciphertext pair \((c,{\bar{c}})\) is random.
For the selection of these N differential-linear approximations, we initially search a large number of differential-linear approximations by experiments according to the Hamming weight of the output masks. Subsequently, we utilize the LGBM classification algorithm to filter these approximations. The \(feature\_importance\) function offered by LGBM provides a measure of significance for each differential-linear approximation. Leveraging this metric, we handpick the most influential N approximations to bolster the distinguishing advantage of our distinguishers.
In general, traditional multiple differential-linear distinguishers (MDLDs) rely on a large number of ciphertext pairs. However, since these N DL approximations are so imbalanced that a single ciphertext pair is sufficient to guarantee the non-negligible distinguishing advantages of these MDLDs, as shown in Table 2. For the sake of brevity, throughout this paper, we refer to these distinguishers that employ only one ciphertext pair and multiple DL approximations as multiple differential-linear distinguishers.
Table 2
The simplified models for 5–8 rounds of NDs when only utilizing a single ciphertext pair. Pre-ND indicates that to construct the current distinguisher, one needs to obtain a ND in advance. Exh-Key denotes that this distinguisher itself exhausts a portion of the key. Pre-DDT indicates that the construction of this distinguisher necessitates a memory overhead of approximately 32.5 GB
\(\text{Round}\)
\(\text{Scenario}\)
\(\text{Distinguisher}\)
\(\text{TPR}\)
\(\text{TNR}\)
\(\text{Acc}\)
\(\mathbf -Log_2(Adv)\)
References
\(\textbf{5}\)
 
ND
0.904
0.954
0.929
0.22
[3]
Pre-ND,Exh-Key,Pre-DDT
TDD
0.9076
0.9522
0.9298
0.22
[7]
Pre-ND
M-ODT
0.955
0.891
0.923
0.24
[7]
Pre-ND
partial C+LR
0.847
0.53
[8]
Pre-DDT
DDT
0.9171
0.9557
0.9364
0.20
[14]
 
DL+ELLR
0.7510
0.9679
0.8594
0.48
Our
 
DL+LR
0.8605
0.9306
0.8956
0.34
Our
 
DL+LGBM
0.8825
0.9608
0.9217
0.25
Our
 
DL+FCNN
0.9003
0.9535
0.9269
0.23
Our
\(\textbf{6}\)
 
ND
0.724
0.853
0.788
0.80
[3]
Pre-ND,Exh-Key,Pre-DDT
TDD
0.7253
0.8507
0.7879
0.80
[7]
Pre-ND
M-ODT
0.852
0.706
0.779
0.84
[7]
Pre-ND
partial C+LR
0.667
1.58
[8]
Pre-DDT
DDT
0.7309
0.8583
0.7946
0.76
[14]
 
DL+ELLR
0.5626
0.8017
0.6822
1.46
Our
 
DL+LR
0.6462
0.7560
0.7011
1.31
Our
 
DL+LGBM
0.6958
0.8577
0.7767
0.85
Our
 
DL+FCNN
0.7162
0.8495
0.7829
0.82
Our
\(\textbf{7}\)
 
ND
0.533
0.699
0.616
2.11
[3]
Pre-ND,Exh-Key,Pre-DDT
TDD
0.5531
0.6524
0.6028
2.28
[7]
Pre-DDT
DDT
0.5435
0.7046
0.6240
2.01
[14]
 
DL+ELLR
0.5696
0.5913
0.5804
2.64
Our
 
DL+LR
0.5738
0.5903
0.5820
2.61
Our
 
DL+LGBM
0.5709
0.6430
0.6070
2.22
Our
 
DL+FCNN
0.5544
0.6617
0.6081
2.21
Our
\(\textbf{8}\)
 
ND
0.5190
0.5080
0.514
5.16
[3]
Pre-DDT
DDT
0.4919
0.5469
0.5194
4.69
[14]
 
DL+FCNN
0.4140
0.6000
0.5070
6.16
Our
 
DL+ELLR\(^{2}\)
0.5000
0.5130
0.5065
6.27
Our
 
DL+LR\(^{2}\)
0.5169
0.5042
0.5106
5.56
Our
 
DL+LGBM\(^{2}\)
0.4247
0.6165
0.5206
4.60
Our
 
DL+FCNN\(^{2}\)
0.3909
0.6506
0.5208
4.59
Our
\(^1\)TPR—true positive rate, TFR—true negative rate, Acc—accuracy, Adv—distinguishing advantage.
\(^2\)The input difference of the current distinguisher is \(0x2800\_0010\), whereas the other distinguishers have an input difference of \(0x0040\_0000\).
\(^3\)DDT represents the improved DDT-based distinguishers [14], while M-ODT refers to a distinguisher based on Masked Output Distribution Table (M-ODT) [7]

3.2 Accuracy and further improvement

In this section, we investigate the accuracies of the simplified models of NDs and also improve the accuracies of MDLDs by utilizing more ciphertext pairs concurrently.
It is noteworthy that our previously proposed distinguishers only relied on a single pair of ciphertext, similar to NDs. However, in contrast to previous NDs [6, 10, 15], which were restricted to a maximum of 16 ciphertext pairs, the MDLDs proposed in this paper are not bound by such limitations. Consequently, we present a 7-round distinguisher with an accuracy of 0.9999 and an 8-round distinguisher with an accuracy of 0.8866 against SPECK32.
Accuracy and distinguishing advantage Let Adv and Acc represent the distinguishing advantage and accuracy of a distinguisher. According to Definition 1, the distinguishing advantage is computed by \(Adv=2|Acc-\frac{1}{2}|\). See Table 2 for the comparison of distinguishing advantage between MDLDs and other simplified models of NDs. The accuracy of the 8-round SPECK32 distinguisher based on DL+FCNN is 0.5070 when the input difference is \(0x0040\_0000\). By modifying the input difference to \(0x2800\_0010\), we observe significant improvements in the accuracies of 8-round MDLDs. It is worth noting that these newly constructed distinguishers rely on a single ciphertext pair, similar to NDs.
Further accuracy improvements Furthermore, MDLDs can simultaneously explore more ciphertext pairs than NDs, leading to higher accuracies. Generally, using multiple ciphertext pairs simultaneously for one judgment can increase the accuracy of NDs. Indeed, Hou et al. [16] and Chen et al. [10] have utilized multiple ciphertext pairs as input to improve the accuracies. However, the scale of a neural network is proportional to the number of ciphertext pairs utilized. Consequently, as the number of these pairs increases, both the memory requirement and the time complexity undergo a rapid surge. Therefore, due to the memory and computation limitations associated with the scale of the neural network, both [16] and [10] utilize only a small number of ciphertext pairs. As a result, the improvement achieved in these studies is limited.
Table 3
Accuracies of neural distinguishers and MDLDs when using different numbers of ciphertext pairs
\(\text{Round}\)
\(\text{Distinguisher}\)
1
2
4
8
16
32
64
128
256
512
References
\(\textbf{7}\)
ConNet
0.614
[3]
ConNet
0.6393
0.6861
0.7074
0.6694
[10]
ConNet
0.663
0.725
0.801
0.883
[15]
GoogLeNet
0.6649
0.7283
0.8106
0.8963
[6]
MDLD
0.6081
0.6481
0.6970
0.7311
0.8013
0.8813
0.9518
0.9901
0.9993
0.9999
Our
\(\textbf{8}\)
ConNet
0.514
[3]
ConNet
0.521
0.530
0.543
0.560
[15]
GoogLeNet
0.5428
0.5590
0.5854
[6]
MDLD
0.5219
0.5349
0.5468
0.5612
0.5868
0.6189
0.6648
0.7267
0.8046
0.8866
Our
ConNet=Convolutional neural network, GoogLeNet= The neural network proposed by Google, MDLD= Multiple differential-linear distinguisher, − = Not available
The maximum number of ciphertext pairs simultaneously used for one judgment in [6, 10, 15] is 16, since the neural networks have limitations on the input scale. The MDLDs proposed in this paper are not constrained by the number of ciphertext pairs used. This is because the input to the classification algorithm is the biases of differential-linear approximations. Table 3 presents the accuracies of MDLDs based on DL+FCNN for 7–8 rounds of SPECK32.
Moreover, Chen et al. in [10] discovered that Gohr’s NDs may encounter difficulties in effectively utilizing multiple ciphertext pairs for one judgment. For instance, the accuracy of ND using 16 pairs of ciphertext is lower than that using 8 pairs of ciphertext. However, when a more powerful neural network, such as GoogLeNet, is used, Wang et al. [6] obtained the opposite result that the accuracy of using 16 pairs of ciphertexts is higher than that of using 8 pairs of ciphertexts. Nevertheless, MDLDs do not face this issue at all, since the statistical bias becomes more accurate as more ciphertext pairs are used simultaneously for one judgment.

3.3 Matching rate

In this section, considering the case of using only a single ciphertext pair, we investigate the similarities between MDLDs and NDs in terms of matching rate.
Table 4
The matching rate between Gohr’s NDs and their corresponding simplified models. Among all simplified models of NDs, MDLDs achieve the highest matching rate up to date
\(\text{Round}\)
\(\text{Type}\)
\(\text{Distinguisher}\)
\(\text{Accuracy}\)
\(\text{Matching Rate}^{1}\)
References
5
Neural Network
Gohr’s ND
0.929
[3]
Differential Cryptanalysis
TDD
0.9298
97.6%
[7]
Neural Network
ND2\(^{2}\)
0.9283
\({ 98.5\%}\)
Our
MDLD
DL+ELLR
0.8594
\({89.2\%}\)
Our
MDLD
DL+LR
0.8956
\({93.7\%}\)
Our
MDLD
DL+LGBM
0.9217
\({97.6\%}\)
Our
MDLD
DL+FCNN
0.9269
\({98.2\%}\)
Our
6
Neural Network
Gohr’s ND
0.788
[3]
Differential Cryptanalysis
TDD
0.7879
\({94.98\%}\)
[7]
Neural Network
ND2
0.7868
\({ 97.0\%}\)
Our
MDLD
DL+ELLR
0.6822
\({75.1\%}\)
Our
MDLD
DL+LR
0.7011
\({78.1\%}\)
Our
MDLD
DL+LGBM
0.7767
\({93.9\%}\)
Our
MDLD
DL+FCNN
0.7829
\({95.5\%}\)
Our
7
Neural Network
Gohr’s ND
0.616
[3]
Neural Network
ND2
0.6143
\({ 91.7\%}\)
Our
MDLD
DL+ELLR
0.5804
\({75.3\%}\)
Our
MDLD
DL+LR
0.5820
\({75.6\%}\)
Our
MDLD
DL+LGBM
0.6070
\({87.5\%}\)
Our
MDLD
DL+FCNN
0.6081
\({87.8\%}\)
Our
\(^1\)The matching rate represents the ratio that Gohr’s ND and a distinguisher to be compared have the same classifications for the same ciphertext pairs.
\(^2\)Without any change, Gohr’s source code is used to generate NDs, denoted by ND2.
The matching rate, initially proposed in [7], quantifies the proportion of identical classifications between Gohr’s NDs and a distinguisher to be compared under the same ciphertext pairs. Table 4 presents the matching rate between Gohr’s NDs and their corresponding simplified models, including our newly developed MDLDs. Here, the number of samples is \(2 \times {10^6}\), in which the positive and negative data account for half. Table 4 shows that the matching rate between MDLDs and Gohr’s NDs exceeds \(95\%\) in 5/6 rounds of SPECK32 and achieves \(87.8\%\) in 7 rounds of SPECK32.
Furthermore, we reproduce the neural distinguishers with the source code in [3], denoted by ND2. The accuracy of ND2 is extremely close to that of Gohr’s NDs, since the method used is the same. However, the matching rate between ND2 and Gohr’s NDs cannot reach 100%, e.g., 91.7% for 7-round SPECK32 distinguishers. This discrepancy can be attributed to the randomness of the initial weight matrix utilized in the training of neural networks.

3.4 Experimental comparison of memory and time complexity

In this section, we present an experimental comparison of the memory and time complexity between MDLDs and neural distinguishers.
The results are summarized in Table 5, demonstrating that the time complexity of an 8-round MDLD distinguisher based on DL+FCNN is approximately \(2^{-1.35}\) of that of neural distinguishers. However, the memory requirement of MDLDs is slightly higher than that of neural distinguishers. ND inherently requires storing neural network parameters, whereas MDLD adds an additional memory overhead to compute the inner products between the current ciphertext pair and the output masks.
In this context, let \((c,{\bar{c}})\) represent a ciphertext pair and \((\Gamma _1,\bar{\Gamma }_1),\cdots , (\Gamma _N,\bar{\Gamma }_N)\) be the output masks of N differential-linear approximations. We denote \(x_i\) as \(x_i=\Gamma _i\cdot c \oplus \bar{\Gamma }_i\cdot \bar{c}\) for \(1\le i \le N\). The key observation is that obtaining the N-bit binary vector \(x_1\Vert x_2\Vert \dots \Vert x_N\) dominates the memory and time complexity of MDLDs. Since the time complexity of neural distinguishers stems primarily from a substantial number of matrix multiplications, it tends to be higher compared to MDLDs, which primarily involve the computation of dot products. On the contrary, these dot products between ciphertext pairs and output masks occupy a certain amount of memory, leading to a relatively higher memory complexity for MDLDs in comparison with neural distinguishers.
Table 5
Complexity/Memory comparison of several types of distinguishers on 8-round SPECK32. For ease of comparison, the implementation platform utilizes an 8-core AMD Ryzen 7 4800 H CPU without leveraging a GPU. Additionally, the test set has a size of \(2^{20}\) samples
Type
ND
DL+ELLR
DL+LR
DL+LGBM
DL+FCNN
Time(s)
\(2^{4.38}\)
\(2^{3.12}\)
\(2^{2.84}\)
\(2^{3.61}\)
\(2^{3.03}\)
Mem(MBytes)
27
100
100
100
100
\(^1\)All operations were restricted to running on a single CPU thread.
\(^2\)Bao et al. in [14] pointed out that the memory required by NDs, excluding the memory allocated for the test set, is approximately 27 MBytes

4 Application of simplified models of NDs to reduced-round SPECK32

To demonstrate the feasibility of MDLDs as a replacement for NDs, we carry out the key recovery attacks on 11–14 rounds of SPECK32. Among the attacks clearly stating success rates, we achieve the lowest time complexities for 12–14 rounds of SPECK32 up to now.

4.1 Specific details of our attacks

4.1.1 A trick for reducing time complexity

To reduce the time complexity, we employ 20 differential-linear approximations to construct MDLDs based on DL+FCNN and use a pre-computed look-up table of size \(2^{20}\) to speed up the judgment process. The DL+FCNN distinguisher receives a 20-bit vector, which is obtained from the inner products of the output masks of those 20 differential-linear approximations and one ciphertext pair \((c,c')\), and then returns a real number within the range from 0 to 1. To improve efficiency, we can pre-compute and store the outcomes of the DL+FCNN distinguisher for all possible values of this 20-bit vector in a table. This approach eliminates the requirement for real-time invocations of the distinguisher, thereby significantly decreasing the time complexity. Since the input to NDs consists of ciphertext pairs derived from SPECK32, NDs require a look-up table of size \(2^{64}\). It indicates that this approach may not be feasible for NDs because the key size of SPECK32 is 64.

4.1.2 Attack platform

In order to eliminate the impact of different GPUs, we compare all results on one CPU-only platform. The implementation platform is a laptop that has an 8-core AMD Ryzen7 4800 H CPU with 2.9 GHz, of which the system is Win10. Our CPU-only platform can execute \(2^{22.27}\), \(2^{22.1}\) and \(2^{22}\) full encryptions of 11–13 rounds of SPECK32 per second, respectively.
Table 6
The differentials and neutral bits/vectors which are used in the attacks on 11–13 rounds of SPECK32. Let the result after 1-round encryption of SPECK32 be (xy). If the constraint conditions are satisfied simultaneously, the neutral probabilities of neutral bits/vectors can be improved
\(\textbf{Round}\)
\(\text{Input}\ Diff.\)
\(\text{Neutal}\ Bits/Vectors\)
\(\text{Conditions}\)
11
\(0x0211\_0a04\)
[20], [21], [22], [14], [15], [23]
 
12
\(0x8020\_4101\) \(0x8060\_4101\) \(0x8021\_4101 0x8061\_4101\)
[13], [12, 19], [14, 21], [6, 29], [20], [22]
\(x[7]=0\)
13
\(0x8020\_4101\) \(0x8060\_4101\)
[20], [13], [12, 19], [14, 21], [6, 29], [30], [0, 8, 31], [5, 28], [15, 24], [4, 27, 29], [22]
\(x[7]=0 x[15]\oplus y[8]=0 x[12]\oplus y[5]=1 y[1]=0 x[11]\oplus y[4]=1\)

4.1.3 Generation of the ciphertext pair structures

Gohr in [3] used the neutral bits/vectors to enhance the distinguishing advantage of neural distinguishers. Similarly, we use the same differentials and neutral bits/vectors as [3, 5, 6] in our key recovery attacks, as shown in Table 6.
Definition 3
(Neutral bits of a differential (NB) [17]) Let \({e _i} = 1\ll i\). Let \((\Delta _{in},\Delta _{out})\) represent a differential, and the input plaintext pair and output ciphertext pair of \(\mathrm E\) are denoted by \((p,p')\) and \((c,c')\), respectively. If \(p\oplus p'=\Delta _{in}\) and \(c\oplus c'=\Delta _{out}\), \((p,p')\) is said to conform the differential \((\Delta _{in},\Delta _{out})\). For simplicity, \((p,p')\) is also called a conforming pair. The i-th bit is called a NB for the differential \({\Delta _{in}}\xrightarrow {\mathrm{{E}}}{\Delta _{out}}\) if \((p\oplus e_i,p'\oplus e_i)\) is also a conforming pair for any conforming pair \((p,p')\).
Furthermore, probabilistic NB and conditional (simultaneous-) NB were introduced in [18] and [5] to leverage a greater number of NBs. The following is the definition of a plaintext pair structure constructed by using neutral bits.
Definition 4
(Plaintext pair structure) Let \(M_1,M_2,\dots ,M_m\) denote the corresponding masks for m neutral bit(vector)s of a differential \((\Delta _{in},\Delta _{out})\). Here \(M_1,M_2,\dots ,M_m\) are linearly independent. Let \(\Omega\) be the linear subspace spanned by \(M_1,M_2,\dots ,M_m\). Given a plaintext x, we define the plaintext pair structure \(P_{x, \Omega ,\Delta _{in}}\) as the set \(\{(x\oplus y,x\oplus y\oplus \Delta _{in}) |y\in \Omega \}\), which is uniquely determined by x, \(\Omega\) and \(\Delta _{in}\).
Note that these structures are not exactly the same as the traditional plaintext structures of differential cryptanalysis. For clarity, we refer to the ciphertext pairs encrypted from \(P_{x, \Omega ,\Delta _{in}}\) as a ciphertext pair structure in this paper, which is denoted by \(C_{x, \Omega ,\Delta _{in}}\). Furthermore, Bao et al. [5] found that the super-neutral bits can effectively reduce the data complexity associated with constructing a plaintext pair structure.
Definition 5
(Super-neutral bit of a differential (Super-NB)2) Let \({e _i} = 1\ll i\). The i-th bit is called a Super-NB for the differential \((\Delta _{in},\Delta _{out})\) if \(e_i\) is a NB for two differentials \((\Delta _{in},\Delta _{out})\) and \((\Delta _{in}\oplus e_i,\Delta _{out})\).
Let \((\Delta _{in},\Delta _{out})\) and \((\Delta _{in}\oplus e_i,\Delta _{out})\) be two differential trails, and \(e_i\) be a Super-NB. By employing the Super-NB \(e_i\), one can leverage four plaintexts \(\{p, p\oplus \Delta _{in},p\oplus e_i,p\oplus \Delta _{in}\oplus e_i \}\) to construct four plaintext pairs (belong two plaintext pair structures) that satisfy either the input difference \(\Delta _{in}\) or \(\Delta _{in}\oplus e_i\), i.e., \(\{ (p,p\oplus \Delta _{in}), (p\oplus e_i,p\oplus \Delta _{in}\oplus e_i)\}\) and \(\{ (p\oplus e_i,p\oplus \Delta _{in}), (p,p\oplus \Delta _{in}\oplus e_i) \}\). Previously, without using Super-NBs, the construction of these four plaintext pairs required eight plaintexts.
The data complexity can be reduced by half by using a Super-NB to double the number of cipher structures. This occurs when two differentials exhibit nearly identical probabilities and rely on common NBs. Super-NBs are rare, and Bao et al. [5] discovered that the 23rd and the 16th bits are Super-NBs for four 3-round SPECK32 differentials with the same probability. These differentials are represented by \((8020\_4101,0040\_0000)\), \((8060\_4101,0040\_0000)\), \((8021\_4101,0040\_0000)\), \((8061\_4101,0040\_0000)\). Furthermore, Bao et al. [5] pointed out that the concurrent utilization of two Super-NBs can reduce the data complexity of obtaining one plaintext pair from 2 to 3/4. Additionally, they pointed out that these four differentials are feasible only when the second round key \(k_2\) satisfies the condition \(k_2[12]\ne k_2[11]\). Consequently, our 12/13/14-round attacks work for \(2^{63}\) keys.

4.1.4 The key recovery attacks

Apart from replacing NDs with simplified distinguishers, the procedures of attacks are the same as that of differential-neural cryptanalysis in [3], including:
1.
One uses neutral bits/vectors to generate sufficient ciphertext structures.
 
2.
One can use the Upper Confidence Bounds (UCB) to identify which ciphertext structure is most likely to satisfy the prepended short-round differential.
 
3.
One uses the Bayesian optimization algorithm (Algorithm 2) to efficiently find a list of key candidates. This approach is now based on the simplified distinguishers MDLDs, previously relying on NDs.
 
4.
Steps 2 and 3 need to be repeated many times, since getting the real key through step 3 is probabilistic.
 
5.
To correct the bit errors in the key guess, one needs to perform a search within hamming radius two around the current best key candidate before returning a key. For each of those potential keys, one uses it to decrypt ciphertext pairs and observes if the ND outputs a higher score (or the MDLD deems that it is a more imbalanced distribution). If so, this search will be repeated with the new best key guess. Otherwise, the key with the highest score is returned.
 
The key search algorithm based on Bayesian optimization, proposed in [3], is shown in Algorithm 2. Gohr discovered that the randomization hypothesis for wrong keys does not hold in SPECK32 when only one round of decryption is performed. As a result, Gohr in advance recorded the average and standard deviation of the outputs of r-round NDs when decrypting \((r+1)\)-round ciphertext pairs using all possible key values for the last round. These data were referred to as the “wrong key profile” in [3]. Within the Algorithm 2, lines 10–11 aim to find the right key by minimizing the weighted Euclidean distance between the responses obtained under the currently guessed key and the wrong key profile. Subsequently, this process generates a fresh set of potential key candidates.
Gohr [3] pointed out that lines 10–11 of Algorithm 2 do not guarantee the retrieval of the correct key. Let the number of iterations for step 3 be denoted by \(n_{cts}\). Additionally, a prerequisite for the success of Algorithm 2 is that all ciphertext pairs in the current ciphertext structure satisfy the prepended short-round differential. However, the main key is unknown, thus Gohr proposed an Upper Confidence Bounds (UCB) algorithm to identify the ciphertext structure that is most likely to satisfy the prepended short-round differential. Let \(w^i_{max}\) be the highest score obtained for the i-th ciphertext structure. Denote by \(n_i\) the number of previous iterations in which the i-th ciphertext structure has been chosen. Let j represent the number of the current iteration. The UCB algorithm calculates a priority score for each ciphertext structure as follows
$$\begin{aligned} s_i=w^i_{max}+\sqrt{n_{cts}}\cdot \sqrt{\log _2(j)/n_i}. \end{aligned}$$
The UCB algorithm then selects the ciphertext structure with the highest priority score for further processing.

4.2 Results

The final attack results are shown in Table 7. Similar to [5, 6], for the economic feasibility of experimental verification, we tested the core of the 13-round attack when the five conditions in Table 6 are fulfilled simultaneously. It indicates that the additional 5 bits of subkey need to be guessed. To estimate the time complexity, we first experimentally estimate the time complexity of the core attacks where these 5 bits of subkey are guessed correctly. Subsequently, we multiply this time complexity by \(2^5\) to obtain the final time complexity. Let \(n_{kg}\) represent the number of bit conditions to be met during these attacks. For our SPECK32 experiments, the values of \(n_{kg}\) are 5 and 1 for our 13-round and 12-round attacks, respectively.
Determination of a successful attack Similar to [3, 5, 6], we count a key guess as successful if the last round key is guessed correctly and the penultimate round key differs from the real key by a Hamming distance of two or less.
Repeated attacks To improve the success rate of attacks, we continue attempting to recover the same key until the real key is obtained by the attacks. As a result, we calculate the time and data complexity with a certain success rate. For the 11-round/12-round/13-round SPECK32 attacks presented in this paper, the original success rates are 0.3, 0.7, and 0.1, respectively. By repeating these attacks 4 times, 2 times, and 8 times, the final success rates are \(1-(1-0.3)^4= 0.76\), \(1-(1-0.7)^2= 0.91\), and \(1-(1-0.1)^8= 0.57\), respectively.
Table 7
The key recovery attacks of reduced-round SPECK32. Replacing NDs with our simplified distinguishers significantly reduces the time complexity of differential-neural attacks on 11–14 rounds of SPECK32
Round
Weak Keys
Typr
Time
Data
Suc.
References
11
\(2^{64}\)
Diff
\(2^{24.66}\)
\(2^{26.70}\)
63%
[4]
\(2^{64}\)
ND
\(2^{38}\)
\(2^{14.5}\)
52%
[3]
\(2^{64}\)
\(\textrm{MDLD}\)
\(2^{27.9}\)
\(2^{15.6}\)
\(76\%\)
\(\textbf{Our}\)
12
\(2^{64}\)
Diff
\(2^{33.84}\)
\(2^{30.42}\)
63%
[4]
\(2^{64}\)
ND
\(2^{44.8}\)
\(2^{22.0}\)
86%
[5]
\(2^{63}\)
ND
\(2^{42.9}\)
\(2^{18.5}\)
81%
[5]
\(2^{63}\)
\(\textrm{MDLD}\)
\(2^{33.7}\)
\(2^{19.5}\)
\(91\%\)
\(\textbf{Our}\)
13
\(2^{64}\)
Diff
\(2^{50.16}\)
\(2^{31.13}\)
63%
[4]
\(2^{63}\)
ND
\(2^{51.0}\)
\(2^{29}\)
82%
[5]
263
GoogLeNet
\(2^{45.03}\)
\(2^{27}\)
34%
[6]
263
\(\textrm{MDLD}\)
\(2^{43.8}\)
\(2^{31}\)
\(57\%\)
\(\textbf{Our}\)
14
\(2^{64}\)
Diff
\(2^{62.4}\)
\(2^{30.4}\)
\(63\%\)
[19]
\(2^{64}\)
Diff
\(2^{60.99}\)
\(2^{31.75}\)
\(63\%\)
[4]
\(2^{64}\)
DL
\(2^{58}\)
\(2^{31}\)
\(-^{1}\)
[20]
\(2^{63}\)
GoogLeNet
\(2^{60.93}\)
\(2^{27}\)
\(34\%\)
[6]
\(2^{63}\)
\(\textrm{MDLD}\)
\(2^{59.8}\)
\(2^{31}\)
\(57\%\)
\(\textbf{Our}\)
15
\(2^{64}\)
Diff
\(2^{62.25}\)
\(2^{31.39}\)
\(63\%\)
[4]
\(^{1} -\) = This attack is flawed due to the absence of a success rate. Upon applying the success rate calculation method for linear attacks as outlined in [21], it becomes evident that the success rate is negligible
\(^2\)ND = neural distinguisher, GoogLeNet = a distinguisher using GoogLeNet neural network, Diff. = differential distinguisher, DL = differential-linear distinguisher, MDLD = multiple differential-linear distinguisher.
Time and data complexity of the 11-round practical attack For 11-round SPECK32, we conducted 100 attacks and achieved success 30 times, with an average time of 1288s each. Repeatedly calling the 11-round key recovery attack 4 times as a final attack, the final success rate is \(1-(1-0.3)^4= 0.76\). Thus, the time complexity is \(4\times 1288/100\times 2^{22.27}=2^{27.95}\) full encryptions of 11-round SPECK32. The average number of times to query cipher E to obtain one pair of ciphertext is 2. The data complexity is \(4\times n_{cts}\times n_{b}\times 2=2^{15.64}\) chosen plaintexts.
Time and data complexity of the 12-round practical attack For 12-round SPECK32, we conducted 40 attacks and achieved success 28 times, with an average time of 32320 s each. Repeatedly calling the 12-round key recovery attack 2 times as a final attack, the final success rate is \(1-(1-0.7)^2= 0.91\). Thus, the time complexity is \(2\times 32320/40\times 2^{n_{kg}}\times 2^{22.1}=2^{33.75}\) full encryptions of 12-round SPECK32. By using the Super-NBs 23rd and 16th bits of plaintext, the average number of times to query cipher E to obtain one pair of ciphertext is 3/4. The data complexity is \(2 \times 2^{n_{kg}} \times n_{cts}\times n_{b}\times 3/4=2^{19.58}\) chosen plaintexts.
Time and data complexity of the 13-round practical attack For 13-round SPECK32, we conducted 30 attacks and achieved success 3 times, with an average time of \(2^{18.75}\) seconds each. Repeatedly calling the 13-round key recovery attack 8 times as a final attack, the final success rate is \(1-(1-0.1)^8= 0.57\). Thus, the time complexity is \(8\times 2^{18.75}/30 \times 2^{n_{kg}} \times 2^{22.0}=2^{43.84}\) full encryptions of 13-round SPECK32. By using the Super-NB, specifically the 23rd bit of plaintext, the average number of times to query cipher E to obtain one pair of ciphertext is 1. The data complexity is \(8 \times 2^{n_{kg}} \times n_{cts}\times n_{b}=2^{31}\) chosen plaintexts.
Time and data complexity of the 14-round theoretical attack By exhausting additional 16 bits of the 14th round key, we can use the above 13-round SPECK32 key recovery attack to attack 14-round SPECK32. The data complexity and time complexity are \(2^{31}\) chosen plaintexts and \(2^{59.84}\) full encryptions of 14-round SPECK32., respectively. Table 7 summarizes the key recovery attacks on Speck32. Among all attacks clearly stating success rates, we present the attacks for 12–14 rounds of SPECK32 with the lowest time complexities up to date. See Appendix C for the details about attacks.

5 Conclusion

In this paper, we construct the simplified models of NDs using differential-linear imbalances. These simplified models offer comparable distinguishing advantages while maintaining reduced time complexities. Furthermore, we improve the accuracies of these distinguishers by utilizing more ciphertext pairs. Additionally, we introduce a comprehensive process of replacing the NDs in differential-neural attacks with the newly constructed distinguishers. Consequently, we present attacks against 11–14 rounds of SPECK32. Notably, no simplified distinguishers have been applied to Gohr’s differential-neural attacks before. Overall, our findings provide insight into the underlying mechanisms of Gohr’s NDs, emphasizing the significance of the differential-linear imbalance.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 62302518, Grant No. 62372463) and the Natural Science Foundation of Henan (Grant No: 222300420100).

Declarations

Conflict of interest

The authors have no conflict interests to declare that are relevant to the content of this paper, besides the funding that we already state and our affiliations.

Ethics approval

Not applicable.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the articleΓÇÖs Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the articleΓÇÖs Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by-nc-nd/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Anhänge

Appendix A: Benamira et al.’s Distinguisher Based on Exhaustive Search of Partial Subkey [7]

Appendix B: Neutral Bits/Vectors Used in Key Recovery Attacks

Neutral bits/vectors are mainly used to generate more plaintext pairs that satisfy the differential trail, originating from a single such plaintext pair. In this paper, a 2-round differential trail \((0x0211\_0a04 \rightarrow 0x0040\_0000)\) is used in our 11-round SPECK32 key recovery attack, and the six neutral bits used in our attack are the same as those of [3], i.e., the 14th, 15th, 20th, 21st, 22nd, 23rd bits of the plaintexts. [5] presents the probabilistic neutral bits/vectors for the 2-round differential trail (0x0211_0a04,0x0040_0000) of SPECK32, as shown in Table 8.
Table 8
Neutral bits/vectors for 2-round differential trail (0x0211_0a04,0x0040_0000) of SPECK32
\(\text{NB}\)
\(\text{Pr.}\)
\(\text{NB}\)
\(\text{Pr.}\)
\(\text{NB}\)
\(\text{Pr.}\)
\(\text{NB}\)
\(\text{Pr.}\)
\(\text{NB}\)
\(\text{Pr.}\)
\(\text{NB}\)
\(\text{Pr.}\)
\(\text{NB}\)
\(\text{Pr.}\)
[20]
1
[21]
1
[22]
1
[9,16]
1
[2,11,25]
1
[14]
0.965
[15]
0.938
[6,29]
0.91
[23]
0.812
[30]
0.809
[7]
0.806
[0]
0.754
[11,27]
0.736
[8]
0.664
When requiring the output difference is \(0x0040\_0000\), there are two optimal 3-round differentials with probability \(2^{-11}\), i.e., \((0x0a20\_4205,0x0040\_0000)\) and \((0x0a60\_4205,0x0040\_0000)\).
However, the neutral bits/vectors for these two differentials are very scarce. There are four sub-optimal 3-round differentials with probability \(2^{-12}\). Table 9 presents these four differential and three sufficient conditions to conform the differentials.
Table 9
Four sufficient conditions conform the 3-round sub-optimal differentials with output difference \(0x0040\_0000\) [5]. Let the result after 1-round encryption of SPECK32 be (xy). Here \(c=(x\ggg 7\boxplus y)\oplus (x\ggg 7\oplus y)\). [5] pointed out that for each linear condition, once it is fulfilled, the probability of the differential increases by a factor of 2
\(\text{Input Diff.}\)
\({0x8020\_4101}\)
\({0x8060\_4101}\)
\({0x8021\_4101}\)
\({0x8061\_4101}\)
\(\mathbf{Round\ 1}\)
\({\textbf {0x0201}}\_{\textbf {0604}}\)
\(\mathbf{Round\ 2}\)
\({\textbf {0x1800}}\_{\textbf {0010}}\)
\(\mathbf{Round\ 3}\)
\({{\textbf {0x0040}}\_{\textbf {0000}}}\)
x[7]
0
0
0
0
\(x[5]\oplus y[14]\)
1
0
1
0
\(x[15]\oplus y[8]\)
0
0
1
1
 
\(x[0]\oplus y[9]=0\)
\(x[0]\oplus y[9]=0\)
\(c[9]\oplus y[9]=0\)
\(c[9]\oplus y[9]=0\)
Table 10
Public Neutral bit/vectors for 3-round differentials (\(0x8020\_4101\), \(0x0040\_0000\)), (\(0x8060\_4101\), \(0x0040\_0000\)), (\(0x8021\_4101\), \(0x0040\_0000\)), (\(0x8061\_4101\), \(0x0040\_0000\)) of SPECK32 [5]
Input Diff.
\({0x8020\_4101}\)
\({0x8060\_4101}\)
\({0x8021\_4101}\)
\({0x8061\_4101}\)
 
Neutral bit/vector
\(\text{Pre.}^{2}\)
\(\text{Post.}^{3}\)
\(\text{Pre.}\)
\(\text{Post.}\)
\(\text{Pre.}\)
\(\text{Post.}\)
\(\text{Pre.}\)
\(\text{Post.}\)
\(\text{Condition}^{1}\)
[22]
0.995
1.000
0.995
1.000
0.996
1.000
0.997
1.000
-
[20]
0.986
1.000
0.997
1.000
0.996
1.000
0.995
1.000
-
[13]
0.986
1.000
0.989
1.000
0.988
1.000
0.992
1.000
-
[12,19]
0.986
1.000
0.995
1.000
0.993
1.000
0.986
1.000
-
[14,21]
0.855
0.860
0.874
0.871
0.881
0.873
0.881
0.876
-
[6,29]
0.901
0.902
0.898
0.893
0.721
0.706
0.721
0.723
-
[30]
0.803
0.818
0.818
0.860
0.442
0.442
0.412
0.407
-
[0,8,31]
0.855
0.859
0.858
0.881
0.000
0.000
0.000
0.000
-
[5,28]
0.495
1.000
0.495
1.000
0.481
1.000
0.469
1.000
\(x[12]\oplus y[5]=1\)
[15,24]
0.482
1.000
0.542
1.000
0.498
1.000
0.496
1.000
\(y[1]=0\)
[4,27,29]
0.672
0.916
0.648
0.905
0.535
0.736
0.536
0.718
\(x[11]\oplus y[4]=1\)
[6,11,12,18]
0.445
0.903
0.456
0.906
0.333
0.701
0.382
0.726
\(x[2]\oplus y[11]=0\)
\(^1\)The condition at the end of a row is specific to the neutral bit/vector at the same row. ’-’ means that there is no condition for the corresponding neutral bit/vector.
\(^2\)Pre.: probability obtained using 1000 correct pairs without imposing the conditions.
\(^3\)Post.: probability obtained using 1000 correct pairs and imposing all conditions in the last column.

Appendix C: Experimental parameters of our attacks

The procedures of our attacks are the same as those of differential-neural cryptanalysis in [3], except that we utilize simplified distinguishers instead of NDs. The parameters specified in the source code of [3] are outlined below.
\(n_{kg}\) is the number of possible values for the bits of \(k_0\), on which the conditions depend.
\(n_{b}\) is the number of ciphertext pairs in each ciphertext structure and \(n_{b}=DataSize\times NumPairs\).
\(n_{cts}\) is the number of ciphertext structures used in one attack, which is usually several times of the reciprocal of the s-round differential probability.
\(n_{it}\) is the total number of iterations on the ciphertext structures, which is usually several times \(n_{cts}\).
\(c_1\) and \(c_2\) are the cut-offs with respect to the scores of the recommended last subkey and second-to-last subkey, respectively.
\(n_{byit1}\), \(n_{byit2}\) and \(n_{cand1}\), \(n_{cand2}\) are the number of iterations and number of key candidates within each iteration in the Bayesian optimization procedures for guessing each of the last and the penultimate subkeys, respectively.
Table 11
Parameters of key recovery attacks for reduced-round SPECK32
\(\text{Round}\)
\({\text{n}}_{{{\text{kg}}}}\)
\({\text{n}}_{{\text{b}}}\)
\({\text{n}}_{{{\text{cts}}}}\)
\({\text{n}}_{{{\text{it}}}}\)
\({\text{c}}_{{1}}\)
\({\text{c}}_{2}\)
\({\text{n}}_{{{\text{byit1}}}} {\text{ = n}}_{{{\text{byit2}}}}\)
\({\text{n}}_{{{\text{cand1}}}}\)
\({\text{n}}_{{{\text{cand2}}}}\)
11
0
\(2^6\)
100
200
10
10
5
32
32
12
\(2^1\)
\(2^{6}\)
\(2^{12}\)
\(2^{13}\)
10
10
5
64
32
13
\(2^5\)
\(2^{12}\)
\(2^{11}\)
\(2^{12}\)
20
-250
5
64
32
The parameters of our key recovery attacks are shown in Table 11. The restrictions on the key for our 13-round attack are \(x [7] = 0\), \(x[15]\oplus y[8]=0\) and the restrictions \(x[12]\oplus y[5]=1\), \(y[1]=0\) and \(x[11]\oplus y[4]=1\) in Table 10. Thus, we need to guess 5 bits of the key \(k_0\), that is, \(n_{kg}=2^5\). The restrictions for the key of our 12-round attack are \(x [7] = 0\), that is, \(n_{kg}=2^1\).
Fußnoten
1
To be precise, previous multiple differential-linear distinguishers leverage statistical imbalances across multiple pairs of ciphertexts. For the sake of brevity, we refer to those distinguishers that utilize a single ciphertext pair in conjunction with multiple differential-linear approximations as multiple differential-linear distinguishers as well.
 
2
This is a simpler and clearer definition of Switching bits for adjoining differential of [5].
 
Literatur
1.
Zurück zum Zitat Rivest RL (1991) Cryptography and machine learning. In: International Conference on the Theory and Application of Cryptology, pp. 427–439. Springer Rivest RL (1991) Cryptography and machine learning. In: International Conference on the Theory and Application of Cryptology, pp. 427–439. Springer
2.
Zurück zum Zitat Maghrebi H, Portigliatti T, Prouff E (2016) Breaking cryptographic implementations using deep learning techniques. In: International Conference on Security, Privacy, and Applied Cryptography Engineering, pp. 3–26. Springer Maghrebi H, Portigliatti T, Prouff E (2016) Breaking cryptographic implementations using deep learning techniques. In: International Conference on Security, Privacy, and Applied Cryptography Engineering, pp. 3–26. Springer
3.
Zurück zum Zitat Gohr A (2019) Improving attacks on round-reduced speck32/64 using deep learning. In: Annual International Cryptology Conference, pp. 150–179. Springer Gohr A (2019) Improving attacks on round-reduced speck32/64 using deep learning. In: Annual International Cryptology Conference, pp. 150–179. Springer
5.
Zurück zum Zitat Bao Z, Guo J, Liu M, Ma L, Tu Y (2023) Enhancing differential-neural cryptanalysis. In: Advances in Cryptology–ASIACRYPT 2022: 28th International Conference on the Theory and Application of Cryptology and Information Security, Taipei, Taiwan, December 5–9, 2022, Proceedings, Part I, pp. 318–347. Springer Bao Z, Guo J, Liu M, Ma L, Tu Y (2023) Enhancing differential-neural cryptanalysis. In: Advances in Cryptology–ASIACRYPT 2022: 28th International Conference on the Theory and Application of Cryptology and Information Security, Taipei, Taiwan, December 5–9, 2022, Proceedings, Part I, pp. 318–347. Springer
7.
Zurück zum Zitat Benamira A, Gerault D, Peyrin T, Tan QQ (2021) A deeper look at machine learning-based cryptanalysis. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 805–835. Springer Benamira A, Gerault D, Peyrin T, Tan QQ (2021) A deeper look at machine learning-based cryptanalysis. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 805–835. Springer
10.
Zurück zum Zitat Chen Y, Shen Y, Yu H, Yuan S (2023) A new neural distinguisher considering features derived from multiple ciphertext pairs. Comput J 66:1419–1433MathSciNetCrossRef Chen Y, Shen Y, Yu H, Yuan S (2023) A new neural distinguisher considering features derived from multiple ciphertext pairs. Comput J 66:1419–1433MathSciNetCrossRef
11.
Zurück zum Zitat Baigneres T, Junod P, Vaudenay S (2004) How far can we go beyond linear cryptanalysis? In: International Conference on the Theory and Application of Cryptology and Information Security, pp. 432–450. Springer Baigneres T, Junod P, Vaudenay S (2004) How far can we go beyond linear cryptanalysis? In: International Conference on the Theory and Application of Cryptology and Information Security, pp. 432–450. Springer
12.
Zurück zum Zitat Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y (2017) Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems 30 Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y (2017) Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems 30
13.
Zurück zum Zitat Soltau H, Saon G, Sainath TN (2014) Joint training of convolutional and non-convolutional neural networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5572–5576. IEEE Soltau H, Saon G, Sainath TN (2014) Joint training of convolutional and non-convolutional neural networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5572–5576. IEEE
14.
Zurück zum Zitat Bao Z, Lu J, Yao Y, Zhang L (2023) More insight on deep learning-aided cryptanalysis. In: International Conference on the Theory and Application of Cryptology and Information Security, pp. 436–467. Springer Bao Z, Lu J, Yao Y, Zhang L (2023) More insight on deep learning-aided cryptanalysis. In: International Conference on the Theory and Application of Cryptology and Information Security, pp. 436–467. Springer
17.
Zurück zum Zitat Biham E, Chen R (2004) Near-collisions of sha-0. In: Annual International Cryptology Conference, pp. 290–305. Springer Biham E, Chen R (2004) Near-collisions of sha-0. In: Annual International Cryptology Conference, pp. 290–305. Springer
18.
Zurück zum Zitat Aumasson J-P, Fischer S, Khazaei S, Meier W, Rechberger C (2008) New features of latin dances: analysis of salsa, chacha, and rumba. In: International Workshop on Fast Software Encryption, pp. 470–488. Springer Aumasson J-P, Fischer S, Khazaei S, Meier W, Rechberger C (2008) New features of latin dances: analysis of salsa, chacha, and rumba. In: International Workshop on Fast Software Encryption, pp. 470–488. Springer
19.
Zurück zum Zitat Song L, Huang Z, Yang Q (2016) Automatic differential analysis of arx block ciphers with application to speck and lea. In: Information Security and Privacy: 21st Australasian Conference, ACISP 2016, Melbourne, VIC, Australia, July 4-6, 2016, Proceedings, Part II, pp. 379–394. Springer Song L, Huang Z, Yang Q (2016) Automatic differential analysis of arx block ciphers with application to speck and lea. In: Information Security and Privacy: 21st Australasian Conference, ACISP 2016, Melbourne, VIC, Australia, July 4-6, 2016, Proceedings, Part II, pp. 379–394. Springer
20.
Zurück zum Zitat Bellini E, Gerault D, Grados J, Makarim RH, Peyrin T (2023) Fully automated differential-linear attacks against arx ciphers. In: Cryptographers’ Track at the RSA Conference, pp. 252–276. Springer Bellini E, Gerault D, Grados J, Makarim RH, Peyrin T (2023) Fully automated differential-linear attacks against arx ciphers. In: Cryptographers’ Track at the RSA Conference, pp. 252–276. Springer
Metadaten
Titel
Approximating neural distinguishers using differential-linear imbalance
verfasst von
Guangqiu Lv
Chenhui Jin
Zhen Shi
Ting Cui
Publikationsdatum
11.09.2024
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 19/2024
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-024-06375-4