1 Introduction

1.1 Background

Lattice-based cryptography is an attractive field of research, because it has produced useful cryptographic schemes, for example, public-key cryptosystems, functional encryption schemes, and fully homomorphic encryption schemes, etc. [1, 2, 4, 7,8,9,10,11,12, 24, 26,27,28,29, 31, 44], and these systems are believed to resist quantum cryptanalysis. Their security has been proven under the computational hardness assumption, and this assumption is related to the difficulty in solving the shortest vector problem (SVP) on a lattice. Therefore, it is important to show evidence of and to estimate the hardness not only theoretically, but also practically.

There have been many studies reporting on algorithms to solve the SVP, and the typical approaches are lattice basis reduction and enumerating lattice vectors. Lattice basis reduction algorithms take a lattice basis as input and output another lattice basis for the same lattice whose basis vectors are relatively shorter than the input. Enumeration algorithms take a lattice basis as input and search for relatively shorter lattice vectors than the basis vectors of the input. To evaluate the computational hardness of SVP, it is important to improve the efficiency of algorithms and implement high-performance solvers; here too, there have been many studies [6, 13,14,15,16,17, 20, 21, 23, 30, 33, 34, 36,37,38,39, 46,47,49]. However, thanks to recent advances in computing technologies, exploiting the performance of massively parallel computing environment, for example, a cluster consists of many large computers, has become the most important strategy with which to evaluate the security of lattice cryptosystems, and a methodology to solve the SVP is needed for it.

1.2 Our Contribution

In this paper, we propose a new lattice basis reduction algorithm suitable for parallelization. Our proposal consists of a parallelization strategy and a reduction strategy.

Our parallelization strategy enables many parallel processes to reduce the lattice basis cooperatively, so it can exploit the performance of parallel computing environments in practice. This strategy is based on an idea proposed by Fukase and Kashiwabara in [20]; however, the parallel scalability of their algorithm is unclear with regard to massive parallelization. To achieve parallel scalability, our strategy consists of two methods. The first method is a procedure of generating different lattice bases for many processes. Our experiments show that this method enables the processes to generate a huge number of lattice vectors and that it works well even if the processes take the same lattice basis. The second method is to use global shared storage to keep information on the cooperative lattice basis reduction; this method enables many processes to share auxiliary information for accelerating the lattice basis reduction with a small synchronization overhead.

Our lattice basis reduction strategy enables us to efficiently obtain lattice bases with a small sum of squared lengths of orthogonal basis vectors, which is related to finding relatively short lattice vectors. Experiments and analyses are given by Fukase and Kashiwabara in [20], as explained in Sect. 3.2, and Aono and Nguyen in [5]. Fukase and Kashiwabara in [20] proposed a strategy to reduce this sum, but it is complicated. To efficiently obtain a lattice basis that has a small sum of squared lengths, we propose a new strategy based on an evaluation function of lattice bases.

We applied our algorithm to problem instances in the SVP Challenge, which is hosted by Technische Universität Darmstadt [45]. We solved a problem instance of dimension 150 in about 394 days with four clusters, a world record. In addition, we solved problem instances of dimensions 134, 138, 140, 142, 144, 146, and 148.

1.3 Related Work

Kannan’s enumeration (ENUM) algorithm [33] is an exponential-time algorithm that outputs the shortest or nearly shortest lattice vector of given lattice. ENUM and its variants are used as the main routine or subroutine to solve the shortest lattice vector problem (SVP), as explained in Sect. 2.3. Thus, studying the ENUM algorithm is an important research field, and several improvements have been proposed [16, 17, 23, 30, 38].

The Lenstra–Lenstra–Lovász (LLL) algorithm [36] is a lattice basis reduction algorithm which takes a lattice basis and a parameter \(\delta \) and produces a \(\delta \)-LLL reduced basis of the same lattice. The LLL algorithm is the starting point of many lattice basis reduction algorithms and is a polynomial-time algorithm. Since the first basis vector of the output basis is a relatively short lattice vector in the lattice in practice, the LLL algorithm is also used as the main routine or subroutine of SVP algorithms. The Block Korkin–Zolotarev (BKZ) algorithm [49] is a block-wise generalization of the LLL algorithm, and it uses the ENUM algorithm as a subroutine. The BKZ algorithm generates relatively shorter basis vectors than the LLL algorithm does. Studying LLL, BKZ, and their variants is a popular topic of research and several improvements and extensions have been reported [6, 15, 21, 39].

Schnorr’s random sampling reduction (RSR) algorithm [47, 48] is a probabilistic lattice basis reduction algorithm. The RSR algorithm randomly generates lattice vectors. The simple sampling reduction (SSR) algorithm [13, 14, 37] is an extension of the RSR algorithm that analyzes the expected length of the lattice vectors to be generated. The authors of SSR also presented several theoretical analyses of the RSR and their algorithm.

Our algorithm is based on the Fukase–Kashiwabara (FK) algorithm [20]. The FK algorithm is an extension of the SSR algorithm. The authors of the FK algorithm refined the calculation for the expected length of lattice vectors to be generated and showed that the probability of finding short lattice vectors can be increased by reducing the sum of squared lengths of orthogonal basis vectors. A brief introduction to the FK algorithm is given in Sect. 3.2. Aono and Nguyen [5] presented a theoretical analysis of the RSR algorithm and its extensions. They also gave a probabilistic perspective and a comparison with [23].

With the aim of using parallel computation to solve the SVP, several researchers proposed parallelized algorithms [16, 17, 30, 34, 46]. In particular, parallelized ENUM algorithms on CPUs, GPUs, and FPGAs were investigated by [16, 17, 30]. A parallelized SSR algorithm for GPUs was proposed in [46]. A design for an SVP solver that exploits large-scale computing environments was proposed by [34]. In [34], the authors integrated, tuned up, and parallel pruned the ENUM algorithm [23] into a multicore-CPU and GPU implementation [30], and they extended this implementation by using multiple CPUs and GPUs in the single program multiple data (SPMD) style [41].

Organization. Preliminaries are described in Sect. 2. We then give a brief introduction to sampling reduction and the previous studies [20, 37, 47, 48] in Sect. 3. An overview of our parallelization strategy is presented in Sect. 4, and our proposal is explained in detail in Sect. 5. Section 6 presents the results of applying our proposal to the SVP Challenge and we conclude in Sect. 7.

2 Preliminaries

2.1 Lattice

In this section, we introduce notations and definitions of the lattice.

The set of natural numbers \(\mathopen {\{} 0, 1, \ldots \mathclose {\}}\) is denoted by \(\mathord {\mathbb {N}}\), and the sets of integers and real numbers are denoted by \(\mathord {\mathbb {Z}}\) and \(\mathord {\mathbb {R}}\), respectively. We denote an n-element vector (or sequence) as \(\varvec{x} = \mathopen {(} x_{1}, \ldots , x_{n} \mathclose {)} = (x_{i})_{i = 1}^{n}\). Let \(\varvec{v} = \mathopen {(} v_{1}, \ldots , v_{n} \mathclose {)} \in \mathord {\mathbb {R}}^n\) and \(\varvec{u} = \mathopen {(} u_{1}, \ldots , u_{n} \mathclose {)} \in \mathord {\mathbb {R}}^n\) be two n-dimensional vectors on \(\mathord {\mathbb {R}}\); we define the inner product of \(\varvec{v}\) and \(\varvec{u}\) by \(\langle {\varvec{v}, \varvec{u}}\rangle := \sum _{i = 1}^{n} v_i u_i\). The length (Euclidean norm) of \(\varvec{v}\), denoted by \(\mathopen {||} \varvec{v} \mathclose {||}\), is \(\sqrt{\langle {\varvec{v}, \varvec{v}}\rangle }\).

Let \(\varvec{b}_{1}, \ldots , \varvec{b}_{n} \in \mathord {\mathbb {Z}}^n\) be n-dimensional linearly independent column (integral) vectors. The (full rank) lattice L spanned by a matrix \(\varvec{B} = \mathopen {(} \varvec{b}_{1}, \ldots , \varvec{b}_{n} \mathclose {)}\) is defined as the set \(L = \mathop {\mathcal {L}}(\varvec{B}) = \mathopen {\{} \varvec{B} \varvec{x} \mathrel {|} \varvec{x} \in \mathord {\mathbb {Z}}^n \mathclose {\}}\). Such a \(\varvec{B} \) is called a lattice basis of L, and each \(\varvec{v} \in L\) is called a lattice vector of L. Every lattice has infinitely many lattice bases except \(n = 1\); i.e., let U be an n-dimensional unimodular matrix over \(\mathord {\mathbb {Z}}\); then \(\mathop {\mathcal {L}}(\varvec{B}) = \mathop {\mathcal {L}}(\varvec{B} U)\), so that \(\varvec{B} U\) is another basis of L if U is not the identity matrix. For a lattice basis \(\varvec{B} = \mathopen {(} \varvec{b}_{1}, \ldots , \varvec{b}_{n} \mathclose {)}\), the corresponding (Gram–Schmidt) orthogonal basis vectors \(\varvec{b}^*_{1}, \ldots , \varvec{b}^*_{n} \in \mathord {\mathbb {R}}^n\) are defined by \(\varvec{b}^*_i = \varvec{b}_i - \sum _{j = 1}^{i - 1} \mu _{\varvec{B}, i,j}\varvec{b}^*_j\), where \(\mu _{\varvec{B}, i, j} = \langle {\varvec{b}_i, \varvec{b}^*_j}\rangle / \mathopen {||} \varvec{b}^*_j \mathclose {||}^2\). The determinant of L is denoted by \(\det (L)\), and note that \(\det (L) = \prod _{i = 1}^n \mathopen {||} \varvec{b}^*_i \mathclose {||}\) for any basis \(\varvec{B} = \mathopen {(} \varvec{b}_{1}, \ldots , \varvec{b}_{n} \mathclose {)}\) of L.

Given a lattice basis \(\varvec{B} = \mathopen {(} \varvec{b}_{1}, \ldots , \varvec{b}_{n} \mathclose {)}\) of a lattice L, the i -th orthogonal projection by \(\varvec{B} \), denoted by \(\pi _{\varvec{B}, i} : \mathord {\mathbb {R}}^n \rightarrow \mathop {\mathrm {span}}({\varvec{b}_{1}, \ldots , \varvec{b}_{i - 1}})^{\perp }\), is defined as \(\pi _{\varvec{B}, i}(\varvec{v}) = \varvec{v} - \sum _{j = 1}^{i - 1} v^*_j \varvec{b}^*_j\), where \(v^*_{j} = \langle {\varvec{v}, \varvec{b}^*_{j}}\rangle / \mathopen {||} \varvec{b}^*_j \mathclose {||}^2 \in \mathord {\mathbb {R}}\) (for \(i = 1\), \(\pi _{\varvec{B}, 1}\) is the same as the identity map). Note that \(\mathopen {||} \pi _{\varvec{B}, i}(\varvec{v}) \mathclose {||}^2 = \sum _{j = i + 1}^{n} \big (\langle {\varvec{v}, \varvec{b}^*_{j}}\rangle / \mathopen {||} \varvec{b}^*_{j} \mathclose {||}\big )^2\). We define the i -th orthogonal complement by \(\varvec{B} \) as \(L_{\varvec{B}, i} := \pi _{\varvec{B}, i}(L) = \mathopen {\{} \pi _{\varvec{B}, i}(\varvec{v}) \mathrel {|} \varvec{v} \in L \mathclose {\}}\). We call \(\mathopen {||} \pi _{\varvec{B}, i}(\varvec{v}) \mathclose {||}\) the orthogonal length of \(\varvec{v}\) on \(L_{\varvec{B}, i}\); this is an important measurement in the context of lattice basis reduction.

2.2 Natural Number Representation

In this section, we explain the natural number representation (NNR) of a lattice vector. In Schnorr’s algorithm [47], a lattice vector is represented by an n-dimensional sequence consisting of 0 s and 1s, where n is the dimension of the given lattice basis. The numbers 0 and 1 express the amount of aberration from the orthogonal point at each index, and one can calculate a lattice vector from the lattice basis and this sequence. Moreover, Buchmann and Ludwig [13, 14, 37] indicate that one can generalize 0, 1 to the natural numbers 0, 1, 2, \(\ldots \), and calculate the expected value of squared lengths of the lattice vectors generated from a sequence of natural numbers. Fukase and Kashiwabara [20] call such a sequence of natural numbers a natural number representation (NNR), and Aono and Nguyen [5] call it a tag (defined on the natural numbers). One can determine a set of NNRs that has a small expected value of squared lengths for a given basis. The following definitions and statements are basically reprinted from [20], but similar descriptions are given in other papers [5, 13, 14, 37].

Definition 1

(Natural Number Representation). Let \(\varvec{B} = \mathopen {(} \varvec{b}_{1}, \ldots , \varvec{b}_{n} \mathclose {)}\) be a lattice basis of the lattice L. Given a lattice vector \(\varvec{v} = \sum _{i = 1}^n \nu _i \varvec{b}^*_i \in L\), the natural number representation (NNR) of \(\varvec{v}\) on \(\varvec{B}\) is a vector of n non-negative integers \(\mathopen {(} \mathfrak {n}_{1}, \ldots , \mathfrak {n}_{n} \mathclose {)} \in \mathord {\mathbb {N}}^n\) such that \(-(\mathfrak {n}_i + 1) / 2 < \nu _i \le -\mathfrak {n}_i / 2\) or \(\mathfrak {n}_i / 2 < \nu _i \le (\mathfrak {n}_i + 1) / 2\) for all \(i = 1, \ldots , n\).

Theorem 1

[20]. Let \(\varvec{B} = \mathopen {(} \varvec{b}_{1}, \ldots , \varvec{b}_{n} \mathclose {)}\) be a lattice basis of the lattice L, and let \(\varvec{v} = \sum _{i = 1}^n \nu _i \varvec{b}^*_i \in L\). For any lattice vector \(\varvec{v} \in L\), the NNR of \(\varvec{v}\) is uniquely determined, and the map \(\mathord {\mathbf {n}}_{\varvec{B}} : L \rightarrow \mathord {\mathbb {N}}^n\), which is given by \(\mathord {\mathbf {n}}_{\varvec{B}}(\varvec{v}) := \mathopen {(} \mathfrak {n}_{1}, \ldots , \mathfrak {n}_{n} \mathclose {)}\), is a bijection.

Assumption 1

(Randomness Assumption [5, 20, 47]). Let \(\varvec{v} = \sum _{j = 1}^{n} \nu _{j} \varvec{b}^*_{j}\) be a lattice vector, and \(\varvec{v}\) has the NNR \(\varvec{\mathfrak {n}} = \mathopen {(} \mathfrak {n}_{1}, \ldots , \mathfrak {n}_{n} \mathclose {)}\). The coefficients \(\nu _j\) of \(\varvec{v}\) are uniformly distributed in \((-(\mathfrak {n}_{j} + 1)/2, -\mathfrak {n}_{j} / 2]\) and \((\mathfrak {n}_{j} / 2, (\mathfrak {n}_{j} + 1) / 2]\) and statistically independent with respect to j.

The difference from the original randomness assumption [47, 48] is that the NNRs are generated by a procedure called the sampling algorithm and these NNRs belong to a subset of \(\mathopen {\{} 0, 1 \mathclose {\}}^n\) defined as

$$\begin{aligned} \mathopen {\{} (0, \ldots , 0, \mathfrak {n}_{n - u}, \ldots , \mathfrak {n}_{n - 1}, 1) \mathrel {|} \mathfrak {n}_{n - u}, \ldots , \mathfrak {n}_{n - 1} \in \mathopen {\{} 0, 1 \mathclose {\}} \mathclose {\}}, \end{aligned}$$

where u is a positive integer parameter of the sampling algorithm.

Theorem 2

[20]. Let \(\varvec{B} = \mathopen {(} \varvec{b}_{1}, \ldots , \varvec{b}_{n} \mathclose {)}\) be a lattice basis of the lattice L, and let \(\varvec{\mathfrak {n}} = \mathopen {(} \mathfrak {n}_{1}, \ldots , \mathfrak {n}_{n} \mathclose {)}\) be an NNR. The expectation of the squared length of the corresponding lattice vector \(\varvec{v}\) of \(\varvec{\mathfrak {n}}\) (namely, \(\mathord {\mathbf {n}}_{\varvec{B}}(\varvec{v}) = \mathopen {(} \mathfrak {n}_{1}, \ldots , \mathfrak {n}_{n} \mathclose {)}\)) is:

$$\begin{aligned} \mathord {E}_{} \left[ \Vert \varvec{v} \Vert ^2 \right] = \frac{1}{12}\sum _{i = 1}^n (3\mathfrak {n}_i^2 + 3\mathfrak {n}_i + 1) \Vert \varvec{b}^*_i \Vert ^2. \end{aligned}$$
(1)

Theorem 2 states that one can calculate the expected value of the squared length of a generated lattice vector with given NNR in a given basis without calculating the coordinates of the lattice vector. For a relatively reduced basis (the output of the LLL algorithm, the BKZ algorithm, or other variants), the sequence \((\mathopen {||} \varvec{b}^*_1 \mathclose {||}^2, \mathopen {||} \varvec{b}^*_2 \mathclose {||}^2, \ldots , \mathopen {||} \varvec{b}^*_n \mathclose {||}^2)\) tends to decrease with respect to the indices. Therefore, a short lattice vector tends to have natural number 0 in first consecutive indices of the NNR, and natural numbers 1 and 2 appear in last consecutive indices in the NNR for any relatively reduced basis.

2.3 Shortest Vector Problem and SVP Challenge

Given a lattice basis \(\varvec{B} \) of a lattice L of dimension n, the shortest vector problem (SVP) consists in finding the shortest non-zero vector \(\varvec{v} \in L\). The SVP is NP-hard under randomized reduction [3]. It is hard to find the shortest lattice vector in the lattice when n is large. However, thanks to the Gaussian heuristic \(\mathop {\mathrm {GH}} (L)\), the length of the shortest lattice vector in L can be estimated as \(\mathop {\mathrm {GH}} (L) := \big (\mathop {\mathrm {\Gamma }}(n/2 + 1) \cdot \det (L)\big )^{1/n} / \sqrt{\pi }\), where \(\mathop {\mathrm {\Gamma }}\) is the gamma function [32, 40]. The root Hermite factor of \(\varvec{v}\) and L is defined by \(\big (\Vert \varvec{v} \Vert / \det (L)^{1/n}\big )^{1/n}\). Given a lattice vector \(\varvec{v}\), the corresponding root Hermite factor is used to make comparative measurements among different lattices, because it is independent of the lattice basis [22, 42].

The security of lattice-based cryptosystems depends on the hardness of the lattice problem. Therefore, it is important to study efficient algorithms for solving the SVP. Here, a competition, called the SVP Challenge [45], was maintained by Technische Universität Darmstadt since 2010. The SVP Challenge gives a problem instance generator that produces SVP instances and hosts a ranking of registered lattice vectors as follows. A submitted lattice vector \(\varvec{v} \in L\) is registered if one of the following conditions is satisfied: no lattice vector is registered at the dimension of \(\varvec{v}\) and \(\mathopen {||} \varvec{v} \mathclose {||} < 1.05 \cdot \mathop {\mathrm {GH}} (L)\), or \(\varvec{v}\) is shorter than any other lattice vector of the same dimension. In this paper, for a lattice L, we call a lattice vector \(\varvec{v} \in L\) a solution of L if \(\mathopen {||} \varvec{v} \mathclose {||} < 1.05 \cdot \mathop {\mathrm {GH}} (L)\).

3 Brief Introduction to Sampling Reduction

Before explaining our algorithm, we briefly introduce the RSR algorithm and its extensions [20, 37, 47, 48].

3.1 Overview of Sampling Reduction

Here, we briefly introduce the RSR algorithm and its extensions [20, 37, 47, 48] and describe the parts that our algorithm shares with them.

First of all, we define the following notions and terminology.

Definition 2

(lexicographical order). Let \(\varvec{B} = \mathopen {(} \varvec{b}_{1}, \ldots , \varvec{b}_{n} \mathclose {)}\) and \(\varvec{C} = \mathopen {(} \varvec{c}_{1}, \ldots , \varvec{c}_{n} \mathclose {)}\) be two lattice bases of the same lattice. We say that \(\varvec{B} \) is smaller than \(\varvec{C} \) in the lexicographical order if and only if \(\mathopen {||} \varvec{b}^*_j \mathclose {||} < \mathopen {||} \varvec{c}^*_j \mathclose {||}\), where j is the smallest index for which \(\mathopen {||} \varvec{b}^*_j \mathclose {||}\) and \(\mathopen {||} \varvec{c}^*_j \mathclose {||}\) are different.

Definition 3

(insertion index). Given a lattice vector \(\varvec{v}\) and a real number \(\delta \) with \(0 < \delta \le 1\), the \(\delta \) -insertion index of \(\varvec{v}\) for \(\varvec{B} \) is defined by

$$\begin{aligned} \mathord {\mathbf {h}}_{\delta }(\varvec{v}) = \min \big ( \mathopen {\{} i \mathrel {|} \mathopen {||} \pi _{\varvec{B}, i}(\varvec{v}) \mathclose {||}^2 < \delta \mathopen {||} \varvec{b}^*_i \mathclose {||}^2 \mathclose {\}} \mathbin {\cup }\mathopen {\{} \infty \mathclose {\}} \big ). \end{aligned}$$

Note that the \(\delta \)-insertion index is a parameterized notion of Schnorr [47, 48].

The RSR algorithm and its extensions need a lattice reduction algorithm to reduce a given lattice basis in lexicographical order, and the LLL and BKZ algorithms can be used for this purpose. In this paper, we define this operation as follows.

Definition 4

Let \(\varvec{v} \in L\) be a lattice vector with \(\mathopen {||} \pi _{\varvec{B}, i}(\varvec{v}) \mathclose {||} < \delta \mathopen {||} \varvec{b}^*_i \mathclose {||}\). We call the following operation \(\delta \) -LLL reduction of \(\varvec{B} \) at index i by \(\varvec{v}\): generate an \((n + 1) \times n\) matrix \(M = (\varvec{b}_1, \ldots , \varvec{b}_{i - 1}, \varvec{v}, \varvec{b}_{i}, \ldots , \varvec{b}_n)\), compute the \(\delta \)-LLL reduced (full rank) lattice basis \(\varvec{B} '\) by using the LLL algorithm with input M, and output the resulting lattice basis.

As mentioned above, one can use another lattice reduction algorithm. In fact, the RSR algorithm [47, 48] uses the BKZ algorithm.

The RSR algorithm and its extensions consist of two major parts. The first part takes a lattice basis \(\varvec{B} \) and a set of NNRs as input and generates a lattice vector that corresponds to an NNR and \(\varvec{B} \). It outputs a set of generated lattice vectors. In this paper, we call this operation lattice vector generation. The second part performs lattice basis reduction. If the result of the lattice vector generation is a lattice vector \(\varvec{v}\) whose orthogonal length is shorter than a orthogonal basis vector of \(\varvec{B} \), in other words, whose \(\delta \)-insertion index is finite, \(\varvec{B} \) is reduced by using \(\delta \)-LLL reduction of \(\varvec{B} \) at index \(\mathord {\mathbf {h}}_{\delta }(\varvec{v})\) by \(\varvec{v}\). Note that the lexicographical order of the resulting lattice basis of this part is smaller than the input. To summarize the RSR algorithm and its extensions, alternately repeat the lattice vector generation and the lattice basis reduction to reduce a given lattice basis. To solve the SVP, these calculations are repeated until the length of the first basis vector of the obtained lattice basis is less than \(1.05 \cdot \mathop {\mathrm {GH}} \big (\mathop {\mathcal {L}}(\varvec{B})\big )\), where \(\varvec{B} \) is the given lattice basis.

To reduce the lattice basis or solve the SVP efficiently, it is important to choose a lattice vector as input for the lattice basis reduction from the result of the lattice vector generation. Each previous study has its own selection rule. In the RSR algorithm, a lattice vector \(\varvec{v}\) with \(\mathord {\mathbf {h}}_{\delta }(\varvec{v}) \le 10\) is chosen from the result of the lattice vector generation. In the SSR algorithm, a lattice vector \(\varvec{v}\) with \(\mathord {\mathbf {h}}_{\delta }(\varvec{v}) = 1\) is chosen. Fukase and Kashiwabara [20] introduce a rule to choose a lattice vector, called restricting reduction, as explained in Sect. 3.2.

Ludwig [37] and Fukase and Kashiwabara [20] computed the expectation of the squared length of the generated lattice vectors in the lattice vector generation. According to the expectation value of squared length, their algorithm prepared lists of NNRs with 0, 1, and 2, and generated the corresponding lattice vectors from the lattice basis. Aono and Nguyen [5] presented an algorithm to produce a set of better NNRs (in their terms, tags).

3.2 Fukase–Kashiwabara Algorithm

As our algorithm is an extension of a parallelization strategy and uses the global shared storage proposed by Fukase and Kashiwabara [20], we will briefly introduce their proposal before explaining ours. Note that our algorithm does not use the restricting reduction of their algorithm. We refer the reader to [20] for the details of the FK algorithm.

Restricting Reduction. Ludwig [37] and Fukase and Kashiwabara [20] showed that one can calculate the expected value of the squared lengths of the generated lattice vectors for a lattice basis and a set of NNRs, and this expectation is proportional to the sum of the squared lengths of the orthogonal basis vectors. Therefore, a nice reduction strategy seems to reduce the lattice basis so as to decrease the sum of squared lengths, and hence, such a strategy is considerable. Fukase and Kashiwabara [20] proposed a method for decreasing the sum of squared lengths by introducing the restricting reduction index; this strategy is briefly explained below.

To decrease the sum of squared lengths quickly, some application of a lattice vector to the lattice basis is not good. Moreover, some application of a lattice vector which decreases the sum of squared lengths is not the fast way to reach a very small sum of squared lengths. To decrease the sum of squared lengths quickly, Fukase and Kashiwabara [20] used a restricting index. For an index \(\ell \), called the restricting reduction index, the lattice basis is reduced at an index \(\ell '\) only after \(\ell \). In other words, the basis vectors before \(\ell \) are fixed. Let \(\ell \) be 0 at the beginning of the program. As the main loops are executed, the restricting index is gradually increased according to some rule. When the restricting index reaches a certain value, it is set to 0 again. By using the restricting index, the sum of squared lengths of orthogonal basis vectors is decreased from first consecutive indices.

Stock Vectors. Some lattice vectors may be useful even if they are not short enough for reducing the lattice basis. For example, after reduction of the lattice basis at index i by another lattice vector, a lattice vector \(\varvec{v}\) may be applied at index \(i + 1\) or greater. In addition, after changing the restricting index to 0, a lattice vector \(\varvec{v}\) may be applied to the lattice basis if it was not used for reducing the lattice basis because of a previous high restriction index. Therefore, lattice vectors that are relatively shorter than the i-th orthogonal basis vector are stored in global shared storage, and each stored lattice vector is associated with an orthogonal complement \(L_{\varvec{B}, i}\) (namely, the place of the stored vector is determined by the basis vectors \(\varvec{b}_1, \ldots , \varvec{b}_{i - 1}\)) in order to load and use it for reducing the lattice basis by all the parallel running processes. Fukase and Kashiwabara [20] call these stored vectors stock vectors.

Parallelization. A set of generated lattice vectors is determined from the lattice basis and set of NNRs. Therefore, even if the sets of NNRs are the same, two different lattice bases generate different sets of lattice vectors that have few vectors in common. Fukase and Kashiwabara [20] utilized this heuristic approach with stock vectors to generate a lot of lattice vectors and cooperate with each parallel running process. Given a lattice basis, their algorithm generates different lattice bases except for first consecutive indices from 1 and distributes each generated basis to each process. In other words, the processes have lattice bases that have different basis vectors at last consecutive indices. Since these lattice bases have a common orthogonal complement at first consecutive indices, processes can share stock vectors associated with common basis vectors of first consecutive indices, and since they have different basis vectors at last consecutive indices, the generated lattice vectors could be different from each other.

Note that basis reduction is done by each process, and keeping first consecutive basis vectors the same is difficult. Thus, some mechanism is required to keep first consecutive basis vectors the same among all the processes. Fukase and Kashiwabara [20] proposed a method by storing basis vectors as the stock vectors.

4 Overview of Our Parallelization Strategy

First, we briefly explain parallelization and the problems that make it hard to exploit the performance of parallel computation. Then, we briefly describe our strategy.

4.1 Technical Hurdles and Brief Review of Parallelization Methodology

Several researchers have used parallel computation to solve the SVP. In particular, there are three types of parallel algorithm for solving it.

The first type is the divide-and-conquer approach for parallelizing the ENUM, pruned ENUM, and sampling algorithms of lattice vectors [16, 17, 23, 30, 34, 46]. Several researchers [16, 17, 30, 34] studied parallel ENUM algorithms based on this methodology. This type of parallelization can be used to speed up the search for the shortest or a relatively short lattice vector. However, it is hard to solve high-dimensional SVPs by using this type of parallelization, which take an exponentially long time in practice. Hence, this type of parallelization is out of the scope of this paper.

The second type is randomization of lattice bases to exploit the power of massively parallel computing environments. The authors of reference [34] proposed an implementation to solve high-dimensional SVPs by exploiting massively parallel running processes. Their implementation works as follows. For a given lattice basis, the method generates a lot of lattice bases by multiplications with random unimodular matrices. The generated lattice bases are distributed to the processes, and each process runs the BKZ and pruned parallel ENUM algorithms. This method reduces a lot of lattice bases simultaneously, so the probability of finding a short lattice vector becomes higher. However, each process works independently; therefore, the information for accelerating lattice basis reduction is not shared among processes.

The third type of parallelization, proposed by Fukase and Kashiwabara [20], involves alteration of last consecutive basis vectors, which is briefly explained in Sect. 3.2. For a given lattice basis, their method generates a lot of lattice vectors from the given basis and a set of NNRs in order to find lattice vectors with a shorter length than the given orthogonal basis vectors. Their key idea consists of two heuristic approaches. Firstly, each process executes lattice vector generation, which takes the same set of NNRs and a different lattice basis as input in order to generate a different set of lattice vectors. To generate many different lattice bases, each process computes the \(\delta \)-LLL reduction of a lattice basis of dimension n at last consecutive indices \(\ell , \ell + 1, \ldots , n\) by using a different lattice vector. Note that different lattice bases might be outputted for different input lattice vectors. Secondly, if the processes have different lattice bases but same first consecutive basis vectors with indices \(1, 2, \ldots , \ell '\), then useful lattice vectors, which are contained in the same orthogonal complements (called stock vectors by Fukase and Kashiwabara), could be shared among the processes in order to accelerate the lattice basis reduction. Fukase and Kashiwabara proposed a parallelization strategy based on these heuristics. However, they reported experimental results with only 12 or fewer parallel processes. In fact, in our preliminary experiments of the parallelization strategy of Fukase and Kashiwabara [20], we observed that many parallel running processes to solve higher dimensional problems did not keep the same first consecutive basis vectors. Since current massively parallel computing environments have many more physical processing units (often, more than 100), we conclude that there is still room for scalable massive parallelization.

The parallelization strategy in our algorithm is an extension of [20]. The key idea of our method is explained next.

4.2 Key Idea of Our Parallelization Strategy

To reduce a lattice basis by utilizing the performance of a parallel computing environment effectively, one needs a scalable parallelization strategy that keeps first consecutive indices’ basis vectors common but last consecutive indices’ basis vectors different on the many lattice bases of the individual processes with a small synchronization penalty in practice. To accomplish this, we extend the parallelization strategy of Fukase and Kashiwabara [20]. Our strategy consists of two methodologies. The first enables each process to generate a lattice basis variant which has different last consecutive indices’ basis vectors from those of other processes, and the second enables each process to share first consecutive indices’ basis vectors. The rest of this section briefly explains our idea, and Sect. 5.2 explains it in detail.

Our lattice basis reduction with process-unique information (e.g., process ID, MPI rank, etc.) generates many lattice basis variants that have different last consecutive indices’ basis vectors. In addition, each process generates a lot of lattice vectors to reduce the lattice basis by using a given lattice basis and a set of NNRs. If there is a lattice vector whose orthogonal length is relatively shorter than the orthogonal basis vectors in first consecutive indices, the process stores it in the global shared storage of stock vectors, by using the method proposed by Fukase and Kashiwabara [20], as explained in Sect. 3.2. Thanks to the process-unique information, each process can use this information as a seed for generating different lattice bases even if each process takes the same lattice basis as input. During the reduction execution, the processes work independently except for the access to the global shared storage and do not communicate with each other. The most computationally expensive operation is generating a lot of lattice vectors, and this operation is calculated in each process independently. Hence, there is only a negligible synchronization penalty in practice.

As mentioned above regarding the second method, Fukase and Kashiwabara [20] suggested that cooperation among parallel running processes can be enabled by sharing first consecutive indices’ basis vectors. However, maintaining this situation without incurring a painfully large synchronization cost is a difficult task when there are many processes. Fukase and Kashiwabara [20] proposed to use global shared storage for stock vectors, as explained in Sect. 3.2. However, it is unclear how their method can be used to keep common basis vectors in first consecutive indices among many processes. We propose a method to share the basis vectors in first consecutive indices by using additional global shared storage. In our algorithm, each process has its own lattice basis. In every loop, each process stores basis vectors in first consecutive indices of its own lattice basis in this extra global shared storage; then it loads lattice vectors from the storage and computes the smallest lattice basis in lexicographical order by using the stored lattice vectors and their own lattice bases. Note that only first consecutive indices’ basis vectors are stored in the global storage. In this method, many processes cause accesses to the global shared storage, and synchronization is required. However, the synchronization penalty is practically smaller than that of communication among many processes, and as mentioned above, the most computationally expensive operation is generating lattice vectors.

5 Our Algorithm

figure a
figure b

The most important part of our proposal is the methodology of parallelization. We designed and implemented our parallelization methodology in the single program multiple data (SPMD) style [41]. Namely, each process executes the same program, but each process has a different internal state from each other. Our parallelization is an extension of the parallelization strategy and uses the global shared storage and stock vectors proposed by Fukase and Kashiwabara [20], as explained in Sect. 3.2. In addition, it contains a lattice basis reduction strategy based on our new evaluation function.

Our algorithm, listed in Algorithms 1 and 2, is based on the Fukase–Kashiwabara (FK) algorithm [20]; in particular, its lattice basis reduction is similar to that method, namely, by generating a lot of lattice vectors by using a lattice basis \(\varvec{B} \) and a set of natural number representations (NNRs), as defined by Definition 1 in Sect. 2.2, and reducing \(\varvec{B} \) to significantly decrease the sum of squared lengths of orthogonal basis vectors of \(\varvec{B} \) by using the generated vectors.

Our method to generate a lot of lattice vectors from a lattice basis is similar to the FK algorithm. Note that Aono and Nguyen [5] presented an algorithm to produce a set of better NNRs (in their terms, tags), but we did not use it.

The major differences from the FK algorithm are as follows: a parallelization strategy, and a lattice basis reduction strategy based on our evaluation function in order to decrease the sum of squared lengths significantly for each process. In Sect. 5.1, we show our new lattice basis reduction strategy for each process. In Sect. 5.2, we describe our new parallelization strategy. We explain how to choose parameters of our proposed algorithm in Sect. 5.3.

5.1 Basis Reduction Strategy Using Evaluation Function

As mentioned above, our algorithm is based on the FK algorithm [20]. FK algorithm adopted restricting index as its lattice basis reduction method. But our algorithm uses a following evaluation function of bases on lattice basis reduction. Our algorithm generates a lot of lattice vectors by using a lattice basis \(\varvec{B} \) and a set of NNRs. Discovered relatively shorter lattice vectors, called stock vectors, are stored in global shared storage [20], as explained in Sect. 3.2. After generating these stock vectors, our algorithm uses them to reduce \(\varvec{B} \). Several stock vectors have the \(\delta \)-insertion index at an index i for \(\varvec{B} \), which is defined in Sect. 2.1, so that there are several choices of stock vector that can reduce \(\varvec{B} \). Now we face the problem of deciding which stock vector is the best to reduce \(\varvec{B} \). An answer to this question is important for finding a solution efficiently.

However, based on our preliminary experiments with reducing higher dimensional lattice bases, this is quite difficult problem. In order to manage to resolve this difficulty, we introduce the following evaluation function \(\mathsf {Eval}\) and a positive real number parameter \(\Theta \) less than 1 for the lattice basis \(\varvec{B} = \mathopen {(} \varvec{b}_{1}, \ldots , \varvec{b}_{n} \mathclose {)}\):

$$\begin{aligned} \mathsf {Eval}(\varvec{B}, \Theta ) = \sum _{i = 1}^n \Theta ^i \mathopen {||} \varvec{b}^*_i \mathclose {||}^2. \end{aligned}$$
(2)

It uses the following strategy at line 9. Replace \(\varvec{B} \) by a lattice basis \(\overline{\varvec{B}}\) which has the minimum \(\mathsf {Eval}(\overline{\varvec{B}}, \Theta )\), where \(\overline{\varvec{B}}\) is computed by the \(\delta \)-LLL reduction of \(\varvec{B} \) at index i by \(\varvec{v}\), which is a lattice vector from the global shared storage of stock vectors with \(\mathord {\mathbf {h}}_{\delta _{\mathrm {stock}}}(\varvec{v}) = i \le \ell _{\mathrm {fc}}\).

Now let us precisely describe what is difficulty and why we introduced \(\mathsf {Eval}\) and \(\Theta \). According to [13, 14, 20, 37], the average squared lengths of lattice vectors generated by \(\varvec{B} \) and the set of NNRs are proportional to the sum of squared lengths of orthogonal basis vectors of \(\varvec{B} \), explained in Sect. 2.2. In this way, the SVP can be solved by searching for shorter lattice vectors and using the found lattice vector to reduce the lattice basis. At this time, lattice basis reduction with a shorter lattice vector yields a reduced lattice basis in the context of the lexicographical order, which is defined in Sect. 2.1; however, the sum of the squared lengths of orthogonal basis vectors of the resulting lattice basis becomes much larger in practice. Namely, there is a trade-off between two reduction strategies (decreasing the lexicographical order and the sum of squared lengths of orthogonal basis vectors). To resolve this dilemma, a method for optimally choosing between these strategies is needed. However, as mentioned above, it seems to be quite difficult because, to the best of our knowledge, basic theories allowing this have not been developed in this research area. In order to manage this issue, we decide to combine both strategies and this is the reason why we introduced \(\mathsf {Eval}\) and \(\Theta \).

The evaluation function \(\mathsf {Eval}\) attempts to combine the lexicographical order and the order defined by the sum of squared lengths with an adjustable parameter \(\Theta \). For a lattice basis \(\varvec{B} \), if \(\Theta \) is close to 1, then \(\mathsf {Eval}(\varvec{B}, \Theta )\) considers that decreasing the sum of squared lengths of the orthogonal basis vectors is more important, otherwise decreasing the lexicographical order is more important. \(\mathsf {Eval}\) balances the two orders and yields a flexible and efficient basis reduction strategy for solving the SVP.

5.2 Parallelization Strategy

Now, we will explain parallelizing the computations method of our lattice reduction algorithm. If the parallel processes have the same lattice basis and generate the same set of lattice vectors, these parallel computations would be useless. As stated in Sect. 4, we generate different lattice bases for each process. However, if parallel processes have completely different lattice bases, the parallel computation cannot be cooperative.

Fukase and Kashiwabara [20] proposed a parallel computation method such that the basis vectors in first consecutive indices are the same among the processes and the basis vectors in last consecutive indices are different among the processes. Since the basis vectors at last consecutive indices are different, the same set of natural number representations generates a different set of lattice vectors. However, they did not describe an effective parallel computation for more than 100 processes. We propose an effective method for parallel computations, in which each process has same first consecutive basis vectors but different last consecutive basis vectors. In our method, the computational cost of synchronization is practically small. Our method consists of two parts: a part in which each process has different last consecutive basis vectors, and method part in which each process has common first consecutive basis vectors.

Lattice Basis Variants Generation. Here, we explain our new lattice reduction procedure with process-unique information in order to generate many lattice basis variants that have different basis vectors in last consecutive indices. Each process executes this procedure independently with process-unique information \(\mathtt {pui}\). The procedure is shown in Algorithm 2.

The discontinue rule at line 16 in Algorithm 2 is important. This rule depends on the process-unique information \(\mathtt {pui}\) such that each process executes different lattice basis reduction operations for last consecutive indices of the lattice basis. Thanks to this procedure, a huge number of processes can generate many lattice basis variants even if they have the same lattice basis without communicating with each other.

Cooperation Between Many Processes. As mentioned above, one can make a large amount of processes whose lattice bases have different basis vectors in last consecutive indices. Since these processes have the same basis vectors in first consecutive indices from 1, they utilize the stock vectors they have in common. Fukase and Kashiwabara [20] proposed a method to keep this situation by using stock vectors, as explained in Sect. 3.2; however, it is difficult to maintain for a large number of processes solving higher dimensional problems without imposing a heavy synchronization penalty.

We propose a method to solve the above problem. In it, all the processes share additional global storage, and the basis vectors of the lattice basis possessed by each process are stored in it in the same manner as stock vectors but with a different strategy as explained next. We call the stored lattice vectors link vectors. When each process has a new lattice basis, its basis vectors are stored as link vectors. After each process loads link vectors from their storage, it uses them to reduce the lattice basis to the smallest one in the lexicographical order. In this method, the synchronization penalty (computational cost) consists of storing and loading link vectors from storage, but this penalty is practically small.

Note that we can adjust the parameter \(\delta _{\mathrm {stock}}\) in such a way that quite a few stock vectors are found in one process and one main loop evaluation, and we run around thousand processes in parallel, each of which stores stock vectors independently. However, the number of stored stock vectors at any given time is not that large, as we do not keep all stock vectors. This is because old stock vectors are relatively longer than orthogonal basis vectors possessed by running processes, and it is therefore not necessary to store these, and we throw them away. This parameter adjustment is also explained in next subsection. The same strategy was used for the link vectors.

5.3 Parameter Choice

Our algorithm takes a basis \(\varvec{B} \) and process-unique information \(\mathtt {pui}\) with parameters \(\ell _{\mathrm {fc}}\), \(\ell _{\mathrm {lc}}\), \(\delta \), \(\delta '\), \(\delta ''\), \(\delta _{\mathrm {stock}}\), \(\ell _{\mathrm {link}}\), \(\Theta \), and two sets S and \(S'\) of NNRs used in line 5 in Algorithm 1 and line 4 in Algorithm 2 as input. \(\varvec{B} \) is given by a user, and \(\mathtt {pui}\) is given by a user or a facility, e.g., MPI libraries and a job scheduler, but choosing the remaining parameters is not trivial. We did not find an optimal choice of parameters because, to the best of our knowledge, basic theories allowing this have not been developed. In this section, we explain a method of parameter choice for our algorithm. Note that the concrete chosen values of the parameters when we solved a 150-dimensional problem instance appear in the next section.

More precisely, we observed intermediate values in the main loop of our algorithm (lines 2–14 in Algorithm 1) with input a basis (e.g., 150-dimensional problem instance in the SVP Challenge). The intermediate values which we observed are as follows: (1) The number of found stock vectors. (2) The sum of squared lengths of the orthogonal basis vectors. (3) The calculation times of the two lattice vector generations in the main loop called at lines 4 and 5 in Algorithm 1.

We want that (1) is quite small because we are only interested in relatively short lattice vectors, and adjusting this can be immediately done by decreasing \(\delta _{\mathrm {stock}}\).

We adjust parameters \(\ell _{\mathrm {fc}}\), \(\ell _{\mathrm {lc}}\), \(\ell _{\mathrm {link}}\), and \(\Theta \) to appropriate values such that the processes running in parallel can keep a small value of (2) when reducing the basis. As mentioned in Sect. 5.1, it is quite hard to determine the value of \(\Theta \), so that we choose and adjust the parameters by observations regarding the performance and the intermediate values of the main loop of our algorithm.

Regarding (3), we adjust \(\ell _{\mathrm {lc}}\), \(\delta \), \(\delta '\), \(\delta ''\), the set S of NNRs at line 5 in Algorithm 1, and the set \(S'\) of NNRs at line 4 in Algorithm 2. Our algorithm has two subroutines to generate lattice vectors by using S and \(S'\) (in line 5 in Algorithm 1 and line 4 in Algorithm 2), and they have different purposes. Since these lattice vector generations are the most costly parts of our algorithm, it is important to optimize the choices of S and \(S'\); however, these sets differ only in their size. Note that NNRs in S and \(S'\) are chosen by ascending order of expectation of squared lengths (shown in Theorem 2) of generated lattice vectors with several bases.

The purpose of the lattice vector generation with S (at line 5 in Algorithm 1) is finding relatively short lattice vectors, so the size of S should be large. The purpose of the other lattice vector generation with \(S'\) (at line 4 in Algorithm 2) is decreasing the sum of squared length of orthogonal basis vectors and generating many lattice bases variants as mentioned in Sect. 5.2 by repeatedly applying lattice reduction (e.g., the LLL algorithm), so it is not necessary to choose the size of \(S'\) as large.

Actually, we adjust the parameters \(\ell _{\mathrm {lc}}\), \(\delta \), \(\delta '\), \(\delta ''\), and sizes of sets S and \(S'\) in such a way that the calculation times of two lattice vector generations with S and \(S'\) are around several minutes. More concretely, we chose the parameters such that the calculation times are 5 min and 10 min, respectively, and the lattice vector generation with \(S'\) is repeatedly evaluated around 500 times per one Algorithm 2 evaluation on average when we found a solution of a 150-dimensional problem instance of the SVP Challenge.

6 Application to SVP Challenge

The shortest vector problem (SVP) is an important problem because of its relation to the security strength of lattice-based cryptosystems, and studying algorithms for solving it is an important research topic. The SVP Challenge was started in 2010 [45] as a way of advancing this research. The challenge provides a problem instance generator and accepts submissions of discovered short lattice vectors. As mentioned in Sect. 2.3, a submitted lattice vector of a lattice L is inducted into the hall-of-fame if its length is less than \(1.05 \cdot \mathop {\mathrm {GH}} (L)\) and less than the lengths of other registered lattice vectors if they exist for the same dimension.

6.1 Equipment

We applied our algorithm to the problems in the SVP Challenge. In this section, we explain our equipment and results.

Table 1. Specifications of cluster system Oakleaf-FX to solve the 134-dimension SVP Challenge; note that the units of “CPU frequency” and “Total RAM” are GHz and GB, respectively
Table 2. Specifications of cluster system Chimera to solve SVP Challenge instances of 138, 140, 142, 144, 146, 148, and 150 dimensions; note that the units of “CPU frequency” and “Total RAM” are GHz and GB, respectively
Table 3. Specifications of cluster system ASGC to solve 148 and 150-dimensions instances of SVP Challenge; note that the units of “CPU frequency” and “Total RAM” are GHz and GB, respectively
Table 4. Specifications of cluster system Reedbush-U to solve the 150-dimensional instance of the SVP Challenge; note that the units of “CPU frequency” and “Total RAM” are GHz and GB, respectively

First, let us describe our equipment. We used four clusters, called Oakleaf-FX (FX10), Chimera, ASGC, and Reedbush-U. The hardware specifications of these clusters are summarized in Tables 1, 2, 3, and 4. The FX10 cluster included a customized compiler, MPI library, and job scheduler. The Chimera, ASGC, and Reedbush-U clusters included compilers, MPI libraries, and job schedulers. In particular, we used GCC version 4.8.4, OpenMPI version 1.8.1, and Univa Grid Engine version 8.1.7 in the Chimera, Intel C++ Compiler version 14.0.2, Intel MPI Library version 4.1, and TORQUE version 4.2.8 in the ASGC, and Intel C++ Compiler version 16.0.3, Intel MPI Library version 5.1, and PBS Professional 13.1 in the Reedbush-U. Note that Hyper-Threading was enabled and Turbo Boost was disabled in all nodes of Chimera, while Hyper-Threading was disabled and Turbo Boost was disabled in all nodes of ASGC and Reedbush-U. Note also that these clusters were shared systems, so all the processes were controlled by the job schedulers. Now, let us briefly explain the pre-processing. Before executing processes for our algorithm, the lattice basis of the input was pre-processed. We used the BKZ algorithm implemented by fplll [50] to reduce components of the lattice basis to small integers.

For each process when we solve the problem instances, the process-unique information \(\mathtt {pui}\) was given as the rank by the MPI libraries and job schedulers, namely, the \(\mathtt {pui}\) value ranged from 0 to \(m - 1\), where m is the number of requested processes sent to the MPI libraries and job schedulers.

6.2 Experimental Results

We used our algorithm to solve SVP Challenge problem instances of 134, 138, 140, 142, 144, 146, 148, and 150 dimensions with seed 0. We used FX10 to solve the 134-dimensional instance, Chimera to solve the instances with 138–146 dimensions, and the two clusters Chimera and ASGC to solve the 148-dimensional instance.

To solve the 150-dimensional instance, we used the following numbers of cores and CPUs of the four clusters: (1) 28 cores of Xeon E5-2695 v3 and (2) 80 cores of Xeon E7-4870 in the parts of Chimera, (3) 100 cores of Xeon E5-2680 v2 in ASGC, (4) 192 cores of SPARC64 IXfx per job in FX10, and (5) 864 cores of Xeon E5-2695 v4 per job in Reedbush-U, for (1) 49 days, (2) 81 days, (3) 39 days, (4) 37 days, and (5) 188 days, respectively. We did not use these computing nodes simultaneously. Therefore, the total wall-clock time was about 394 days. Note that when we used the computing node (1), we requested 24 processes for job schedulers and the other cases (2), (3), (4), and (5), and the number of requested processes was the same as the number of cores.

When we found a solution of the 150-dimensional instance, tuned parameters of our algorithm were as follows: \(\ell _{\mathrm {fc}}= 50\), \(\ell _{\mathrm {lc}}= 50\), \(\delta = 0.9999\), \(\delta ' = 0.985\), \(\delta '' = 0.0002\), \(\delta _{\mathrm {stock}}= 1.08\), \(\ell _{\mathrm {link}}= 50\), and \(\Theta = 0.93\). The method we used to choose these parameters is explained in Sect. 5.3.

We clarify the numbers of lattice vectors generated by the two subroutines in line 5 in Algorithm 1 with a set S of NNRs and in line 4 in Algorithm 2 with a set \(S'\) of NNRs, and the costs per generation when we solved the 150-dimensional problem and problems with a smaller dimensions less than 150. The sizes of S and \(S'\) are around 30 millions NNRs and 50 thousands NNRs, respectively. Note that lines 3–19 in Algorithm 2 are repeated until the termination condition in line 18 is satisfied, and the number of loops is 500 on average. The calculation times of the lattice vector generations with S and \(S'\) are around 5 min and 10 min, respectively. That is, the calculation time of these generations per lattice vector is around 12–20 \(\upmu \)s.

As mentioned in Sect. 5.2, it is not necessary to store all the stock vectors and link vectors, because old stock vectors and link vectors are relatively longer, so that we threw away old stock vectors and link vectors, and we cannot show the total size. The size of the remaining stock vectors and link vectors is around 60 gigabytes. We estimate that the total size of all the found stock vectors and link vectors is several hundred gigabytes for solving the 150-dimensional problem.

Table 5 lists the details of our solutions. The solution of the 150-dimensional problem is the current world record.

Table 5. Our experimental results for the SVP Challenge instances of dimensions 134, 138, 140, 142, 144, 146, 148, and 150, where “\(\mathop {\mathrm {GH}} (L)\)” is the Gaussian heuristic of a given lattice L, “\(\varvec{v}\)” in the column header indicates a solution of the corresponding row, the “Max. # of requested processes” column shows the maximum number of requested processes sent to the job schedulers, and the “Days to solve” column lists the wall-clock times to obtain each solution
Fig. 1.
figure 1

Comparison of solved dimension and single thread calculation time in seconds of our results and results in the SVP Challenge, where plotted boxes are our 8 results, the straight solid line is fitted to our results, plotted circles are other 17 results of problems with dimensions greater than or equal to 100 and specified overall calculation times, and the dotted line is fitted to these results

We show a comparison of solved dimension and single thread calculation time in seconds of our results and the results from the SVP Challenge in Fig. 1. In this figure, plotted boxes are our 8 results, and plotted circles are other 17 results for problems with dimensions greater than or equal to 100 and specified overall calculation times. It is clear that our algorithm outperforms others. This is experimental evidence that our proposal is effective at reducing higher-dimensional lattice bases and solving higher-dimensional SVPs.

Additionally, we show fittings of our and the SVP Challenge results in Fig. 1 obtained by fit function of GNU Plot version 5.2. As a result, we obtained the straight solid line for our results and the dotted line for the other SVP Challenge results. Note that these lines are expressed as \(f(x) = c^{ax + b} + d\) with four variables abcd. Obtained values and asymptotic standard errors of abcd, and values of the root mean square (RMS) of residuals by using fit are as follows: \(a = 1.00287 \pm 933.3\), \(b = 0.999973 \pm 7822\), \(c = 1.16833 \pm 148.9\), and \(d = 1.00093 \pm 2.948e+10\), and the value of RMS is 4.85206e+09 for the straight solid line, and \(a = 0.957735 \pm 843.9\), \(b = 0.988369 \pm 5095\), \(c = 1.14625 \pm 123.6\), and \(d = 1.0004 \pm 5.792e+07\), and the value of RMS is 1.66635e+07 for the dotted line.

7 Conclusion

We proposed an algorithm suitable for parallel computing environments to reduce the lattice basis and presented its results in the SVP Challenge.

Regarding the results in the SVP Challenge, we discovered solutions to problem instances of dimensions 134, 138, 140, 142, 144, 146, 148, and 150, a world record for the highest dimension. This is clear experimental evidence that our proposal is effective at reducing the lattice basis and solving the SVP by using a parallel computing environment.