1 Introduction
1.1 Contribution
-
A novel approach to sample from a Bernoulli distribution as well as an integer Gaussian sampler using multilevel polar samplers are developed. Using a binary partition tree, we recursively partition \(\mathbb {Z}\) into 2 cosets, 4 cosets, and so on. The number of partitions is only logarithmic in s. Each partition gives rise to a binary source, which is produced by one polar sampler. The advantage of this multilevel sampling approach is that only Bernoulli samples are needed, which allows simpler implementation than sampling over the whole integer domain.
-
Analysis of approximation errors. Although multilevel polar samplers would produce the desired distribution \(D_{\mathbb {Z}^N,c,s}\), it is not exactly so. This is because the polar sampler converts N i.i.d. Bernoullis into N polarized and unpolarized Bernoullis. We approximate the polarized ones using either unbiased or deterministic Bernoullis which will only yield an approximate version of the desired distribution. We derive upper bounds on the closeness between the target discrete Gaussian and its approximation measured by KL divergence.
-
Security analysis. To achieve a certain security level in a standard cryptographic scheme with oracle access to a discrete Gaussian distribution, the principle of setting the parameters of our polar sampler is also discussed based on KL divergence. In cryptographic applications where the number of queries q to the Gaussian sampler is limited (e.g., \(q\le 2^{64}\) in the NIST specifications of signatures), using Rényi divergence can yield considerable savings according to previous work of [5, 32]. We also apply Rényi divergence to improve the parameter selection of polar sampler.
-
Information theoretic optimality. Asymptotically, the multilevel polar sampler achieves the entropy bound of the discrete Gaussian distribution. This implies that it requires minimum resources of random bits to produce the desired distribution.
-
Quasi-linear complexity. The proposed Gaussian sampling approach enjoys low complexity. The design of a polar sampler can be done at the offline stage, that is, given a target distribution, it is done once and for all. The online stage of a polar sampler computes certain posterior probabilities which can be implemented in \(O(N\log N)\) complexity.1 We also give experimental and asymptotic comparison between our DGS approach and other existing samplers including Knuth–Yao sampling, binary sampling and CDT sampling. The prominent advantage of multilevel polar sampler is the entropy consumption which indicates the cost of randomness. The overall running time depends on both SC decoding(computing LRs) and Bernoulli sampling. Compared with the binary sampling [11], polar sampler has higher computational complexity but it asymptotically and effectively reduces the entropy consumption. We also illustrate in experiments that the multilevel polar sampler is faster than Knuth–Yao sampling.
-
Constant-time implementation. The proposed discrete Gaussian sampler is constant-time in the sense that the running time is independent of the output values. This makes our sampler attractive when dealing with timing side-channel attacks. From a perspective of coding theory, polar sampler is constant-time because polar codes have a fixed code length which compares favorably with other source coding techniques (e.g. Huffman coding).
1.2 Roadmap
2 Bernoulli sampling using polar codes
2.1 Notation
2.2 Source polarization
2.3 From source coding to Bernoulli sampling
2.3.1 Interfaces
2.3.2 Key operations
2.3.3 Subroutines
2.4 Closeness analysis of polar sampler
3 Gaussian sampling over the integers using polar sampler
4 Closeness analysis of discrete Gaussian sampling
4.1 The approximation error model
4.2 Approximation error from tailcut
4.3 Approximation error from polar sampling
5 Security analysis and parameter selection
5.1 Security analysis with KL divergence
5.2 Security analysis of tailcut and precision with Rényi divergence
6 Complexity and comparison
6.1 A constant-time algorithm
Operations | Positions in Algorithm 1 | Bernoulli biases | Output samples | |
---|---|---|---|---|
\(\mathcal {H}_{X|Y},\mathcal {L}_{X|Y}\) | Precomputed table \(\textbf{c}\) | |||
Table lookup | Line 2 | \(\times \) | \(\times \) | \(\times \) |
Recursively calculating LRs | Line 4 | \(\times \) | \(\checkmark \) | \(\times \) |
Division-free alternative of LR* calculations | Line 4 | \(\times \) | \(\times \) | \(\times \) |
Probabilistic/deterministic sampling | Line 6,9,12 | \(\checkmark \) | \(\times \) | \(\times \) |
\(x^{1:N}=u^{1:N}G_N\) | Line 14 | \(\times \) | \(\times \) | \(\times \) |
-
Otherwise, the running time of LR calculations might be somehow relevant to \({{\textbf {c}}}\). Recall it in Fig. 3 that the intermediate LRs are deduced from the rightmost column of LRs which are derived by \({{\textbf {c}}}[y^{1:N}]\). It might take longer/shorter to finish an LR calculation given the two LRs of some specific values from last round of recursion. However, instead of observing a single LR calculation, an adversary has to guess \({{\textbf {c}}}\) given the total running time of \(N\log N\) LR calculations which may be impractical. Furthermore, the shuffled circuit makes the timing-based cryptanalysis even harder. As for the sensitive output, they are irrelevant to the running time taken to finish all the LR calculations. Only the N LR results in register \(\text {LRReg}[:][n]\) (i.e. the leftmost column LRs in Fig. 3) derived by the end of SC decoding and the succeeding deterministic/probabilistic sampling in line 6, 9 and 12 of Algorithm 1 determine the output samples, while how long it takes to calculate the intermediate LRs does not.
6.2 Time complexity
\(2^r\) | s | Knuth–Yao (samples/s) | GaussianSampling\((\cdot )\) (samples/s) | |||
---|---|---|---|---|---|---|
Constant-time | Non constant-time | \(N=2^{13}\) | \(N=2^{14}\) | \(N=2^{15}\) | ||
\(2^6\) | 3\(\sqrt{2\pi }\) | 2.809E5/s | 3.876E5/s | \(\beta =0.487\), 1.333E6/s | \(\beta =0.4535\), 1.283E6/s | \(\beta =0.4244\), 1.168E6/s |
\(2^7\) | 8\(\sqrt{2\pi }\) | 1.172E5/s | 2.212E5/s | \(\beta =0.4876\), 1.194E6/s | \(\beta =0.454\), 1.097E6/s | \(\beta =0.425\), 1.010E6/s |
\(2^9\) | 32\(\sqrt{2\pi }\) | 3.255E4/s | 6.017E4/s | \(\beta =0.488\), 0.960E6/s | \(\beta =0.4544\), 0.861E6/s | \(\beta =0.4252\), 0.792E6/s |
\(2^{12}\) | 256\(\sqrt{2\pi }\) | 4.464E3/s | 6.760E3/s | \(\beta =0.4885\), 0.680E6/s | \(\beta =0.455\), 0.621E6/s | \(\beta =0.4257\), 0.572E6/s |
6.3 Memory cost
6.4 Asymptotic comparison
Schemes | Computational complexity/sample | Entropy consumption/sample | storage |
---|---|---|---|
Multilevel polar sampler | \(O(t \times s \times \log N)\) floating point\(^\textrm{a}\) |
\(O(\lambda \times N^{-\frac{1}{\mu }}+H/N)\rightarrow H(X)\)
|
\(O(\lambda \times t \times s)\)
|
Binary sampling [11] | \(O(\log (s))\) table lookup |
\(\approx 6+3 \times \log (s)\)
|
\(O(\lambda \times \log (2.4 \times t \times s^2))\)
|
Constant-time CDT [16] | \(O(\log (t \times s))\) table lookup |
\(\lambda \)
|
\(O(\lambda \times t \times s)\)
|
variant of Knuth–Yao [19] | \(O(\log (t \times s))\) Boolean functions |
\(\lambda \)
| \(O(\log (t \times s))\) Boolean functions |