The main goal of the aggregator is to estimate the frequency of the items without disclosing the privacy of the users. Therefore, we adopt the square error (SE) as the metric to evaluate the estimation. Without loss of generality, we assume there are
l attributes
\({\mathbf {a}}_1, {\mathbf {a}}_2, \ldots , {\mathbf {a}}_l\) and that the number of items for each attribute
\({\mathbf {a}}_i\) is
\(k_i\), that is,
\(|{\mathbf {a}}_i| = k_i\). The total number of items
\(d = k_1 + k_2 + \cdots + k_l\). We allocate budgets
\(\{\epsilon _1, \epsilon _2, \ldots , \epsilon _l\}\) to the set of attributes
\(\{{\mathbf {a}}_1, {\mathbf {a}}_2, \ldots , {\mathbf {a}}_l\}\), respectively, and
\(\epsilon _1 + \epsilon _2 + \cdots + \epsilon _l = \frac{\epsilon }{2}\). Each user
\(u_i\) publishes a length-
d bit vector
\({\mathbf {h}}_i'\) obtained by perturbing the original bit vector
\({\mathbf {h}}_i\). The true histogram
\({\mathbf {H}} = \sum {\{{\mathbf {h}}_1, \ldots , {\mathbf {h}}_n\}}\). The sanitized histogram
\({\mathbf {H}}' = \sum {\{{\mathbf {h}}_1', \ldots , {\mathbf {h}}_n'\}}\). Let
\({\mathbf {H}}'' = \{{H}_{11}'', \ldots , {H}_{1k_1}'', {H}_{21}'', \ldots , {H}_{2k_2}'', \ldots , {H}_{lk_l}''\}\) denote the unbiased estimation of
\({\mathbf {H}}\); for each attribute, we have
\({{H}_{ij}''}p_i + (n - {H}_{ij}'')(1-p_i) = {H}_{ij}', i = 1, \ldots , l, j = 1, \ldots , k_i\), where
\(p_i = \frac{\exp (\epsilon _i)}{\exp (\epsilon _i) +1}\). Thus, we have:
$$\begin{aligned} {H}_{ij}'' = \frac{{H}_{ij}'(\exp (\epsilon _i) + 1) - n}{\exp (\epsilon _i) - 1} \end{aligned}$$
The SE is given as follows:
$$\begin{aligned} \begin{aligned} {{\mathbf {S}}}{{\mathbf {E}}}(\epsilon , l, d)&= E\left[ \sum _{i = 1}^{l}\sum _{j = 1}^{k_i}(H_{ij}''-H_{ij})^2\right] = \sum _{i = 1}^{l}\sum _{j = 1}^{k_i}E[(H_{ij}'' - H_{ij})^2]\\&= \sum _{i = 1}^{l}\sum _{j = 1}^{k_i}Var[H_{ij}''] = \sum _{i=1}^{l}\sum _{j = 1}^{k_i}\left( \frac{\exp (\epsilon _i) + 1}{\exp (\epsilon _i) - 1}\right) ^2Var[H_{ij}']\\&= \sum _{i = 1}^{l}\frac{nk_i\cdot \exp (\epsilon _i)}{(\exp (\epsilon _i)-1)^2} \end{aligned} \end{aligned}$$
where
\(H_{ij}'\) is the Bernoulli probability distribution, with the variance of
\(H_{ij}'\) being equal to
\(np_i(1-p_i)\). Our goal is given as:
$$\begin{aligned} \begin{aligned} L(\epsilon ) = \min _{\epsilon _1 + \cdots + \epsilon _l = \frac{\epsilon }{2}}\sum _{i = 1}^{l}\frac{nk_i\cdot \exp (\epsilon _i)}{(\exp (\epsilon _i)-1)^2} \end{aligned} \end{aligned}$$
To solve the optimization problem under restricted conditions, we employ the LM method to translate the conditional restrictions into unconditional constraints:
$$\begin{aligned} \begin{aligned} L(\epsilon , \lambda )&= \sum _{i = 1}^{l}\frac{nk_i\cdot \exp (\epsilon _i)}{(\exp (\epsilon _i)-1)^2} + \lambda \left( \epsilon _1 + \epsilon _2 + \cdots + \epsilon _l - \frac{\epsilon }{2}\right) \end{aligned} \end{aligned}$$
(5)
where
\(\epsilon _i \ge 0, i = 1, \ldots , l\). The task now is to obtain the minimum value of
\(L(\epsilon , \lambda )\). Since the second-order partial derivative
\(\frac{\partial ^{2}L(\epsilon , \lambda )}{\partial ^2{\epsilon _i}} = \frac{nk_i\exp (3\epsilon _i) + 4nk_i\exp (2\epsilon _i) + nk_i\exp (\epsilon _i)}{(\exp (\epsilon _i) - 1)^4} > 0, i = 1, 2, \ldots , l\),
\(L(\epsilon , \lambda )\) is strictly a convex function for the variable
\(\epsilon _i\), there must exist a minimum solution for
\(L(\epsilon , \lambda )\). For simplicity, let
\(x_i = \exp (\epsilon _i)\); then, the equation
\(L(\epsilon , \lambda )\) becomes:
$$\begin{aligned} \begin{aligned} L(x, \lambda )&= \sum _{i = 1}^{l}\frac{nk_ix_i}{(x_i-1)^2} + \lambda \left( x_1x_2\ldots x_l - \exp \left( \frac{\epsilon }{2}\right) \right) \end{aligned} \end{aligned}$$
(6)
where
\(x_i > 1, i = 1, 2, \ldots , l\). Its optimal solution is obtained by solving the following equations:
$$\left\{ {\begin{array}{ll} {\frac{{\partial L\left( {x,\lambda } \right)}}{{\partial x_{i} }} = - \frac{{nk_{i} \left( {x_{i} + 1} \right)}}{{\left( {x_{i} - 1} \right)^{3} }} + \lambda \frac{{x_{1} \ldots x_{l} }}{{x_{i} }} = 0} \\ {\frac{{\partial L\left( {x,\lambda } \right)}}{{\partial \lambda }} = x_{1} ,x_{2} \ldots x_{l} = \exp \left( {\frac{\varepsilon }{2}} \right) = 0} \\ \end{array} } \right.$$
(7)
where
\(i = 1, 2, \ldots , l\). We carry out a simple transformation of the equation to obtain:
$$\left\{ {\begin{array}{ll} {\lambda \exp \left( {\frac{\varepsilon }{2}} \right)\left( {x_{i} - 1} \right)^{3} - nk_{i} \left( {x_{i} + 1} \right)x_{i} = 0} \\ {x_{1} ,x_{2} \ldots x_{l} = \exp \left( {\frac{\varepsilon }{2}} \right)} \\ \end{array} } \right.$$
(8)
where
\(i = 1, 2, \ldots , l\). The above equation is related to the problem of solving the univariate cubic equation. There are a variety of methods for solving the univariate cubic equation. Here, we employ the Cardano formula (CF) to solve this problem. The univariate cubic equation in Eq. (
8) can be changed to:
$$\begin{aligned} \begin{aligned} \lambda \exp \left( \frac{\epsilon }{2}\right) x_i^3-\left( 3\lambda \exp \left( \frac{\epsilon }{2}\right) +nk_i\right) x_i^2+\left( 3\lambda \exp \left( \frac{\epsilon }{2}\right) -nk_i\right) x_i-\lambda \exp \left( \frac{\epsilon }{2}\right) = 0 \end{aligned} \end{aligned}$$
(9)
we let
$$\begin{aligned} a=\lambda \exp \left( \frac{\epsilon }{2}\right) ,b=-(3a+nk_i),c=3a-nk_i,d=-a \end{aligned}$$
Then, Eq. (
9) can be expressed as:
$$\begin{aligned} ax_i^3 + bx_i^2+cx_i+d = 0 \end{aligned}$$
(10)
To find the root of the equation, we let
\(x_i = y_i-\frac{b}{3a}\). Eq. (
10) can then be changed to:
$$\begin{aligned} y_i^3 + \left( \frac{c}{a}-\frac{b^2}{3a^2}\right) y_i + \left( \frac{d}{a} + \frac{2b^3}{27a^3} - \frac{bc}{3a^2}\right) = 0 \end{aligned}$$
(11)
We let
\(p = \frac{c}{a}-\frac{b^2}{3a^2}\) and
\(q = \frac{d}{a} + \frac{2b^3}{27a^3} - \frac{bc}{3a^2}\); thus, Eq. (
11) can be expressed as:
$$\begin{aligned} y_i^3 + py_i + q = 0 \end{aligned}$$
(12)
By using the CF method, we can obtain the root of Eq. (
12) as follows:
$$\begin{aligned} \begin{aligned} y_{i1}&= \root 3 \of {-\frac{q}{2}+\sqrt{\left( \frac{q}{2}\right) ^2+\left( \frac{p}{3}\right) ^3}} + \root 3 \of {-\frac{q}{2}-\sqrt{\left( \frac{q}{2}\right) ^2+\left( \frac{p}{3}\right) ^3}}\\ y_{i2}&= \omega \root 3 \of {-\frac{q}{2}+\sqrt{\left( \frac{q}{2}\right) ^2+\left( \frac{p}{3}\right) ^3}} + \omega ^2\root 3 \of {-\frac{q}{2}-\sqrt{\left( \frac{q}{2}\right) ^2+\left( \frac{p}{3}\right) ^3}}\\ y_{i3}&= \omega ^2\root 3 \of {-\frac{q}{2}+\sqrt{\left( \frac{q}{2}\right) ^2+\left( \frac{p}{3}\right) ^3}} + \omega \root 3 \of {-\frac{q}{2}-\sqrt{\left( \frac{q}{2}\right) ^2+\left( \frac{p}{3}\right) ^3}} \end{aligned} \end{aligned}$$
(13)
where
\(\omega = \frac{-1+\sqrt{3}i}{2}\). Thus, the roots of Eq. (
10) are obtained by solving
\(x_{ij} = y_{ij}-\frac{b}{3a}, j= 1,2,3\). We take only
\(x_{i1}\) as our final real root.
The finally obtained
l solutions
\({x_1, x_2,\ldots , x_l}\) are applied to equation
\(f(\lambda ) = x_1x_2\ldots x_l - \exp (\frac{\epsilon }{2}) = 0\). We can thus obtain a higher-order equation for
\(\lambda\). We employ the existed Newton-Raphson (NS) method to solve the problem of high degree with one unknown. The NS method first chooses an initial approximate value
\(\lambda _0\). At each iteration, let
\(\lambda _k\) be the initial value of the next iteration, which is given as:
$$\begin{aligned} \begin{aligned} \lambda _{k+1} = \lambda _k - \frac{f(\lambda _k)}{f'(\lambda _k)} \end{aligned} \end{aligned}$$
(14)
The NS method will produce an infinite sequence
\(\{\lambda _1, \lambda _2, \ldots \}\), which will converge to the true root of the function
\(f(\lambda )\). After obtaining the asymptotic answer
\(\lambda ^*\), we can obtain the value of
\(\{x_1, x_2, \ldots , x_l\}\). The privacy budget
\(\epsilon _i\) can be obtained by
\(\epsilon _i = \log {x_i}\),
\(i =1, \ldots , l\) for each attribute. To analyze the optimal answer
\(\{\epsilon _1, \epsilon _2, \ldots , \epsilon _l\}\), we can draw the following conclusions: