In this section, we introduce the details of our proposed privacy-preserving service recommendation approach, i.e.,
RecLSH*. Concretely, in Section
3.1, we briefly introduce the rationale of LSH to be used in our approach; in Section
3.2, we first improve the traditional LSH technique to make it applicable to the service quality with a big range, and then search for the neighbors of a target user based on the improved LSH technique; finally, in Section
3.3, we make service recommendations based on the target user’s neighbors derived in Section
3.2.
3.2 Improved LSH-based neighbor search based on service quality with a big range
As discussed in Section
1, the traditional LSH technique cannot guarantee high accuracy when the private service quality data vary in a big range. The reason is that LSH is essentially a probability-based neighbor search technique and the probability of successful neighbor search decreases when the service quality data for recommendations fluctuate significantly. In view of the above reason analyses, we first transform the service quality with a big range into a corresponding service quality with a small range through linear mappings; afterwards, we utilize the transformed service quality with a small range to make similar neighbor search based on LSH. Next, we introduce the concrete processes of service quality transformation and similar neighbor search.
To facilitate the further discussions, we only consider one service quality dimension with a big range, denoted by
q, instead of multiple dimensions in many application domains [
18‐
27]; for a web service
ws, its real quality over
q is denoted by
ws.q; the minimal and maximal quality values of
q of all the candidate services are denoted by MIN
q and MAX
q, respectively (thus a big range [MIN
q, MAX
q] can be obtained through statistic offline), while
ws’s quality over
q after transformation is denoted by
T(
ws.q). Next, we introduce how to transform the real service quality
ws.q (∈[MIN
q, MAX
q]) into its corresponding value
T(
ws.q) (∈[0, 1]) (here, we select the small range [0, 1] due to its popularity in the data normalization applications).
Concretely, if
q is a positive quality dimension (the larger the better, e.g.,
throughput), then the normalization process can be finished by (
1) where
ws.q∈[
MINq,
MAXq] holds. While if
q is a negative quality dimension (the smaller the better, e.g.,
response time), then the normalization process can be finished by (
2) where
ws.q∈[
MINq,
MAXq] holds. Thus, through the transformation equations in (
1) and (
2), we can obtain
T(
ws.q) whose value falls into the range [0, 1]. Then according to the
T(
ws.q) values of all candidate services invoked by all users in the past, we can make similar neighbor search for a target user (denoted by
utarget) based on the LSH technique, in an efficient and privacy-preserving way. Next, we introduce the concrete process of LSH-based neighbor search.
$$ T(ws.q)=\left\{\begin{array}{l}\frac{ws.q-{\operatorname{MIN}}_q}{{\operatorname{MAX}}_q-{\operatorname{MIN}}_q}\kern1.25em \mathrm{if}\ {\operatorname{MAX}}_q\ne {\operatorname{MIN}}_q\\ {}1\kern7em \mathrm{if}\ {\operatorname{MAX}}_q={\operatorname{MIN}}_q\end{array}\right. $$
(1)
$$ T(ws.q)=\left\{\begin{array}{l}\frac{{\operatorname{MAX}}_q- ws.q}{{\operatorname{MAX}}_q-{\operatorname{MIN}}_q}\kern1.25em \mathrm{if}\ {\operatorname{MAX}}_q\ne {\operatorname{MIN}}_q\\ {}1\kern7em \mathrm{if}\ {\operatorname{MAX}}_q={\operatorname{MIN}}_q\end{array}\right. $$
(2)
Suppose there are
n candidate services {
ws1, …,
wsn} and
m users {
u1, …,
um}. Then for each user
ui (1 ≤
i ≤
m), the quality of his/her invoked services can be denoted by the
n-dimensional vector
\( \overrightarrow{u_i} \) = (
T(
ws1.q), …,
T(
wsn.q)). Specifically,
T(
wsj.q) = 0 (1 ≤
j ≤
n) if user
ui has never invoked service
wsj before. Afterwards, we can transform the sensitive vector
\( \overrightarrow{u_i} \) into its corresponding hash value
h(
ui) with little privacy, based on the LSH function
h(.) in (
3). Here,
\( \overrightarrow{v} \) is an
n-dimensional vector (
v1, …,
vn) where
vj is a random value in the range [− 1, 1]; symbol “○” denotes the dot product of two vectors.
$$ h\left({u}_i\right)=\left\{\begin{array}{l}1\kern1.5em \mathrm{if}\kern0.5em \overrightarrow{u_i}\circ \overrightarrow{v}>0\\ {}0\kern1.25em \mathrm{if}\kern0.5em \overrightarrow{u_i}\circ \overrightarrow{v}\le 0\end{array}\right. $$
(3)
We repeat the above hash process (for user ui) r times based on the randomly selected r hash functions {h1(.), …, hr(.)}. Afterwards, an r-dimensional vector H(ui) = (h1(ui), …, hr(ui)) is obtained, which can be regarded as the index for user ui according to the traditional LSH theory. Then all the users and their respective indices form a hash table (denoted by Hash_Tb). With the derived hash table, we can make quick neighbor search for the target user utarget. Concretely, if the indices of utarget and ui are equal in the hash table Hash_Tb, i.e., H(ui) = H(utarget), then we can conclude that ui is a neighbor of utarget.
However, LSH is a probability-based neighbor search technique; therefore, one hash table is often not enough to guarantee to find out all the similar neighbors of a target user. In other words, “false-negative” (i.e., similar neighbors of the target user are overlooked by mistake) search results are possible. Therefore, in order to reduce the “false-negative” probability, multiple hash tables are often needed. Concretely, we repeat the abovementioned hash table building process L times to get L hash tables {Hash_Tb1, …, Hash_TbL}. Then we relax the neighbor search condition in the last paragraph as follows: if H(ui) = H(utarget) holds in any hash table Hash_Tbx (1 ≤ x ≤ L), then ui can be regarded as a neighbor of utarget. This way, we can obtain more neighbors (denoted by set Neighbor_set) of the target user.
Here, we utilize the user index H(ui) (1 ≤ i ≤ m), instead of the transformed service quality vector \( \overrightarrow{u_i} \), to make similar neighbor search due to the following two reasons. First of all, H(ui) contains less private information of users compared to \( \overrightarrow{u_i} \). Second, the L hash tables can be built offline; therefore, the neighbor search process based on H(ui) and H(utarget) would be finished quickly, which can accelerate the speed of subsequent recommendation process significantly. Therefore, with the help of user indices, we can search for the neighbors of a target user in a privacy-preserving and efficient way. Next, we introduce how to predict the missing service quality and make appropriate service recommendations based on the derived neighbors in Neighbor_set.
3.3 Service recommendation
For each service
wsj (1 ≤
j ≤
n) never invoked by
utarget, its missing service quality (after quality transformation) over dimension
q, i.e.,
T(
wsj.q), can be predicted by the equation in (
4). Here, set
Neighbor_setj includes all the neighbors of
utarget who have ever invoked service
wsj before;
Tu(
wsj.q) denotes the quality over dimension
q of service
wsj invoked by user
u.
$$ T\left({ws}_j.q\right)={\frac{1}{\mid Neighbor\_{set}_j\mid}}^{\ast}\sum \limits_{u\in Neighbor\_{set}_j}{T}_u\left({ws}_j.q\right) $$
(4)
With the predicted service quality
T(
wsj.q) after quality transformation, we can reversely calculate the predicted service quality before quality transformation, i.e.,
wsj.q, by (
5) (for those positive quality dimensions) and (
6) (for those negative quality dimensions). These two equations are deduced from (
1) and (
2), respectively. Then for each service never invoked by
utarget, its missing service quality can be predicted by (
5) and (
6). Finally, we select the service with the optimal predicted quality and recommend it to
utarget.
$$ {ws}_j.q=T{\left({ws}_j.q\right)}^{\ast }\ \left({\operatorname{MAX}}_q-{\operatorname{MIN}}_q\right)+{\operatorname{MIN}}_q $$
(5)
$$ {ws}_j.q={\operatorname{MAX}}_q-T{\left({ws}_j.q\right)}^{\ast }\ \left({\operatorname{MAX}}_q-{\operatorname{MIN}}_q\right) $$
(6)