4.1 Step 1: create less sensitive user indices and find neighbors of target user u* based on user indices
Historical service invocation records of
n service {
ws1, …,
wsn} by
m users {
u1, …,
um} can be represented by the matrix in (1), where
ri,j is a Boolean value indicating whether
ui has invoked
wsj in the past. Thus, each row vector (
ri,1, …,
ri,n) denotes the historical service invocation records of user
ui. As a service community often contains a large number of web services, i.e.,
n is large, vector (
ri,1, …,
ri,n) for user
ui is often high-dimensional and hence requires much computational time when (
ri,1, …,
ri,n) takes part in the subsequent service recommendation process. Therefore, to reduce the time cost, Simhash technique is employed to convert the high-dimensional vector (
ri,1, …,
ri,n) for
ui into a low-dimensional vector for
ui, i.e., (
Ri,1, …,
Ri,p) where
p =
\( \left\lceil {\log}_2^n\right\rceil \) holds.
$$ \left[\begin{array}{ccc}{r}_{1,1}& \dots & {r}_{1,n}\\ {}\vdots & \ddots & \vdots \\ {}{r}_{m,1}& \dots & {r}_{m,n}\end{array}\right] $$
(1)
Next, we introduce the concrete conversion process. Each of the n services {ws1, …, wsn} is recoded according to binary code (the number of 0/1 bits is equal to p). For example, ws1 = (0, 0, …, 0, 0, 1), ws2 = (0, 0, …, 0, 1, 0), ws3 = (0, 0, …, 0, 1, 1), and so on. Assume that ui has invoked n1 services (n1 ≤ n), then we pick these n1 services as well as their binary codes to form an n1*p matrix constituted by 0 and 1. For example, if there are totally 30 candidate web services (here, p = \( \left\lceil {\log}_2^{30}\right\rceil \) = 5) and ui has invoked ws1 and ws3, then we can derive a 2*5 0/1 matrix in (2). Next, we substitute “− 1” for the element “0” in (2). Thus, we can obtain another 2*5 matrix in (3) where each entry is either − 1 or 1.
ui: \( \left[\begin{array}{l}0\kern0.5em 0\kern0.5em 0\kern0.5em 0\kern0.5em 1\\ {}\begin{array}{ccc}0& 0& \begin{array}{ccc}0& 1& 1\end{array}\end{array}\end{array}\right] \) (2).
ui: \( \left[\begin{array}{l}-1\kern0.5em -1\kern0.5em -1\kern0.5em \begin{array}{cc}-1& 1\end{array}\\ {}\begin{array}{ccc}-1& -1& \begin{array}{ccc}-1&\ 1&\ 1\end{array}\end{array}\end{array}\right] \) (3).
For the − 1/1 matrix in (3), we calculate the sum of each column and then obtain a 5-dimensional vector H (ui) = (− 2, − 2, − 2, 0, 2). Afterward, in vector H (ui), we substitute “0” for the negative entries and substitute “1” for the positive entries. Then, we obtain a new 5-dimensional 0/1 vector (0, 0, 0, 0, 1), which can be considered as the index for user ui, denoted by h (ui). Here, index h (ui) has two advantages: first, h (ui) is less sensitive as it contains little even no private information of user ui; second, h (ui) is a low-dimensional vector (Ri,1, …, Ri,p) compared to the original high-dimensional vector (ri,1, …, ri,n) for user ui.
Next, with the user indices h (ui) (1 ≤ i ≤ m), we can look for the similar users (i.e., neighbors) of target user u*. Concretely, if index values h (ui) = h(u*) holds, then ui is deemed as a qualified neighbor of u* with high probability according to the Simhash theory.
4.2 Step 2: improved neighbor search for target user u* based on multi-probe Simhash
In Step 1, neighbors of target user u* can be discovered and returned for recommendation decision-makings based on Simhash technique. However, the neighbor search condition in Step 1, i.e., h (ui) = h(u*) cannot always work well as it is probably too loose or too tight in certain situations. Concretely, if the condition h (ui) = h(u*) is too loose, then too many neighbors of target user u* can be returned, which may reduce the recommendation accuracy to some extent; otherwise, if the condition h (ui) = h(u*) is too tight, then few (even null) neighbors of target user u* will be returned, which may decrease the recommendation feasibility. In other words, the traditional Simhash technique needs to be improved or modified to avoid the probably returned too many or too few (even null) neighbors of u*.
Next, we improve the traditional Simhash technique to be multi-probe Simhash. Concretely, if the neighbor search condition h (ui) = h(u*) is too loose, then we will tighten it; otherwise, if the neighbor search condition h (ui) = h(u*) is too tight, then we will loosen it to some extent.
4.2.1 Case 1: search condition relaxation
The neighbor search condition h (ui) = h(u*) introduced in Step 1 is probably too rigid or tight in certain situations and thereby finds too few (even null) neighbors of the target user u*. In this situation, we need to relax the too tight neighbor search condition h (ui) = h(u*) so that the number of returned neighbors of u* can exceed the pre-defined threshold P.
Next, we elaborate on the concrete condition relaxation process. Suppose h (ui) = (Ri,1, …, Ri,p) and h(u*) = (R*,1, …, R*,p), then h (ui) ⊕ h(u*) can be defined as in (4). Thus the original neighbor search condition that is too tight, i.e., h (ui) = h(u*) can be converted into another condition h (ui) ⊕ h(u*) = 0. Therefore, we can relax the neighbor search condition to be h (ui) ⊕ h(u*) = 1 or 2 or 3 or … or p, depending on the number of returned neighbors of u* according to the neighbor search condition. At last, the returned neighbors of u* are put into set Neig_Set.
h (ui) ⊕ h(u*).
= (Ri,1 ⊕ R*,1) + (Ri,2 ⊕ R*,2) + … + (Ri,p ⊕ R*,p) (4).
4.2.2 Case 2: search condition tightness
In Step 1, user index h (ui) is a super simplification (i.e., coarse-grained expression) of the historical service invocation records of user ui, e.g., h (ui) = (0, 0, 0, 0, 1) holds in the example of Step 1. While coarse-grained h (ui) may lead to too loose search condition (i.e., h (ui) = h(u*)) for the neighbors of target user u*. Considering this drawback, we use relatively fine-grained index for ui, i.e., H (ui) (in the example of Step 1, H (ui) = (− 2, − 2, − 2, 0, 2) holds) to replace coarse-grained h (ui) so as to tighten the search condition and produce fewer neighbors of target user u*.
Concretely, if H (ui) = H(u*) holds, we can reach a conclusion that ui and u* are similar users because H (ui) = H(u*) is a tighter neighbor search condition compared to the original condition h (ui) = h(u*). Therefore, through H (ui) = H(u*), we can expect to obtain fewer but more similar neighbors of u*. However, if condition H (ui) = H(u*) is too tight, then an appropriate relaxation is necessary. Concretely, we do not expect H (ui) = H(u*) (i.e., H (ui) ⊕ H(u*) = 0) but expect the result of xor operation H (ui) ⊕ H(u*) is close to 0. This way, we can relax the neighbor search condition if H (ui) = H(u*) is too tight. Concrete condition relaxation degree denoted by the value of H (ui) ⊕ H(u*) depends on the pre-defined threshold P of the number of u*‘s neighbors. At last, the returned neighbors of u* are put into set Neig_Set.
4.3 (3) Step 3: recommend new services to target user u* through returned neighbors in Neig_Set
For each user ui in Neig_Set, if he or she has invoked candidate service wsj (1 ≤ j ≤ n) in the past, i.e., ri,j = 1, then ui is put into a new set Neig_Set*; furthermore, wsj’s historical quality value by ui (denoted by qi,j) can be used to predict the missing quality value of wsj by the target user u* (denoted by q*,j), based on the prediction equation in (5), where | Neig_Set* | is the size of set Neig_Set*.
q*,j = \( \frac{1}{\mid \mathrm{Neig}\_{\mathrm{Set}}^{\ast}\mid}\ast \sum \limits_{u_i\in \mathrm{Neig}\_\mathrm{Set}\ast }{q}_{i,j} \) (5).
Thus, for each candidate service wsj (1 ≤ j ≤ n) that has never been executed by the target user u*, its missing quality value invoked by u*, i.e., q*,j can be predicted by (5). Finally, we select one candidate service with the optimal predicted value q*,j and recommend it to u*. This is the end of our suggested recommendation method RecMPS.