1 Introduction
-
We reinterpret similarity as a conditional probability.
-
We introduce the Bayesian credible interval lower bound to incorporate uncertainties.
-
We decouple popularity from personal taste by using Bayesian inference.
2 Related results
3 Background
4 Our new algorithm in the probabilistic interpretation of nearest neighbors
-
We interpret similarity as a conditional probability and propose using Bayesian credible intervals to select neighbors.
-
We present a plausible explanation for the neighbor ensemble prediction formula of the classic KNN method.
-
We explain the presence of popularity bias in the predictions and develop a formula to mitigate its effects by using Bayesian inference to acquire an estimate of popularity-free personal taste. Our Bayesian approach is reminiscent of Latent Dirichlet Allocation, however, with a simpler distribution (binomial instead of multinomial) but with a more complex inference that involves an operation between the random variables expressing item popularity and user personal taste.
-
In Sect. 4.1, we replace the naive formulation of the observed conditional probability of a new interaction given a past one by a Bayesian inference using Beta priors.
-
In the same section, we introduce Bayesian credible intervals to handle the uncertainties if too few interactions are available in the data.
-
In Sect. 4.3, we also involve negative events (items not consumed by the user) in our formulation.
-
The decoupled components of the interaction are handled separately in the next two subsections. In Sect. 4.5, we introduce a variant of the estimated or maximum a posteriori estimates to handle the taste component and mathematically prove its properties.
-
In Sect. 4.6, we complete our formula by adding the popularity component as a Bernoulli variable.
u | User |
i, j | Items |
\(t_i\) | Positive feedback count on item i |
\(z_{u,i}\) | Observed interactions between users and items |
\(Z_{u,i}\) | The Bernoulli variables that we assume \(z_{u,i}\) are sampled from |
\(X_{u,i}\), \(Y_{u,i}\) | The user specific and global Bernoulli variable components of \(Z_{u,i}\), respectively |
\(x_{u,i}\), \(y_{u,i}\) | Parameters of \(X_{u,i}\) and \(Y_{u,i}\) |
n(i) | Set of k selected neighbors for item i |
\(n^+_u(i)\) | Set of k selected neighbors for item i for which \(z_{u,j}=1\) |
\(Z_{u,j}\mathrel {\!\twoheadrightarrow \!}Z_{u,j}\) | Assumed causal events where consuming j signals an independent reason for u to consume i |
\(|U|, |I|\) | Number of users and items |
q | Confidence value for the credible interval in Eq. (8) |
c | Popularity scaling coefficient in Eq. (22) |
\(\alpha \) | Weight of popularity after recombining with the popularity-free decoupled estimate in Eq. (41) |
4.1 Conditional probability as similarity
4.2 Ensemble prediction
4.3 Predicting from negative events
4.4 Inverse frequency scaling
4.5 Estimating \(P(X_{u,i} \mid Z_{u,j}=1)\)
4.6 Reintroducing popularity \(Y_i\)
4.7 Summary
5 Experiments
5.1 Datasets and evaluation
Users | Items | Interact. | Density | Valid./test size | |
---|---|---|---|---|---|
A. Books | 56257 | 50154 | 1418076 | 0.00050 | 0.05%/0.05% |
A. Video | 3113 | 5860 | 22184 | 0.00122 | 0.17%/0.19% |
Hetrec | 2044 | 5351 | 71680 | 0.00336 | 0.16%/0.20% |
ML 1M | 6013 | 3231 | 225473 | 0.00962 | 0.08%/0.20% |
Epinions | 40163 | 139738 | 664823 | 0.00012 | 0.05%/0.06% |
-
Amazon Instant Video [62]: ratings from Amazon.com. The records have been transformed by transforming a rating into an interaction regardless of its value. Users with fewer than 5 ratings were removed. The training-test and then the training-validation sets are split randomly on a per-user basis with given percentages (20% and 20%).
-
MovieLens 1M [63]: a widely used dataset of movie ratings. Implicit transformation was done the same way as above. The training-test and then the training-validation sets are split randomly on a per-user basis with given percentages (20% and 10%).
-
HetRec [64]: a dataset released by the Second International Workshop on Information Heterogeneity and Fusion in Recommender Systems. Implicit transformation was done the same way as above. The training-test and then the training-validation sets are split randomly on a per-user basis with given percentages (20% and 20%).
-
Epinions [65]: This dataset is collected from Epinions.com website, which provides an online service for users to share product feedback. Implicit transformation was done the same way as above. The validation and test sets were each created by leaving one interaction out of the training set randomly for each user.
-
Amazon Books [66]: book ratings from Amazon.com spanning May 1996–July 2014. To create an implicit dataset, we filtered for ratings of 5 and extracted the 10-core [67] of the remaining graph, resulting in a dataset that is still one of the most sparse of the ones used. The training-test-validation sets are created by randomly assigning every record with the given probabilities to each set (0.9, 0.05 and 0.05).
5.2 Baselines
-
Item KNN: The method described in Sect. 3 uses the presence of other items in the user interaction history to predict the likelihood of interacting with an item. Multiple different similarity measures can be used for selecting similar items [15], as well as the shrink and the TF-IDF or BM25 weighting heuristics [23].
-
iALS: Matrix factorization for implicit feedback datasets [72].
-
\(\text {{\textbf {EASE}}}^R\): A shallow autoencoder-like model, with a closed form solution for the training objective [75].
5.3 Inverse frequency scaling
5.4 The ensemble formula
NDCG@50 | Recall@50 | |||||
---|---|---|---|---|---|---|
“or” | “sum” | ratio | “or” | “sum” | ratio | |
Epinions | 0.040 | 0.040 | 1.004 | 0.109 | 0.109 | 1.002 |
Amazon Books | 0.159 | 0.156 | 1.019 | 0.333 | 0.329 | 1.010 |
Amazon Video | 0.191 | 0.190 | 1.003 | 0.391 | 0.391 | 1.000 |
MovieLens 1M | 0.270 | 0.271 | 0.997 | 0.439 | 0.440 | 0.997 |
Hetrec | 0.220 | 0.219 | 1.005 | 0.354 | 0.356 | 0.996 |
5.5 Negative predictors
5.6 Neighborhood size and popularity bias
Amazon Books | Amazon Video | Epinions | Hetrec | MovieLens 1M | ||||||
---|---|---|---|---|---|---|---|---|---|---|
N@50 | R@50 | N@50 | R@50 | N@50 | R@50 | N@50 | R@50 | N@50 | R@50 | |
Item KNN cosine | 0.1785 | 0.3565 | 0.1822 |
0.3845
| 0.0394 | 0.1044 | 0.2255 | 0.3701 | 0.2724 | 0.4464 |
Item KNN dice | 0.1785 | 0.3528 | 0.1912 |
0.3818
| 0.0389 | 0.1046 | 0.2128 | 0.3600 | 0.2569 | 0.4209 |
Item KNN jaccard | 0.1791 | 0.3539 | 0.1906 |
0.3819
| 0.0388 | 0.1045 | 0.2133 | 0.3499 | 0.2567 | 0.4189 |
Item KNN asymmetric | 0.1809 | 0.3476 | 0.1966 |
0.3987
| 0.0427 | 0.1136 | 0.2245 | 0.3749 | 0.2705 | 0.4436 |
Item KNN tversky | 0.1820 | 0.3526 | 0.1915 | 0.3808 | 0.0397 | 0.1068 | 0.2156 | 0.3495 | 0.2611 | 0.4253 |
Our variant |
0.1842
|
0.3620
|
0.2022
|
0.3813
|
0.0451
|
0.1231
|
0.2368
|
0.3886
|
0.2812
|
0.4564
|
IALS | 0.1287 | 0.2944 | 0.1857 | 0.3766 | 0.0422 | 0.1145 | 0.2360 |
0.3990
| 0.2672 | 0.4444 |
P3\(\alpha \) | 0.1812 |
0.3670
| 0.1929 |
0.3979
| 0.0410 | 0.1105 | 0.2235 | 0.3863 | 0.2747 | 0.4564 |
RP3\(\beta \) |
0.1864
| 0.3605 | 0.1835 |
0.3887
| 0.0398 | 0.1051 | 0.2255 | 0.3878 |
0.2828
|
0.4658
|
\(\text {EASE}^R\)
| 0.1515 | 0.3192 | 0.1815 | 0.3785 | Nan | Nan | 0.2115 | 0.3511 | 0.2787 | 0.4517 |
SLIMElasticNet |
0.1935
|
0.3812
| 0.2008 | 0.3812 | 0.0444 | 0.1160 |
0.2399
| 0.3886 |
0.2927
|
0.4654
|
MultVAE | 0.1097 | 0.2474 | 0.1512 | 0.3468 | 0.0403 | 0.1159 | 0.2110 | 0.3773 | 0.2806 |
0.4724
|
Amazon Books | Amazon Video | Epinions | Hetrec | MovieLens 1M | ||||||
---|---|---|---|---|---|---|---|---|---|---|
N@20 | R@20 | N@20 | R@20 | N@20 | R@20 | N@20 | R@20 | N@20 | R@20 | |
Item KNN cosine | 0.1571 | 0.2680 | 0.1617 | 0.2879 | 0.0323 | 0.0688 | 0.1858 | 0.2481 | 0.2187 | 0.2929 |
Item KNN dice | 0.1586 | 0.2714 | 0.1738 |
0.3011
| 0.0318 | 0.0690 | 0.1725 | 0.2365 | 0.2051 | 0.2724 |
Item KNN jaccard | 0.1591 | 0.2721 | 0.1732 |
0.3006
| 0.0318 | 0.0690 | 0.1763 | 0.2379 | 0.2056 | 0.2733 |
Item KNN asymmetric | 0.1632 | 0.2765 | 0.1782 |
0.3130
| 0.0351 | 0.0753 | 0.1827 | 0.2478 | 0.2168 | 0.2888 |
Item KNN tversky | 0.1635 |
0.2780
| 0.1741 |
0.3001
| 0.0325 | 0.0702 | 0.1777 | 0.2373 | 0.2082 | 0.2759 |
Our variant |
0.1635
|
0.2772
|
0.1843
|
0.2972
|
0.0368
|
0.0812
|
0.1951
|
0.2610
|
0.2260
|
0.3012
|
IALS | 0.1054 | 0.1954 | 0.1680 | 0.2932 | 0.0342 | 0.0743 | 0.1905 |
0.2620
| 0.2112 | 0.2847 |
P3\(\alpha \) | 0.1586 | 0.2731 | 0.1744 |
0.3112
| 0.0336 | 0.0731 | 0.1792 | 0.2511 | 0.2177 | 0.2940 |
RP3\(\beta \) |
0.1672
|
0.2823
| 0.1637 | 0.2956 | 0.0327 | 0.0689 | 0.1803 | 0.2495 | 0.2247 | 0.3006 |
\(\text {EASE}^R\)
| 0.1300 | 0.2289 | 0.1625 | 0.2896 | Nan | Nan | 0.1719 | 0.2291 | 0.2228 | 0.2918 |
SLIMElasticNet |
0.1713
|
0.2899
|
0.1845
|
0.3051
|
0.0369
| 0.0782 |
0.1983
|
0.2639
|
0.2374
|
0.3091
|
MultVAE | 0.0909 | 0.1672 | 0.1306 | 0.2498 | 0.0318 | 0.0729 | 0.1685 | 0.2449 | 0.2230 |
0.3087
|
5.7 Predictive performance
Amazon Books | Amazon Video | Epinions | Hetrec | MovieLens 1M | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Gini | Entropy | Gini | Entropy | Gini | Entropy | Gini | Entropy | Gini | Entropy | |
Item KNN cosine | 0.4346 | 14.6999 | 0.1233 | 9.3946 | 0.1209 | 12.4611 | 0.0139 | 7.3247 | 0.0538 | 8.0575 |
Item KNN dice | 0.5078 | 14.7845 | 0.1785 | 9.9468 | 0.0479 | 10.9559 | 0.0165 | 7.4539 | 0.0498 | 7.8765 |
Item KNN jaccard | 0.5000 | 14.7828 | 0.1776 | 9.9330 | 0.0478 | 10.9496 | 0.0142 | 7.2364 | 0.0498 | 7.8494 |
Item KNN asymmetric | 0.4686 | 14.5611 | 0.1524 | 9.7078 | 0.0566 | 11.5312 | 0.0158 | 7.4184 | 0.0614 | 8.0926 |
Item KNN tversky | 0.4761 | 14.6544 | 0.1789 | 9.9533 | 0.0530 | 11.1388 | 0.0160 | 7.3258 | 0.0530 | 8.1142 |
Our variant | 0.4431 | 14.6924 | 0.1451 | 9.3348 | 0.0382 | 10.3138 | 0.0139 | 7.5387 | 0.0616 | 8.3081 |
IALS | 0.1472 | 13.1727 | 0.1299 | 9.9664 | 0.0118 | 10.8564 | 0.0208 | 8.1445 | 0.0538 | 8.1307 |
P3\(\alpha \) | 0.3073 | 14.1947 | 0.1491 | 9.6472 | 0.0585 | 11.6387 | 0.0147 | 7.5004 | 0.0507 | 8.0022 |
RP3\(\beta \) | 0.4658 | 14.8078 | 0.1345 | 9.4838 | 0.1446 | 12.0390 | 0.0235 | 7.8296 | 0.0668 | 8.2576 |
\(\text {EASE}^R\) | 0.1577 | 13.0396 | 0.1686 | 9.7731 | Nan | Nan | 0.0086 | 6.7291 | 0.0529 | 8.1099 |
SLIMElasticNet | 0.2938 | 14.0509 | 0.1893 | 10.2678 | 0.0344 | 11.3497 | 0.0178 | 7.8192 | 0.0638 | 8.3519 |
MultVAE | 0.3637 | 14.3242 | 0.2189 | 10.3845 | 0.0222 | 10.7044 | 0.0250 | 8.2452 | 0.0941 | 8.8399 |
5.8 Recommendation diversity
5.9 Hyperparameters
k | \(b_1\) | \(b_2\) | c | \(\alpha \) | q | |
---|---|---|---|---|---|---|
Amazon Books | 10 | 11 | 46 | 2.63 | 0.00 | 0.00100 |
Amazon Video | 100 | 11 | 21 | 2.68 | 0.00 | 0.00886 |
Epinions | 757 | 14 | 242 | 4.81 | 0.17 | 0.09501 |
Hetrec | 85 | 45 | 4 | 1.20 | 0.21 | 0.00016 |
MovieLens 1M | 100 | 7 | 11 | 0.95 | 0.11 | 0.00070 |