Top

Published in:

2019 | OriginalPaper | Chapter

Collaborative Thompson Sampling

Authors : Zhenyu Zhu, Liusheng Huang, Hongli Xu

Published in: Collaborative Computing: Networking, Applications and Worksharing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Thompson sampling is one of the most effective strategies to balance exploration-exploitation trade-off. It has been applied in a variety of domains and achieved remarkable success. Thompson sampling makes decisions in a noisy but stationary environment by accumulating uncertain information over time to improve prediction accuracy. In highly dynamic domains, however, the environment undergoes frequent and unpredictable changes. Making decisions in such an environment should rely on current information. Therefore, standard Thompson sampling may perform poorly in these domains. Here we present a collaborative Thompson sampling algorithm to apply the exploration-exploitation strategy to highly dynamic settings. The algorithm takes collaborative effects into account by dynamically clustering users into groups, and the feedback of all users in the same group will help to estimate the expected reward in the current context to find the optimal choice. Incorporating collaborative effects into Thompson sampling allows to capture real-time changes of the environment and adjust decision making strategy accordingly. We compare our algorithm with standard Thompson sampling algorithms on two real-world datasets. Our algorithm shows accelerated convergence and improved prediction performance in collaborative environments. We also provide a regret analysis of our algorithm on a non-contextual model.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Meta-Path and Matrix Factorization Based Shilling Detection for Collaborate Filtering

next chapter Collaborative Workflow Scheduling over MANET, a User Position Prediction-Based Approach

Agarwal, D., Long, B., Traupman, J., Xin, D., Zhang, L.: Laser: a scalable response prediction platform for online advertising. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pp. 173–182. ACM (2014)

Agrawal, S., Goyal, N.: Analysis of Thompson sampling for the multi-armed bandit problem. In: Conference on Learning Theory, pp. 39.1–39.26 (2012)

Agrawal, S., Goyal, N.: Thompson sampling for contextual bandits with linear payoffs. In: International Conference on Machine Learning, pp. 127–135 (2013)

Banerjee, A.: On Bayesian bounds. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 81–88. ACM (2006)

Bresler, G., Chen, G.H., Shah, D.: A latent source model for online collaborative filtering. In: Advances in Neural Information Processing Systems, pp. 3347–3355 (2014)

Brodén, B., Hammar, M., Nilsson, B.J., Paraschakis, D.: Ensemble recommendations via Thompson sampling: an experimental study within e-Commerce. In: 23rd International Conference on Intelligent User Interfaces, pp. 19–29. ACM (2018)

Chapelle, O., Li, L.: An empirical evaluation of Thompson sampling. In: Advances in Neural Information Processing Systems, pp. 2249–2257 (2011)

Christakopoulou, K., Banerjee, A.: Learning to interact with users: a collaborative-bandit approach. In: Proceedings of the 2018 SIAM International Conference on Data Mining, pp. 612–620. SIAM (2018)CrossRef

Chu, W., Li, L., Reyzin, L., Schapire, R.: Contextual bandits with linear payoff functions. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 208–214 (2011)

10.

Chu, W., et al.: A case study of behavior-driven conjoint analysis on Yahoo!: front page today module. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1097–1104. ACM (2009)

11.

Ferreira, K., Simchi-Levi, D., Wang, H.: Online network revenue management using Thompson sampling (2017)

12.

Glaze, C.M., Filipowicz, A.L., Kable, J.W., Balasubramanian, V., Gold, J.I.: A bias-variance trade-off governs individual differences in on-line learning in an unpredictable environment. Nat. Hum. Behav. 2(3), 213 (2018)CrossRef

13.

Gopalan, A., Mannor, S.: Thompson sampling for learning parameterized Markov decision processes. In: Conference on Learning Theory, pp. 861–898 (2015)

14.

Gopalan, A., Mannor, S., Mansour, Y.: Thompson sampling for complex online problems. In: International Conference on Machine Learning, pp. 100–108 (2014)

15.

Graepel, T., Candela, J.Q., Borchert, T., Herbrich, R.: Web-scale Bayesian click-through rate prediction for sponsored search advertising in Microsoft’s Bing search engine. Omnipress (2010)

16.

Johnson, C.C.: Logistic matrix factorization for implicit feedback data. In: Advances in Neural Information Processing Systems, vol. 27 (2014)

17.

Kaufmann, E., Korda, N., Munos, R.: Thompson sampling: an asymptotically optimal finite-time analysis. In: Bshouty, N.H., Stoltz, G., Vayatis, N., Zeugmann, T. (eds.) ALT 2012. LNCS (LNAI), vol. 7568, pp. 199–213. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34106-9_18CrossRef

18.

Kawale, J., Bui, H.H., Kveton, B., Tran-Thanh, L., Chawla, S.: Efficient Thompson sampling for online matrix-factorization recommendation. In: Advances in Neural Information Processing Systems, pp. 1297–1305 (2015)

19.

Lavancier, F., Rochet, P.: A general procedure to combine estimators. Comput. Stat. Data Anal. 94, 175–192 (2016)MathSciNetCrossRef

20.

Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th International Conference on World Wide Web, pp. 661–670. ACM (2010)

21.

Li, S., Karatzoglou, A., Gentile, C.: Collaborative filtering bandits. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 539–548. ACM (2016)

22.

Nguyen, T.T., Lauw, H.W.: Dynamic clustering of contextual multi-armed bandits. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1959–1962. ACM (2014)

23.

Ouyang, Y., Gagrani, M., Nayyar, A., Jain, R.: Learning unknown Markov decision processes: a Thompson sampling approach. In: Advances in Neural Information Processing Systems, pp. 1333–1342 (2017)

24.

Russo, D.J., Van Roy, B., Kazerouni, A., Osband, I., Wen, Z., et al.: A tutorial on Thompson sampling. Found. Trends® in Mach. Learn. 11(1), 1–96 (2018)CrossRef

25.

Schwartz, E.M., Bradlow, E.T., Fader, P.S.: Customer acquisition via display advertising using multi-armed bandit experiments. Mark. Sci. 36(4), 500–522 (2017)CrossRef

26.

Scott, S.L.: A modern Bayesian look at the multi-armed bandit. Appl. Stoch. Models Bus. Ind. 26(6), 639–658 (2010)MathSciNetCrossRef

27.

Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4), 285–294 (1933)CrossRef

28.

Wolfinger, R.: Laplace’s approximation for nonlinear mixed models. Biometrika 80(4), 791–795 (1993)MathSciNetCrossRef

29.

Wu, Q., Wang, H., Gu, Q., Wang, H.: Contextual bandits in a collaborative environment. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 529–538. ACM (2016)

Title: Collaborative Thompson Sampling
Authors: Zhenyu Zhu
Liusheng Huang
Hongli Xu
Publisher: Springer International Publishing
Book: Collaborative Computing: Networking, Applications and Worksharing
Print ISBN: 978-3-030-12980-4

Electronic ISBN: 978-3-030-12981-1

Copyright Year: 2019
DOI: https://doi.org/10.1007/978-3-030-12981-1_2

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"