skip to main content
10.1145/3219819.3219918acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Deep Reinforcement Learning for Sponsored Search Real-time Bidding

Published:19 July 2018Publication History

ABSTRACT

Bidding optimization is one of the most critical problems in online advertising. Sponsored search (SS) auction, due to the randomness of user query behavior and platform nature, usually adopts keyword-level bidding strategies. In contrast, the display advertising (DA), as a relatively simpler scenario for auction, has taken advantage of real-time bidding (RTB) to boost the performance for advertisers. In this paper, we consider the RTB problem in sponsored search auction, named SS-RTB. SS-RTB has a much more complex dynamic environment, due to stochastic user query behavior and more complex bidding policies based on multiple keywords of an ad. Most previous methods for DA cannot be applied. We propose a reinforcement learning (RL) solution for handling the complex dynamic environment. Although some RL methods have been proposed for online advertising, they all fail to address the "environment changing'' problem: the state transition probabilities vary between two days. Motivated by the observation that auction sequences of two days share similar transition patterns at a proper aggregation level, we formulate a robust MDP model at hour-aggregation level of the auction data and propose a control-by-model framework for SS-RTB. Rather than generating bid prices directly, we decide a bidding model for impressions of each hour and perform real-time bidding accordingly. We also extend the method to handle the multi-agent problem. We deployed the SS-RTB system in the e-commerce search auction platform of Alibaba. Empirical experiments of offline evaluation and online A/B test demonstrate the effectiveness of our method.

References

  1. Kareem Amin, Michael Kearns, Peter Key, and Anton Schwaighofer. 2012. Budget optimization for sponsored search: Censored learning in MDPs. arXiv preprint arXiv:1210.4847 (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Christian Borgs, Jennifer Chayes, Nicole Immorlica, Kamal Jain, Omid Etesami, and Mohammad Mahdian. 2007. Dynamics of bid optimization in online advertisement auctions. In Proceedings of the 16th international conference on World Wide Web. ACM, 531--540. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Christian Borgs, Jennifer Chayes, Nicole Immorlica, Mohammad Mahdian, and Amin Saberi. 2005. Multi-unit auctions with budget-constrained bidders. In Proceedings of the 6th ACM conference on Electronic commerce. ACM, 44--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Andrei Broder, Evgeniy Gabrilovich, Vanja Josifovski, George Mavromatis, and Alex Smola. 2011. Bid generation for advanced match in sponsored search. In Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 515--524. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Lucian Busoniu, Robert Babuska, and Bart De Schutter. 2008. A comprehensive survey of multiagent reinforcement learning. IEEE Trans. Systems, Man, and Cybernetics, Part C 38, 2 (2008), 156--172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Han Cai, Kan Ren, Weinan Zhang, Kleanthis Malialis, Jun Wang, Yong Yu, and Defeng Guo. 2017. Real-time bidding by reinforcement learning in display advertising. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 661--670. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ye Chen, Pavel Berkhin, Bo Anderson, and Nikhil R Devanur. 2011. Real-time bidding algorithms for performance-based display ad allocation. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1307--1315. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Eyal Even Dar, Vahab S Mirrokni, S Muthukrishnan, Yishay Mansour, and Uri Nadav. 2009. Bid optimization for broad match ad auctions. In Proceedings of the 18th international conference on World wide web. ACM, 231--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jon Feldman, S Muthukrishnan, Martin Pal, and Cliff Stein. 2007. Budget optimization in search-based advertising auctions. In Proceedings of the 8th ACM conference on Electronic commerce. ACM, 40--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ariel Fuxman, Panayiotis Tsaparas, Kannan Achan, and Rakesh Agrawal. 2008. Using the wisdom of the crowds for keyword generation. In Proceedings of the 17th international conference on World Wide Web. ACM, 61--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, and Sergey Levine. 2016. Continuous deep q-learning with model-based acceleration. In International Conference on Machine Learning. 2829--2838. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Roland Hafner and Martin Riedmiller. 2011. Reinforcement learning in feedback control. Machine learning 84, 1--2 (2011), 137--169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Brendan Kitts and Benjamin Leblanc. 2004. Optimal bidding on keyword auctions. Electronic Markets 14, 3 (2004), 186--201.Google ScholarGoogle ScholarCross RefCross Ref
  14. Kuang-Chih Lee, Ali Jalali, and Ali Dasdan. 2013. Real time bid optimization with smooth budget delivery in online advertising. In Proceedings of the Seventh International Workshop on Data Mining for Online Advertising. ACM, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kuang-chih Lee, Burkay Orten, Ali Dasdan, and Wentong Li. 2012. Estimating conversion rate in display advertising from past erformance data. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 768--776. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Sergey Levine and Pieter Abbeel. 2014. Learning neural network policies with guided policy search under unknown dynamics. In Advances in Neural Information Processing Systems. 1071--1079. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).Google ScholarGoogle Scholar
  18. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529.Google ScholarGoogle Scholar
  19. S Muthukrishnan, Martin Pál, and Zoya Svitkina. 2007. Stochastic models for budget optimization in search-based advertising. In International Workshop on Web and Internet Economics. Springer, 131--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Claudia Perlich, Brian Dalessandro, Rod Hook, Ori Stitelman, Troy Raeder, and Foster Provost. 2012. Bid optimizing and inventory scoring in targeted online advertising. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 804--812. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. David L Poole and Alan K Mackworth. 2010. Artificial Intelligence: foundations of computational agents. Cambridge University Press. Google ScholarGoogle Scholar
  22. Howard M Schwartz. 2014. Multi-agent machine learning: A reinforcement approach. John Wiley &Sons. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. nature 529, 7587 (2016), 484--489.Google ScholarGoogle Scholar
  24. Richard S Sutton and Andrew G Barto. 1998. Reinforcement learning: An introduction. Vol. 1. MIT press Cambridge. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ardi Tampuu, Tambet Matiisen, Dorian Kodelja, Ilya Kuzovkin, Kristjan Korjus, Juhan Aru, Jaan Aru, and Raul Vicente. 2017. Multiagent cooperation and competition with deep reinforcement learning. PloS one 12, 4 (2017), e0172395.Google ScholarGoogle ScholarCross RefCross Ref
  26. Tijmen Tieleman and Geoffrey Hinton. 2012. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning 4, 2 (2012), 26--31.Google ScholarGoogle Scholar
  27. Jun Wang and Shuai Yuan. 2015. Real-time bidding: A new frontier of computational advertising research. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. ACM, 415--416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Yu Wang, Jiayi Liu, Yuxiang Liu, Jun Hao, Yang He, Jinghe Hu, Weipeng Yan, and Mantian Li. 2017. LADDER: A Human-Level Bidding Agent for Large-Scale Real-Time Online Auctions. arXiv preprint arXiv:1708.05565 (2017).Google ScholarGoogle Scholar
  29. Wush Chi-Hsuan Wu, Mi-Yen Yeh, and Ming-Syan Chen. 2015. Predicting winning price in real time bidding with censored data. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1305--1314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Shuai Yuan, Jun Wang, and Xiaoxue Zhao. 2013. Real-time bidding for online advertising: measurement and analysis. In Proceedings of the Seventh International Workshop on Data Mining for Online Advertising. ACM, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Weinan Zhang, Shuai Yuan, and Jun Wang. 2014. Optimal real-time bidding for display advertising. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1077--1086. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Deep Reinforcement Learning for Sponsored Search Real-time Bidding

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Other conferences
              KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
              July 2018
              2925 pages
              ISBN:9781450355520
              DOI:10.1145/3219819

              Copyright © 2018 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 19 July 2018

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              KDD '18 Paper Acceptance Rate107of983submissions,11%Overall Acceptance Rate1,133of8,635submissions,13%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader