ABSTRACT
Bidding optimization is one of the most critical problems in online advertising. Sponsored search (SS) auction, due to the randomness of user query behavior and platform nature, usually adopts keyword-level bidding strategies. In contrast, the display advertising (DA), as a relatively simpler scenario for auction, has taken advantage of real-time bidding (RTB) to boost the performance for advertisers. In this paper, we consider the RTB problem in sponsored search auction, named SS-RTB. SS-RTB has a much more complex dynamic environment, due to stochastic user query behavior and more complex bidding policies based on multiple keywords of an ad. Most previous methods for DA cannot be applied. We propose a reinforcement learning (RL) solution for handling the complex dynamic environment. Although some RL methods have been proposed for online advertising, they all fail to address the "environment changing'' problem: the state transition probabilities vary between two days. Motivated by the observation that auction sequences of two days share similar transition patterns at a proper aggregation level, we formulate a robust MDP model at hour-aggregation level of the auction data and propose a control-by-model framework for SS-RTB. Rather than generating bid prices directly, we decide a bidding model for impressions of each hour and perform real-time bidding accordingly. We also extend the method to handle the multi-agent problem. We deployed the SS-RTB system in the e-commerce search auction platform of Alibaba. Empirical experiments of offline evaluation and online A/B test demonstrate the effectiveness of our method.
- Kareem Amin, Michael Kearns, Peter Key, and Anton Schwaighofer. 2012. Budget optimization for sponsored search: Censored learning in MDPs. arXiv preprint arXiv:1210.4847 (2012). Google ScholarDigital Library
- Christian Borgs, Jennifer Chayes, Nicole Immorlica, Kamal Jain, Omid Etesami, and Mohammad Mahdian. 2007. Dynamics of bid optimization in online advertisement auctions. In Proceedings of the 16th international conference on World Wide Web. ACM, 531--540. Google ScholarDigital Library
- Christian Borgs, Jennifer Chayes, Nicole Immorlica, Mohammad Mahdian, and Amin Saberi. 2005. Multi-unit auctions with budget-constrained bidders. In Proceedings of the 6th ACM conference on Electronic commerce. ACM, 44--51. Google ScholarDigital Library
- Andrei Broder, Evgeniy Gabrilovich, Vanja Josifovski, George Mavromatis, and Alex Smola. 2011. Bid generation for advanced match in sponsored search. In Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 515--524. Google ScholarDigital Library
- Lucian Busoniu, Robert Babuska, and Bart De Schutter. 2008. A comprehensive survey of multiagent reinforcement learning. IEEE Trans. Systems, Man, and Cybernetics, Part C 38, 2 (2008), 156--172. Google ScholarDigital Library
- Han Cai, Kan Ren, Weinan Zhang, Kleanthis Malialis, Jun Wang, Yong Yu, and Defeng Guo. 2017. Real-time bidding by reinforcement learning in display advertising. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 661--670. Google ScholarDigital Library
- Ye Chen, Pavel Berkhin, Bo Anderson, and Nikhil R Devanur. 2011. Real-time bidding algorithms for performance-based display ad allocation. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1307--1315. Google ScholarDigital Library
- Eyal Even Dar, Vahab S Mirrokni, S Muthukrishnan, Yishay Mansour, and Uri Nadav. 2009. Bid optimization for broad match ad auctions. In Proceedings of the 18th international conference on World wide web. ACM, 231--240. Google ScholarDigital Library
- Jon Feldman, S Muthukrishnan, Martin Pal, and Cliff Stein. 2007. Budget optimization in search-based advertising auctions. In Proceedings of the 8th ACM conference on Electronic commerce. ACM, 40--49. Google ScholarDigital Library
- Ariel Fuxman, Panayiotis Tsaparas, Kannan Achan, and Rakesh Agrawal. 2008. Using the wisdom of the crowds for keyword generation. In Proceedings of the 17th international conference on World Wide Web. ACM, 61--70. Google ScholarDigital Library
- Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, and Sergey Levine. 2016. Continuous deep q-learning with model-based acceleration. In International Conference on Machine Learning. 2829--2838. Google ScholarDigital Library
- Roland Hafner and Martin Riedmiller. 2011. Reinforcement learning in feedback control. Machine learning 84, 1--2 (2011), 137--169. Google ScholarDigital Library
- Brendan Kitts and Benjamin Leblanc. 2004. Optimal bidding on keyword auctions. Electronic Markets 14, 3 (2004), 186--201.Google ScholarCross Ref
- Kuang-Chih Lee, Ali Jalali, and Ali Dasdan. 2013. Real time bid optimization with smooth budget delivery in online advertising. In Proceedings of the Seventh International Workshop on Data Mining for Online Advertising. ACM, 1. Google ScholarDigital Library
- Kuang-chih Lee, Burkay Orten, Ali Dasdan, and Wentong Li. 2012. Estimating conversion rate in display advertising from past erformance data. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 768--776. Google ScholarDigital Library
- Sergey Levine and Pieter Abbeel. 2014. Learning neural network policies with guided policy search under unknown dynamics. In Advances in Neural Information Processing Systems. 1071--1079. Google ScholarDigital Library
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).Google Scholar
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529.Google Scholar
- S Muthukrishnan, Martin Pál, and Zoya Svitkina. 2007. Stochastic models for budget optimization in search-based advertising. In International Workshop on Web and Internet Economics. Springer, 131--142. Google ScholarDigital Library
- Claudia Perlich, Brian Dalessandro, Rod Hook, Ori Stitelman, Troy Raeder, and Foster Provost. 2012. Bid optimizing and inventory scoring in targeted online advertising. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 804--812. Google ScholarDigital Library
- David L Poole and Alan K Mackworth. 2010. Artificial Intelligence: foundations of computational agents. Cambridge University Press. Google Scholar
- Howard M Schwartz. 2014. Multi-agent machine learning: A reinforcement approach. John Wiley &Sons. Google ScholarDigital Library
- David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. nature 529, 7587 (2016), 484--489.Google Scholar
- Richard S Sutton and Andrew G Barto. 1998. Reinforcement learning: An introduction. Vol. 1. MIT press Cambridge. Google ScholarDigital Library
- Ardi Tampuu, Tambet Matiisen, Dorian Kodelja, Ilya Kuzovkin, Kristjan Korjus, Juhan Aru, Jaan Aru, and Raul Vicente. 2017. Multiagent cooperation and competition with deep reinforcement learning. PloS one 12, 4 (2017), e0172395.Google ScholarCross Ref
- Tijmen Tieleman and Geoffrey Hinton. 2012. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning 4, 2 (2012), 26--31.Google Scholar
- Jun Wang and Shuai Yuan. 2015. Real-time bidding: A new frontier of computational advertising research. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. ACM, 415--416. Google ScholarDigital Library
- Yu Wang, Jiayi Liu, Yuxiang Liu, Jun Hao, Yang He, Jinghe Hu, Weipeng Yan, and Mantian Li. 2017. LADDER: A Human-Level Bidding Agent for Large-Scale Real-Time Online Auctions. arXiv preprint arXiv:1708.05565 (2017).Google Scholar
- Wush Chi-Hsuan Wu, Mi-Yen Yeh, and Ming-Syan Chen. 2015. Predicting winning price in real time bidding with censored data. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1305--1314. Google ScholarDigital Library
- Shuai Yuan, Jun Wang, and Xiaoxue Zhao. 2013. Real-time bidding for online advertising: measurement and analysis. In Proceedings of the Seventh International Workshop on Data Mining for Online Advertising. ACM, 3. Google ScholarDigital Library
- Weinan Zhang, Shuai Yuan, and Jun Wang. 2014. Optimal real-time bidding for display advertising. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1077--1086. Google ScholarDigital Library
Index Terms
- Deep Reinforcement Learning for Sponsored Search Real-time Bidding
Recommendations
Real-Time Bidding by Reinforcement Learning in Display Advertising
WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data MiningThe majority of online display ads are served through real-time bidding (RTB) --- each ad display impression is auctioned off in real-time when it is just being generated from a user visit. To place an ad automatically and optimally, it is critical for ...
Correcting vindictive bidding behaviors in sponsored search auctions
In this study, we aim to develop a pricing mechanism that reduces the effects resulted by vindictive advertisers who bid on sponsored search auctions run by search engine providers. In particular, we aim to ensure payment fairness and price stability in ...
Multi-bidding strategy in sponsored search auctions
The generalized second price auction has recently become a much studied model for sponsored search auctions for Internet advertisement. Though it is known not to be incentive compatible, properties of its pure Nash equilibria have been well ...
Comments