ABSTRACT
Search engines represent one of the most popular web services, visited by more than 85% of internet users on a daily basis. Advertisers are interested in making use of this vast business potential, as very clear intent signal communicated through the issued query allows effective targeting of users. This idea is embodied in a sponsored search model, where each advertiser maintains a list of keywords they deem indicative of increased user response rate with regards to their business. According to this targeting model, when a query is issued all advertisers with a matching keyword are entered into an auction according to the amount they bid for the query, and the winner gets to show their ad. One of the main challenges is the fact that a query may not match many keywords, resulting in lower auction value, lower ad quality, and lost revenue for advertisers and publishers. Possible solution is to expand a query into a set of related queries and use them to increase the number of matched ads, called query rewriting. To this end, we propose rewriting method based on a novel query embedding algorithm, which jointly models query content as well as its context within a search session. As a result, queries with similar content and context are mapped into vectors close in the embedding space, which allows expansion of a query via simple K-nearest neighbor search in the projected space. The method was trained on more than 12 billion sessions, one of the largest corpuses reported thus far, and evaluated on both public TREC data set and in-house sponsored search data set. The results show the proposed approach significantly outperformed existing state-of-the-art, strongly indicating its benefits and the monetization potential.
- M. Aly, A. Hatch, V. Josifovski, and V. K. Narayanan. Web-scale user modeling for targeting. WWW, 2012. Google ScholarDigital Library
- R. Baeza-Yates, C. Hurtado, and M. Mendoza. Query recommendation using query logs in search engines. In Proceedings of the 2004 International Conference on Current Trends in Database Technology, EDBT'04, pages 588--596, Berlin, Heidelberg, 2004. Springer-Verlag. Google ScholarDigital Library
- R. Baeza-Yates, B. Ribeiro-Neto, et al. Modern information retrieval, volume 463. ACM press New York, 1999. Google ScholarDigital Library
- Y. Bengio, H. Schwenk, J.-S. Senécal, F. Morin, and J.-L. Gauvain. Neural probabilistic language models. In Innovations in Machine Learning, pages 137--186. Springer, 2006.Google ScholarCross Ref
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993--1022, 2003. Google ScholarDigital Library
- P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, and S. Vigna. The query-flow graph: Model and applications. In Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM '08, pages 609--618, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- F. Bonchi, R. Perego, F. Silvestri, H. Vahabi, and R. Venturini. Efficient query recommendations in the long tail via center-piece subgraphs. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '12, pages 345--354, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko. Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems, pages 2787--2795, 2013.Google ScholarDigital Library
- D. E. Bowman, M. L. Hamrick, T. R. Kohn, R. E. Ortega, and J. R. Spiegel. Refining search queries by the suggestion of correlated terms from prior searches, Dec. 21 1999. US Patent 6,006,225.Google Scholar
- A. Broder. A taxonomy of web search. In ACM Sigir forum, volume 36, pages 3--10. ACM, 2002. Google ScholarDigital Library
- A. Z. Broder, P. Ciccolo, M. Fontoura, E. Gabrilovich, V. Josifovski, and L. Riedel. Search advertising using web relevance feedback. In Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM '08, pages 1013--1022, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- Y. Chen, D. Pavlov, and J. F. Canny. Large-scale behavioral targeting. KDD, 2009. Google ScholarDigital Library
- R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. The Journal of Machine Learning Research, 12:2493--2537, 2011. Google ScholarDigital Library
- N. Djuric, H. Wu, V. Radosavljevic, M. Grbovic, and N. Bhamidipati. Hierarchical neural language models for joint representation of streaming documents and their content. In International World Wide Web Conference (WWW), 2015. Google ScholarDigital Library
- D. C. Fain and J. O. Pedersen. Sponsored search: A brief history. Bulletin of the American Society for Information Science and Technology, 32(2):12--13, 2006.Google ScholarCross Ref
- D. Gayo-Avello. A survey on session detection methods in query logs and a proposal for future evaluation. Inf. Sci., 179(12):1822--1843, May 2009. Google ScholarDigital Library
- M. Grbovic, N. Djuric, V. Radosavljevic, and N. Bhamidipati. Search retargeting using directed query embeddings. In International World Wide Web Conference (WWW), 2015. Google ScholarDigital Library
- M. Grbovic and S. Vucetic. Generating ad targeting rules using sparse principal component analysis with constraints. WWW, 2014. Google ScholarDigital Library
- T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 50--57. ACM, 1999. Google ScholarDigital Library
- L. Hong and B. D. Davison. Empirical study of topic modeling in twitter. In Proceedings of the First Workshop on Social Media Analytics, pages 80--88. ACM, 2010. Google ScholarDigital Library
- A. K. Jain, L. Hong, and S. Pankanti. Iab internet advertising revenue report: 2013 first six months' results. Technical report, Interactive Advertising Bureau, 2013.Google Scholar
- B. J. Jansen and T. Mullen. Sponsored search: An overview of the concept, history, and technology. International Journal of Electronic Business, 6(2):114--131, 2008.Google ScholarCross Ref
- R. Jones, B. Rey, O. Madani, and W. Greiner. Generating query substitutions. In Proceedings of the 15th international conference on World Wide Web, pages 387--396. ACM, 2006. Google ScholarDigital Library
- F. Keller and M. Lapata. Using the web to obtain frequencies for unseen bigrams. Computational linguistics, 29(3):459--484, 2003. Google ScholarDigital Library
- R. Kiros, R. Zemel, and R. Salakhutdinov. Multimodal neural language models. In Proceedings of the 31th International Conference on Machine Learning, 2014.Google Scholar
- R. Kiros, R. S. Zemel, and R. Salakhutdinov. A multiplicative model for learning distributed text-based attribute representations. arXiv preprint arXiv:1406.2710, 2014.Google Scholar
- V. Lavrenko and W. B. Croft. Relevance based language models. In SIGIR, pages 120--127. ACM, 2001. Google ScholarDigital Library
- Q. V. Le and T. Mikolov. Distributed representations of sentences and documents. arXiv preprint arXiv:1405.4053, 2014.Google Scholar
- V. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet Physics-Doklady, volume 10, 1966.Google Scholar
- T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781, 2013.Google Scholar
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pages 3111--3119, 2013.Google ScholarDigital Library
- S. Pandey, M. Aly, A. Bagherjeiran, A. Hatch, P. Ciccolo, A. Ratnaparkhi, and M. Zinkevich. Learning to target: what works for behavioral targeting. CIKM, 2011. Google ScholarDigital Library
- B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. arXiv preprint arXiv:1403.6652, 2014.Google Scholar
- PwC. Global entertainment and media outlook: 2014--2018. Technical report, 2014.Google Scholar
- C. Silverstein, H. Marais, M. Henzinger, and M. Moricz. Analysis of a very large web search engine query log. In ACm SIGIR Forum, volume 33, pages 6--12. ACM, 1999. Google ScholarDigital Library
- F. Silvestri. Mining query logs: Turning search usage data into knowledge. Found. Trends Inf. Retr., 4:1--174, Jan. 2010. Google ScholarDigital Library
- R. Socher, D. Chen, C. D. Manning, and A. Ng. Reasoning with neural tensor networks for knowledge base completion. In Advances in Neural Information Processing Systems, pages 926--934, 2013.Google ScholarDigital Library
- J. Turian, L. Ratinov, and Y. Bengio. Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 384--394. Association for Computational Linguistics, 2010. Google ScholarDigital Library
- H. Vahabi, M. Ackerman, D. Loker, R. Baeza-Yates, and A. Lopez-Ortiz. Orthogonal query recommendation. In Proceedings of the 7th ACM Conference on Recommender Systems, RecSys '13, pages 33--40, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- W. V. Zhang and R. Jones. Comparing click logs and editorial labels for training query rewriting. In WWW 2007 Workshop on Query Log Analysis: Social And Technological Challenges, 2007.Google Scholar
Index Terms
- Context- and Content-aware Embeddings for Query Rewriting in Sponsored Search
Recommendations
Query rewriting using active learning for sponsored search
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrievalSponsored search is a major revenue source for search companies. Web searchers can issue any queries, while advertisement keywords are limited. Query rewriting technique effectively matches user queries with relevant advertisement keywords, thus ...
Impact of query intent and search context on clickthrough behavior in sponsored search
Implicit feedback techniques may be used for query intent detection, taking advantage of user behavior to understand their interests and preferences. In sponsored search, a primary concern is the user's interest in purchasing or utilizing a commercial ...
Diversity driven Query Rewriting in Search Advertising
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data MiningRetrieving keywords (bidwords) with the same intent as query, referred to as close variant keywords, is of prime importance for effective targeted search advertising. For head and torso search queries, sponsored search engines use a huge repository of ...
Comments