research-article

Context- and Content-aware Embeddings for Query Rewriting in Sponsored Search

Authors:
Mihajlo Grbovic

Yahoo Labs, Sunnyvale, CA, USA

Yahoo Labs, Sunnyvale, CA, USA
View Profile

,
Nemanja Djuric

Yahoo Labs, Sunnyvale, CA, USA

Yahoo Labs, Sunnyvale, CA, USA
View Profile

,
Vladan Radosavljevic

Yahoo Labs, Sunnyvale, CA, USA

Yahoo Labs, Sunnyvale, CA, USA
View Profile

,
Fabrizio Silvestri

Yahoo Labs, London, England UK

Yahoo Labs, London, England UK
View Profile

,
Narayan Bhamidipati

Yahoo Labs, Sunnyvale, CA, USA

Yahoo Labs, Sunnyvale, CA, USA
View Profile

SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information RetrievalAugust 2015Pages 383–392https://doi.org/10.1145/2766462.2767709

Published:09 August 2015Publication History

SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 383–392

ABSTRACT

Search engines represent one of the most popular web services, visited by more than 85% of internet users on a daily basis. Advertisers are interested in making use of this vast business potential, as very clear intent signal communicated through the issued query allows effective targeting of users. This idea is embodied in a sponsored search model, where each advertiser maintains a list of keywords they deem indicative of increased user response rate with regards to their business. According to this targeting model, when a query is issued all advertisers with a matching keyword are entered into an auction according to the amount they bid for the query, and the winner gets to show their ad. One of the main challenges is the fact that a query may not match many keywords, resulting in lower auction value, lower ad quality, and lost revenue for advertisers and publishers. Possible solution is to expand a query into a set of related queries and use them to increase the number of matched ads, called query rewriting. To this end, we propose rewriting method based on a novel query embedding algorithm, which jointly models query content as well as its context within a search session. As a result, queries with similar content and context are mapped into vectors close in the embedding space, which allows expansion of a query via simple K-nearest neighbor search in the projected space. The method was trained on more than 12 billion sessions, one of the largest corpuses reported thus far, and evaluated on both public TREC data set and in-house sponsored search data set. The results show the proposed approach significantly outperformed existing state-of-the-art, strongly indicating its benefits and the monetization potential.

References

M. Aly, A. Hatch, V. Josifovski, and V. K. Narayanan. Web-scale user modeling for targeting. WWW, 2012. Google ScholarDigital Library
R. Baeza-Yates, C. Hurtado, and M. Mendoza. Query recommendation using query logs in search engines. In Proceedings of the 2004 International Conference on Current Trends in Database Technology, EDBT'04, pages 588--596, Berlin, Heidelberg, 2004. Springer-Verlag. Google ScholarDigital Library
R. Baeza-Yates, B. Ribeiro-Neto, et al. Modern information retrieval, volume 463. ACM press New York, 1999. Google ScholarDigital Library
Y. Bengio, H. Schwenk, J.-S. Senécal, F. Morin, and J.-L. Gauvain. Neural probabilistic language models. In Innovations in Machine Learning, pages 137--186. Springer, 2006.Google ScholarCross Ref
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993--1022, 2003. Google ScholarDigital Library
P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, and S. Vigna. The query-flow graph: Model and applications. In Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM '08, pages 609--618, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
F. Bonchi, R. Perego, F. Silvestri, H. Vahabi, and R. Venturini. Efficient query recommendations in the long tail via center-piece subgraphs. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '12, pages 345--354, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko. Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems, pages 2787--2795, 2013.Google ScholarDigital Library
D. E. Bowman, M. L. Hamrick, T. R. Kohn, R. E. Ortega, and J. R. Spiegel. Refining search queries by the suggestion of correlated terms from prior searches, Dec. 21 1999. US Patent 6,006,225.Google Scholar
A. Broder. A taxonomy of web search. In ACM Sigir forum, volume 36, pages 3--10. ACM, 2002. Google ScholarDigital Library
A. Z. Broder, P. Ciccolo, M. Fontoura, E. Gabrilovich, V. Josifovski, and L. Riedel. Search advertising using web relevance feedback. In Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM '08, pages 1013--1022, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
Y. Chen, D. Pavlov, and J. F. Canny. Large-scale behavioral targeting. KDD, 2009. Google ScholarDigital Library
R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. The Journal of Machine Learning Research, 12:2493--2537, 2011. Google ScholarDigital Library
N. Djuric, H. Wu, V. Radosavljevic, M. Grbovic, and N. Bhamidipati. Hierarchical neural language models for joint representation of streaming documents and their content. In International World Wide Web Conference (WWW), 2015. Google ScholarDigital Library
D. C. Fain and J. O. Pedersen. Sponsored search: A brief history. Bulletin of the American Society for Information Science and Technology, 32(2):12--13, 2006.Google ScholarCross Ref
D. Gayo-Avello. A survey on session detection methods in query logs and a proposal for future evaluation. Inf. Sci., 179(12):1822--1843, May 2009. Google ScholarDigital Library
M. Grbovic, N. Djuric, V. Radosavljevic, and N. Bhamidipati. Search retargeting using directed query embeddings. In International World Wide Web Conference (WWW), 2015. Google ScholarDigital Library
M. Grbovic and S. Vucetic. Generating ad targeting rules using sparse principal component analysis with constraints. WWW, 2014. Google ScholarDigital Library
T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 50--57. ACM, 1999. Google ScholarDigital Library
L. Hong and B. D. Davison. Empirical study of topic modeling in twitter. In Proceedings of the First Workshop on Social Media Analytics, pages 80--88. ACM, 2010. Google ScholarDigital Library
A. K. Jain, L. Hong, and S. Pankanti. Iab internet advertising revenue report: 2013 first six months' results. Technical report, Interactive Advertising Bureau, 2013.Google Scholar
B. J. Jansen and T. Mullen. Sponsored search: An overview of the concept, history, and technology. International Journal of Electronic Business, 6(2):114--131, 2008.Google ScholarCross Ref
R. Jones, B. Rey, O. Madani, and W. Greiner. Generating query substitutions. In Proceedings of the 15th international conference on World Wide Web, pages 387--396. ACM, 2006. Google ScholarDigital Library
F. Keller and M. Lapata. Using the web to obtain frequencies for unseen bigrams. Computational linguistics, 29(3):459--484, 2003. Google ScholarDigital Library
R. Kiros, R. Zemel, and R. Salakhutdinov. Multimodal neural language models. In Proceedings of the 31th International Conference on Machine Learning, 2014.Google Scholar
R. Kiros, R. S. Zemel, and R. Salakhutdinov. A multiplicative model for learning distributed text-based attribute representations. arXiv preprint arXiv:1406.2710, 2014.Google Scholar
V. Lavrenko and W. B. Croft. Relevance based language models. In SIGIR, pages 120--127. ACM, 2001. Google ScholarDigital Library
Q. V. Le and T. Mikolov. Distributed representations of sentences and documents. arXiv preprint arXiv:1405.4053, 2014.Google Scholar
V. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet Physics-Doklady, volume 10, 1966.Google Scholar
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781, 2013.Google Scholar
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pages 3111--3119, 2013.Google ScholarDigital Library
S. Pandey, M. Aly, A. Bagherjeiran, A. Hatch, P. Ciccolo, A. Ratnaparkhi, and M. Zinkevich. Learning to target: what works for behavioral targeting. CIKM, 2011. Google ScholarDigital Library
B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. arXiv preprint arXiv:1403.6652, 2014.Google Scholar
PwC. Global entertainment and media outlook: 2014--2018. Technical report, 2014.Google Scholar
C. Silverstein, H. Marais, M. Henzinger, and M. Moricz. Analysis of a very large web search engine query log. In ACm SIGIR Forum, volume 33, pages 6--12. ACM, 1999. Google ScholarDigital Library
F. Silvestri. Mining query logs: Turning search usage data into knowledge. Found. Trends Inf. Retr., 4:1--174, Jan. 2010. Google ScholarDigital Library
R. Socher, D. Chen, C. D. Manning, and A. Ng. Reasoning with neural tensor networks for knowledge base completion. In Advances in Neural Information Processing Systems, pages 926--934, 2013.Google ScholarDigital Library
J. Turian, L. Ratinov, and Y. Bengio. Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 384--394. Association for Computational Linguistics, 2010. Google ScholarDigital Library
H. Vahabi, M. Ackerman, D. Loker, R. Baeza-Yates, and A. Lopez-Ortiz. Orthogonal query recommendation. In Proceedings of the 7th ACM Conference on Recommender Systems, RecSys '13, pages 33--40, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
W. V. Zhang and R. Jones. Comparing click logs and editorial labels for training query rewriting. In WWW 2007 Workshop on Query Log Analysis: Social And Technological Challenges, 2007.Google Scholar

Index Terms

Context- and Content-aware Embeddings for Query Rewriting in Sponsored Search
1. Information systems
  1. Information retrieval

Recommendations

Query rewriting using active learning for sponsored search
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Sponsored search is a major revenue source for search companies. Web searchers can issue any queries, while advertisement keywords are limited. Query rewriting technique effectively matches user queries with relevant advertisement keywords, thus ...
Read More
Impact of query intent and search context on clickthrough behavior in sponsored search

Implicit feedback techniques may be used for query intent detection, taking advantage of user behavior to understand their interests and preferences. In sponsored search, a primary concern is the user's interest in purchasing or utilizing a commercial ...
Read More
Diversity driven Query Rewriting in Search Advertising
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

Retrieving keywords (bidwords) with the same intent as query, referred to as close variant keywords, is of prime importance for effective targeted search advertising. For head and torso search queries, sponsored search engines use a huge repository of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
August 2015
1198 pages
ISBN:9781450336215
DOI:10.1145/2766462
General Chair:
Ricardo Baeza-Yates
Yahoo Labs, USA
,
Program Chairs:
Mounia Lalmas
Yahoo Labs, UK
,
Alistair Moffat
University of Melbourne, Australia
,
Berthier Ribeiro-Neto
Google, Brazil, and UFMG, Brazil
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 August 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
algorithms
information retrieval
query rewriting
Qualifiers
- research-article
Conference

Acceptance Rates
SIGIR '15 Paper Acceptance Rate70of351submissions,20%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 58
  Total Citations
  View Citations
- 1,153
  Total Downloads
- Downloads (Last 12 months)51
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Context- and Content-aware Embeddings for Query Rewriting in Sponsored Search

SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Query rewriting using active learning for sponsored search

Impact of query intent and search context on clickthrough behavior in sponsored search

Diversity driven Query Rewriting in Search Advertising