skip to main content
10.1145/2766462.2767709acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Context- and Content-aware Embeddings for Query Rewriting in Sponsored Search

Published:09 August 2015Publication History

ABSTRACT

Search engines represent one of the most popular web services, visited by more than 85% of internet users on a daily basis. Advertisers are interested in making use of this vast business potential, as very clear intent signal communicated through the issued query allows effective targeting of users. This idea is embodied in a sponsored search model, where each advertiser maintains a list of keywords they deem indicative of increased user response rate with regards to their business. According to this targeting model, when a query is issued all advertisers with a matching keyword are entered into an auction according to the amount they bid for the query, and the winner gets to show their ad. One of the main challenges is the fact that a query may not match many keywords, resulting in lower auction value, lower ad quality, and lost revenue for advertisers and publishers. Possible solution is to expand a query into a set of related queries and use them to increase the number of matched ads, called query rewriting. To this end, we propose rewriting method based on a novel query embedding algorithm, which jointly models query content as well as its context within a search session. As a result, queries with similar content and context are mapped into vectors close in the embedding space, which allows expansion of a query via simple K-nearest neighbor search in the projected space. The method was trained on more than 12 billion sessions, one of the largest corpuses reported thus far, and evaluated on both public TREC data set and in-house sponsored search data set. The results show the proposed approach significantly outperformed existing state-of-the-art, strongly indicating its benefits and the monetization potential.

References

  1. M. Aly, A. Hatch, V. Josifovski, and V. K. Narayanan. Web-scale user modeling for targeting. WWW, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Baeza-Yates, C. Hurtado, and M. Mendoza. Query recommendation using query logs in search engines. In Proceedings of the 2004 International Conference on Current Trends in Database Technology, EDBT'04, pages 588--596, Berlin, Heidelberg, 2004. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Baeza-Yates, B. Ribeiro-Neto, et al. Modern information retrieval, volume 463. ACM press New York, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Y. Bengio, H. Schwenk, J.-S. Senécal, F. Morin, and J.-L. Gauvain. Neural probabilistic language models. In Innovations in Machine Learning, pages 137--186. Springer, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  5. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, and S. Vigna. The query-flow graph: Model and applications. In Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM '08, pages 609--618, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. F. Bonchi, R. Perego, F. Silvestri, H. Vahabi, and R. Venturini. Efficient query recommendations in the long tail via center-piece subgraphs. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '12, pages 345--354, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko. Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems, pages 2787--2795, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. E. Bowman, M. L. Hamrick, T. R. Kohn, R. E. Ortega, and J. R. Spiegel. Refining search queries by the suggestion of correlated terms from prior searches, Dec. 21 1999. US Patent 6,006,225.Google ScholarGoogle Scholar
  10. A. Broder. A taxonomy of web search. In ACM Sigir forum, volume 36, pages 3--10. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Z. Broder, P. Ciccolo, M. Fontoura, E. Gabrilovich, V. Josifovski, and L. Riedel. Search advertising using web relevance feedback. In Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM '08, pages 1013--1022, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. Chen, D. Pavlov, and J. F. Canny. Large-scale behavioral targeting. KDD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. The Journal of Machine Learning Research, 12:2493--2537, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. N. Djuric, H. Wu, V. Radosavljevic, M. Grbovic, and N. Bhamidipati. Hierarchical neural language models for joint representation of streaming documents and their content. In International World Wide Web Conference (WWW), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. C. Fain and J. O. Pedersen. Sponsored search: A brief history. Bulletin of the American Society for Information Science and Technology, 32(2):12--13, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  16. D. Gayo-Avello. A survey on session detection methods in query logs and a proposal for future evaluation. Inf. Sci., 179(12):1822--1843, May 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Grbovic, N. Djuric, V. Radosavljevic, and N. Bhamidipati. Search retargeting using directed query embeddings. In International World Wide Web Conference (WWW), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Grbovic and S. Vucetic. Generating ad targeting rules using sparse principal component analysis with constraints. WWW, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 50--57. ACM, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. Hong and B. D. Davison. Empirical study of topic modeling in twitter. In Proceedings of the First Workshop on Social Media Analytics, pages 80--88. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. K. Jain, L. Hong, and S. Pankanti. Iab internet advertising revenue report: 2013 first six months' results. Technical report, Interactive Advertising Bureau, 2013.Google ScholarGoogle Scholar
  22. B. J. Jansen and T. Mullen. Sponsored search: An overview of the concept, history, and technology. International Journal of Electronic Business, 6(2):114--131, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  23. R. Jones, B. Rey, O. Madani, and W. Greiner. Generating query substitutions. In Proceedings of the 15th international conference on World Wide Web, pages 387--396. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. F. Keller and M. Lapata. Using the web to obtain frequencies for unseen bigrams. Computational linguistics, 29(3):459--484, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. Kiros, R. Zemel, and R. Salakhutdinov. Multimodal neural language models. In Proceedings of the 31th International Conference on Machine Learning, 2014.Google ScholarGoogle Scholar
  26. R. Kiros, R. S. Zemel, and R. Salakhutdinov. A multiplicative model for learning distributed text-based attribute representations. arXiv preprint arXiv:1406.2710, 2014.Google ScholarGoogle Scholar
  27. V. Lavrenko and W. B. Croft. Relevance based language models. In SIGIR, pages 120--127. ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Q. V. Le and T. Mikolov. Distributed representations of sentences and documents. arXiv preprint arXiv:1405.4053, 2014.Google ScholarGoogle Scholar
  29. V. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet Physics-Doklady, volume 10, 1966.Google ScholarGoogle Scholar
  30. T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781, 2013.Google ScholarGoogle Scholar
  31. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pages 3111--3119, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Pandey, M. Aly, A. Bagherjeiran, A. Hatch, P. Ciccolo, A. Ratnaparkhi, and M. Zinkevich. Learning to target: what works for behavioral targeting. CIKM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. arXiv preprint arXiv:1403.6652, 2014.Google ScholarGoogle Scholar
  34. PwC. Global entertainment and media outlook: 2014--2018. Technical report, 2014.Google ScholarGoogle Scholar
  35. C. Silverstein, H. Marais, M. Henzinger, and M. Moricz. Analysis of a very large web search engine query log. In ACm SIGIR Forum, volume 33, pages 6--12. ACM, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. F. Silvestri. Mining query logs: Turning search usage data into knowledge. Found. Trends Inf. Retr., 4:1--174, Jan. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. R. Socher, D. Chen, C. D. Manning, and A. Ng. Reasoning with neural tensor networks for knowledge base completion. In Advances in Neural Information Processing Systems, pages 926--934, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. J. Turian, L. Ratinov, and Y. Bengio. Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 384--394. Association for Computational Linguistics, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. H. Vahabi, M. Ackerman, D. Loker, R. Baeza-Yates, and A. Lopez-Ortiz. Orthogonal query recommendation. In Proceedings of the 7th ACM Conference on Recommender Systems, RecSys '13, pages 33--40, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. W. V. Zhang and R. Jones. Comparing click logs and editorial labels for training query rewriting. In WWW 2007 Workshop on Query Log Analysis: Social And Technological Challenges, 2007.Google ScholarGoogle Scholar

Index Terms

  1. Context- and Content-aware Embeddings for Query Rewriting in Sponsored Search

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
      August 2015
      1198 pages
      ISBN:9781450336215
      DOI:10.1145/2766462

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 August 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SIGIR '15 Paper Acceptance Rate70of351submissions,20%Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader