Skip to main content
Top
Published in: Discover Computing 3/2011

01-06-2011 | Web Mining for Search

The sum of its parts: reducing sparsity in click estimation with query segments

Authors: Dustin Hillard, Eren Manavoglu, Hema Raghavan, Chris Leggetter, Erick Cantú-Paz, Rukmini Iyer

Published in: Discover Computing | Issue 3/2011

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The critical task of predicting clicks on search advertisements is typically addressed by learning from historical click data. When enough history is observed for a given query-ad pair, future clicks can be accurately modeled. However, based on the empirical distribution of queries, sufficient historical information is unavailable for many query-ad pairs. The sparsity of data for new and rare queries makes it difficult to accurately estimate clicks for a significant portion of typical search engine traffic. In this paper we provide analysis to motivate modeling approaches that can reduce the sparsity of the large space of user search queries. We then propose methods to improve click and relevance models for sponsored search by mining click behavior for partial user queries. We aggregate click history for individual query words, as well as for phrases extracted with a CRF model. The new models show significant improvement in clicks and revenue compared to state-of-the-art baselines trained on several months of query logs. Results are reported on live traffic of a commercial search engine, in addition to results from offline evaluation.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Agarwal, D., Agrawal, R., Khanna, R., & Kota, N. (2010). Estimating rates of rare events with multiple hierarchies through scalable log-linear models. In KDD (pp. 213–222). Agarwal, D., Agrawal, R., Khanna, R., & Kota, N. (2010). Estimating rates of rare events with multiple hierarchies through scalable log-linear models. In KDD (pp. 213–222).
go back to reference Agarwal, D., Broder, A. Z., Chakrabarti, D., Diklic, D., Josifovski, V., & Sayyadian, M. (2007). Estimating rates of rare events at multiple resolutions. In KDD (pp. 16–25). New York, NY, USA, ACM. Agarwal, D., Broder, A. Z., Chakrabarti, D., Diklic, D., Josifovski, V., & Sayyadian, M. (2007). Estimating rates of rare events at multiple resolutions. In KDD (pp. 16–25). New York, NY, USA, ACM.
go back to reference Agichtein, E., Brill, E., & Dumais, S. (2006). Improving web search ranking by incorporating user behavior information. In SIGIR. Agichtein, E., Brill, E., & Dumais, S. (2006). Improving web search ranking by incorporating user behavior information. In SIGIR.
go back to reference Anastasakos, T., Hillard, D., Kshetramade, S., & Raghavan, H. (2009). A collaborative filtering approach to ad recommendation using the query ad click graph. Technical Report YL-2009-006, Yahoo! Labs, Aug. Anastasakos, T., Hillard, D., Kshetramade, S., & Raghavan, H. (2009). A collaborative filtering approach to ad recommendation using the query ad click graph. Technical Report YL-2009-006, Yahoo! Labs, Aug.
go back to reference Ashkan, A., Clarke, C. L. A., Agichtein, E., & Guo, Q. (2009). Estimating ad clickthrough rate through query intent analysis. In WI-IAT ’09: Proceedings of the 2009 IEEE/WIC/ACM international joint conference on web intelligence and intelligent agent technology (pp. 222–229). Washington, DC, USA: IEEE Computer Society. Ashkan, A., Clarke, C. L. A., Agichtein, E., & Guo, Q. (2009). Estimating ad clickthrough rate through query intent analysis. In WI-IAT ’09: Proceedings of the 2009 IEEE/WIC/ACM international joint conference on web intelligence and intelligent agent technology (pp. 222–229). Washington, DC, USA: IEEE Computer Society.
go back to reference Baeza-Yates, R., Hurtado, C., & Mendoza, M. (2007). Improving search engines by query clustering. Journal of the American Society for Information Science and Technology, 58(12), 1793–1804.CrossRef Baeza-Yates, R., Hurtado, C., & Mendoza, M. (2007). Improving search engines by query clustering. Journal of the American Society for Information Science and Technology, 58(12), 1793–1804.CrossRef
go back to reference Beeferman, D., & Berger, A. (2000). Agglomerative clustering of a search engine query log. In KDD. Beeferman, D., & Berger, A. (2000). Agglomerative clustering of a search engine query log. In KDD.
go back to reference Broder, A., Ciccolo, P., Gabrilovich, E., Josifovski, V., Metzler, D., Riedel, L., et al. (2009). Online expansion of rare queries for sponsored search. In WWW (pp. 511–520). Broder, A., Ciccolo, P., Gabrilovich, E., Josifovski, V., Metzler, D., Riedel, L., et al. (2009). Online expansion of rare queries for sponsored search. In WWW (pp. 511–520).
go back to reference Broder, A. Z., Ciccolo, P., Fontoura, M., Gabrilovich, E., Josifovski, V., & Riedel, L. (2008). Search advertising using web relevance feedback. In CIKM. Broder, A. Z., Ciccolo, P., Fontoura, M., Gabrilovich, E., Josifovski, V., & Riedel, L. (2008). Search advertising using web relevance feedback. In CIKM.
go back to reference Chapelle, O., & Zhang, Y. (2009). A dynamic bayesian network click model for web search ranking. WWW. Chapelle, O., & Zhang, Y. (2009). A dynamic bayesian network click model for web search ranking. WWW.
go back to reference Chen, S., & Rosenfeld, R. (1999). A gaussian prior for smoothing maximum entropy models. Technical report, Carnegie Mellon University. Chen, S., & Rosenfeld, R. (1999). A gaussian prior for smoothing maximum entropy models. Technical report, Carnegie Mellon University.
go back to reference Ciaramita, M., Murdock, V., & Plachouras, V. (2008). Online learning from click data for sponsored search. In WWW. Ciaramita, M., Murdock, V., & Plachouras, V. (2008). Online learning from click data for sponsored search. In WWW.
go back to reference Craswell, N., Zoeter, O., Taylor, M., & Ramsey, B. (2008). An experimental comparison of click position-bias models. WSDM. Craswell, N., Zoeter, O., Taylor, M., & Ramsey, B. (2008). An experimental comparison of click position-bias models. WSDM.
go back to reference Dupret, G. E., & Piwowarski, B. (2008). A user browsing model to predict search engine click data from past observations. In SIGIR. Dupret, G. E., & Piwowarski, B. (2008). A user browsing model to predict search engine click data from past observations. In SIGIR.
go back to reference Edelman, B., Ostrovsky, M., & Schwarz, M. (2007). Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords. American Economic Review, 97(1), 242–259. Edelman, B., Ostrovsky, M., & Schwarz, M. (2007). Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords. American Economic Review, 97(1), 242–259.
go back to reference Guo, F., Liu, C., Kannan, A., Minka, T., Taylor, M., Wang, Y., & Faloutsos, C. (2009). Click chain model in web search. WWW. Guo, F., Liu, C., Kannan, A., Minka, T., Taylor, M., Wang, Y., & Faloutsos, C. (2009). Click chain model in web search. WWW.
go back to reference Hillard, D., Schroedl, S., Manavoglu, E., Raghavan, H., & Leggetter, C. (2010). Improving ad relevance in sponsored search. In WSDM. Hillard, D., Schroedl, S., Manavoglu, E., Raghavan, H., & Leggetter, C. (2010). Improving ad relevance in sponsored search. In WSDM.
go back to reference Jansen, B., & Resnick, M. (2005). Examining searcher perceptions of and interactions with sponsored results. In Workshop on Sponsored Search Auctions. Jansen, B., & Resnick, M. (2005). Examining searcher perceptions of and interactions with sponsored results. In Workshop on Sponsored Search Auctions.
go back to reference Joachims, T., Granka, L., Pan, B., Hembrooke, H., & Gay, G. (2005). Accurately interpreting clickthrough data as implicit feedback. In SIGIR. Joachims, T., Granka, L., Pan, B., Hembrooke, H., & Gay, G. (2005). Accurately interpreting clickthrough data as implicit feedback. In SIGIR.
go back to reference Jones, R., Rey, B., Madani, O., & Greiner, W. (2006). Generating query substitutions. In WWW. Jones, R., Rey, B., Madani, O., & Greiner, W. (2006). Generating query substitutions. In WWW.
go back to reference Li, X., Wang, Y.-Y., & Acero, A. (2009). Extracting structured information from user queries with semi-supervised conditional random fields. In SIGIR (pp. 572–579). Li, X., Wang, Y.-Y., & Acero, A. (2009). Extracting structured information from user queries with semi-supervised conditional random fields. In SIGIR (pp. 572–579).
go back to reference Minka, T. (2003). A comparison of numerical optimizers for logistic regression. Technical report, Microsoft. Minka, T. (2003). A comparison of numerical optimizers for logistic regression. Technical report, Microsoft.
go back to reference Mordecai A. (2003) Nonlinear programming: Analysis and methods. New York: Dover Publishing Mordecai A. (2003) Nonlinear programming: Analysis and methods. New York: Dover Publishing
go back to reference Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In SIGIR. Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In SIGIR.
go back to reference Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications in speech recognition. In Proceedings of the IEEE (pp. 257–286). Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications in speech recognition. In Proceedings of the IEEE (pp. 257–286).
go back to reference Radlinski, F., Broder, A., Ciccolo, P., Gabrilovich, E., Josifovski, V., & Riedel, L. (2008). Optimizing relevance and revenue in ad search:a query substitution approach. In SIGIR. Radlinski, F., Broder, A., Ciccolo, P., Gabrilovich, E., Josifovski, V., & Riedel, L. (2008). Optimizing relevance and revenue in ad search:a query substitution approach. In SIGIR.
go back to reference Raghavan, H., & Iyer, R. (2008). Evaluating vector-space and probabilistic models for query to ad matching. In SIGIR ’08 Workshop on information retrieval in advertising (IRA). Raghavan, H., & Iyer, R. (2008). Evaluating vector-space and probabilistic models for query to ad matching. In SIGIR ’08 Workshop on information retrieval in advertising (IRA).
go back to reference Raghavan, H., & Iyer, R. (2010). Probabilistic first pass retrieval for search advertising: From theory to practice. In CIKM. Raghavan, H., & Iyer, R. (2010). Probabilistic first pass retrieval for search advertising: From theory to practice. In CIKM.
go back to reference Regelson, M., & Fain, D. C. (2007). Predicting click-through rate using keyword clusters. In In electronic commerce (EC). ACM. Regelson, M., & Fain, D. C. (2007). Predicting click-through rate using keyword clusters. In In electronic commerce (EC). ACM.
go back to reference Richardson, M., Dominowska, E., & Ragno, R. (2007). Predicting clicks: Estimating the click-through rate for new ads. In WWW. Richardson, M., Dominowska, E., & Ragno, R. (2007). Predicting clicks: Estimating the click-through rate for new ads. In WWW.
go back to reference Sculley, D., Malkin, R. G., Basu, S., & Bayardo, R. J. (2009). Predicting bounce rates in sponsored search advertisements. In KDD (pp. 1325–1334). Sculley, D., Malkin, R. G., Basu, S., & Bayardo, R. J. (2009). Predicting bounce rates in sponsored search advertisements. In KDD (pp. 1325–1334).
go back to reference Shaparenko, B., Cetin, O., & Iyer, R. (2009). Data driven text features for sponsored search click prediction. In AdKDD Workshop. Shaparenko, B., Cetin, O., & Iyer, R. (2009). Data driven text features for sponsored search click prediction. In AdKDD Workshop.
go back to reference Srikant, R., Basu, S., Wang, N., & Pregibon, D. (2010). User browsing models: Relevance versus examination. In KDD. Srikant, R., Basu, S., Wang, N., & Pregibon, D. (2010). User browsing models: Relevance versus examination. In KDD.
go back to reference Tan, B., & Peng, F. (2008). Unsupervised query segmentation using generative language models and wikipedia. In WWW (pp. 347–356). Tan, B., & Peng, F. (2008). Unsupervised query segmentation using generative language models and wikipedia. In WWW (pp. 347–356).
go back to reference Tang, D., Agarwal, A., O’Brien, D., & Meyer, M. (2010). Overlapping experiment infrastructure: More, better, faster experimentation. In KDD (pp. 17–26). New York, NY, USA: ACM. Tang, D., Agarwal, A., O’Brien, D., & Meyer, M. (2010). Overlapping experiment infrastructure: More, better, faster experimentation. In KDD (pp. 17–26). New York, NY, USA: ACM.
go back to reference Xu, W., Manavoglu, E., Cantú-Paz, E. (2010). Temporal click model for sponsored search. In SIGIR. Xu, W., Manavoglu, E., Cantú-Paz, E. (2010). Temporal click model for sponsored search. In SIGIR.
go back to reference Zhang, W. V., & Jones, R. (2007). Comparing click logs and editorial labels for training query rewriting. In Amitay, E., Murray, C. G., & Teevan, J., (Eds.), Query log analysis: Social and technological challenges. A workshop at the 16th international World Wide Web conference (WWW 2007), May. Zhang, W. V., & Jones, R. (2007). Comparing click logs and editorial labels for training query rewriting. In Amitay, E., Murray, C. G., & Teevan, J., (Eds.), Query log analysis: Social and technological challenges. A workshop at the 16th international World Wide Web conference (WWW 2007), May.
go back to reference Zhang, Z., & Nasraoui, O. (2006). Mining search engine query logs for query recommendation. In WWW. Zhang, Z., & Nasraoui, O. (2006). Mining search engine query logs for query recommendation. In WWW.
go back to reference Zheng, Z., Zha, H., Zhang, T., Chapelle, O., Chen, K., & Sun, G. (2008). A general boosting method and its application to learning ranking functions for web search. In NIPS (pp. 1697–1704). Zheng, Z., Zha, H., Zhang, T., Chapelle, O., Chen, K., & Sun, G. (2008). A general boosting method and its application to learning ranking functions for web search. In NIPS (pp. 1697–1704).
go back to reference Zhou, D., Bolelli, L., Li, J., Giles, C. L., & Zha, H. (2007). Learning user clicks in web search. In IJCAI. Zhou, D., Bolelli, L., Li, J., Giles, C. L., & Zha, H. (2007). Learning user clicks in web search. In IJCAI.
Metadata
Title
The sum of its parts: reducing sparsity in click estimation with query segments
Authors
Dustin Hillard
Eren Manavoglu
Hema Raghavan
Chris Leggetter
Erick Cantú-Paz
Rukmini Iyer
Publication date
01-06-2011
Publisher
Springer Netherlands
Published in
Discover Computing / Issue 3/2011
Print ISSN: 2948-2984
Electronic ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-010-9152-6

Other articles of this Issue 3/2011

Discover Computing 3/2011 Go to the issue

Premium Partner