ABSTRACT
Contextual advertising supports much of the Web's ecosystem today. User experience and revenue (shared by the site publisher and the ad network) depend on the relevance of the displayed ads to the page content. As with other document retrieval systems, relevance is provided by scoring the match between individual ads (documents) and the content of the page where the ads are shown (query). In this paper we show how this match can be improved significantly by augmenting the ad-page scoring function with extra parameters from a logistic regression model on the words in the pages and ads. A key property of the proposed model is that it can be mapped to standard cosine similarity matching and is suitable for efficient and scalable implementation over inverted indexes. The model parameter values are learnt from logs containing ad impressions and clicks, with shrinkage estimators being used to combat sparsity. To scale our computations to train on an extremely large training corpus consisting of several gigabytes of data, we parallelize our fitting algorithm in a Hadoop framework [10]. Experimental evaluation is provided showing improved click prediction over a holdout set of impression and click events from a large scale real-world ad placement engine. Our best model achieves a 25% lift in precision relative to a traditional information retrieval model which is based on cosine similarity, for recalling 10% of the clicks in our test data.
- R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. ACM, 1999. Google ScholarDigital Library
- A. Z. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Zien. Efficient query evaluation using a two-level retrieval process. In CIKM '03: Proc. of the twelfth intl. conf. on Information and knowledge management, pages 426--434, New York, NY, 2003. ACM. Google ScholarDigital Library
- A. Z. Broder, M. Fontoura, V. Josifovski, and L. Riedel. A semantic approach to contextual advertising. In SIGIR, pages 559--566, 2007. Google ScholarDigital Library
- P. Chatterjee, D. L. Hoffman, and T. P. Novak. Modeling the clickstream: Implications for web-based advertising efforts. Marketing Science, 22(4):520--541, 2003. Google ScholarDigital Library
- C. Lin, R. C. Weng, and S. S. Keerthi. Trust region newton methods for large-scale logistic regression. In International Conference on machine learning, 2007. Google ScholarDigital Library
- C. R. Rao. Linear Statistical Inference and its Applications. Wiley-Interscience, 2002.Google Scholar
- D. C. Liu and J. Nocedal. On the limited memory bfgs method for large scale optimization. Mathmematical Programming, 45:503--528, 1989. Google ScholarDigital Library
- S. Derksen and H. J. Keselman. Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables. British Journal of Mathematical and Statistical Psychology, 45:265--282, 1992.Google ScholarCross Ref
- Online ad spending to total $19.5 billion in 2007. eMarketer, February 2007. Available from http://www.emarketer.com/Article.aspx?id=1004635.Google Scholar
- A. Foundation. Apache hadoop project. In lucene.apache.org/hadoop.Google Scholar
- G. King and L. Zeng. Logistic regression in rare events data. Political Analysis, 9:137--162, 2001.Google ScholarCross Ref
- J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. In Sixth Symposium on Operating System Design and Implementation, pages 137--150, 2004. Google ScholarDigital Library
- A. Lacerda, M. Cristo, M. A. G., W. Fan, N. Ziviani, and B. Ribeiro-Neto. Learning to advertise. In SIGIR '06: Proc. of the 29th annual intl. ACM SIGIR conf., pages 549--556, New York, NY, 2006. ACM. Google ScholarDigital Library
- P. McCullagh and J. A. Nelder. Generalized Linear Models. Chapman and Hall, 1989.Google ScholarCross Ref
- M. J. Silvapulle. On the existence of maximum likelihood estimates for the binomial response models. Journal of the Royal Statistical Society, Series B, 43:310--313, 1981.Google Scholar
- P. Komarek and A. W. Moore. Making logistic regression a core data mining tool with tr-irls. In International Conference on Data Mining, pages 685--688, 2005. Google ScholarDigital Library
- M. Regelson and D. Fain. Predicting click-through rate using keyword clusters. In In Proc. of the Second Workshop on Sponsored Search Auctions, 2006.Google Scholar
- B. Ribeiro-Neto, M. Cristo, P. B. Golgher, and E. S. de Moura. Impedance coupling in content-targeted advertising. In SIGIR '05: Proc. of the 28th annual intl. ACM SIGIR conf., pages 496--503, New York, NY, 2005. ACM. Google ScholarDigital Library
- M. Richardson, E. Dominowska, and R. Ragno. Predicting clicks: estimating the click-through rate for new ads. In WWW, pages 521--530, 2007. Google ScholarDigital Library
- S. D. Pietra, V. D. Pietra, and J. Lafferty. Inducing features of random fields. IEEE PAMI, 19:380--393, 1997. Google ScholarDigital Library
- C. Wang, P. Zhang, R. Choi, and M. D. Eredita. Understanding consumers attitude toward advertising. In Eighth Americas conf. on Information System, pages 1143--1148, 2002.Google Scholar
- W. Yih, J. Goodman, and V. R. Carvalho. Finding advertising keywords on web pages. In WWW '06: Proc. of the 15th intl. conf. on World Wide Web, pages 213--222, New York, NY, 2006. ACM. Google ScholarDigital Library
Index Terms
- Contextual advertising by combining relevance with click feedback
Recommendations
Is Combining Contextual and Behavioral Targeting Strategies Effective in Online Advertising?
Online targeting has been increasingly used to deliver ads to consumers. But discovering how to target the most valuable web visitors and generate a high response rate is still a challenge for advertising intermediaries and advertisers. The purpose of ...
Cost-per-Click Pricing for Display Advertising
Display advertising is a $25 billion business with a promising upward revenue trend. In this paper, we consider an online display advertising setting in which a web publisher posts display ads on its website and charges based on the cost-per-click ...
Scientific challenges in contextual advertising
RSKT'10: Proceedings of the 5th international conference on Rough set and knowledge technologyOnline advertising has been fueling the rapid growth of the Web that offers a plethora of free web services, ranging from search, email, news, sports, finance, and video, to various social network services. Such free services have accelerated the shift ...
Comments