skip to main content
10.1145/1557019.1557048acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Large-scale behavioral targeting

Published:28 June 2009Publication History

ABSTRACT

Behavioral targeting (BT) leverages historical user behavior to select the ads most relevant to users to display. The state-of-the-art of BT derives a linear Poisson regression model from fine-grained user behavioral data and predicts click-through rate (CTR) from user history. We designed and implemented a highly scalable and efficient solution to BT using Hadoop MapReduce framework. With our parallel algorithm and the resulting system, we can build above 450 BT-category models from the entire Yahoo's user base within one day, the scale that one can not even imagine with prior systems. Moreover, our approach has yielded 20% CTR lift over the existing production system by leveraging the well-grounded probabilistic model fitted from a much larger training dataset.

Specifically, our major contributions include: (1) A MapReduce statistical learning algorithm and implementation that achieve optimal data parallelism, task parallelism, and load balance in spite of the typically skewed distribution of domain data. (2) An in-place feature vector generation algorithm with linear time complexity O(n) regardless of the granularity of sliding target window. (3) An in-memory caching scheme that significantly reduces the number of disk IOs to make large-scale learning practical. (4) Highly efficient data structures and sparse representations of models and data to enable fast model updates. We believe that our work makes significant contributions to solving large-scale machine learning problems of industrial relevance in general. Finally, we report comprehensive experimental results, using industrial proprietary codebase and datasets.

Skip Supplemental Material Section

Supplemental Material

p209-chen.mp4

mp4

144.2 MB

References

  1. http://hadoop.apache.org/.Google ScholarGoogle Scholar
  2. S. Agarwal, P. Renaker, and A. Smith. Determining ad targeting information and/or ad creative information using past search queries. U.S. Patent 10/813,925, filed: Mar 31, 2004.Google ScholarGoogle Scholar
  3. A. C. Cameron and P. K. Trivedi. Regression Analysis of Count Data. Cambridge University Press, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  4. J. Canny. GaP: a factor model for discrete data. ACM Conference on Information Retrieval (SIGIR 2004), pages 122--129, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Canny, S. Zhong, S. Gaffney, C. Brower, P. Berkhin, and G. H. John. Granular data for behavioral targeting. U.S. Patent Application 20090006363.Google ScholarGoogle Scholar
  6. E. Chang. Scalable collaborative filtering algorithms for mining social networks. In The NIPS 2008 Workshop on "Beyond Search: Computational Intelligence for the Web, 2008.Google ScholarGoogle Scholar
  7. Y. Chen, D. Pavlov, P. Berkhin, and J. Canny. Large-scale behavioral targeting for advertising over a network. U.S. Patent Application 12/351,749, filed: Jan 09, 2009.Google ScholarGoogle Scholar
  8. C. Y. Chung, J. M. Koran, L.-J. Lin, and H. Yin. Model for generating user profiles in a behavioral targeting system. U.S. Patent 11/394,374, filed: Mar 29, 2006.Google ScholarGoogle Scholar
  9. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. E. Gibbs, W. G. Poole, Jr., and P. K. Stockmeyer. A comparison of several bandwidth and profile reduction algorithms. ACM Transactions on Mathematical Software (TOMS), 2(3):322--330, 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems (NIPS), 13:556--562, 2000.Google ScholarGoogle Scholar
  12. D. A. Spielman and S.-H. Teng. Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. Journal of the ACM, 51(3), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Large-scale behavioral targeting

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
      June 2009
      1426 pages
      ISBN:9781605584959
      DOI:10.1145/1557019

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 June 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader