skip to main content
10.1145/1242572.1242594acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
Article

Demographic prediction based on user's browsing behavior

Published:08 May 2007Publication History

ABSTRACT

Demographic information plays an important role in personalized web applications. However, it is usually not easy to obtain this kind of personal data such as age and gender. In this paper, we made a first approach to predict users' gender and age from their Web browsing behaviors, in which the Webpage view information is treated as a hidden variable to propagate demographic information between different users. There are three main steps in our approach: First, learning from the Webpage click-though data, Webpages are associated with users' (known) age and gender tendency through a discriminative model; Second, users' (unknown) age and gender are predicted from the demographic information of the associated Webpages through a Bayesian framework; Third, based on the fact that Webpages visited by similar users may be associated with similar demographic tendency, and users with similar demographic information would visit similar Webpages, a smoothing component is employed to overcome the data sparseness of web click-though log. Experiments are conducted on a real web click-through log to demonstrate the effectiveness of the proposed approach. The experimental results show that the proposed algorithm can achieve up to 30.4% improvements on gender prediction and 50.3% on age prediction in terms of macro F1, compared to baseline algorithms.

References

  1. Berryman-Fink, C. L., J. R. Wilcox (1983). A multivariate investigation of perceptual attributions concerning gender appropriateness in language, Sex Roles 9, 1983.Google ScholarGoogle ScholarCross RefCross Ref
  2. Biber, D., S. Conrad, R. Reppen (1998). Corpus Linguistics Investigating Language Structure and Use, Cambridge University Press, Cambridge, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  3. Computerworld Report: Men Want Facts, Women Seek Personal Connections on Web, http://www.computerworld.com/developmenttopics/websitemgmt/story/0,10801,107391p2,00.html.Google ScholarGoogle Scholar
  4. Eckert, P. (1997). Gender and sociolinguistic variation, in J. Coates ed., Readings in Language and Gender, Blackwell, Oxford 1997, pp. 64--75.Google ScholarGoogle Scholar
  5. Herring, S. (1996). Two variants of an electronic message schema, in S. Herring ed., Computer-Mediated Communication: Linguistic, Social and Cross-Cultural Perspectives (John Benjamins, Amsterdam, 1996), pp. 81--106.Google ScholarGoogle Scholar
  6. Holmes, J. (1993). Women's talk: The question of sociolinguistic universals, Australian Journal of Communications 20, 3, 1993.Google ScholarGoogle Scholar
  7. Google Personal. http://labs.google.com/personalized.Google ScholarGoogle Scholar
  8. J. S. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the Fourteenth Annual Conference on Uncertainty in Artificial Intelligence, pages 43--52. Morgan Kaufman, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Lakoff, R. T. (1975). Language and Women's Place, Harper Colophon Books, New York, 1975.Google ScholarGoogle Scholar
  10. Lewis, D., R. Schapire, J. Callan, R. Papka (1996). Training algorithms for text classifiers, in Proc. 19th ACM/SIGIR Conf. on R&D in IR, 1996, pp 306--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Koppel, J. Schler, S. Argamon, and J.W. Pennebaker. Effects of age. and gender on blogging. In AAAI 2006 Spring Symposium on Computational Approaches to Analysing Weblogs, 2006.Google ScholarGoogle Scholar
  12. M. Koppel, S. Argamon and A. R. Shimoni (2003). Automatically Categorizing Written Texts by Author Gender. In Literary and Linguistic Computing, 2003. Mulac, A., L. B. Studley, S. Blau (1990). The gender-linked language effect in primary and secondary students' impromptu essays, Sex Roles 23, 9/10, 1990.Google ScholarGoogle Scholar
  13. Mulac, A., L. B. Studley, S. Blau (1990). The gender-linked language effect in primary and secondary students' impromptu essays, Sex Roles 23, 9/10, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  14. Mulac, A., T. L. Lundell (1994). Effects of gender-linked language differences in adults' written discourse: Multivariate tests of language effects, Language & Communication 14, 3, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  15. Palander-Collin, M. (1999). Male and female styles in 17th century correspondence, Language Variation and Change 11, pp. 123--141.Google ScholarGoogle ScholarCross RefCross Ref
  16. Manber U., Patel A., and Robison J. Experience with Personalization on Yahoo! Communication of the ACM, 43(8): 35--39, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Simkins-Bullock, J. A., B. G. Wildman (1991). An investigation into the relationship between gender and language, Sex Roles 24, 1991.Google ScholarGoogle ScholarCross RefCross Ref
  18. Search Engine Watch Journal, Behavioral Targeting and Contextual Advertising, http://www.searchenginejournal.com/?p=836.Google ScholarGoogle Scholar
  19. Yang, Y., Pedersen J.P. A Comparative Study on Feature Selection in Text Categorization Proceedings of the Fourteenth International Conference on Machine Learning (ICML'97), 1997, pp412--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Joachims, T. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In Proceedings of the 10th European Conference on Machine Learning (ECML), Chemnitz, Germany, 137--142, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Vapnik, V.N. The Nature of Statistical Learning Theory. Springer-Verlag, New York, NY, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. iMedia Connection: Behavioral Targeting Online Ad Spend, http://www.imediaconnection.com/content/9236.aspGoogle ScholarGoogle Scholar
  23. G. Golub and C. V. Loan. Matrix Computations, 2nd edition. The Johns Hopkins University Press,Baltimore, Maryland, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Application of dimensionality reduction in recommender systems-a case study, 2000.Google ScholarGoogle Scholar
  25. M. H. Pryor. The effects of singular value decomposition on collaborative filtering. Technical Report PCS-TR98-338, Dartmouth College, Computer Science, Hanover, NH, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J.H. Lee, "Combining Multiple Evidence from Different Properties of Weighting Schemes," Proceedings of the 18th Annual ACM-SIGIR, pp. 180--188, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Pazzani M., Muramatsu J., and Billsus D. Syskill & Webert: Identifying Interesting Web Sites. In Proc. of the 13th National Conference on Artificial Intelligence, pages: 54--61, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391--407, 1990.Google ScholarGoogle Scholar
  29. Amanda Lenhart, Susannah Fox. Bloggers: A portrait of the internet's new storytellers. http://www.pewinternet.org/pdfs/ PIP%20Bloggers%20Report%20July%2019%202006.pdfGoogle ScholarGoogle Scholar
  30. Finn V. Jensen. Bayesian Networks and Decision Graphs. Springer, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Berry, T. Do, and S. Varadhan. Svdpackc (version 1.0) user's guide. Technical Report CS-93-194, University of Tennessee, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Demographic prediction based on user's browsing behavior

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        WWW '07: Proceedings of the 16th international conference on World Wide Web
        May 2007
        1382 pages
        ISBN:9781595936547
        DOI:10.1145/1242572

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 8 May 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate1,899of8,196submissions,23%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader