skip to main content
10.1145/2488388.2488401acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Inferring the demographics of search users: social data meets search queries

Authors Info & Claims
Published:13 May 2013Publication History

ABSTRACT

Knowing users' views and demographic traits offers a great potential for personalizing web search results or related services such as query suggestion and query completion. Such signals however are often only available for a small fraction of search users, namely those who log in with their social network account and allow its use for personalization of search results. In this paper, we offer a solution to this problem by showing how user demographic traits such as age and gender, and even political and religious views can be efficiently and accurately inferred based on their search query histories. This is accomplished in two steps; we first train predictive models based on the publically available myPersonality dataset containing users' Facebook Likes and their demographic information. We then match Facebook Likes with search queries using Open Directory Project categories. Finally, we apply the model trained on Facebook Likes to large-scale query logs of a commercial search engine while explicitly taking into account the difference between the traits distribution in both datasets. We find that the accuracy of classifying age and gender, expressed by the area under the ROC curve (AUC), are 77% and 84% respectively for predictions based on Facebook Likes, and only degrade to 74% and 80% when based on search queries. On a US state-by-state basis we find a Pearson correlation of 0.72 for political views between the predicted scores and Gallup data, and 0.54 for affiliation with Judaism between predicted scores and data from the US Religious Landscape Survey. We conclude that it is indeed feasible to infer important demographic data of users from their query history based on labelled Likes data and believe that this approach could provide valuable information for personalization and monetization even in the absence of demographic data.

References

  1. A. Arnold, R. Nallapati, and W. W. Cohen. A comparative study of methods for transductive transfer learning. In Proceedings of the Seventh IEEE International Conference on Data Mining Workshops, ICDMW '07, pages 77--82, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Y. Bachrach, M. Kosinski, T. Graepel, P. Kohli, and D. Stillwell. Personality and patterns of Facebook usage. In Proceedings of the 3rd Annual ACM Web Science Conference, WebSci '12, pages 24--32, Evanston, IL, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. N. Bennett, F. Radlinski, R. W. White, and E. Yilmaz. Inferring and using location metadata to personalize web search. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, SIGIR '11, pages 135--144, Beijing, China, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. N. Bennett, K. Svore, and S. T. Dumais. Classification-enhanced ranking. In Proceedings of the 19th international conference on World wide web, WWW '10, pages 111--120, Raleigh, NC, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Carmel, N. Zwerdling, I. Guy, S. Ofek-Koifman, N. Har'el, I. Ronen, E. Uziel, S. Yogev, and S. Chernov. Personalized social search based on the user's social network. In Proceedings of the 18th ACM conference on Information and knowledge management, CIKM '09, pages 1227--1236, Hong Kong, China, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Culotta. Towards detecting influenza epidemics by analyzing Twitter messages. In Proceedings of the First Workshop on Social Media Analytics, SOMA '10, pages 115--122, Washington, DC, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. W. Dai, G.-R. Xue, Q. Yang, and Y. Yu. Transferring naive Bayes classifiers for text classification. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 1, AAAI'07, pages 540--545, Vancouver, BC, 2007. AAAI Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H. Daume, III and D. Marcu. Domain adaptation for statistical classifiers. J. Artif. Int. Res., 26(1):101--126, May 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Ettredge, J. Gerdes, and G. Karuga. Using web-based search data to predict macroeconomic statistics. Commun. ACM, 48(11):87--92, Nov. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. W. Fan, I. Davidson, B. Zadrozny, and P. S. Yu. An improved categorization of classifier's sensitivity on sample selection bias. In Proceedings of the Fifth IEEE International Conference on Data Mining, ICDM '05, pages 605--608, Washington, DC, USA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Ginsberg, M. H. Mohebbi, R. S. Patel, L. Brammer, M. S. Smolinski, and L. Brilliant. Detecting influenza epidemics using search engine query data. Nature, 457(7232):1012--1014, Feb. 2009.Google ScholarGoogle ScholarCross RefCross Ref
  12. S. Goel, J. M. Hofman, S. Lahaie, D. M. Pennock, and D. J. Watts. Predicting consumer behavior with Web search. Proceedings of the National Academy of Sciences, 107(41):17486--17490, Oct. 2010.Google ScholarGoogle ScholarCross RefCross Ref
  13. J. Hu, H.-J. Zeng, H. Li, C. Niu, and Z. Chen. Demographic prediction based on user's browsing behavior. In Proceedings of the 16th international conference on World Wide Web, WWW '07, pages 151--160, Banff, AB, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. J. Jansen and L. Solomon. Gender demographic targeting in sponsored search. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '10, pages 831--840, Atlanta, GA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Jones, R. Kumar, B. Pang, and A. Tomkins. "I know what you did last summer": query logs and user privacy. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, CIKM '07, pages 909--914, Lisbon, Portugal, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. E. Kharitonov and P. Serdyukov. Gender-aware re-ranking. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, SIGIR '12, pages 1081--1082, Portland, OR, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. W. Kong, Y. Liu, S. Ma, and L. Ru. Detecting epidemic tendency by mining search logs. In Proceedings of the 19th international conference on World wide web, WWW '10, pages 1133--1134, Raleigh, NC, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Kosinski, P. Kohli, D. Stillwell, Y. Bachrach, and T. Graepel. Personality and website choice. In Proceedings of the 3rd Annual ACM Web Science Conference, WebSci '12, Evanston, IL, 2012.Google ScholarGoogle Scholar
  19. L. Lorigo, B. Pan, H. Hembrooke, T. Joachims, L. Granka, and G. Gay. The influence of task and gender on search and evaluation behavior using google. Inf. Process. Manage., 42(4):1123--1131, July 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Otterbacher. Inferring gender of movie reviewers: exploiting writing style, content and metadata. In Proceedings of the 19th ACM international conference on Information and knowledge management, CIKM '10, pages 369--378, Toronto, ON, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Pennacchiotti and A.-M. Popescu. Democrats, Republicans and Starbucks afficionados: user classification in Twitter. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '11, pages 430--438, San Diego, CA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Quercia, M. Kosinski, D. Stillwell, and J. Crowcroft. Our Twitter profiles, our selves: Predicting personality with Twitter. In PASSAT/SocialCom 2011, pages 180--185, Boston, MA, 2011. IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  23. S. Torres and I. Weber. What and how children search on the web. In Proceedings of the 20th ACM international conference on Information and knowledge management, CIKM '11, pages 393--402, Glasgow, UK, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. I. Weber and C. Castillo. The demographics of web search. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, SIGIR '10, pages 523--530, Geneva, Switzerland, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. I. Weber, V. R. K. Garimella, and E. Borra. Mining web query logs to analyze political issues. In Proceedings of the 3rd Annual ACM Web Science Conference, WebSci '12, pages 330--334, Evanston, IL, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. I. Weber, V. R. K. Garimella, and E. Borra. Political search trends. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, SIGIR '12, pages 1012--1012, Portland, OR, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. I. Weber and A. Jaimes. Demographic information flows. In Proceedings of the 19th ACM international conference on Information and knowledge management, CIKM '10, pages 1521--1524, Toronto, ON, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. I. Weber and A. Jaimes. Who uses web search for what: and how. In Proceedings of the fourth ACM international conference on Web search and data mining, WSDM '11, pages 15--24, Hong Kong, China, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. em Proceedings of the 22nd national conference on Artificial J. J.-C. Ying, Y.-J. Chang, C.-M. Huang, and V. S. Tseng. Demographic prediction based on users mobile behaviors. In Mobile Data Challenge 2012 (by Nokia) Workshop, Newcastle, UK., 2012.Google ScholarGoogle Scholar
  30. B. Zadrozny. Learning and evaluating classifiers under sample selection bias. In Proceedings of the twenty-first international conference on Machine learning, ICML '04, pages 114--, Banff, AB, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Inferring the demographics of search users: social data meets search queries

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        WWW '13: Proceedings of the 22nd international conference on World Wide Web
        May 2013
        1628 pages
        ISBN:9781450320351
        DOI:10.1145/2488388

        Copyright © 2013 Copyright is held by the International World Wide Web Conference Committee (IW3C2).

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 May 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        WWW '13 Paper Acceptance Rate125of831submissions,15%Overall Acceptance Rate1,899of8,196submissions,23%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader