Skip to main content

2017 | OriginalPaper | Buchkapitel

Probabilistic Inference of Twitter Users’ Age Based on What They Follow

verfasst von : Benjamin Paul Chamberlain, Clive Humby, Marc Peter Deisenroth

Erschienen in: Machine Learning and Knowledge Discovery in Databases

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Twitter provides an open and rich source of data for studying human behaviour at scale and is widely used in social and network sciences. However, a major criticism of Twitter data is that demographic information is largely absent. Enhancing Twitter data with user ages would advance our ability to study social network structures, information flows and the spread of contagions. Approaches toward age detection of Twitter users typically focus on specific properties of tweets, e.g., linguistic features, which are language dependent. In this paper, we devise a language-independent methodology for determining the age of Twitter users from data that is native to the Twitter ecosystem. The key idea is to use a Bayesian framework to generalise ground-truth age information from a few Twitter users to the entire network based on what/whom they follow. Our approach scales to inferring the age of 700 million Twitter accounts with high accuracy.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
we use capitalisation to indicate the Twitter specific usage of this word.
 
3
This value was used as the US is the largest Twitter country.
 
4
We only consider cases where \(X_i = 1\) since the Twitter graph is sparse: In the full Twitter graph there are \(7\times 10^8\) nodes with \(5\times 10^{10}\) edges, which implies a density of \(1.6\times 10^{-7}\), i.e., the default is to follow nobody. Hence, not following an account does not contain enough information to justify the additional computational cost.
 
5
Nguyen (2013) used additional LinkedIn data for labelling.
 
6
Without the inclusion of grandparents and retirees in the training set, the predictive performance would rapidly drop off for ages greater than 35.
 
Literatur
Zurück zum Zitat Al Zamal, F., Liu, W., Ruths, D.: Homophily and latent attribute inference: inferring latent attributes of twitter users from neighbors. In: ICWSM (2012) Al Zamal, F., Liu, W., Ruths, D.: Homophily and latent attribute inference: inferring latent attributes of twitter users from neighbors. In: ICWSM (2012)
Zurück zum Zitat Burger, J.D., Henderson, J., Kim, G., Zarrella, G.: Discriminating gender on Twitter. In: EMNLP (2011) Burger, J.D., Henderson, J., Kim, G., Zarrella, G.: Discriminating gender on Twitter. In: EMNLP (2011)
Zurück zum Zitat Cheng, Z., Caverlee, J., Lee, K.: You are where you tweet: a content-based approach to geo-locating Twitter users. In: CIKM (2010) Cheng, Z., Caverlee, J., Lee, K.: You are where you tweet: a content-based approach to geo-locating Twitter users. In: CIKM (2010)
Zurück zum Zitat Culotta, A., Nirmal, R.K., Cutler, J.: Predicting the demographics of Twitter users from website traffic data. In: AAAI (2015) Culotta, A., Nirmal, R.K., Cutler, J.: Predicting the demographics of Twitter users from website traffic data. In: AAAI (2015)
Zurück zum Zitat Fang, Q., Sang, J., Xu, C., Hossain, M.S.: Relational user attribute inference in social media. IEEE Trans. Multimedia 17(7), 1031–1044 (2015)CrossRef Fang, Q., Sang, J., Xu, C., Hossain, M.S.: Relational user attribute inference in social media. IEEE Trans. Multimedia 17(7), 1031–1044 (2015)CrossRef
Zurück zum Zitat Grainger, T., Potter, T.: Solr in Action. Manning Publications Co., Cherry Hill (2014) Grainger, T., Potter, T.: Solr in Action. Manning Publications Co., Cherry Hill (2014)
Zurück zum Zitat Gupta, P., Goel, A., Lin, J., Sharma, A., Wang, D., Zadeh, R.: WTF: the who to follow service at Twitter. In: WWW (2013) Gupta, P., Goel, A., Lin, J., Sharma, A., Wang, D., Zadeh, R.: WTF: the who to follow service at Twitter. In: WWW (2013)
Zurück zum Zitat Kosinski, M., Stillwell, D., Graepel, T.: Private traits and attributes are predictable from digital records of human behavior. PNAS 110(15), 5802–5805 (2013)CrossRef Kosinski, M., Stillwell, D., Graepel, T.: Private traits and attributes are predictable from digital records of human behavior. PNAS 110(15), 5802–5805 (2013)CrossRef
Zurück zum Zitat Liu, W., Ruths, D.: Whats in a name? using first names as features for gender inference in Twitter. In: AAAI Spring Symposium on Analyzing Microtext (2013) Liu, W., Ruths, D.: Whats in a name? using first names as features for gender inference in Twitter. In: AAAI Spring Symposium on Analyzing Microtext (2013)
Zurück zum Zitat McPherson, M., Smith-Lovin, L., Cook, J.M.: Birds of a feather: homophily in social networks. Ann. Rev. Sociol. 27(1), 415–444 (2001)CrossRef McPherson, M., Smith-Lovin, L., Cook, J.M.: Birds of a feather: homophily in social networks. Ann. Rev. Sociol. 27(1), 415–444 (2001)CrossRef
Zurück zum Zitat Mislove, A., Lehmann, S., Ahn, Y.Y.: Understanding the demographics of Twitter users. In: ICWSM (2011) Mislove, A., Lehmann, S., Ahn, Y.Y.: Understanding the demographics of Twitter users. In: ICWSM (2011)
Zurück zum Zitat Nguyen, D., Gravel, R., Trieschnigg, D., Meder, T.: How old do you think i am? a study of language and age in Twitter. In: ICWSM (2013) Nguyen, D., Gravel, R., Trieschnigg, D., Meder, T.: How old do you think i am? a study of language and age in Twitter. In: ICWSM (2013)
Zurück zum Zitat Nguyen, D., Noah, A., Smith, A., Rose, C.P.: Author age prediction from text using linear regression. In: LaTeCH (2011) Nguyen, D., Noah, A., Smith, A., Rose, C.P.: Author age prediction from text using linear regression. In: LaTeCH (2011)
Zurück zum Zitat Oktay, H., Firat, A., Ertem, Z.: Demographic breakdown of Twitter users: an analysis based on names. In: BIGDATA (2014) Oktay, H., Firat, A., Ertem, Z.: Demographic breakdown of Twitter users: an analysis based on names. In: BIGDATA (2014)
Zurück zum Zitat Pennacchiotti, M., Popescu, A.M.: A machine learning approach to Twitter user classification. In: ICWSM (2011) Pennacchiotti, M., Popescu, A.M.: A machine learning approach to Twitter user classification. In: ICWSM (2011)
Zurück zum Zitat Rao, D., Yarowsky, D., Shreevats, A., Gupta, M.: Classifying latent user attributes in Twitter. In: SMUC (2010) Rao, D., Yarowsky, D., Shreevats, A., Gupta, M.: Classifying latent user attributes in Twitter. In: SMUC (2010)
Zurück zum Zitat Schler, J., Koppel, M., Argamon, S., Pennebaker, J.W.: Effects of age and gender on blogging. In: AAAI-CAAW (2006) Schler, J., Koppel, M., Argamon, S., Pennebaker, J.W.: Effects of age and gender on blogging. In: AAAI-CAAW (2006)
Zurück zum Zitat Wysocki, R., Zabierowski, W.: Twisted framework on game server example. In: CADSM (2011) Wysocki, R., Zabierowski, W.: Twisted framework on game server example. In: CADSM (2011)
Metadaten
Titel
Probabilistic Inference of Twitter Users’ Age Based on What They Follow
verfasst von
Benjamin Paul Chamberlain
Clive Humby
Marc Peter Deisenroth
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-71273-4_16