Skip to main content

2015 | OriginalPaper | Buchkapitel

Automatic Estimation of Web Bloggers’ Age Using Regression Models

verfasst von : Vasiliki Simaki, Christina Aravantinou, Iosif Mporas, Vasileios Megalooikonomou

Erschienen in: Speech and Computer

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this article, we address the problem of automatic age estimation of web users based on their posts. Most studies on age identification treat the issue as a classification problem. Instead of following an age category classification approach, we investigate the appropriateness of several regression algorithms on the task of age estimation. We evaluate a number of well-known and widely used machine learning algorithms for numerical estimation, in order to examine their appropriateness on this task. We used a set of 42 text features. The experimental results showed that the Bagging algorithm with RepTree base learner offered the best performance, achieving estimation of web users’ age with mean absolute error equal to 5.44, while the root mean squared error is approximately 7.14.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Labov, W.: Sociolinguistic Patterns (No. 4). University of Pennsylvania Press, Philadelphia (1972) Labov, W.: Sociolinguistic Patterns (No. 4). University of Pennsylvania Press, Philadelphia (1972)
2.
Zurück zum Zitat Trudgill, P.: The social differentiation of English in Norwich, vol. 13. CUP Archive, Cambridge (1974) Trudgill, P.: The social differentiation of English in Norwich, vol. 13. CUP Archive, Cambridge (1974)
3.
Zurück zum Zitat Eckert, P.: Age as a sociolinguistic variable. In: Coulmas, F. (ed.) The Handbook of Sociolinguistics. Blackwell, Oxford (1997) Eckert, P.: Age as a sociolinguistic variable. In: Coulmas, F. (ed.) The Handbook of Sociolinguistics. Blackwell, Oxford (1997)
4.
Zurück zum Zitat Labov, W.: Principles of linguistic change, cognitive and cultural factors, vol. 3. John Wiley & Sons, New York (2011) Labov, W.: Principles of linguistic change, cognitive and cultural factors, vol. 3. John Wiley & Sons, New York (2011)
5.
Zurück zum Zitat Schler, J., Koppel, M., Argamon, S., Pennebaker, J.W.: Effects of age and gender on blogging. In: AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, vol. 6, pp. 199–205 (2006) Schler, J., Koppel, M., Argamon, S., Pennebaker, J.W.: Effects of age and gender on blogging. In: AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, vol. 6, pp. 199–205 (2006)
6.
Zurück zum Zitat Argamon, S., Koppel, M., Pennebaker, J.W., Schler, J.: Mining the blogosphere: age, gender and the varieties of self-expression. First Monday, 12(9) (2007) Argamon, S., Koppel, M., Pennebaker, J.W., Schler, J.: Mining the blogosphere: age, gender and the varieties of self-expression. First Monday, 12(9) (2007)
7.
Zurück zum Zitat Goswami, S., Sarkar, S., Rustagi, M.: Stylometric analysis of bloggers’ age and gender. In: Third International AAAI Conference on Weblogs and Social Media (2009) Goswami, S., Sarkar, S., Rustagi, M.: Stylometric analysis of bloggers’ age and gender. In: Third International AAAI Conference on Weblogs and Social Media (2009)
8.
Zurück zum Zitat Tam, J., Martell, C.H.: Age detection in chat. In: IEEE International Conference on Semantic Computing, ICSC 2009, pp. 33–39. IEEE (2009) Tam, J., Martell, C.H.: Age detection in chat. In: IEEE International Conference on Semantic Computing, ICSC 2009, pp. 33–39. IEEE (2009)
9.
Zurück zum Zitat Peersman, C., Daelemans, W., Van Vaerenbergh, L.: Predicting age and gender in online social networks. In: Proceedings of the 3rd international workshop on Search and Mining User-Generated Contents, pp. 37–44. ACM (2011) Peersman, C., Daelemans, W., Van Vaerenbergh, L.: Predicting age and gender in online social networks. In: Proceedings of the 3rd international workshop on Search and Mining User-Generated Contents, pp. 37–44. ACM (2011)
10.
Zurück zum Zitat Rosenthal, S., McKeown, K.: Age prediction in blogs: a study of style, content, and online behavior in pre-and post-social media generations. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 763–772. ACL (2011) Rosenthal, S., McKeown, K.: Age prediction in blogs: a study of style, content, and online behavior in pre-and post-social media generations. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 763–772. ACL (2011)
11.
Zurück zum Zitat Nguyen, D., Smith, N.A., Ros, C.P.: Author age prediction from text using linear regression. In: Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 115–123. ACL (2011) Nguyen, D., Smith, N.A., Ros, C.P.: Author age prediction from text using linear regression. In: Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 115–123. ACL (2011)
12.
Zurück zum Zitat Rangel, F., Rosso, P., Koppel, M., Stamatatos, E., Inches, G.: Overview of the author profiling task at PAN 2013. Notebook Papers of CLEF (2013) Rangel, F., Rosso, P., Koppel, M., Stamatatos, E., Inches, G.: Overview of the author profiling task at PAN 2013. Notebook Papers of CLEF (2013)
13.
Zurück zum Zitat Flekova, L., Gurevych, I.: Can we hide in the web? Large scale simultaneous age and gender author profiling in social media. In: CLEF 2012 Labs and Work-shop. Notebook Papers (2013) Flekova, L., Gurevych, I.: Can we hide in the web? Large scale simultaneous age and gender author profiling in social media. In: CLEF 2012 Labs and Work-shop. Notebook Papers (2013)
14.
Zurück zum Zitat Rangel, F., Rosso, P.: Use of language and author profiling: identification of gender and age. Natural Language Processing and Cognitive Science, 177 (2013) Rangel, F., Rosso, P.: Use of language and author profiling: identification of gender and age. Natural Language Processing and Cognitive Science, 177 (2013)
15.
Zurück zum Zitat Schwartz, H.A., Eichstaedt, J.C., Kern, M.L., Dziurzynski, L., Ramones, S.M., Agrawal, M., Ungar, L.H.: Personality, gender, and age in the language of social media: the open-vocabulary approach. PloS one 8(9), e73791 (2013)CrossRef Schwartz, H.A., Eichstaedt, J.C., Kern, M.L., Dziurzynski, L., Ramones, S.M., Agrawal, M., Ungar, L.H.: Personality, gender, and age in the language of social media: the open-vocabulary approach. PloS one 8(9), e73791 (2013)CrossRef
16.
Zurück zum Zitat Nguyen, D., Gravel, R., Trieschnigg, D., Meder, T.: “How old do you think i am?”; A study of language and age in twitter. In: Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media. AAAI Press (2013) Nguyen, D., Gravel, R., Trieschnigg, D., Meder, T.: “How old do you think i am?”; A study of language and age in twitter. In: Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media. AAAI Press (2013)
17.
Zurück zum Zitat Verhoeven, B., Daelemans, W.: CLiPSStylometry Investigation (CSI) corpus: a Dutch corpus for the detection of age, gender, personality, sentiment and deception in text. In: Proceedings of the 9th International Conference on Language Resources and Evaluation (2014) Verhoeven, B., Daelemans, W.: CLiPSStylometry Investigation (CSI) corpus: a Dutch corpus for the detection of age, gender, personality, sentiment and deception in text. In: Proceedings of the 9th International Conference on Language Resources and Evaluation (2014)
18.
Zurück zum Zitat Chester, D.L.: Why two hidden layers are better than one. In: Proceedings of the International Joint Conference on Neural Networks, vol. 1, pp. 265–268 (1990) Chester, D.L.: Why two hidden layers are better than one. In: Proceedings of the International Joint Conference on Neural Networks, vol. 1, pp. 265–268 (1990)
19.
Zurück zum Zitat Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Elsevier, Morgan-Kaufman Series of Data Management Systems, San Francisco (2005) Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Elsevier, Morgan-Kaufman Series of Data Management Systems, San Francisco (2005)
Metadaten
Titel
Automatic Estimation of Web Bloggers’ Age Using Regression Models
verfasst von
Vasiliki Simaki
Christina Aravantinou
Iosif Mporas
Vasileios Megalooikonomou
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-23132-7_14