Skip to main content
Top

2015 | OriginalPaper | Chapter

Automatic Estimation of Web Bloggers’ Age Using Regression Models

Authors : Vasiliki Simaki, Christina Aravantinou, Iosif Mporas, Vasileios Megalooikonomou

Published in: Speech and Computer

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this article, we address the problem of automatic age estimation of web users based on their posts. Most studies on age identification treat the issue as a classification problem. Instead of following an age category classification approach, we investigate the appropriateness of several regression algorithms on the task of age estimation. We evaluate a number of well-known and widely used machine learning algorithms for numerical estimation, in order to examine their appropriateness on this task. We used a set of 42 text features. The experimental results showed that the Bagging algorithm with RepTree base learner offered the best performance, achieving estimation of web users’ age with mean absolute error equal to 5.44, while the root mean squared error is approximately 7.14.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Labov, W.: Sociolinguistic Patterns (No. 4). University of Pennsylvania Press, Philadelphia (1972) Labov, W.: Sociolinguistic Patterns (No. 4). University of Pennsylvania Press, Philadelphia (1972)
2.
go back to reference Trudgill, P.: The social differentiation of English in Norwich, vol. 13. CUP Archive, Cambridge (1974) Trudgill, P.: The social differentiation of English in Norwich, vol. 13. CUP Archive, Cambridge (1974)
3.
go back to reference Eckert, P.: Age as a sociolinguistic variable. In: Coulmas, F. (ed.) The Handbook of Sociolinguistics. Blackwell, Oxford (1997) Eckert, P.: Age as a sociolinguistic variable. In: Coulmas, F. (ed.) The Handbook of Sociolinguistics. Blackwell, Oxford (1997)
4.
go back to reference Labov, W.: Principles of linguistic change, cognitive and cultural factors, vol. 3. John Wiley & Sons, New York (2011) Labov, W.: Principles of linguistic change, cognitive and cultural factors, vol. 3. John Wiley & Sons, New York (2011)
5.
go back to reference Schler, J., Koppel, M., Argamon, S., Pennebaker, J.W.: Effects of age and gender on blogging. In: AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, vol. 6, pp. 199–205 (2006) Schler, J., Koppel, M., Argamon, S., Pennebaker, J.W.: Effects of age and gender on blogging. In: AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, vol. 6, pp. 199–205 (2006)
6.
go back to reference Argamon, S., Koppel, M., Pennebaker, J.W., Schler, J.: Mining the blogosphere: age, gender and the varieties of self-expression. First Monday, 12(9) (2007) Argamon, S., Koppel, M., Pennebaker, J.W., Schler, J.: Mining the blogosphere: age, gender and the varieties of self-expression. First Monday, 12(9) (2007)
7.
go back to reference Goswami, S., Sarkar, S., Rustagi, M.: Stylometric analysis of bloggers’ age and gender. In: Third International AAAI Conference on Weblogs and Social Media (2009) Goswami, S., Sarkar, S., Rustagi, M.: Stylometric analysis of bloggers’ age and gender. In: Third International AAAI Conference on Weblogs and Social Media (2009)
8.
go back to reference Tam, J., Martell, C.H.: Age detection in chat. In: IEEE International Conference on Semantic Computing, ICSC 2009, pp. 33–39. IEEE (2009) Tam, J., Martell, C.H.: Age detection in chat. In: IEEE International Conference on Semantic Computing, ICSC 2009, pp. 33–39. IEEE (2009)
9.
go back to reference Peersman, C., Daelemans, W., Van Vaerenbergh, L.: Predicting age and gender in online social networks. In: Proceedings of the 3rd international workshop on Search and Mining User-Generated Contents, pp. 37–44. ACM (2011) Peersman, C., Daelemans, W., Van Vaerenbergh, L.: Predicting age and gender in online social networks. In: Proceedings of the 3rd international workshop on Search and Mining User-Generated Contents, pp. 37–44. ACM (2011)
10.
go back to reference Rosenthal, S., McKeown, K.: Age prediction in blogs: a study of style, content, and online behavior in pre-and post-social media generations. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 763–772. ACL (2011) Rosenthal, S., McKeown, K.: Age prediction in blogs: a study of style, content, and online behavior in pre-and post-social media generations. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 763–772. ACL (2011)
11.
go back to reference Nguyen, D., Smith, N.A., Ros, C.P.: Author age prediction from text using linear regression. In: Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 115–123. ACL (2011) Nguyen, D., Smith, N.A., Ros, C.P.: Author age prediction from text using linear regression. In: Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 115–123. ACL (2011)
12.
go back to reference Rangel, F., Rosso, P., Koppel, M., Stamatatos, E., Inches, G.: Overview of the author profiling task at PAN 2013. Notebook Papers of CLEF (2013) Rangel, F., Rosso, P., Koppel, M., Stamatatos, E., Inches, G.: Overview of the author profiling task at PAN 2013. Notebook Papers of CLEF (2013)
13.
go back to reference Flekova, L., Gurevych, I.: Can we hide in the web? Large scale simultaneous age and gender author profiling in social media. In: CLEF 2012 Labs and Work-shop. Notebook Papers (2013) Flekova, L., Gurevych, I.: Can we hide in the web? Large scale simultaneous age and gender author profiling in social media. In: CLEF 2012 Labs and Work-shop. Notebook Papers (2013)
14.
go back to reference Rangel, F., Rosso, P.: Use of language and author profiling: identification of gender and age. Natural Language Processing and Cognitive Science, 177 (2013) Rangel, F., Rosso, P.: Use of language and author profiling: identification of gender and age. Natural Language Processing and Cognitive Science, 177 (2013)
15.
go back to reference Schwartz, H.A., Eichstaedt, J.C., Kern, M.L., Dziurzynski, L., Ramones, S.M., Agrawal, M., Ungar, L.H.: Personality, gender, and age in the language of social media: the open-vocabulary approach. PloS one 8(9), e73791 (2013)CrossRef Schwartz, H.A., Eichstaedt, J.C., Kern, M.L., Dziurzynski, L., Ramones, S.M., Agrawal, M., Ungar, L.H.: Personality, gender, and age in the language of social media: the open-vocabulary approach. PloS one 8(9), e73791 (2013)CrossRef
16.
go back to reference Nguyen, D., Gravel, R., Trieschnigg, D., Meder, T.: “How old do you think i am?”; A study of language and age in twitter. In: Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media. AAAI Press (2013) Nguyen, D., Gravel, R., Trieschnigg, D., Meder, T.: “How old do you think i am?”; A study of language and age in twitter. In: Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media. AAAI Press (2013)
17.
go back to reference Verhoeven, B., Daelemans, W.: CLiPSStylometry Investigation (CSI) corpus: a Dutch corpus for the detection of age, gender, personality, sentiment and deception in text. In: Proceedings of the 9th International Conference on Language Resources and Evaluation (2014) Verhoeven, B., Daelemans, W.: CLiPSStylometry Investigation (CSI) corpus: a Dutch corpus for the detection of age, gender, personality, sentiment and deception in text. In: Proceedings of the 9th International Conference on Language Resources and Evaluation (2014)
18.
go back to reference Chester, D.L.: Why two hidden layers are better than one. In: Proceedings of the International Joint Conference on Neural Networks, vol. 1, pp. 265–268 (1990) Chester, D.L.: Why two hidden layers are better than one. In: Proceedings of the International Joint Conference on Neural Networks, vol. 1, pp. 265–268 (1990)
19.
go back to reference Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Elsevier, Morgan-Kaufman Series of Data Management Systems, San Francisco (2005) Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Elsevier, Morgan-Kaufman Series of Data Management Systems, San Francisco (2005)
Metadata
Title
Automatic Estimation of Web Bloggers’ Age Using Regression Models
Authors
Vasiliki Simaki
Christina Aravantinou
Iosif Mporas
Vasileios Megalooikonomou
Copyright Year
2015
DOI
https://doi.org/10.1007/978-3-319-23132-7_14

Premium Partner