Skip to main content

2015 | OriginalPaper | Buchkapitel

Gender Classification of Web Authors Using Feature Selection and Language Models

verfasst von : Christina Aravantinou, Vasiliki Simaki, Iosif Mporas, Vasileios Megalooikonomou

Erschienen in: Speech and Computer

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In the present article, we address the problem of automatic gender classification of web blog authors. More specifically, we employ eight widely used machine learning algorithms, in order to study the effectiveness of feature selection on improving the accuracy of gender classification. The feature ranking is performed over a set of statistical, part-of-speech tagging and language model features. In the experiments, we employed classification models based on decision trees, support vector machines and lazy-learning algorithms. The experimental evaluation performed on blog author gender classification data demonstrated the importance of language model features for this task and that feature selection significantly improves the accuracy of gender classification, regardless of the type of the machine learning algorithm used.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ansari, Y.Z., Azad, S.A., Akhtar, H.: Gender classification of blog authors. Int. J. Sustain. Dev. Green Econ. 2(1) (2013). ISSN No: 2315–4721 Ansari, Y.Z., Azad, S.A., Akhtar, H.: Gender classification of blog authors. Int. J. Sustain. Dev. Green Econ. 2(1) (2013). ISSN No: 2315–4721
2.
Zurück zum Zitat Argamon, S., Koppel, M., Pennebaker, W., Schler, J.: Mining the Blogosphere: age, gender and the varieties of self-expression. First Monday 12, 9 (2007)CrossRef Argamon, S., Koppel, M., Pennebaker, W., Schler, J.: Mining the Blogosphere: age, gender and the varieties of self-expression. First Monday 12, 9 (2007)CrossRef
3.
Zurück zum Zitat Burger, J., Henderson, J., Kim, G., Zarrella, G.: Discriminating gender on Twitter. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1301–1309. Association for Computational Linguistics, Stroudsburg (2011) Burger, J., Henderson, J., Kim, G., Zarrella, G.: Discriminating gender on Twitter. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1301–1309. Association for Computational Linguistics, Stroudsburg (2011)
4.
Zurück zum Zitat Cheng, N., Chandramouli, R., Subbalakshmi, K.P.: Author gender identification from text. Int. J. Digit. Forensics Incident Response 8(1), 78–88 (2011) Cheng, N., Chandramouli, R., Subbalakshmi, K.P.: Author gender identification from text. Int. J. Digit. Forensics Incident Response 8(1), 78–88 (2011)
5.
Zurück zum Zitat Company, J.S., Wanner, L.: How to use less features and reach better performance in author gender identification. In: Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC). Reykjavik, Iceland (2014) Company, J.S., Wanner, L.: How to use less features and reach better performance in author gender identification. In: Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC). Reykjavik, Iceland (2014)
6.
Zurück zum Zitat Holmgren, J., Shyu, E.: Gender Classification of Facebook Posts (2013) Holmgren, J., Shyu, E.: Gender Classification of Facebook Posts (2013)
7.
Zurück zum Zitat Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings of the 9th International Workshop on Machine Learning, pp. 249–256 (1992) Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings of the 9th International Workshop on Machine Learning, pp. 249–256 (1992)
8.
Zurück zum Zitat Kobayashi, D., Matsumura, N., Ishizuka, M.: Automatic estimation of Bloggers’ gender. In: Proceedings of International Conference on Weblogs and Social Media (2007) Kobayashi, D., Matsumura, N., Ishizuka, M.: Automatic estimation of Bloggers’ gender. In: Proceedings of International Conference on Weblogs and Social Media (2007)
9.
Zurück zum Zitat Koppel, M., Argamon, S., Shimoni, A.R.: Automatically categorizing written texts by author gender. Literary Linguist. Comput. 17(4), 401–412 (2003)CrossRef Koppel, M., Argamon, S., Shimoni, A.R.: Automatically categorizing written texts by author gender. Literary Linguist. Comput. 17(4), 401–412 (2003)CrossRef
10.
Zurück zum Zitat Lazer, D., Pentland, A.S., Adamic, L., Aral, S., Barabasi, A.L., Brewer, D., Van Alstyne, M.: Life in the network: the coming age of computational social science. Science 323(5915), 721 (2009). (New York, NY)CrossRef Lazer, D., Pentland, A.S., Adamic, L., Aral, S., Barabasi, A.L., Brewer, D., Van Alstyne, M.: Life in the network: the coming age of computational social science. Science 323(5915), 721 (2009). (New York, NY)CrossRef
11.
Zurück zum Zitat Marquardt, J., Farnadi, G., Vasudevan, G., Moens, M., Davalos, S., Teredesai, A., De Cock, M.: Age and Gender Identification in Social Media. Author Profiling Task at PAN (2014) Marquardt, J., Farnadi, G., Vasudevan, G., Moens, M., Davalos, S., Teredesai, A., De Cock, M.: Age and Gender Identification in Social Media. Author Profiling Task at PAN (2014)
12.
Zurück zum Zitat Mukherjee, A., Liu, B.: Improving gender classification of blog authors. In: Proceedings of EMNLP (2010) Mukherjee, A., Liu, B.: Improving gender classification of blog authors. In: Proceedings of EMNLP (2010)
13.
Zurück zum Zitat Peersman, C., Daelemans, W., Van Vaerenbergh, L: Predicting age and gender in online social networks. In: Proceedings of the 3rd Workshop on Search and Mining User-Generated Contents, Glasgow, UK (2011) Peersman, C., Daelemans, W., Van Vaerenbergh, L: Predicting age and gender in online social networks. In: Proceedings of the 3rd Workshop on Search and Mining User-Generated Contents, Glasgow, UK (2011)
14.
Zurück zum Zitat Rangel, F., Rosso, P.: Use of language and author profiling: identification of gender and age. In: Proceedings of the 10th International Workshop on Natural Language Processing and Cognitive Science (2013) Rangel, F., Rosso, P.: Use of language and author profiling: identification of gender and age. In: Proceedings of the 10th International Workshop on Natural Language Processing and Cognitive Science (2013)
15.
Zurück zum Zitat Sarawgi, R., Gajulapalli, K., Choi, Y.: Gender attribution: tracing stylometric evidence beyond topic and genre. In: Proceedings of the 15th Conference on Computational Natural Language Learning, pp. 78–86. Association for Computational Linguistics, Stroudsburg (2011) Sarawgi, R., Gajulapalli, K., Choi, Y.: Gender attribution: tracing stylometric evidence beyond topic and genre. In: Proceedings of the 15th Conference on Computational Natural Language Learning, pp. 78–86. Association for Computational Linguistics, Stroudsburg (2011)
16.
Zurück zum Zitat Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan-Kaufman Series of Data Management Systems, 2nd edn. Elsevier, San Francisco (2005) Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan-Kaufman Series of Data Management Systems, 2nd edn. Elsevier, San Francisco (2005)
17.
Zurück zum Zitat Yan, X., Yan, L.: Gender Classification of Weblog Authors. Computational Approaches to Analyzing Weblogs, AAAI (2006) Yan, X., Yan, L.: Gender Classification of Weblog Authors. Computational Approaches to Analyzing Weblogs, AAAI (2006)
18.
Zurück zum Zitat Zhang, C., Zhang, P.: Predicting gender from blog posts. Technical report. University of Massachusetts Amherst, USA (2010) Zhang, C., Zhang, P.: Predicting gender from blog posts. Technical report. University of Massachusetts Amherst, USA (2010)
Metadaten
Titel
Gender Classification of Web Authors Using Feature Selection and Language Models
verfasst von
Christina Aravantinou
Vasiliki Simaki
Iosif Mporas
Vasileios Megalooikonomou
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-23132-7_28