Skip to main content

2018 | OriginalPaper | Buchkapitel

Bidirectional LSTM for Author Gender Identification

verfasst von : Bassem Bsir, Mounir Zrigui

Erschienen in: Computational Collective Intelligence

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Author profiling consists in inferring the authors’ gender, age, native language, dialects or personality by examining his/her written text. This important task is a very active research area because of its utility in crime, marketing and business.
In this paper, we address the problem of gender identification by applying the Long Short-Term Memory neural network architecture. Which is a novel type of recurrent network architecture that implements an appropriate gradient-based learning algorithm to overcome the vanishing-gradient problem. Experimental results show that our composition outperformed the traditional machine learning methods on gender identification.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Wikipedia, “WikimediaDownloads.”https://​dumps.​wikimedia.​org/​arwiki/​ 20170401/, 2017. [Online. Accessed 10 Apr 2017]
 
Literatur
1.
Zurück zum Zitat Poulston, A.: Stevenson, M., Bontcheva, K.: Topic models and n–gram language models for author profiling. In: Proceedings of CLEF 2015 Evaluation Labs (2015) Poulston, A.: Stevenson, M., Bontcheva, K.: Topic models and n–gram language models for author profiling. In: Proceedings of CLEF 2015 Evaluation Labs (2015)
2.
Zurück zum Zitat Alvarez-Carmona, M.A., et al.: INAOE’s participation at PAN 2015: Author profiling task. In: Working Notes Papers of the CLEF (2015) Alvarez-Carmona, M.A., et al.: INAOE’s participation at PAN 2015: Author profiling task. In: Working Notes Papers of the CLEF (2015)
3.
Zurück zum Zitat Argamon, S., Koppel, M., Fine, J., Shimoni, A.R.: Gender, genre, and writing style in formal written texts. Text-The Hague Then Amsterdam Then Berlin- 23(3), 321–346 (2003) Argamon, S., Koppel, M., Fine, J., Shimoni, A.R.: Gender, genre, and writing style in formal written texts. Text-The Hague Then Amsterdam Then Berlin- 23(3), 321–346 (2003)
4.
Zurück zum Zitat Aslam, T., Krsul, I., Spafford, E.H.: Use of a taxonomy of security faults (1996) Aslam, T., Krsul, I., Spafford, E.H.: Use of a taxonomy of security faults (1996)
5.
Zurück zum Zitat Bamman, D., Eisenstein, J., Schnoebelen, T.: Gender identity and lexical variation in social media. J. Socioling. 18(2), 135–160 (2014)CrossRef Bamman, D., Eisenstein, J., Schnoebelen, T.: Gender identity and lexical variation in social media. J. Socioling. 18(2), 135–160 (2014)CrossRef
6.
Zurück zum Zitat Bassem, B., Zrigui, M.: An empirical method for evaluation of author profiling framework. In: PACLIC 31. Cebu (2017) Bassem, B., Zrigui, M.: An empirical method for evaluation of author profiling framework. In: PACLIC 31. Cebu (2017)
7.
Zurück zum Zitat González-Gallardo, C.E., Montes, A., Sierra, G., Núñez-Juárez, J.A., Salinas-López, A.J., Ek, J.: Tweets classification using corpus dependent tags, character and POS N-grams. In: Proceedings of CLEF 2015 Evaluation Labs (2015) González-Gallardo, C.E., Montes, A., Sierra, G., Núñez-Juárez, J.A., Salinas-López, A.J., Ek, J.: Tweets classification using corpus dependent tags, character and POS N-grams. In: Proceedings of CLEF 2015 Evaluation Labs (2015)
8.
Zurück zum Zitat Chaski, C.E.: Who wrote it? Steps toward a science of authorship identification. Nat. Inst. Justice J. 233(233), 15–22 (1997) Chaski, C.E.: Who wrote it? Steps toward a science of authorship identification. Nat. Inst. Justice J. 233(233), 15–22 (1997)
9.
Zurück zum Zitat Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014). arXiv preprint arXiv:1406.1078 Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014). arXiv preprint arXiv:​1406.​1078
10.
Zurück zum Zitat Clauset, A., Moore, C., Newman, M.E.: Hierarchical structure and the prediction of missing links in networks. Nature 453(7191), 98 (2008)CrossRef Clauset, A., Moore, C., Newman, M.E.: Hierarchical structure and the prediction of missing links in networks. Nature 453(7191), 98 (2008)CrossRef
11.
Zurück zum Zitat Collobert, R., et al.: Natural language processing (almost) from scratch. Journal of Machine Learning Research, 2493–2537, 12 Aug 2011 Collobert, R., et al.: Natural language processing (almost) from scratch. Journal of Machine Learning Research, 2493–2537, 12 Aug 2011
12.
Zurück zum Zitat Ding, H., Samadzadeh, M.H.: Extraction of Java program fingerprints for software authorship identification. J. Syst. Softw. 72(1), 49–57 (2004)CrossRef Ding, H., Samadzadeh, M.H.: Extraction of Java program fingerprints for software authorship identification. J. Syst. Softw. 72(1), 49–57 (2004)CrossRef
13.
Zurück zum Zitat Estival, D., et al.: Author Profiling for English and Arabic Emails (2008) Estival, D., et al.: Author Profiling for English and Arabic Emails (2008)
14.
Zurück zum Zitat Gehring, W.J., et al.: A neural system for error detection and compensation. Psychol. Sci. 4(6), 385–390 (1993)CrossRef Gehring, W.J., et al.: A neural system for error detection and compensation. Psychol. Sci. 4(6), 385–390 (1993)CrossRef
15.
Zurück zum Zitat Gokturk, S.B., et al.: System and method for providing objectified image renderings using recognition information from images. U.S. Patent No. 9,430,719. 30 Aug 2016 Gokturk, S.B., et al.: System and method for providing objectified image renderings using recognition information from images. U.S. Patent No. 9,430,719. 30 Aug 2016
16.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: LSTM can solve hard long time lag problems. In: Advances in neural information processing systems, pp. 473–479 (1997) Hochreiter, S., Schmidhuber, J.: LSTM can solve hard long time lag problems. In: Advances in neural information processing systems, pp. 473–479 (1997)
17.
Zurück zum Zitat Inches, G., Crestani, F.: Overview of the international sexual predator identification competition at PAN-2012. In: CLEF (Online working notes/labs/workshop), vol. 30 (2012) Inches, G., Crestani, F.: Overview of the international sexual predator identification competition at PAN-2012. In: CLEF (Online working notes/labs/workshop), vol. 30 (2012)
19.
Zurück zum Zitat Kalchbrenner, N., Blunsom, P.: Recurrent continuous translation models. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (2013) Kalchbrenner, N., Blunsom, P.: Recurrent continuous translation models. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (2013)
20.
Zurück zum Zitat Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences (2014). arXiv preprint arXiv:1404.2188 Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences (2014). arXiv preprint arXiv:​1404.​2188
21.
Zurück zum Zitat Kodiyan, D., et al.: Author profiling with bidirectional RNNs using attention with GRUs: notebook for PAN at CLEF 2017. In: CLEF 2017 Evaluation Labs and Workshop–Working Notes Papers, Dublin, Ireland. 11–14 September 2017 Kodiyan, D., et al.: Author profiling with bidirectional RNNs using attention with GRUs: notebook for PAN at CLEF 2017. In: CLEF 2017 Evaluation Labs and Workshop–Working Notes Papers, Dublin, Ireland. 11–14 September 2017
22.
Zurück zum Zitat Malmasi, S., et al.: Discriminating between similar languages and arabic dialect identification: a report on the third dsl shared task. In: VarDial 3 (2016) Malmasi, S., et al.: Discriminating between similar languages and arabic dialect identification: a report on the third dsl shared task. In: VarDial 3 (2016)
23.
Zurück zum Zitat Martinc, M., Škrjanec, I., Zupan, K., Pollak, S.: Pan 2017: Author profiling gender and language variety prediction. CLEF (Working Notes) 2017. In: CEUR Workshop Proceedings 1866, CEUR-WS.org (2017) Martinc, M., Škrjanec, I., Zupan, K., Pollak, S.: Pan 2017: Author profiling gender and language variety prediction. CLEF (Working Notes) 2017. In: CEUR Workshop Proceedings 1866, CEUR-WS.org (2017)
24.
Zurück zum Zitat Miura, Y., et al.: Author Profiling with Word + Character Neural Attention Network. CLEF (Working Notes) 2017. In: CEUR Workshop Proceedings 1866, CEUR-WS.org (2017) Miura, Y., et al.: Author Profiling with Word + Character Neural Attention Network. CLEF (Working Notes) 2017. In: CEUR Workshop Proceedings 1866, CEUR-WS.org (2017)
25.
Zurück zum Zitat Peersman, C., Daelemans, W., Van Vaerenbergh, L.: Predicting age and gender in online social networks. In: Proceedings of the 3rd International Workshop on Search and Mining User-Generated Contents, pp. 37–44. ACM (2011) Peersman, C., Daelemans, W., Van Vaerenbergh, L.: Predicting age and gender in online social networks. In: Proceedings of the 3rd International Workshop on Search and Mining User-Generated Contents, pp. 37–44. ACM (2011)
26.
Zurück zum Zitat Pham, D.D., Tran, G.B., Pham, S.B.: Author profiling for Vietnamese blogs. In: International Conference on Asian Language Processing, 2009. IALP 2009, pp. 190–194. IEEE (2009) Pham, D.D., Tran, G.B., Pham, S.B.: Author profiling for Vietnamese blogs. In: International Conference on Asian Language Processing, 2009. IALP 2009, pp. 190–194. IEEE (2009)
27.
Zurück zum Zitat Rangel, F., et al.: Overview of the 5th author profiling task at pan 2017: gender and language variety identification in twitter. In: Working Notes Papers of the CLEF (2017) Rangel, F., et al.: Overview of the 5th author profiling task at pan 2017: gender and language variety identification in twitter. In: Working Notes Papers of the CLEF (2017)
28.
Zurück zum Zitat Säily, T.: Variation in morphological productivity in the BNC: Sociolinguistic and methodological considerations. Corpus Linguist. Linguist. Theory 7(1), 119–141 (2011)CrossRef Säily, T.: Variation in morphological productivity in the BNC: Sociolinguistic and methodological considerations. Corpus Linguist. Linguist. Theory 7(1), 119–141 (2011)CrossRef
29.
Zurück zum Zitat Sallis, P.J., et al.: Identified: Software authorship analysis with case-based reasoning (1998) Sallis, P.J., et al.: Identified: Software authorship analysis with case-based reasoning (1998)
30.
Zurück zum Zitat Sap, M., Park, G., Eichstaedt, J., Kern, M., Stillwell, D., Kosinski, M., Schwartz, H.A.: Developing age and gender predictive lexica over social media. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1146–1151 (2014) Sap, M., Park, G., Eichstaedt, J., Kern, M., Stillwell, D., Kosinski, M., Schwartz, H.A.: Developing age and gender predictive lexica over social media. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1146–1151 (2014)
31.
Zurück zum Zitat Socher, R., et al. Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (2013) Socher, R., et al. Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (2013)
32.
Zurück zum Zitat Wang, P., Xu, J., Xu, B., Liu, C., Zhang, H., Wang, F., Hao, H.: Semantic clustering and convolutional neural network for short text categorization. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). vol. 2, pp. 352–357 (2015) Wang, P., Xu, J., Xu, B., Liu, C., Zhang, H., Wang, F., Hao, H.: Semantic clustering and convolutional neural network for short text categorization. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). vol. 2, pp. 352–357 (2015)
33.
Zurück zum Zitat Williams, J.D., Zweig, G.: End-to-end lstm-based dialog control optimized with supervised and reinforcement learning (2016). arXiv preprint arXiv:1606.01269 Williams, J.D., Zweig, G.: End-to-end lstm-based dialog control optimized with supervised and reinforcement learning (2016). arXiv preprint arXiv:​1606.​01269
34.
35.
36.
Zurück zum Zitat Zhou, P., Qi, Z., Zheng, S., Xu, J., Bao, H., Xu, B.: Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling (2016). arXiv preprint arXiv:1611.06639. 2016 Zhou, P., Qi, Z., Zheng, S., Xu, J., Bao, H., Xu, B.: Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling (2016). arXiv preprint arXiv:​1611.​06639. 2016
Metadaten
Titel
Bidirectional LSTM for Author Gender Identification
verfasst von
Bassem Bsir
Mounir Zrigui
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-98443-8_36