Skip to main content
main-content
Top

Hint

Swipe to navigate through the chapters of this book

2019 | OriginalPaper | Chapter

Gender Prediction Based on Chinese Name

Authors : Jizheng Jia, Qiyang Zhao

Published in: Natural Language Processing and Chinese Computing

Publisher: Springer International Publishing

share
SHARE

Abstract

Much work has been done on the problem of gender prediction about English using the idea of probability models or traditional machine learning methods. Different from English or other alphabetic languages, Chinese characters are logosyllabic. Previous approaches work quite well for Indo-European languages in general and English in particular, however, their performance deteriorate in Asian languages such as Chinese, Japanese and Korean. In our work, we focus on Simplified Chinese characters and present a novel approach incorporating phonetic information (Pinyin) to enhance Chinese word embedding trained on BERT model. We compared our method with several previous methods, namely Naive Bayes, GBDT, and Random forest with word embedding via fastText as features. Quantitative and qualitative experiments demonstrate the superior of our model. The results show that we can achieve 93.45% test accuracy using our method. In addition, we have released two large-scale gender-labeled datasets (one with over one million first names and the other with over six million full names) used as a part of this study for the community.
Literature
1.
go back to reference Mueller, J., Stumme, G.: Gender inference using statistical name characteristics in twitter. In: Proceedings of the The 3rd Multidisciplinary International Social Networks Conference on SocialInformatics 2016, Data Science 2016, p. 47. ACM (2016) Mueller, J., Stumme, G.: Gender inference using statistical name characteristics in twitter. In: Proceedings of the The 3rd Multidisciplinary International Social Networks Conference on SocialInformatics 2016, Data Science 2016, p. 47. ACM (2016)
2.
go back to reference Karimi, F., Wagner, C., Lemmerich, F., Jadidi, M., Strohmaier, M.: Inferring gender from names on the web: a comparative evaluation of gender detection methods. In: Proceedings of the 25th International Conference Companion on World Wide Web, WWW 2016 Companion, Republic and Canton of Geneva, Switzerland, pp. 53–54. International World Wide Web Conferences Steering Committee (2016) Karimi, F., Wagner, C., Lemmerich, F., Jadidi, M., Strohmaier, M.: Inferring gender from names on the web: a comparative evaluation of gender detection methods. In: Proceedings of the 25th International Conference Companion on World Wide Web, WWW 2016 Companion, Republic and Canton of Geneva, Switzerland, pp. 53–54. International World Wide Web Conferences Steering Committee (2016)
3.
go back to reference Khachane, M.Y.: Gender estimation from first name: a rule based approach. Int. J. Adv. Res. Comput. Sci. 9(2), 609 (2018) CrossRef Khachane, M.Y.: Gender estimation from first name: a rule based approach. Int. J. Adv. Res. Comput. Sci. 9(2), 609 (2018) CrossRef
4.
go back to reference Liu, W., Ruths, D.: What’s in a name? using first names as features for gender inference in twitter. In: 2013 AAAI Spring Symposium Series (2013) Liu, W., Ruths, D.: What’s in a name? using first names as features for gender inference in twitter. In: 2013 AAAI Spring Symposium Series (2013)
5.
go back to reference Gu, C., Tian, X.-P., Yu, J.-D.: Automatic recognition of chinese personal name using conditional random fields and knowledge base. Mathematical Problems in Engineering (2015) Gu, C., Tian, X.-P., Yu, J.-D.: Automatic recognition of chinese personal name using conditional random fields and knowledge base. Mathematical Problems in Engineering (2015)
6.
go back to reference Burger, J.D., Henderson, J., Kim, G., Zarrella, G.: Discriminating gender on twitter. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, Stroudsburg, PA, USA, pp. 1301–1309. Association for Computational Linguistics (2011) Burger, J.D., Henderson, J., Kim, G., Zarrella, G.: Discriminating gender on twitter. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, Stroudsburg, PA, USA, pp. 1301–1309. Association for Computational Linguistics (2011)
7.
go back to reference Liu, M., Rus, V., Liao, Q., Liu, L.: Encoding and ranking similar chinese characters. J. Inf. Sci. Eng. 33(5), 1195–1211 (2017) Liu, M., Rus, V., Liao, Q., Liu, L.: Encoding and ranking similar chinese characters. J. Inf. Sci. Eng. 33(5), 1195–1211 (2017)
8.
go back to reference Huang, S., Wu, J.: A pragmatic approach for classical chinese word segmentation. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018) 2018 Huang, S., Wu, J.: A pragmatic approach for classical chinese word segmentation. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018) 2018
9.
go back to reference Peng, N., Yu, M., Dredze, M.: An empirical study of chinese name matching and applications. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), vol. 2, pp. 377–383 (2015) Peng, N., Yu, M., Dredze, M.: An empirical study of chinese name matching and applications. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), vol. 2, pp. 377–383 (2015)
10.
go back to reference Huang, Y., Zhao, H.: Chinese pinyin aided IME, input what you have not keystroked yet. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, pp. 2923–2929. Association for Computational Linguistics, October-November 2018 Huang, Y., Zhao, H.: Chinese pinyin aided IME, input what you have not keystroked yet. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, pp. 2923–2929. Association for Computational Linguistics, October-November 2018
11.
go back to reference Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv e-prints, page arXiv:​1810.​04805, October 2018 Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv e-prints, page arXiv:​1810.​04805, October 2018
12.
go back to reference Chen, H., Gallagher, A.C., Girod, B.: What’s in a name? first names as facial attributes. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2013 Chen, H., Gallagher, A.C., Girod, B.: What’s in a name? first names as facial attributes. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2013
13.
go back to reference Zhao, H., Kamareddine, F.: Advance gender prediction tool of first names and its use in analysing gender disparity in computer science in the uk, malaysia and china. In: 2017 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 222–227, December 2017 Zhao, H., Kamareddine, F.: Advance gender prediction tool of first names and its use in analysing gender disparity in computer science in the uk, malaysia and china. In: 2017 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 222–227, December 2017
14.
Metadata
Title
Gender Prediction Based on Chinese Name
Authors
Jizheng Jia
Qiyang Zhao
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-32236-6_62

Premium Partner