Skip to main content
Erschienen in: Neural Computing and Applications 8/2021

24.07.2020 | Original Article

Extensive study on the underlying gender bias in contextualized word embeddings

verfasst von: Christine Basta, Marta R. Costa-jussà, Noe Casas

Erschienen in: Neural Computing and Applications | Ausgabe 8/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Gender bias is affecting many natural language processing applications. While we are still far from proposing debiasing methods that will solve the problem, we are making progress analyzing the impact of this bias in current algorithms. This paper provides an extensive study of the underlying gender bias in popular contextualized word embeddings. Our study provides an insightful analysis of evaluation measures applied to several English data domains and the layers of the contextualized word embeddings. It is also adapted and extended to the Spanish language. Our study points out the advantages and limitations of the various evaluation measures that we are using and aims to standardize the evaluation of gender bias in contextualized word embeddings.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Basta C, Costa-jussà MR, Casas N (2019) Evaluating the underlying gender bias in contextualized word embeddings. arXiv:190408783 Basta C, Costa-jussà MR, Casas N (2019) Evaluating the underlying gender bias in contextualized word embeddings. arXiv:​190408783
2.
Zurück zum Zitat Blodgett SL, Barocas S, Daumé III H, Wallach H (2020) Language (technology) is power: a critical survey of “bias” in NLP. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 5454–5476 Blodgett SL, Barocas S, Daumé III H, Wallach H (2020) Language (technology) is power: a critical survey of “bias” in NLP. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 5454–5476
3.
Zurück zum Zitat Bolukbasi T, Chang KW, Zou JY, Saligrama V, Kalai AT (2016) Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R (eds) Advances in Neural Information Processing Systems 29, Curran Associates, Inc., pp 4349–4357 Bolukbasi T, Chang KW, Zou JY, Saligrama V, Kalai AT (2016) Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R (eds) Advances in Neural Information Processing Systems 29, Curran Associates, Inc., pp 4349–4357
4.
Zurück zum Zitat Costa-jussà MR (2019) An analysis of gender bias studies in natural language processing. Nature Machine Intelligence 1 Costa-jussà MR (2019) An analysis of gender bias studies in natural language processing. Nature Machine Intelligence 1
5.
6.
Zurück zum Zitat Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:181004805 Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:​181004805
7.
Zurück zum Zitat Ethayarajh K, Duvenaud D, Hirst G (2019) Understanding undesirable word embedding associations. In: Proc. of the ACL Ethayarajh K, Duvenaud D, Hirst G (2019) Understanding undesirable word embedding associations. In: Proc. of the ACL
8.
Zurück zum Zitat Font JE, Costa-jussà MR (2019) Equalizing gender biases in neural machine translation with word embeddings techniques. arXiv:1901.03116 Font JE, Costa-jussà MR (2019) Equalizing gender biases in neural machine translation with word embeddings techniques. arXiv:​1901.​03116
9.
Zurück zum Zitat Gonen H, Goldberg Y (2019) Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them. arXiv:1903.03862 Gonen H, Goldberg Y (2019) Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them. arXiv:​1903.​03862
10.
11.
Zurück zum Zitat Guo W, Caliskan A (2020) Detecting emergent intersectional biases: contextualized word embeddings contain a distribution of human-like biases. arXiv:200603955 Guo W, Caliskan A (2020) Detecting emergent intersectional biases: contextualized word embeddings contain a distribution of human-like biases. arXiv:​200603955
13.
Zurück zum Zitat Howard J, Ruder S (2018) Universal language model fine-tuning for text classification. In: Proc. of the ACL (Volume 1: Long Papers), Melbourne, Australia, pp 328–339 Howard J, Ruder S (2018) Universal language model fine-tuning for text classification. In: Proc. of the ACL (Volume 1: Long Papers), Melbourne, Australia, pp 328–339
14.
Zurück zum Zitat Huang L, Sun C, Qiu X, Huang X (2019) GlossBERT: BERT for word sense disambiguation with gloss knowledge. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), Hong Kong, China, pp 3509–3514 Huang L, Sun C, Qiu X, Huang X (2019) GlossBERT: BERT for word sense disambiguation with gloss knowledge. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), Hong Kong, China, pp 3509–3514
16.
Zurück zum Zitat Khattak FK, Jeblee S, Pou-Prom C, Abdalla M, Meaney C, Rudzicz F (2019) A survey of word embeddings for clinical text. J Biomed Inf X 4:100057 Khattak FK, Jeblee S, Pou-Prom C, Abdalla M, Meaney C, Rudzicz F (2019) A survey of word embeddings for clinical text. J Biomed Inf X 4:100057
17.
Zurück zum Zitat Kurita K, Vyas N, Pareek A, Black AW, Tsvetkov Y (2019) Measuring bias in contextualized word representations. In: Proceedings of the first workshop on gender bias in natural language processing, pp 166–172 Kurita K, Vyas N, Pareek A, Black AW, Tsvetkov Y (2019) Measuring bias in contextualized word representations. In: Proceedings of the first workshop on gender bias in natural language processing, pp 166–172
18.
Zurück zum Zitat Liu NF, Gardner M, Belinkov Y, Peters ME, Smith NA (2019) Linguistic knowledge and transferability of contextual representations. In: Proceedings of the conference of the north american chapter of the association for computational linguistics: human language technologies Liu NF, Gardner M, Belinkov Y, Peters ME, Smith NA (2019) Linguistic knowledge and transferability of contextual representations. In: Proceedings of the conference of the north american chapter of the association for computational linguistics: human language technologies
19.
20.
Zurück zum Zitat May C, Wang A, Bordia S, Bowman SR, Rudinger R (2019) On measuring social biases in sentence encoders. arXiv:190310561 May C, Wang A, Bordia S, Bowman SR, Rudinger R (2019) On measuring social biases in sentence encoders. arXiv:​190310561
21.
Zurück zum Zitat McGuire L, Mulvey KL, Goff E, Irvin MJ, Winterbottom M, Fields GE, Hartstone-Rose A, Rutland A (2020) Stem gender stereotypes from early childhood through adolescence at informal science centers. J Appl Develop Psychol 67:101109CrossRef McGuire L, Mulvey KL, Goff E, Irvin MJ, Winterbottom M, Fields GE, Hartstone-Rose A, Rutland A (2020) Stem gender stereotypes from early childhood through adolescence at informal science centers. J Appl Develop Psychol 67:101109CrossRef
22.
Zurück zum Zitat Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in Neural Information Processing Systems 26, pp 3111–3119 Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in Neural Information Processing Systems 26, pp 3111–3119
23.
Zurück zum Zitat Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp 1532–1543 Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp 1532–1543
25.
Zurück zum Zitat Peters M, Ruder S, Smith NA (2019) To tune or not to tune? adapting pretrained representations to diverse tasks. arXiv:190305987 Peters M, Ruder S, Smith NA (2019) To tune or not to tune? adapting pretrained representations to diverse tasks. arXiv:​190305987
27.
Zurück zum Zitat Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners
28.
Zurück zum Zitat Smith NA (2020) Contextual word representations: putting words into computers. Commun ACM 63(6):66–74CrossRef Smith NA (2020) Contextual word representations: putting words into computers. Commun ACM 63(6):66–74CrossRef
29.
Zurück zum Zitat Stanovsky G, Smith NA, Zettlemoyer L (2019) Evaluating gender bias in machine translation. In: Proc. of the ACL, Florence, Italy, pp 1679–1684 Stanovsky G, Smith NA, Zettlemoyer L (2019) Evaluating gender bias in machine translation. In: Proc. of the ACL, Florence, Italy, pp 1679–1684
30.
Zurück zum Zitat Tan YC, Celis LE (2019) Assessing social and intersectional biases in contextualized word representations. In: Advances in Neural Information Processing Systems, pp 13209–13220 Tan YC, Celis LE (2019) Assessing social and intersectional biases in contextualized word representations. In: Advances in Neural Information Processing Systems, pp 13209–13220
31.
Zurück zum Zitat Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
32.
Zurück zum Zitat Zhang BH, Lemoine B, Mitchell M (2018) Mitigating unwanted biases with adversarial learning. In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, ACM, pp 335–340 Zhang BH, Lemoine B, Mitchell M (2018) Mitigating unwanted biases with adversarial learning. In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, ACM, pp 335–340
33.
Zurück zum Zitat Zhao J, Wang T, Yatskar M, Ordonez V, Chang KW (2018a) Gender bias in coreference resolution: evaluation and debiasing methods. arXiv:180406876 Zhao J, Wang T, Yatskar M, Ordonez V, Chang KW (2018a) Gender bias in coreference resolution: evaluation and debiasing methods. arXiv:​180406876
35.
Zurück zum Zitat Zhao J, Wang T, Yatskar M, Cotterell R, Ordonez V, Chang KW (2019) Gender bias in contextualized word embeddings. In: Proc. of the Conference of the NAACL Zhao J, Wang T, Yatskar M, Cotterell R, Ordonez V, Chang KW (2019) Gender bias in contextualized word embeddings. In: Proc. of the Conference of the NAACL
36.
Zurück zum Zitat Zhou P, Shi W, Zhao J, Huang KH, Chen M, Cotterell R, Chang KW (2019) Examining gender bias in languages with grammatical gender. arXiv:190902224 Zhou P, Shi W, Zhao J, Huang KH, Chen M, Cotterell R, Chang KW (2019) Examining gender bias in languages with grammatical gender. arXiv:​190902224
Metadaten
Titel
Extensive study on the underlying gender bias in contextualized word embeddings
verfasst von
Christine Basta
Marta R. Costa-jussà
Noe Casas
Publikationsdatum
24.07.2020
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 8/2021
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-020-05211-z

Weitere Artikel der Ausgabe 8/2021

Neural Computing and Applications 8/2021 Zur Ausgabe