Skip to main content
Top

2019 | OriginalPaper | Chapter

Cross-lingual Neural Vector Conceptualization

Authors : Lisa Raithel, Robert Schwarzenberg

Published in: Natural Language Processing and Chinese Computing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Recently, Neural Vector Conceptualization (NVC) was proposed as a means to interpret samples from a word vector space. For NVC, a neural model activates higher order concepts it recognizes in a word vector instance. To this end, the model first needs to be trained with a sufficiently large instance-to-concept ground truth, which only exists for a few languages. In this work, we tackle this lack of resources with word vector space alignment techniques: We train the NVC model on a high resource language and test it with vectors from an aligned word vector space of another language, without retraining or fine-tuning. A quantitative and qualitative analysis shows that the NVC model indeed activates meaningful concepts for unseen vectors from the aligned vector space. NVC thus becomes available for low resource languages for which no appropriate concept ground truth exists.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)CrossRef Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)CrossRef
2.
go back to reference Brunet, M.E., Alkalay-Houlihan, C., Anderson, A., Zemel, R.: Understanding the origins of bias in word embeddings. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 803–811. PMLR, Long Beach, California, USA 09–15 June 2019 Brunet, M.E., Alkalay-Houlihan, C., Anderson, A., Zemel, R.: Understanding the origins of bias in word embeddings. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 803–811. PMLR, Long Beach, California, USA 09–15 June 2019
3.
go back to reference Chen, K.J., Huang, C.R., Chang, L.P., Hsu, H.L.: Sinica corpus: design methodology for balanced corpora. In: Proceedings of the 11th Pacific Asia Conference on Language, Information and Computation, pp. 167–176 (1996) Chen, K.J., Huang, C.R., Chang, L.P., Hsu, H.L.: Sinica corpus: design methodology for balanced corpora. In: Proceedings of the 11th Pacific Asia Conference on Language, Information and Computation, pp. 167–176 (1996)
4.
go back to reference Conneau, A., Lample, G., Ranzato, M., Denoyer, L., Jégou, H.: Word translation without parallel data. arXiv preprint arXiv:1710.04087 (2017) Conneau, A., Lample, G., Ranzato, M., Denoyer, L., Jégou, H.: Word translation without parallel data. arXiv preprint arXiv:​1710.​04087 (2017)
5.
go back to reference Dev, S., Phillips, J.: Attenuating bias in word vectors. In: Chaudhuri, K., Sugiyama, M. (eds.) Proceedings of Machine Learning Research, vol. 89, pp. 879–887. PMLR, 16–18 April 2019 Dev, S., Phillips, J.: Attenuating bias in word vectors. In: Chaudhuri, K., Sugiyama, M. (eds.) Proceedings of Machine Learning Research, vol. 89, pp. 879–887. PMLR, 16–18 April 2019
7.
go back to reference Glavas, G., Litschko, R., Ruder, S., Vulic, I.: How to (properly) evaluate cross-lingual word embeddings: on strong baselines, comparative analyses, and some misconceptions. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pp. 710–721 (2019) Glavas, G., Litschko, R., Ruder, S., Vulic, I.: How to (properly) evaluate cross-lingual word embeddings: on strong baselines, comparative analyses, and some misconceptions. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pp. 710–721 (2019)
8.
go back to reference Gromann, D., Declerck, T.: Comparing pretrained multilingual word embeddings on an ontology alignment task. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018). European Languages Resources Association (ELRA), Miyazaki, Japan, May 2018 Gromann, D., Declerck, T.: Comparing pretrained multilingual word embeddings on an ontology alignment task. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018). European Languages Resources Association (ELRA), Miyazaki, Japan, May 2018
9.
go back to reference Joulin, A., Bojanowski, P., Mikolov, T., Jégou, H., Grave, E.: Loss in translation: learning bilingual word mapping with a retrieval criterion. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2979–2984 (2018) Joulin, A., Bojanowski, P., Mikolov, T., Jégou, H., Grave, E.: Loss in translation: learning bilingual word mapping with a retrieval criterion. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2979–2984 (2018)
10.
go back to reference Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR) (2015) Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR) (2015)
11.
go back to reference Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv:1301.3781 (2013) Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv:​1301.​3781 (2013)
12.
go back to reference Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168 (2013) Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting similarities among languages for machine translation. arXiv preprint arXiv:​1309.​4168 (2013)
13.
go back to reference Prost, F., Thain, N., Bolukbasi, T.: Debiasing embeddings for reduced gender bias in text classification. In: Proceedings of the First Workshop on Gender Bias in Natural Language Processing, pp. 69–75 (2019) Prost, F., Thain, N., Bolukbasi, T.: Debiasing embeddings for reduced gender bias in text classification. In: Proceedings of the First Workshop on Gender Bias in Natural Language Processing, pp. 69–75 (2019)
14.
go back to reference Schwarzenberg, R., Raithel, L., Harbecke, D.: Neural vector conceptualization for word vector space interpretation. In: NAACL HLT 2019 (2019) Schwarzenberg, R., Raithel, L., Harbecke, D.: Neural vector conceptualization for word vector space interpretation. In: NAACL HLT 2019 (2019)
15.
go back to reference Wang, Z., Wang, H., Wen, J.R., Xiao, Y.: An inference approach to basic level of categorization. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management - CIKM 2015, pp. 653–662. ACM Press, New York City (2015) Wang, Z., Wang, H., Wen, J.R., Xiao, Y.: An inference approach to basic level of categorization. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management - CIKM 2015, pp. 653–662. ACM Press, New York City (2015)
Metadata
Title
Cross-lingual Neural Vector Conceptualization
Authors
Lisa Raithel
Robert Schwarzenberg
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-32236-6_59

Premium Partner