Skip to main content
Top

2018 | OriginalPaper | Chapter

Learning Bilingual Lexicon for Low-Resource Language Pairs

Authors : ShaoLin Zhu, Xiao Li, YaTing Yang, Lei Wang, ChengGang Mi

Published in: Natural Language Processing and Chinese Computing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Learning bilingual lexicon from monolingual data is a novel idea in natural language process which can benefit many low-resource language pairs. In this paper, we present an approach for obtaining bilingual lexicon from monolingual data. Our method only requires a small seed bilingual lexicon and we use the Canonical Correlation Analysis to construct a shared latent space to explain two monolingual embeddings how to be linked. Experimental results show that a considerable precision and size bilingual lexicon can be learned in Chinese-Uyghur and Chinese-Kazakh monolingual data.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Munteanu, D.S., Marcu, D.: Improving machine translation performance by exploiting non-parallel corpora. Comput. Linguist. 31, 477–504 (2005)CrossRef Munteanu, D.S., Marcu, D.: Improving machine translation performance by exploiting non-parallel corpora. Comput. Linguist. 31, 477–504 (2005)CrossRef
go back to reference Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR Workshop (2013a) Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR Workshop (2013a)
go back to reference Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS (2013b) Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS (2013b)
go back to reference Mikolov, T., Sutskever, I.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems (2013) Mikolov, T., Sutskever, I.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems (2013)
go back to reference Cao, H., Zhao, T., Zhang, S.: A distribution-based model to learn bilingualword embeddings. In: Proceedings of COLING (2016) Cao, H., Zhao, T., Zhang, S.: A distribution-based model to learn bilingualword embeddings. In: Proceedings of COLING (2016)
go back to reference Bach, F.R., Jordan, M.I.: A probabilistic interpretation of canonical correlation analysis (2005) Bach, F.R., Jordan, M.I.: A probabilistic interpretation of canonical correlation analysis (2005)
go back to reference Vulić, I., Moens, M.-F.: A study on bootstrapping bilingual vector spaces from non-parallel data (and nothing else). In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (2013) Vulić, I., Moens, M.-F.: A study on bootstrapping bilingual vector spaces from non-parallel data (and nothing else). In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (2013)
go back to reference Gouws, S., Bengio, Y., Corrado, G.: BilBOWA: fast bilingual distributed representations without word alignments. In: JMLR (2015) Gouws, S., Bengio, Y., Corrado, G.: BilBOWA: fast bilingual distributed representations without word alignments. In: JMLR (2015)
go back to reference Wushouer, M., Ishida, T., Lin, D.: Bilingual dictionary induction as an optimization problem. In: International Conference on Language Resources & Evaluation (2014) Wushouer, M., Ishida, T., Lin, D.: Bilingual dictionary induction as an optimization problem. In: International Conference on Language Resources & Evaluation (2014)
go back to reference Zhang, M., Peng, H., Liu, Y.: Bilingual lexicon induction from non-parallel data with minimal supervision. In: AAAI (2017) Zhang, M., Peng, H., Liu, Y.: Bilingual lexicon induction from non-parallel data with minimal supervision. In: AAAI (2017)
go back to reference Haghighi, A., Liang, P., Berg-Kirkpatrick, T.: Learning bilingual lexicons from monolingual corpora. In: ACL (2008) Haghighi, A., Liang, P., Berg-Kirkpatrick, T.: Learning bilingual lexicons from monolingual corpora. In: ACL (2008)
go back to reference Shi, T., Liu, Z., Liu, Y.: Learning cross-lingual word embeddings via matrix co-factorization. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (2015) Shi, T., Liu, Z., Liu, Y.: Learning cross-lingual word embeddings via matrix co-factorization. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (2015)
go back to reference Vulić, I., Kiela, D., Clark, S.: Multi-modal representations for improved bilingual lexicon learning. In: ACL (2016) Vulić, I., Kiela, D., Clark, S.: Multi-modal representations for improved bilingual lexicon learning. In: ACL (2016)
go back to reference Vulić, I., Korhonen, A.: On the role of seed lexicons in learning bilingual word embeddings. In: ACL (2016) Vulić, I., Korhonen, A.: On the role of seed lexicons in learning bilingual word embeddings. In: ACL (2016)
go back to reference Vulić, I., Moens, M.-F.: Probabilistic models of cross-lingual semantic similarity in context based on latent cross-lingual concepts induced from comparable data. In: EMNLP (2014) Vulić, I., Moens, M.-F.: Probabilistic models of cross-lingual semantic similarity in context based on latent cross-lingual concepts induced from comparable data. In: EMNLP (2014)
go back to reference Gouws, S., Søgaard, A.: Simple task-specific bilingual word embeddings. In: The 2015 Annual Conference of the North American Chapter of the ACL (2015) Gouws, S., Søgaard, A.: Simple task-specific bilingual word embeddings. In: The 2015 Annual Conference of the North American Chapter of the ACL (2015)
go back to reference Liu, X., Duh, K., Matsumoto, Y.: Topic models + word alignment = a flexible framework for extracting bilingual dictionary from comparable corpus (2013) Liu, X., Duh, K., Matsumoto, Y.: Topic models + word alignment = a flexible framework for extracting bilingual dictionary from comparable corpus (2013)
Metadata
Title
Learning Bilingual Lexicon for Low-Resource Language Pairs
Authors
ShaoLin Zhu
Xiao Li
YaTing Yang
Lei Wang
ChengGang Mi
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-73618-1_66

Premium Partner