skip to main content
survey
Open Access

A Survey of Multilingual Neural Machine Translation

Authors Info & Claims
Published:28 September 2020Publication History
Skip Abstract Section

Abstract

We present a survey on multilingual neural machine translation (MNMT), which has gained a lot of traction in recent years. MNMT has been useful in improving translation quality as a result of translation knowledge transfer (transfer learning). MNMT is more promising and interesting than its statistical machine translation counterpart, because end-to-end modeling and distributed representations open new avenues for research on machine translation. Many approaches have been proposed to exploit multilingual parallel corpora for improving translation quality. However, the lack of a comprehensive survey makes it difficult to determine which approaches are promising and, hence, deserve further exploration. In this article, we present an in-depth survey of existing literature on MNMT. We first categorize various approaches based on their central use-case and then further categorize them based on resource scenarios, underlying modeling principles, core-issues, and challenges. Wherever possible, we address the strengths and weaknesses of several techniques by comparing them with each other. We also discuss the future directions for MNMT. This article is aimed towards both beginners and experts in NMT. We hope this article will serve as a starting point as well as a source of new ideas for researchers and engineers interested in MNMT.

References

  1. Željko Agić and Ivan Vulić. 2019. JW300: A wide-coverage parallel corpus for low-resource languages. In Proceedings of the 57th Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 3204--3210. DOI:https://doi.org/10.18653/v1/P19-1310Google ScholarGoogle Scholar
  2. Roee Aharoni, Melvin Johnson, and Orhan Firat. 2019. Massively multilingual neural machine translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 3874--3884. Retrieved from https://www.aclweb.org/anthology/N19-1388.Google ScholarGoogle ScholarCross RefCross Ref
  3. Maruan Al-Shedivat and Ankur Parikh. 2019. Consistency by agreement in zero-shot neural machine translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 1184--1197. Retrieved from https://www.aclweb.org/anthology/N19-1121.Google ScholarGoogle ScholarCross RefCross Ref
  4. Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Roee Aharoni, Melvin Johnson, and Wolfgang Macherey. 2019. The missing ingredient in zero-shot neural machine translation. CoRR abs/1903.07091 (2019).Google ScholarGoogle Scholar
  5. Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Dmitry Lepikhin, Melvin Johnson, Maxim Krikun, Mia Xu Chen, Yuan Cao, George Foster, Colin Cherry, Wolfgang Macherey, Zhifeng Chen, and Yonghui Wu. 2019. Massively multilingual neural machine translation in the wild: Findings and challenges. CoRR abs/1907.05019 (2019).Google ScholarGoogle Scholar
  6. Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2016. Learning principled bilingual mappings of word embeddings while preserving monolingual invariance. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2289--2294. DOI:https://doi.org/10.18653/v1/D16-1250Google ScholarGoogle ScholarCross RefCross Ref
  7. Mikel Artetxe and Holger Schwenk. 2019. Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. Trans. Assoc. Comput. Ling. 7 (2019), 597--610.Google ScholarGoogle ScholarCross RefCross Ref
  8. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15). Retrieved from http://arxiv.org/abs/1409.0473.Google ScholarGoogle Scholar
  9. Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider. 2013. Abstract meaning representation for sembanking. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse. Association for Computational Linguistics, 178--186. Retrieved from http://www.aclweb.org/anthology/W13-2322.Google ScholarGoogle Scholar
  10. Tamali Banerjee, Anoop Kunchukuttan, and Pushpak Bhattacharya. 2018. Multilingual Indian language translation system at WAT 2018: Many-to-one phrase-based SMT. In Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation. Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/Y18-3013.Google ScholarGoogle Scholar
  11. Ankur Bapna and Orhan Firat. 2019. Simple, scalable adaptation for neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 1538--1548. DOI:https://doi.org/10.18653/v1/D19-1165Google ScholarGoogle ScholarCross RefCross Ref
  12. Emmanuel Bengio, Pierre-Luc Bacon, Joelle Pineau, and Doina Precup. 2016. Conditional computation in neural networks for faster models. In Proceedings of the International Conference on Learning Representations (ICLR’16) Workshop Track.Google ScholarGoogle Scholar
  13. Graeme Blackwood, Miguel Ballesteros, and Todd Ward. 2018. Multilingual neural machine translation with task-specific attention. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, 3112--3122. Retrieved from http://aclweb.org/anthology/C18-1263.Google ScholarGoogle Scholar
  14. Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Shujian Huang, Matthias Huck, Philipp Koehn, Qun Liu, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Raphael Rubino, Lucia Specia, and Marco Turchi. 2017. Findings of the 2017 conference on machine translation (WMT’17). In Proceedings of the 2nd Conference on Machine Translation. Association for Computational Linguistics, 169--214. Retrieved from http://www.aclweb.org/anthology/W17-4717.Google ScholarGoogle Scholar
  15. Ondřej Bojar, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Philipp Koehn, and Christof Monz. 2018. Findings of the 2018 conference on machine translation (WMT’18). In Proceedings of the 3rd Conference on Machine Translation: Shared Task Papers. Association for Computational Linguistics, 272--303. Retrieved from http://aclweb.org/anthology/W18-6401.Google ScholarGoogle Scholar
  16. Mauro Cettolo, Marcello Federico, Luisa Bentivogli, Jan Niehues, Sebastian Stüker, Katsuhito Sudoh, Koichiro Yoshino, and Christian Federmann. 2017. Overview of the IWSLT 2017 evaluation campaign. In Proceedings of the 14th International Workshop on Spoken Language Translation. 2--14.Google ScholarGoogle Scholar
  17. Sarath Chandar, Stanislas Lauly, Hugo Larochelle, Mitesh Khapra, Balaraman Ravindran, Vikas C. Raykar, and Amrita Saha. 2014. An autoencoder approach to learning bilingual word representations. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 1853--1861.Google ScholarGoogle Scholar
  18. Rajen Chatterjee, M. Amin Farajian, Matteo Negri, Marco Turchi, Ankit Srivastava, and Santanu Pal. 2017. Multi-source neural automatic post-editing: FBK’s participation in the WMT 2017 APE shared task. In Proceedings of the 2nd Conference on Machine Translation. Association for Computational Linguistics. 630--638. DOI:https://doi.org/10.18653/v1/W17-4773Google ScholarGoogle Scholar
  19. Aditi Chaudhary, Siddharth Dalmia, Junjie Hu, Xinjian Li, Austin Matthews, Aldrian Obaja Muis, Naoki Otani, Shruti Rijhwani, Zaid Sheikh, Nidhi Vyas, Xinyi Wang, Jiateng Xie, Ruochen Xu, Chunting Zhou, Peter J. Jansen, Yiming Yang, Lori Levin, Florian Metze, Teruko Mitamura, David R. Mortensen, Graham Neubig, Eduard Hovy, Alan W. Black, Jaime Carbonell, Graham V. Horwood, Shabnam Tafreshi, Mona Diab, Efsun S. Kayi, Noura Farra, and Kathleen McKeown. 2019. The ARIEL-CMU systems for LoReHLT18. CoRR abs/1902.08899 (2019).Google ScholarGoogle Scholar
  20. Xilun Chen, Ahmed Hassan Awadallah, Hany Hassan, Wei Wang, and Claire Cardie. 2019. Multi-source cross-lingual model transfer: Learning what to share. In Proceedings of the 57th Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 3098--3112. DOI:https://doi.org/10.18653/v1/P19-1299Google ScholarGoogle ScholarCross RefCross Ref
  21. Yun Chen, Yang Liu, Yong Cheng, and Victor O. K. Li. 2017. A teacher-student framework for zero-resource neural machine translation. In Proceedings of the 55th Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1925--1935. DOI:https://doi.org/10.18653/v1/P17-1176Google ScholarGoogle Scholar
  22. Yun Chen, Yang Liu, and Victor O. K. Li. 2018. Zero-resource neural machine translation with multi-agent communication game. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. AAAI Press, 5086--5093.Google ScholarGoogle Scholar
  23. Yong Cheng, Qian Yang, Yang Liu, Maosong Sun, and Wei Xu. 2017. Joint training for pivot-based neural machine translation. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI’17). 3974--3980. DOI:https://doi.org/10.24963/ijcai.2017/555Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder--decoder approaches. In Proceedings of the 8th Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST’14). Association for Computational Linguistics, 103--111. DOI:https://doi.org/10.3115/v1/W14-4012Google ScholarGoogle ScholarCross RefCross Ref
  25. Gyu Hyeon Choi, Jong Hun Shin, and Young Kil Kim. 2018. Improving a multi-source neural machine translation model with corpus extension for low-resource languages. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC’18). European Language Resource Association, 900--904. Retrieved from http://aclweb.org/anthology/L18-1144.Google ScholarGoogle Scholar
  26. Christos Christodouloupoulos and Mark Steedman. 2015. A massively parallel corpus: The Bible in 100 languages. Lang. Resour. Eval. 49, 2 (2015), 375--395.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Chenhui Chu and Raj Dabre. 2018. Multilingual and multi-domain adaptation for neural machine translation. In Proceedings of the 24th Meeting of the Association for Natural Language Processing (NLP’18). 909--912.Google ScholarGoogle Scholar
  28. Chenhui Chu and Raj Dabre. 2019. Multilingual multi-domain adaptation approaches for neural machine translation. CoRR abs/1906.07978 (2019).Google ScholarGoogle Scholar
  29. Chenhui Chu, Raj Dabre, and Sadao Kurohashi. 2017. An empirical comparison of domain adaptation methods for neural machine translation. In Proceedings of the 55th Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 385--391. DOI:https://doi.org/10.18653/v1/P17-2061Google ScholarGoogle ScholarCross RefCross Ref
  30. Chenhui Chu and Rui Wang. 2018. A survey of domain adaptation for neural machine translation. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, 1304--1319. Retrieved from http://aclweb.org/anthology/C18-1111.Google ScholarGoogle Scholar
  31. Michael Collins, Philipp Koehn, and Ivona Kučerová. 2005. Clause restructuring for statistical machine translation. In Proceedings of the 43rd Meeting of the Association for Computational Linguistics (ACL’05). Association for Computational Linguistics, 531--540. DOI:https://doi.org/10.3115/1219840.1219906Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Alexis Conneau and Guillaume Lample. 2019. Cross-lingual language model pretraining. In Proceedings of the 32nd Conference on Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 7059--7069. Retrieved from http://papers.nips.cc/paper/8928-cross-lingual-language-model-pretraining.pdf.Google ScholarGoogle Scholar
  33. Alexis Conneau, Guillaume Lample, Marc’Aurelio Ranzato, Ludovic Denoyer, and Hervé Jégou. 2018. Word translation without parallel data. In Proceedings of the International Conference on Learning Representations. Retrieved from https://github.com/facebookresearch/MUSE.Google ScholarGoogle Scholar
  34. Alexis Conneau, Ruty Rinott, Guillaume Lample, Adina Williams, Samuel Bowman, Holger Schwenk, and Veselin Stoyanov. 2018. XNLI: Evaluating cross-lingual sentence representations. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. 2475--2485. Retrieved from https://www.aclweb.org/anthology/D18-1269.Google ScholarGoogle ScholarCross RefCross Ref
  35. Anna Currey and Kenneth Heafield. 2019. Zero-resource neural machine translation with monolingual pivot data. In Proceedings of the 3rd Workshop on Neural Generation and Translation. Association for Computational Linguistics, 99--107. DOI:https://doi.org/10.18653/v1/D19-5610Google ScholarGoogle ScholarCross RefCross Ref
  36. Raj Dabre, Fabien Cromieres, and Sadao Kurohashi. 2017. Enabling multi-source neural machine translation by concatenating source sentences in multiple languages. In Proceedings of the Machine Translation Summit XVI, Vol.1: Research Track. 96--106.Google ScholarGoogle Scholar
  37. Raj Dabre, Atsushi Fujita, and Chenhui Chu. 2019. Exploiting multilingualism through multistage fine-tuning for low-resource neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 1410--1416. DOI:https://doi.org/10.18653/v1/D19-1146Google ScholarGoogle ScholarCross RefCross Ref
  38. Raj Dabre, Anoop Kunchukuttan, Atsushi Fujita, and Eiichiro Sumita. 2018. NICT’s participation in WAT 2018: Approaches using multilingualism and recurrently stacked layers. In Proceedings of the 5th Workshop on Asian Language Translation.Google ScholarGoogle Scholar
  39. Raj Dabre and Sadao Kurohashi. 2017. MMCR4NLP: Multilingual multiway corpora repository for natural language processing. arXiv preprint arXiv:1710.01025 (2017).Google ScholarGoogle Scholar
  40. Raj Dabre, Tetsuji Nakagawa, and Hideto Kazawa. 2017. An empirical study of language relatedness for transfer learning in neural machine translation. In Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation. The National University (Philippines), 282--286. Retrieved from http://aclweb.org/anthology/Y17-1038.Google ScholarGoogle Scholar
  41. Mattia A. Di Gangi, Roldano Cattoni, Luisa Bentivogli, Matteo Negri, and Marco Turchi. 2019. MuST-C: A multilingual speech translation corpus. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics. 2012--2017. Retrieved from https://www.aclweb.org/anthology/N19-1202.Google ScholarGoogle Scholar
  42. Daxiang Dong, Hua Wu, Wei He, Dianhai Yu, and Haifeng Wang. 2015. Multi-task learning for multiple language translation. In Proceedings of the 53rd Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 1723--1732. DOI:https://doi.org/10.3115/v1/P15-1166Google ScholarGoogle Scholar
  43. Bonnie J. Dorr. 1987. UNITRAN: An interlingua approach to machine translation. In Proceedings of the 6th Conference of the American Association of Artificial Intelligence.Google ScholarGoogle Scholar
  44. Kevin Duh, Graham Neubig, Katsuhito Sudoh, and Hajime Tsukada. 2013. Adaptation data selection using neural language models: Experiments in machine translation. In Proceedings of the 51st Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 678--683. Retrieved from http://www.aclweb.org/anthology/P13-2119.Google ScholarGoogle Scholar
  45. Carlos Escolano, Marta R. Costa-jussà, and José A. R. Fonollosa. 2019. From bilingual to multilingual neural machine translation by incremental training. In Proceedings of the 57th Meeting of the Association for Computational Linguistics.Google ScholarGoogle Scholar
  46. Cristina España-Bonet, Ádám Csaba Varga, Alberto Barrón-Cedeño, and Josef van Genabith. 2017. An empirical analysis of NMT-derived interlingual embeddings and their use in parallel sentence identification. IEEE J. Select. Topics Sig. Proc. 11, 8 (Dec. 2017), 1340--1350. DOI:https://doi.org/10.1109/JSTSP.2017.2764273Google ScholarGoogle Scholar
  47. Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research), Doina Precup and Yee Whye Teh (Eds.), Vol. 70. 1126--1135. Retrieved from http://proceedings.mlr.press/v70/finn17a.html.Google ScholarGoogle Scholar
  48. Orhan Firat, Kyunghyun Cho, and Yoshua Bengio. 2016. Multi-way, multilingual neural machine translation with a shared attention mechanism. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 866--875. DOI:https://doi.org/10.18653/v1/N16-1101Google ScholarGoogle ScholarCross RefCross Ref
  49. Orhan Firat, Baskaran Sankaran, Yaser Al-Onaizan, Fatos T. Yarman Vural, and Kyunghyun Cho. 2016. Zero-resource translation with multi-lingual neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 268--277. DOI:https://doi.org/10.18653/v1/D16-1026Google ScholarGoogle ScholarCross RefCross Ref
  50. Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. 2016. Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17, 1 (2016), 2096--2030.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Ekaterina Garmash and Christof Monz. 2016. Ensemble learning for multi-source neural machine translation. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING’16). The COLING 2016 Organizing Committee, 1409--1418. Retrieved from http://aclweb.org/anthology/C16-1133.Google ScholarGoogle Scholar
  52. Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N. Dauphin. 2017. Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research), Doina Precup and Yee Whye Teh (Eds.), Vol. 70. 1243--1252. Retrieved from http://proceedings.mlr.press/v70/gehring17a.html.Google ScholarGoogle Scholar
  53. Adrià De Gispert and José B. Mariño. 2006. Catalan-English statistical machine translation without parallel corpus: Bridging through Spanish. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC’06). 65--68.Google ScholarGoogle Scholar
  54. Jiatao Gu, Hany Hassan, Jacob Devlin, and Victor O. K. Li. 2018. Universal neural machine translation for extremely low resource languages. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, 344--354. DOI:https://doi.org/10.18653/v1/N18-1032Google ScholarGoogle Scholar
  55. Jiatao Gu, Yong Wang, Yun Chen, Victor O. K. Li, and Kyunghyun Cho. 2018. Meta-learning for low-resource neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 3622--3631. Retrieved from http://aclweb.org/anthology/D18-1398.Google ScholarGoogle ScholarCross RefCross Ref
  56. Jiatao Gu, Yong Wang, Kyunghyun Cho, and Victor O. K. Li. 2019. Improved zero-shot neural machine translation via ignoring spurious correlations. In Proceedings of the 57th Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1258--1268. DOI:https://doi.org/10.18653/v1/P19-1121Google ScholarGoogle Scholar
  57. Francisco Guzmán, Peng-Jen Chen, Myle Ott, Juan Pino, Guillaume Lample, Philipp Koehn, Vishrav Chaudhary, and Marc’Aurelio Ranzato. 2019. The FLORES evaluation datasets for low-resource machine translation: Nepali--English and Sinhala--English. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 6098--6111. DOI:https://doi.org/10.18653/v1/D19-1632Google ScholarGoogle ScholarCross RefCross Ref
  58. Thanh-Le Ha, Jan Niehues, and Alexander H. Waibel. 2016. Toward multilingual neural machine translation with universal encoder and decoder. In Proceedings of the 13th International Workshop on Spoken Language Translation. 1--7.Google ScholarGoogle Scholar
  59. Thanh-Le Ha, Jan Niehues, and Alexander H. Waibel. 2017. Effective strategies in zero-shot neural machine translation. In Proceedings of the 14th International Workshop on Spoken Language Translation. 105--112.Google ScholarGoogle Scholar
  60. Barry Haddow and Faheem Kirefu. 2020. PMIndia—A collection of parallel corpora of languages of India. arxiv 2001.09907 (2020).Google ScholarGoogle Scholar
  61. Junxian He, Jiatao Gu, Jiajun Shen, and Marc’Aurelio Ranzato. 2020. Revisiting self-training for neural sequence generation. In Proceedings of the 8th International Conference on Learning Representations (ICLR’20). Retrieved from https://openreview.net/forum?id=SJgdnAVKDH.Google ScholarGoogle Scholar
  62. Carlos Henríquez, Marta R. Costa-jussá, Rafael E. Banchs, Lluis Formiga, and José B. Mariño. 2011. Pivot strategies as an alternative for statistical machine translation tasks involving Iberian languages. In Proceedings of the Workshop on Iberian Cross-language Natural Language Processing Tasks (ICL’11). 22--27.Google ScholarGoogle Scholar
  63. Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2014. Distilling the knowledge in a neural network. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS’14) Deep Learning Workshop.Google ScholarGoogle Scholar
  64. Chris Hokamp, John Glover, and Demian Gholipour Ghalandari. 2019. Evaluating the supervised and zero-shot performance of multi-lingual translation models. In Proceedings of the 4th Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1). Association for Computational Linguistics, 209--217. DOI:https://doi.org/10.18653/v1/W19-5319Google ScholarGoogle ScholarCross RefCross Ref
  65. Yanping Huang, Yonglong Cheng, Dehao Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, and Zhifeng Chen. 2019. GPipe: Efficient training of giant neural networks using pipeline parallelism. In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS’19).Google ScholarGoogle Scholar
  66. Pratik Jawanpuria, Arjun Balgovind, Anoop Kunchukuttan, and Bamdev Mishra. 2019. Learning multilingual word embeddings in latent metric space: A geometric approach. Trans. Assoc. Comput. Ling. 7 (2019), 107--120. Retrieved from https://www.aclweb.org/anthology/Q19-1007.Google ScholarGoogle Scholar
  67. Sébastien Jean, Orhan Firat, and Melvin Johnson. 2019. Adaptive scheduling for multi-task learning. In Proceedings of the Continual Learning Workshop at NeurIPS’18.Google ScholarGoogle Scholar
  68. Girish Nath Jha. 2010. The TDIL program and the Indian language Corpora Intitiative (ILCI). In Proceedings of the 7th Conference on International Language Resources and Evaluation (LREC’10). European Languages Resources Association (ELRA). Retrieved from http://www.lrec-conf.org/proceedings/lrec2010/pdf/874_Paper.pdf.Google ScholarGoogle Scholar
  69. Baijun Ji, Zhirui Zhang, Xiangyu Duan, Min Zhang, Boxing Chen, and Weihua Luo. 2020. Cross-lingual pre-training based transfer for zero-shot neural machine translation. In Proceedings of the 34th AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  70. Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2017. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Trans. Assoc. Comput. Ling. 5 (2017), 339--351. Retrieved from http://aclweb.org/anthology/Q17-1024.Google ScholarGoogle ScholarCross RefCross Ref
  71. Yunsu Kim, Yingbo Gao, and Hermann Ney. 2019. Effective cross-lingual transfer of neural machine translation models without shared vocabularies. In Proceedings of the 57th Meeting of the Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  72. Yunsu Kim, Petre Petrov, Pavel Petrushkov, Shahram Khadivi, and Hermann Ney. 2019. Pivot-based transfer learning for neural machine translation between non-English languages. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 866--876. DOI:https://doi.org/10.18653/v1/D19-1080Google ScholarGoogle ScholarCross RefCross Ref
  73. Yoon Kim and Alexander M. Rush. 2016. Sequence-level knowledge distillation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1317--1327. DOI:https://doi.org/10.18653/v1/D16-1139Google ScholarGoogle Scholar
  74. Eliyahu Kiperwasser and Miguel Ballesteros. 2018. Scheduled multi-task learning: From syntax to translation. Trans. Assoc. Comput. Ling. 6 (2018), 225--240. DOI:https://doi.org/10.1162/tacl_a_00017Google ScholarGoogle Scholar
  75. Alexandre Klementiev, Ivan Titov, and Binod Bhattarai. 2012. Inducing crosslingual distributed representations of words. In Proceedings of the International Conference on Computational Linguistics (COLING’12). The COLING 2012 Organizing Committee, 1459--1474. Retrieved from https://www.aclweb.org/anthology/C12-1089.Google ScholarGoogle Scholar
  76. Tom Kocmi and Ondřej Bojar. 2018. Trivial transfer learning for low-resource neural machine translation. In Proceedings of the Third Conference on Machine Translation, Volume 1: Research Papers. Association for Computational Linguistics, 244--252. Retrieved from http://www.aclweb.org/anthology/W18-6325.Google ScholarGoogle ScholarCross RefCross Ref
  77. Philipp Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In Proceedings of the 10th Machine Translation Summit. AAMT, 79--86. Retrieved from http://mt-archive.info/MTS-2005-Koehn.pdf.Google ScholarGoogle Scholar
  78. Philipp Koehn. 2017. Neural machine translation. CoRR abs/1709.07809 (2017).Google ScholarGoogle Scholar
  79. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions. Association for Computational Linguistics, 177--180. Retrieved from http://www.aclweb.org/anthology/P/P07/P07-2045.Google ScholarGoogle ScholarCross RefCross Ref
  80. Philipp Koehn and Rebecca Knowles. 2017. Six challenges for neural machine translation. In Proceedings of the 1st Workshop on Neural Machine Translation. Association for Computational Linguistics, 28--39. Retrieved from http://www.aclweb.org/anthology/W17-3204.Google ScholarGoogle ScholarCross RefCross Ref
  81. Philipp Koehn, Franz J. Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics. 127--133. Retrieved from https://www.aclweb.org/anthology/N03-1017.Google ScholarGoogle ScholarCross RefCross Ref
  82. Taku Kudo and John Richardson. 2018. SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, 66--71. DOI:https://doi.org/10.18653/v1/D18-2012Google ScholarGoogle ScholarCross RefCross Ref
  83. Sneha Kudugunta, Ankur Bapna, Isaac Caswell, and Orhan Firat. 2019. Investigating multilingual NMT representations at scale. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 1565--1575. DOI:https://doi.org/10.18653/v1/D19-1167Google ScholarGoogle ScholarCross RefCross Ref
  84. Anoop Kunchukuttan. 2020. IndoWordnet Parallel Corpus. Retrieved from https://github.com/anoopkunchukuttan/indowordnet_parallel.Google ScholarGoogle Scholar
  85. Anoop Kunchukuttan and Pushpak Bhattacharyya. 2016. Orthographic syllable as basic unit for SMT between related languages. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1912--1917. DOI:https://doi.org/10.18653/v1/D16-1196Google ScholarGoogle ScholarCross RefCross Ref
  86. Anoop Kunchukuttan and Pushpak Bhattacharyya. 2017. Learning variable length units for SMT between related languages via byte pair encoding. In Proceedings of the 1st Workshop on Subword and Character Level Models in NLP. Association for Computational Linguistics, 14--24. DOI:https://doi.org/10.18653/v1/W17-4102Google ScholarGoogle ScholarCross RefCross Ref
  87. Anoop Kunchukuttan, Abhijit Mishra, Rajen Chatterjee, Ritesh Shah, and Pushpak Bhattacharyya. 2014. Shata-Anuvadak: Tackling multiway translation of Indian languages. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14). European Language Resources Association (ELRA), 1781--1787. Retrieved from http://www.lrec-conf.org/proceedings/lrec2014/pdf/414_Paper.pdf.Google ScholarGoogle Scholar
  88. Anoop Kunchukuttan, Maulik Shah, Pradyot Prakash, and Pushpak Bhattacharyya. 2017. Utilizing lexical similarity between related, low-resource languages for pivot-based SMT. In Proceedings of the 8th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Asian Federation of Natural Language Processing, 283--289. Retrieved from http://aclweb.org/anthology/I17-2048.Google ScholarGoogle Scholar
  89. Surafel Melaku Lakew, Mauro Cettolo, and Marcello Federico. 2018. A comparison of transformer and recurrent neural networks on multilingual neural machine translation. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, 641--652. Retrieved from http://aclweb.org/anthology/C18-1054.Google ScholarGoogle Scholar
  90. Surafel Melaku Lakew, Aliia Erofeeva, Matteo Negri, Marcello Federico, and Marco Turchi. 2018. Transfer learning in multilingual neural machine translation with dynamic vocabulary. In Proceedings of the 15th International Workshop on Spoken Language Translation (IWSLT’18). 54--61.Google ScholarGoogle Scholar
  91. Surafel Melaku Lakew, Quintino F. Lotito, Matteo Negri, Marco Turchi, and Marcello Federico. 2017. Improving zero-shot translation of low-resource languages. In Proceedings of the 14th International Workshop on Spoken Language Translation. 113--119.Google ScholarGoogle Scholar
  92. Guillaume Lample, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2018. Unsupervised machine translation using monolingual corpora only. In Proceedings of the International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=rkYTTf-AZ.Google ScholarGoogle Scholar
  93. Jason Lee, Kyunghyun Cho, and Thomas Hofmann. 2017. Fully character-level neural machine translation without explicit segmentation. Trans. Assoc. Comput. Ling. 5 (2017), 365--378. Retrieved from http://aclweb.org/anthology/Q17-1026.Google ScholarGoogle ScholarCross RefCross Ref
  94. Yichao Lu, Phillip Keung, Faisal Ladhak, Vikas Bhardwaj, Shaonan Zhang, and Jason Sun. 2018. A neural interlingua for multilingual machine translation. In Proceedings of the 3rd Conference on Machine Translation: Research Papers. Association for Computational Linguistics, 84--92. Retrieved from http://aclweb.org/anthology/W18-6309.Google ScholarGoogle ScholarCross RefCross Ref
  95. Mieradilijiang Maimaiti, Yang Liu, Huanbo Luan, and Maosong Sun. 2019. Multi-round transfer learning for low-resource NMT using multiple high-resource languages. ACM Trans. Asian Low-Resour. Lang. Inf. Proc. 18, 4 (May 2019). DOI:https://doi.org/10.1145/3314945Google ScholarGoogle Scholar
  96. Chaitanya Malaviya, Graham Neubig, and Patrick Littell. 2017. Learning language representations for typology prediction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2529--2535. DOI:https://doi.org/10.18653/v1/D17-1268Google ScholarGoogle ScholarCross RefCross Ref
  97. Giulia Mattoni, Pat Nagle, Carlos Collantes, and Dimitar Shterionov. 2017. Zero-shot translation for Indian languages with sparse data. In Proceedings of Machine Translation Summit XVI, Vol. 2: Users and Translators Track. 1--10.Google ScholarGoogle Scholar
  98. Evgeny Matusov, Nicola Ueffing, and Hermann Ney. 2006. Computing consensus translation for multiple machine translation systems using enhanced hypothesis alignment. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics. 33--40. Retrieved from https://www.aclweb.org/anthology/E06-1005.Google ScholarGoogle Scholar
  99. Cettolo Mauro, Girardi Christian, and Federico Marcello. 2012. Wit3: Web inventory of transcribed and translated talks. In Proceedings of the 16th Conference of European Association for Machine Translation. 261--268.Google ScholarGoogle Scholar
  100. Tomas Mikolov, Quoc V. Le, and Ilya Sutskever. 2013. Exploiting similarities among languages for machine translation. CoRR abs/1309.4168 (2013).Google ScholarGoogle Scholar
  101. Rudra Murthy, Anoop Kunchukuttan, and Pushpak Bhattacharyya. 2019. Addressing word-order divergence in multilingual neural machine translation for extremely low resource languages. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 3868--3873. Retrieved from https://www.aclweb.org/anthology/N19-1387.Google ScholarGoogle ScholarCross RefCross Ref
  102. Toshiaki Nakazawa, Shohei Higashiyama, Chenchen Ding, Raj Dabre, Anoop Kunchukuttan, Win Pa Pa, Isao Goto, Hideya Mino, Katsuhito Sudoh, and Sadao Kurohashi. 2018. Overview of the 5th workshop on Asian translation. In Proceedings of the 5th Workshop on Asian Translation (WAT’18). 1--41.Google ScholarGoogle Scholar
  103. Preslav Nakov and Hwee Tou Ng. 2009. Improved statistical machine translation for resource-poor languages using related resource-rich languages. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1358--1367. Retrieved from https://www.aclweb.org/anthology/D09-1141.Google ScholarGoogle ScholarCross RefCross Ref
  104. Graham Neubig. 2017. Neural machine translation and sequence-to-sequence models: A tutorial. CoRR abs/1703.01619 (2017).Google ScholarGoogle Scholar
  105. Graham Neubig and Junjie Hu. 2018. Rapid adaptation of neural machine translation to new languages. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 875--880. Retrieved from http://aclweb.org/anthology/D18-1103.Google ScholarGoogle ScholarCross RefCross Ref
  106. Toan Q. Nguyen and David Chiang. 2017. Transfer learning across low-resource, related languages for neural machine translation. In Proceedings of the 8th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Asian Federation of Natural Language Processing, 296--301. Retrieved from http://aclweb.org/anthology/I17-2050.Google ScholarGoogle Scholar
  107. Yuta Nishimura, Katsuhito Sudoh, Graham Neubig, and Satoshi Nakamura. 2018. Multi-source neural machine translation with missing data. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation. Association for Computational Linguistics, 92--99. Retrieved from http://aclweb.org/anthology/W18-2711.Google ScholarGoogle ScholarCross RefCross Ref
  108. Yuta Nishimura, Katsuhito Sudoh, Graham Neubig, and Satoshi Nakamura. 2018. Multi-source neural machine translation with data augmentation. In Proceedings of the 15th International Workshop on Spoken Language Translation (IWSLT’18). 48--53. Retrieved from https://arxiv.org/abs/1810.06826.Google ScholarGoogle Scholar
  109. Eric Nyberg, Teruko Mitamura, and Jaime Carbonell. 1997. The KANT machine translation system: From R&D to initial deployment. In Proceedings of the LISA Workshop on Integrating Advanced Translation Technology. 1--7.Google ScholarGoogle Scholar
  110. Franz Josef Och and Hermann Ney. 2001. Statistical multi-source translation. In Proceedings of the Machine Translation Summit, Vol. 8. 253--258.Google ScholarGoogle Scholar
  111. Robert Östling and Jörg Tiedemann. 2017. Continuous multilinguality with language vectors. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Association for Computational Linguistics, 644--649. Retrieved from https://www.aclweb.org/anthology/E17-2102.Google ScholarGoogle ScholarCross RefCross Ref
  112. Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 10 (Oct. 2010), 1345--1359. DOI:https://doi.org/10.1109/TKDE.2009.191Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. Ngoc-Quan Pham, Jan Niehues, Thanh-Le Ha, and Alexander Waibel. 2019. Improving zero-shot translation with language-independent constraints. In Proceedings of the 4th Conference on Machine Translation (Volume 1: Research Papers). Association for Computational Linguistics, 13--23. DOI:https://doi.org/10.18653/v1/W19-5202Google ScholarGoogle ScholarCross RefCross Ref
  114. Telmo Pires, Eva Schlinger, and Dan Garrette. 2019. How multilingual is multilingual BERT? In Proceedings of the 57th Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 4996--5001. DOI:https://doi.org/10.18653/v1/P19-1493Google ScholarGoogle ScholarCross RefCross Ref
  115. Emmanouil Antonios Platanios, Mrinmaya Sachan, Graham Neubig, and Tom Mitchell. 2018. Contextual parameter generation for universal neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 425--435. Retrieved from http://aclweb.org/anthology/D18-1039.Google ScholarGoogle ScholarCross RefCross Ref
  116. Matt Post, Chris Callison-Burch, and Miles Osborne. 2012. Constructing parallel corpora for six Indian languages via crowdsourcing. In Proceedings of the 7th Workshop on Statistical Machine Translation. Association for Computational Linguistics, 401--409.Google ScholarGoogle Scholar
  117. Raj Noel Dabre Prasanna. 2018. Exploiting Multilingualism and Transfer Learning for Low Resource Machine Translation. Ph.D. Dissertation. Kyoto University. Retrieved from http://hdl.handle.net/2433/232411.Google ScholarGoogle Scholar
  118. Maithra Raghu, Justin Gilmer, Jason Yosinski, and Jascha Sohl-Dickstein. 2017. SVCCA: Singular vector canonical correlation analysis for deep learning dynamics and interpretability. In Proceedings of the 30th Conference on Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 6076--6085. Retrieved from http://papers.nips.cc/paper/7188-svcca-singular-vector-canonical-correlation-analysis-for-deep-learning-dynamics-and-interpretability.pdf.Google ScholarGoogle Scholar
  119. Prajit Ramachandran, Peter Liu, and Quoc Le. 2017. Unsupervised pretraining for sequence to sequence learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 383--391. DOI:https://doi.org/10.18653/v1/D17-1039Google ScholarGoogle ScholarCross RefCross Ref
  120. Ananthakrishnan Ramanathan, Jayprasad Hegde, Ritesh Shah, Pushpak Bhattacharyya, and M. Sasikumar. 2008. Simple syntactic and morphological processing can help English-Hindi statistical machine translation. In Proceedings of the International Joint Conference on Natural Language Processing.Google ScholarGoogle Scholar
  121. Matīss Rikters, Mārcis Pinnis, and Rihards Krišlauks. 2018. Training and adapting multilingual NMT for less-resourced and morphologically rich languages. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC’18). European Language Resources Association (ELRA), 3766--3773.Google ScholarGoogle Scholar
  122. Sebastian Ruder. 2016. An overview of gradient descent optimization algorithms. CoRR abs/1609.04747 (2016).Google ScholarGoogle Scholar
  123. Devendra Sachan and Graham Neubig. 2018. Parameter sharing methods for multilingual self-attentional translation models. In Proceedings of the 3rd Conference on Machine Translation: Research Papers. Association for Computational Linguistics, 261--271. Retrieved from http://aclweb.org/anthology/W18-6327.Google ScholarGoogle ScholarCross RefCross Ref
  124. Amrita Saha, Mitesh M. Khapra, Sarath Chandar, Janarthanan Rajendran, and Kyunghyun Cho. 2016. A correlational encoder decoder architecture for pivot based sequence generation. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING’16). The COLING 2016 Organizing Committee, 109--118. Retrieved from https://www.aclweb.org/anthology/C16-1011.Google ScholarGoogle Scholar
  125. Peter H. Schönemann. 1966. A generalized solution of the orthogonal procrustes problem. Psychometrika 31, 1 (1966), 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  126. Josh Schroeder, Trevor Cohn, and Philipp Koehn. 2009. Word lattices for multi-source translation. In Proceedings of the 12th Conference of the European Chapter of the ACL (EACL’09). Association for Computational Linguistics, 719--727. Retrieved from https://www.aclweb.org/anthology/E09-1082.Google ScholarGoogle ScholarCross RefCross Ref
  127. Mike Schuster and Kaisuke Nakajima. 2012. Japanese and Korean voice search. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP’12). IEEE, 5149--5152. Retrieved from http://dblp.uni-trier.de/db/conf/icassp/icassp2012.html#SchusterN12.Google ScholarGoogle ScholarCross RefCross Ref
  128. Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong, and Francisco Guzmán. 2019. WikiMatrix: Mining 135M parallel sentences in 1620 language pairs from Wikipedia. CoRR abs/1907.05791 (2019).Google ScholarGoogle Scholar
  129. Sukanta Sen, Kamal Kumar Gupta, Asif Ekbal, and Pushpak Bhattacharyya. 2019. Multilingual unsupervised NMT using shared encoder and language-specific decoders. In Proceedings of the 57th Meeting of the Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  130. Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Improving neural machine translation models with monolingual data. In Proceedings of the 54th Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 86--96. Retrieved from http://www.aclweb.org/anthology/P16-1009.Google ScholarGoogle ScholarCross RefCross Ref
  131. Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1715--1725. Retrieved from http://www.aclweb.org/anthology/P16-1162.Google ScholarGoogle ScholarCross RefCross Ref
  132. Lierni Sestorain, Massimiliano Ciaramita, Christian Buck, and Thomas Hofmann. 2018. Zero-shot dual machine translation. CoRR abs/1805.10338 (2018).Google ScholarGoogle Scholar
  133. Petr Sgall and Jarmila Panevová. 1987. Machine translation, linguistics, and interlingua. In Proceedings of the 3rd Conference on European Chapter of the Association for Computational Linguistics (EACL’87). Association for Computational Linguistics, 99--103. DOI:https://doi.org/10.3115/976858.976876Google ScholarGoogle ScholarDigital LibraryDigital Library
  134. Itamar Shatz. 2016. Native language influence during second language acquisition: A large-scale learner corpus analysis. In Proceedings of the Pacific Second Language Research Forum (PacSLRF’16). 175--180.Google ScholarGoogle Scholar
  135. Aditya Siddhant, Melvin Johnson, Henry Tsai, Naveen Arivazhagan, Jason Riesa, Ankur Bapna, Orhan Firat, and Karthik Raman. 2020. Evaluating the cross-lingual effectiveness of massively multilingual neural machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’20).Google ScholarGoogle ScholarCross RefCross Ref
  136. Shashank Siripragrada, Jerin Philip, Vinay P. Namboodiri, and C. V. Jawahar. 2020. A multilingual parallel corpora collection effort for Indian languages. In Proceedings of the 12th Language Resources and Evaluation Conference. European Language Resources Association, 3743--3751. Retrieved from https://www.aclweb.org/anthology/2020.lrec-1.462.Google ScholarGoogle Scholar
  137. Anders Søgaard, Sebastian Ruder, and Ivan Vulić. 2018. On the limitations of unsupervised bilingual dictionary induction. In Proceedings of the 56th Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 778--788. DOI:https://doi.org/10.18653/v1/P18-1072Google ScholarGoogle ScholarCross RefCross Ref
  138. Kai Song, Yue Zhang, Heng Yu, Weihua Luo, Kun Wang, and Min Zhang. 2019. Code-switching for enhancing NMT with pre-specified translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 449--459. Retrieved from https://www.aclweb.org/anthology/N19-1044.Google ScholarGoogle Scholar
  139. Ralf Steinberger, Mohamed Ebrahim, Alexandros Poulis, Manuel Carrasco-Benitez, Patrick Schlüter, Marek Przybyszewski, and Signe Gilbro. 2014. An overview of the European Union’s highly multilingual parallel corpora. Lang. Resour. Eval. 48, 4 (2014), 679--707.Google ScholarGoogle ScholarDigital LibraryDigital Library
  140. Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS’14). The MIT Press, 3104--3112. Retrieved from http://dl.acm.org/citation.cfm?id=2969033.2969173.Google ScholarGoogle ScholarDigital LibraryDigital Library
  141. Xu Tan, Jiale Chen, Di He, Yingce Xia, Tao Qin, and Tie-Yan Liu. 2019. Multilingual neural machine translation with language clustering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 963--973. DOI:https://doi.org/10.18653/v1/D19-1089Google ScholarGoogle ScholarCross RefCross Ref
  142. Xu Tan, Yi Ren, Di He, Tao Qin, and Tie-Yan Liu. 2019. Multilingual neural machine translation with knowledge distillation. In Proceedings of the International Conference on Learning Representations (ICLR’19). Retrieved from http://arxiv.org/abs/1902.10461.Google ScholarGoogle Scholar
  143. Ye Kyaw Thu, Win Pa Pa, Masao Utiyama, Andrew M. Finch, and Eiichiro Sumita. 2016. Introducing the Asian language treebank (ALT). In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16). European Language Resources Association (ELRA), 1574--1578.Google ScholarGoogle Scholar
  144. Jörg Tiedemann. 2012. Character-based pivot translation for under-resourced languages and domains. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 141--151. Retrieved from https://www.aclweb.org/anthology/E12-1015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  145. Jörg Tiedemann. 2012. Parallel data, tools, and interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12). European Language Resources Association (ELRA), 2214--2218. Retrieved from http://www.lrec-conf.org/proceedings/lrec2012/pdf/463_Paper.pdf.Google ScholarGoogle Scholar
  146. Hiroshi Uchida. 1996. UNL: Universal networking language—An electronic language for communication, understanding, and collaboration. In UNU/IAS/UNL Center. Retrieved from https://www.semanticscholar.org/paper/UNL%3A-Universal-Networking-Language-An-Electronic-Uchida/f281c6a61ee69e4fa0f15f3f6d03faeee7a74e10.Google ScholarGoogle Scholar
  147. Masao Utiyama and Hitoshi Isahara. 2007. A comparison of pivot methods for phrase-based statistical machine translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 484--491. Retrieved from https://www.aclweb.org/anthology/N07-1061.Google ScholarGoogle Scholar
  148. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 30th Conference on Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 5998--6008. Retrieved from http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf.Google ScholarGoogle Scholar
  149. Raúl Vázquez, Alessandro Raganato, Jörg Tiedemann, and Mathias Creutz. 2018. Multilingual NMT with a language-independent attention bridge. CoRR abs/1811.00498 (2018).Google ScholarGoogle Scholar
  150. David Vilar, Jan-Thorsten Peter, and Hermann Ney. 2007. Can we translate letters? In Proceedings of the 2nd Workshop on Statistical Machine Translation. Association for Computational Linguistics, 33--39. Retrieved from https://www.aclweb.org/anthology/W07-0705.Google ScholarGoogle ScholarCross RefCross Ref
  151. Karthik Visweswariah, Rajakrishnan Rajkumar, Ankur Gandhe, Ananthakrishnan Ramanathan, and Jiri Navratil. 2011. A word reordering model for improved machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. 486--496. Retrieved from https://www.aclweb.org/anthology/D11-1045.Google ScholarGoogle ScholarDigital LibraryDigital Library
  152. Rui Wang, Andrew Finch, Masao Utiyama, and Eiichiro Sumita. 2017. Sentence embedding for neural machine translation domain adaptation. In Proceedings of the 55th Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 560--566. Retrieved from http://aclweb.org/anthology/P17-2089.Google ScholarGoogle ScholarCross RefCross Ref
  153. Rui Wang, Masao Utiyama, Lemao Liu, Kehai Chen, and Eiichiro Sumita. 2017. Instance weighting for neural machine translation domain adaptation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1482--1488. DOI:https://doi.org/10.18653/v1/D17-1155Google ScholarGoogle ScholarCross RefCross Ref
  154. Xinyi Wang and Graham Neubig. 2019. Target conditioned sampling: Optimizing data selection for multilingual neural machine translation. In Proceedings of the 57th Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 5823--5828. DOI:https://doi.org/10.18653/v1/P19-1583Google ScholarGoogle ScholarCross RefCross Ref
  155. Xinyi Wang, Hieu Pham, Philip Arthur, and Graham Neubig. 2019. Multilingual neural machine translation with soft decoupled encoding. In Proceedings of the International Conference on Learning Representations (ICLR’19). Retrieved from https://arxiv.org/abs/1902.03499.Google ScholarGoogle Scholar
  156. Yining Wang, Jiajun Zhang, Feifei Zhai, Jingfang Xu, and Chengqing Zong. 2018. Three strategies to improve one-to-many multilingual translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2955--2960. Retrieved from http://aclweb.org/anthology/D18-1326.Google ScholarGoogle ScholarCross RefCross Ref
  157. Yining Wang, Long Zhou, Jiajun Zhang, Feifei Zhai, Jingfang Xu, and Chengqing Zong. 2019. A compact and language-sensitive multilingual translation method. In Proceedings of the 57th Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1213--1223. DOI:https://doi.org/10.18653/v1/P19-1117Google ScholarGoogle ScholarCross RefCross Ref
  158. Toon Witkam. 2006. History and heritage of the DLT (Distributed Language Translation) project. In Utrecht, The Netherlands: Private Publication. 1--11. Retrieved from http://www.mt-archive.info/Witkam-2006.pdf.Google ScholarGoogle Scholar
  159. Hua Wu and Haifeng Wang. 2007. Pivot language approach for phrase-based statistical machine translation. Mach. Translat. 21, 3 (2007), 165--181.Google ScholarGoogle ScholarDigital LibraryDigital Library
  160. Hua Wu and Haifeng Wang. 2009. Revisiting pivot language approach for machine translation. In Proceedings of the Joint Conference of the 47th Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Association for Computational Linguistics, 154--162. Retrieved from https://www.aclweb.org/anthology/P09-1018.Google ScholarGoogle Scholar
  161. Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR abs/1609.08144 (2016).Google ScholarGoogle Scholar
  162. Fei Xia and Michael McCord. 2004. Improving a statistical MT system with automatically learned rewrite patterns. In Proceedings of the 20th International Conference on Computational Linguistics (COLING’04). COLING, 508--514. Retrieved from https://www.aclweb.org/anthology/C04-1073.Google ScholarGoogle ScholarDigital LibraryDigital Library
  163. Chang Xu, Tao Qin, Gang Wang, and Tie-Yan Liu. 2019. Polygon-Net: A general framework for jointly boosting multiple unsupervised neural machine translation models. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI’19). International Joint Conferences on Artificial Intelligence Organization, 5320--5326. DOI:https://doi.org/10.24963/ijcai.2019/739Google ScholarGoogle ScholarCross RefCross Ref
  164. Poorya Zaremoodi, Wray Buntine, and Gholamreza Haffari. 2018. Adaptive knowledge sharing in multi-task learning: Improving low-resource neural machine translation. In Proceedings of the 56th Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 656--661. Retrieved from http://aclweb.org/anthology/P18-2104.Google ScholarGoogle ScholarCross RefCross Ref
  165. Yang Zhao, Jiajun Zhang, and Chengqing Zong. 2018. Exploiting pre-ordering for neural machine translation. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC’18). European Language Resources Association (ELRA). Retrieved from https://www.aclweb.org/anthology/L18-1143.Google ScholarGoogle Scholar
  166. Long Zhou, Wenpeng Hu, Jiajun Zhang, and Chengqing Zong. 2017. Neural system combination for machine translation. In Proceedings of the 55th Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 378--384. DOI:https://doi.org/10.18653/v1/P17-2060Google ScholarGoogle ScholarCross RefCross Ref
  167. Michał Ziemski, Marcin Junczys-Dowmunt, and Bruno Pouliquen. 2016. The United Nations parallel corpus v1.0. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16). European Language Resources Association (ELRA), 3530--3534. Retrieved from https://www.aclweb.org/anthology/L16-1561.Google ScholarGoogle Scholar
  168. Barret Zoph and Kevin Knight. 2016. Multi-source neural translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 30--34. DOI:https://doi.org/10.18653/v1/N16-1004Google ScholarGoogle ScholarCross RefCross Ref
  169. Barret Zoph and Quoc V. Le. 2017. Neural architecture search with reinforcement learning. In Proceedings of the 5th International Conference on Learning Representations (ICLR’17). Retrieved from https://openreview.net/forum?id=r1Ue8Hcxg.Google ScholarGoogle Scholar
  170. Barret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight. 2016. Transfer learning for low-resource neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1568--1575. DOI:https://doi.org/10.18653/v1/D16-1163Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A Survey of Multilingual Neural Machine Translation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Computing Surveys
      ACM Computing Surveys  Volume 53, Issue 5
      September 2021
      782 pages
      ISSN:0360-0300
      EISSN:1557-7341
      DOI:10.1145/3426973
      Issue’s Table of Contents

      Copyright © 2020 Owner/Author

      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 September 2020
      • Revised: 1 June 2020
      • Accepted: 1 June 2020
      • Received: 1 July 2019
      Published in csur Volume 53, Issue 5

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • survey
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format