Abstract
We present a survey on multilingual neural machine translation (MNMT), which has gained a lot of traction in recent years. MNMT has been useful in improving translation quality as a result of translation knowledge transfer (transfer learning). MNMT is more promising and interesting than its statistical machine translation counterpart, because end-to-end modeling and distributed representations open new avenues for research on machine translation. Many approaches have been proposed to exploit multilingual parallel corpora for improving translation quality. However, the lack of a comprehensive survey makes it difficult to determine which approaches are promising and, hence, deserve further exploration. In this article, we present an in-depth survey of existing literature on MNMT. We first categorize various approaches based on their central use-case and then further categorize them based on resource scenarios, underlying modeling principles, core-issues, and challenges. Wherever possible, we address the strengths and weaknesses of several techniques by comparing them with each other. We also discuss the future directions for MNMT. This article is aimed towards both beginners and experts in NMT. We hope this article will serve as a starting point as well as a source of new ideas for researchers and engineers interested in MNMT.
- Željko Agić and Ivan Vulić. 2019. JW300: A wide-coverage parallel corpus for low-resource languages. In Proceedings of the 57th Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 3204--3210. DOI:https://doi.org/10.18653/v1/P19-1310Google Scholar
- Roee Aharoni, Melvin Johnson, and Orhan Firat. 2019. Massively multilingual neural machine translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 3874--3884. Retrieved from https://www.aclweb.org/anthology/N19-1388.Google ScholarCross Ref
- Maruan Al-Shedivat and Ankur Parikh. 2019. Consistency by agreement in zero-shot neural machine translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 1184--1197. Retrieved from https://www.aclweb.org/anthology/N19-1121.Google ScholarCross Ref
- Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Roee Aharoni, Melvin Johnson, and Wolfgang Macherey. 2019. The missing ingredient in zero-shot neural machine translation. CoRR abs/1903.07091 (2019).Google Scholar
- Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Dmitry Lepikhin, Melvin Johnson, Maxim Krikun, Mia Xu Chen, Yuan Cao, George Foster, Colin Cherry, Wolfgang Macherey, Zhifeng Chen, and Yonghui Wu. 2019. Massively multilingual neural machine translation in the wild: Findings and challenges. CoRR abs/1907.05019 (2019).Google Scholar
- Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2016. Learning principled bilingual mappings of word embeddings while preserving monolingual invariance. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2289--2294. DOI:https://doi.org/10.18653/v1/D16-1250Google ScholarCross Ref
- Mikel Artetxe and Holger Schwenk. 2019. Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. Trans. Assoc. Comput. Ling. 7 (2019), 597--610.Google ScholarCross Ref
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15). Retrieved from http://arxiv.org/abs/1409.0473.Google Scholar
- Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider. 2013. Abstract meaning representation for sembanking. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse. Association for Computational Linguistics, 178--186. Retrieved from http://www.aclweb.org/anthology/W13-2322.Google Scholar
- Tamali Banerjee, Anoop Kunchukuttan, and Pushpak Bhattacharya. 2018. Multilingual Indian language translation system at WAT 2018: Many-to-one phrase-based SMT. In Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation. Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/Y18-3013.Google Scholar
- Ankur Bapna and Orhan Firat. 2019. Simple, scalable adaptation for neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 1538--1548. DOI:https://doi.org/10.18653/v1/D19-1165Google ScholarCross Ref
- Emmanuel Bengio, Pierre-Luc Bacon, Joelle Pineau, and Doina Precup. 2016. Conditional computation in neural networks for faster models. In Proceedings of the International Conference on Learning Representations (ICLR’16) Workshop Track.Google Scholar
- Graeme Blackwood, Miguel Ballesteros, and Todd Ward. 2018. Multilingual neural machine translation with task-specific attention. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, 3112--3122. Retrieved from http://aclweb.org/anthology/C18-1263.Google Scholar
- Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Shujian Huang, Matthias Huck, Philipp Koehn, Qun Liu, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Raphael Rubino, Lucia Specia, and Marco Turchi. 2017. Findings of the 2017 conference on machine translation (WMT’17). In Proceedings of the 2nd Conference on Machine Translation. Association for Computational Linguistics, 169--214. Retrieved from http://www.aclweb.org/anthology/W17-4717.Google Scholar
- Ondřej Bojar, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Philipp Koehn, and Christof Monz. 2018. Findings of the 2018 conference on machine translation (WMT’18). In Proceedings of the 3rd Conference on Machine Translation: Shared Task Papers. Association for Computational Linguistics, 272--303. Retrieved from http://aclweb.org/anthology/W18-6401.Google Scholar
- Mauro Cettolo, Marcello Federico, Luisa Bentivogli, Jan Niehues, Sebastian Stüker, Katsuhito Sudoh, Koichiro Yoshino, and Christian Federmann. 2017. Overview of the IWSLT 2017 evaluation campaign. In Proceedings of the 14th International Workshop on Spoken Language Translation. 2--14.Google Scholar
- Sarath Chandar, Stanislas Lauly, Hugo Larochelle, Mitesh Khapra, Balaraman Ravindran, Vikas C. Raykar, and Amrita Saha. 2014. An autoencoder approach to learning bilingual word representations. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 1853--1861.Google Scholar
- Rajen Chatterjee, M. Amin Farajian, Matteo Negri, Marco Turchi, Ankit Srivastava, and Santanu Pal. 2017. Multi-source neural automatic post-editing: FBK’s participation in the WMT 2017 APE shared task. In Proceedings of the 2nd Conference on Machine Translation. Association for Computational Linguistics. 630--638. DOI:https://doi.org/10.18653/v1/W17-4773Google Scholar
- Aditi Chaudhary, Siddharth Dalmia, Junjie Hu, Xinjian Li, Austin Matthews, Aldrian Obaja Muis, Naoki Otani, Shruti Rijhwani, Zaid Sheikh, Nidhi Vyas, Xinyi Wang, Jiateng Xie, Ruochen Xu, Chunting Zhou, Peter J. Jansen, Yiming Yang, Lori Levin, Florian Metze, Teruko Mitamura, David R. Mortensen, Graham Neubig, Eduard Hovy, Alan W. Black, Jaime Carbonell, Graham V. Horwood, Shabnam Tafreshi, Mona Diab, Efsun S. Kayi, Noura Farra, and Kathleen McKeown. 2019. The ARIEL-CMU systems for LoReHLT18. CoRR abs/1902.08899 (2019).Google Scholar
- Xilun Chen, Ahmed Hassan Awadallah, Hany Hassan, Wei Wang, and Claire Cardie. 2019. Multi-source cross-lingual model transfer: Learning what to share. In Proceedings of the 57th Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 3098--3112. DOI:https://doi.org/10.18653/v1/P19-1299Google ScholarCross Ref
- Yun Chen, Yang Liu, Yong Cheng, and Victor O. K. Li. 2017. A teacher-student framework for zero-resource neural machine translation. In Proceedings of the 55th Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1925--1935. DOI:https://doi.org/10.18653/v1/P17-1176Google Scholar
- Yun Chen, Yang Liu, and Victor O. K. Li. 2018. Zero-resource neural machine translation with multi-agent communication game. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. AAAI Press, 5086--5093.Google Scholar
- Yong Cheng, Qian Yang, Yang Liu, Maosong Sun, and Wei Xu. 2017. Joint training for pivot-based neural machine translation. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI’17). 3974--3980. DOI:https://doi.org/10.24963/ijcai.2017/555Google ScholarDigital Library
- Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder--decoder approaches. In Proceedings of the 8th Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST’14). Association for Computational Linguistics, 103--111. DOI:https://doi.org/10.3115/v1/W14-4012Google ScholarCross Ref
- Gyu Hyeon Choi, Jong Hun Shin, and Young Kil Kim. 2018. Improving a multi-source neural machine translation model with corpus extension for low-resource languages. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC’18). European Language Resource Association, 900--904. Retrieved from http://aclweb.org/anthology/L18-1144.Google Scholar
- Christos Christodouloupoulos and Mark Steedman. 2015. A massively parallel corpus: The Bible in 100 languages. Lang. Resour. Eval. 49, 2 (2015), 375--395.Google ScholarDigital Library
- Chenhui Chu and Raj Dabre. 2018. Multilingual and multi-domain adaptation for neural machine translation. In Proceedings of the 24th Meeting of the Association for Natural Language Processing (NLP’18). 909--912.Google Scholar
- Chenhui Chu and Raj Dabre. 2019. Multilingual multi-domain adaptation approaches for neural machine translation. CoRR abs/1906.07978 (2019).Google Scholar
- Chenhui Chu, Raj Dabre, and Sadao Kurohashi. 2017. An empirical comparison of domain adaptation methods for neural machine translation. In Proceedings of the 55th Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 385--391. DOI:https://doi.org/10.18653/v1/P17-2061Google ScholarCross Ref
- Chenhui Chu and Rui Wang. 2018. A survey of domain adaptation for neural machine translation. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, 1304--1319. Retrieved from http://aclweb.org/anthology/C18-1111.Google Scholar
- Michael Collins, Philipp Koehn, and Ivona Kučerová. 2005. Clause restructuring for statistical machine translation. In Proceedings of the 43rd Meeting of the Association for Computational Linguistics (ACL’05). Association for Computational Linguistics, 531--540. DOI:https://doi.org/10.3115/1219840.1219906Google ScholarDigital Library
- Alexis Conneau and Guillaume Lample. 2019. Cross-lingual language model pretraining. In Proceedings of the 32nd Conference on Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 7059--7069. Retrieved from http://papers.nips.cc/paper/8928-cross-lingual-language-model-pretraining.pdf.Google Scholar
- Alexis Conneau, Guillaume Lample, Marc’Aurelio Ranzato, Ludovic Denoyer, and Hervé Jégou. 2018. Word translation without parallel data. In Proceedings of the International Conference on Learning Representations. Retrieved from https://github.com/facebookresearch/MUSE.Google Scholar
- Alexis Conneau, Ruty Rinott, Guillaume Lample, Adina Williams, Samuel Bowman, Holger Schwenk, and Veselin Stoyanov. 2018. XNLI: Evaluating cross-lingual sentence representations. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. 2475--2485. Retrieved from https://www.aclweb.org/anthology/D18-1269.Google ScholarCross Ref
- Anna Currey and Kenneth Heafield. 2019. Zero-resource neural machine translation with monolingual pivot data. In Proceedings of the 3rd Workshop on Neural Generation and Translation. Association for Computational Linguistics, 99--107. DOI:https://doi.org/10.18653/v1/D19-5610Google ScholarCross Ref
- Raj Dabre, Fabien Cromieres, and Sadao Kurohashi. 2017. Enabling multi-source neural machine translation by concatenating source sentences in multiple languages. In Proceedings of the Machine Translation Summit XVI, Vol.1: Research Track. 96--106.Google Scholar
- Raj Dabre, Atsushi Fujita, and Chenhui Chu. 2019. Exploiting multilingualism through multistage fine-tuning for low-resource neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 1410--1416. DOI:https://doi.org/10.18653/v1/D19-1146Google ScholarCross Ref
- Raj Dabre, Anoop Kunchukuttan, Atsushi Fujita, and Eiichiro Sumita. 2018. NICT’s participation in WAT 2018: Approaches using multilingualism and recurrently stacked layers. In Proceedings of the 5th Workshop on Asian Language Translation.Google Scholar
- Raj Dabre and Sadao Kurohashi. 2017. MMCR4NLP: Multilingual multiway corpora repository for natural language processing. arXiv preprint arXiv:1710.01025 (2017).Google Scholar
- Raj Dabre, Tetsuji Nakagawa, and Hideto Kazawa. 2017. An empirical study of language relatedness for transfer learning in neural machine translation. In Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation. The National University (Philippines), 282--286. Retrieved from http://aclweb.org/anthology/Y17-1038.Google Scholar
- Mattia A. Di Gangi, Roldano Cattoni, Luisa Bentivogli, Matteo Negri, and Marco Turchi. 2019. MuST-C: A multilingual speech translation corpus. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics. 2012--2017. Retrieved from https://www.aclweb.org/anthology/N19-1202.Google Scholar
- Daxiang Dong, Hua Wu, Wei He, Dianhai Yu, and Haifeng Wang. 2015. Multi-task learning for multiple language translation. In Proceedings of the 53rd Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 1723--1732. DOI:https://doi.org/10.3115/v1/P15-1166Google Scholar
- Bonnie J. Dorr. 1987. UNITRAN: An interlingua approach to machine translation. In Proceedings of the 6th Conference of the American Association of Artificial Intelligence.Google Scholar
- Kevin Duh, Graham Neubig, Katsuhito Sudoh, and Hajime Tsukada. 2013. Adaptation data selection using neural language models: Experiments in machine translation. In Proceedings of the 51st Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 678--683. Retrieved from http://www.aclweb.org/anthology/P13-2119.Google Scholar
- Carlos Escolano, Marta R. Costa-jussà, and José A. R. Fonollosa. 2019. From bilingual to multilingual neural machine translation by incremental training. In Proceedings of the 57th Meeting of the Association for Computational Linguistics.Google Scholar
- Cristina España-Bonet, Ádám Csaba Varga, Alberto Barrón-Cedeño, and Josef van Genabith. 2017. An empirical analysis of NMT-derived interlingual embeddings and their use in parallel sentence identification. IEEE J. Select. Topics Sig. Proc. 11, 8 (Dec. 2017), 1340--1350. DOI:https://doi.org/10.1109/JSTSP.2017.2764273Google Scholar
- Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research), Doina Precup and Yee Whye Teh (Eds.), Vol. 70. 1126--1135. Retrieved from http://proceedings.mlr.press/v70/finn17a.html.Google Scholar
- Orhan Firat, Kyunghyun Cho, and Yoshua Bengio. 2016. Multi-way, multilingual neural machine translation with a shared attention mechanism. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 866--875. DOI:https://doi.org/10.18653/v1/N16-1101Google ScholarCross Ref
- Orhan Firat, Baskaran Sankaran, Yaser Al-Onaizan, Fatos T. Yarman Vural, and Kyunghyun Cho. 2016. Zero-resource translation with multi-lingual neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 268--277. DOI:https://doi.org/10.18653/v1/D16-1026Google ScholarCross Ref
- Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. 2016. Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17, 1 (2016), 2096--2030.Google ScholarDigital Library
- Ekaterina Garmash and Christof Monz. 2016. Ensemble learning for multi-source neural machine translation. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING’16). The COLING 2016 Organizing Committee, 1409--1418. Retrieved from http://aclweb.org/anthology/C16-1133.Google Scholar
- Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N. Dauphin. 2017. Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research), Doina Precup and Yee Whye Teh (Eds.), Vol. 70. 1243--1252. Retrieved from http://proceedings.mlr.press/v70/gehring17a.html.Google Scholar
- Adrià De Gispert and José B. Mariño. 2006. Catalan-English statistical machine translation without parallel corpus: Bridging through Spanish. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC’06). 65--68.Google Scholar
- Jiatao Gu, Hany Hassan, Jacob Devlin, and Victor O. K. Li. 2018. Universal neural machine translation for extremely low resource languages. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, 344--354. DOI:https://doi.org/10.18653/v1/N18-1032Google Scholar
- Jiatao Gu, Yong Wang, Yun Chen, Victor O. K. Li, and Kyunghyun Cho. 2018. Meta-learning for low-resource neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 3622--3631. Retrieved from http://aclweb.org/anthology/D18-1398.Google ScholarCross Ref
- Jiatao Gu, Yong Wang, Kyunghyun Cho, and Victor O. K. Li. 2019. Improved zero-shot neural machine translation via ignoring spurious correlations. In Proceedings of the 57th Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1258--1268. DOI:https://doi.org/10.18653/v1/P19-1121Google Scholar
- Francisco Guzmán, Peng-Jen Chen, Myle Ott, Juan Pino, Guillaume Lample, Philipp Koehn, Vishrav Chaudhary, and Marc’Aurelio Ranzato. 2019. The FLORES evaluation datasets for low-resource machine translation: Nepali--English and Sinhala--English. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 6098--6111. DOI:https://doi.org/10.18653/v1/D19-1632Google ScholarCross Ref
- Thanh-Le Ha, Jan Niehues, and Alexander H. Waibel. 2016. Toward multilingual neural machine translation with universal encoder and decoder. In Proceedings of the 13th International Workshop on Spoken Language Translation. 1--7.Google Scholar
- Thanh-Le Ha, Jan Niehues, and Alexander H. Waibel. 2017. Effective strategies in zero-shot neural machine translation. In Proceedings of the 14th International Workshop on Spoken Language Translation. 105--112.Google Scholar
- Barry Haddow and Faheem Kirefu. 2020. PMIndia—A collection of parallel corpora of languages of India. arxiv 2001.09907 (2020).Google Scholar
- Junxian He, Jiatao Gu, Jiajun Shen, and Marc’Aurelio Ranzato. 2020. Revisiting self-training for neural sequence generation. In Proceedings of the 8th International Conference on Learning Representations (ICLR’20). Retrieved from https://openreview.net/forum?id=SJgdnAVKDH.Google Scholar
- Carlos Henríquez, Marta R. Costa-jussá, Rafael E. Banchs, Lluis Formiga, and José B. Mariño. 2011. Pivot strategies as an alternative for statistical machine translation tasks involving Iberian languages. In Proceedings of the Workshop on Iberian Cross-language Natural Language Processing Tasks (ICL’11). 22--27.Google Scholar
- Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2014. Distilling the knowledge in a neural network. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS’14) Deep Learning Workshop.Google Scholar
- Chris Hokamp, John Glover, and Demian Gholipour Ghalandari. 2019. Evaluating the supervised and zero-shot performance of multi-lingual translation models. In Proceedings of the 4th Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1). Association for Computational Linguistics, 209--217. DOI:https://doi.org/10.18653/v1/W19-5319Google ScholarCross Ref
- Yanping Huang, Yonglong Cheng, Dehao Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, and Zhifeng Chen. 2019. GPipe: Efficient training of giant neural networks using pipeline parallelism. In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS’19).Google Scholar
- Pratik Jawanpuria, Arjun Balgovind, Anoop Kunchukuttan, and Bamdev Mishra. 2019. Learning multilingual word embeddings in latent metric space: A geometric approach. Trans. Assoc. Comput. Ling. 7 (2019), 107--120. Retrieved from https://www.aclweb.org/anthology/Q19-1007.Google Scholar
- Sébastien Jean, Orhan Firat, and Melvin Johnson. 2019. Adaptive scheduling for multi-task learning. In Proceedings of the Continual Learning Workshop at NeurIPS’18.Google Scholar
- Girish Nath Jha. 2010. The TDIL program and the Indian language Corpora Intitiative (ILCI). In Proceedings of the 7th Conference on International Language Resources and Evaluation (LREC’10). European Languages Resources Association (ELRA). Retrieved from http://www.lrec-conf.org/proceedings/lrec2010/pdf/874_Paper.pdf.Google Scholar
- Baijun Ji, Zhirui Zhang, Xiangyu Duan, Min Zhang, Boxing Chen, and Weihua Luo. 2020. Cross-lingual pre-training based transfer for zero-shot neural machine translation. In Proceedings of the 34th AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
- Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2017. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Trans. Assoc. Comput. Ling. 5 (2017), 339--351. Retrieved from http://aclweb.org/anthology/Q17-1024.Google ScholarCross Ref
- Yunsu Kim, Yingbo Gao, and Hermann Ney. 2019. Effective cross-lingual transfer of neural machine translation models without shared vocabularies. In Proceedings of the 57th Meeting of the Association for Computational Linguistics.Google ScholarCross Ref
- Yunsu Kim, Petre Petrov, Pavel Petrushkov, Shahram Khadivi, and Hermann Ney. 2019. Pivot-based transfer learning for neural machine translation between non-English languages. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 866--876. DOI:https://doi.org/10.18653/v1/D19-1080Google ScholarCross Ref
- Yoon Kim and Alexander M. Rush. 2016. Sequence-level knowledge distillation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1317--1327. DOI:https://doi.org/10.18653/v1/D16-1139Google Scholar
- Eliyahu Kiperwasser and Miguel Ballesteros. 2018. Scheduled multi-task learning: From syntax to translation. Trans. Assoc. Comput. Ling. 6 (2018), 225--240. DOI:https://doi.org/10.1162/tacl_a_00017Google Scholar
- Alexandre Klementiev, Ivan Titov, and Binod Bhattarai. 2012. Inducing crosslingual distributed representations of words. In Proceedings of the International Conference on Computational Linguistics (COLING’12). The COLING 2012 Organizing Committee, 1459--1474. Retrieved from https://www.aclweb.org/anthology/C12-1089.Google Scholar
- Tom Kocmi and Ondřej Bojar. 2018. Trivial transfer learning for low-resource neural machine translation. In Proceedings of the Third Conference on Machine Translation, Volume 1: Research Papers. Association for Computational Linguistics, 244--252. Retrieved from http://www.aclweb.org/anthology/W18-6325.Google ScholarCross Ref
- Philipp Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In Proceedings of the 10th Machine Translation Summit. AAMT, 79--86. Retrieved from http://mt-archive.info/MTS-2005-Koehn.pdf.Google Scholar
- Philipp Koehn. 2017. Neural machine translation. CoRR abs/1709.07809 (2017).Google Scholar
- Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions. Association for Computational Linguistics, 177--180. Retrieved from http://www.aclweb.org/anthology/P/P07/P07-2045.Google ScholarCross Ref
- Philipp Koehn and Rebecca Knowles. 2017. Six challenges for neural machine translation. In Proceedings of the 1st Workshop on Neural Machine Translation. Association for Computational Linguistics, 28--39. Retrieved from http://www.aclweb.org/anthology/W17-3204.Google ScholarCross Ref
- Philipp Koehn, Franz J. Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics. 127--133. Retrieved from https://www.aclweb.org/anthology/N03-1017.Google ScholarCross Ref
- Taku Kudo and John Richardson. 2018. SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, 66--71. DOI:https://doi.org/10.18653/v1/D18-2012Google ScholarCross Ref
- Sneha Kudugunta, Ankur Bapna, Isaac Caswell, and Orhan Firat. 2019. Investigating multilingual NMT representations at scale. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 1565--1575. DOI:https://doi.org/10.18653/v1/D19-1167Google ScholarCross Ref
- Anoop Kunchukuttan. 2020. IndoWordnet Parallel Corpus. Retrieved from https://github.com/anoopkunchukuttan/indowordnet_parallel.Google Scholar
- Anoop Kunchukuttan and Pushpak Bhattacharyya. 2016. Orthographic syllable as basic unit for SMT between related languages. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1912--1917. DOI:https://doi.org/10.18653/v1/D16-1196Google ScholarCross Ref
- Anoop Kunchukuttan and Pushpak Bhattacharyya. 2017. Learning variable length units for SMT between related languages via byte pair encoding. In Proceedings of the 1st Workshop on Subword and Character Level Models in NLP. Association for Computational Linguistics, 14--24. DOI:https://doi.org/10.18653/v1/W17-4102Google ScholarCross Ref
- Anoop Kunchukuttan, Abhijit Mishra, Rajen Chatterjee, Ritesh Shah, and Pushpak Bhattacharyya. 2014. Shata-Anuvadak: Tackling multiway translation of Indian languages. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14). European Language Resources Association (ELRA), 1781--1787. Retrieved from http://www.lrec-conf.org/proceedings/lrec2014/pdf/414_Paper.pdf.Google Scholar
- Anoop Kunchukuttan, Maulik Shah, Pradyot Prakash, and Pushpak Bhattacharyya. 2017. Utilizing lexical similarity between related, low-resource languages for pivot-based SMT. In Proceedings of the 8th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Asian Federation of Natural Language Processing, 283--289. Retrieved from http://aclweb.org/anthology/I17-2048.Google Scholar
- Surafel Melaku Lakew, Mauro Cettolo, and Marcello Federico. 2018. A comparison of transformer and recurrent neural networks on multilingual neural machine translation. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, 641--652. Retrieved from http://aclweb.org/anthology/C18-1054.Google Scholar
- Surafel Melaku Lakew, Aliia Erofeeva, Matteo Negri, Marcello Federico, and Marco Turchi. 2018. Transfer learning in multilingual neural machine translation with dynamic vocabulary. In Proceedings of the 15th International Workshop on Spoken Language Translation (IWSLT’18). 54--61.Google Scholar
- Surafel Melaku Lakew, Quintino F. Lotito, Matteo Negri, Marco Turchi, and Marcello Federico. 2017. Improving zero-shot translation of low-resource languages. In Proceedings of the 14th International Workshop on Spoken Language Translation. 113--119.Google Scholar
- Guillaume Lample, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2018. Unsupervised machine translation using monolingual corpora only. In Proceedings of the International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=rkYTTf-AZ.Google Scholar
- Jason Lee, Kyunghyun Cho, and Thomas Hofmann. 2017. Fully character-level neural machine translation without explicit segmentation. Trans. Assoc. Comput. Ling. 5 (2017), 365--378. Retrieved from http://aclweb.org/anthology/Q17-1026.Google ScholarCross Ref
- Yichao Lu, Phillip Keung, Faisal Ladhak, Vikas Bhardwaj, Shaonan Zhang, and Jason Sun. 2018. A neural interlingua for multilingual machine translation. In Proceedings of the 3rd Conference on Machine Translation: Research Papers. Association for Computational Linguistics, 84--92. Retrieved from http://aclweb.org/anthology/W18-6309.Google ScholarCross Ref
- Mieradilijiang Maimaiti, Yang Liu, Huanbo Luan, and Maosong Sun. 2019. Multi-round transfer learning for low-resource NMT using multiple high-resource languages. ACM Trans. Asian Low-Resour. Lang. Inf. Proc. 18, 4 (May 2019). DOI:https://doi.org/10.1145/3314945Google Scholar
- Chaitanya Malaviya, Graham Neubig, and Patrick Littell. 2017. Learning language representations for typology prediction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2529--2535. DOI:https://doi.org/10.18653/v1/D17-1268Google ScholarCross Ref
- Giulia Mattoni, Pat Nagle, Carlos Collantes, and Dimitar Shterionov. 2017. Zero-shot translation for Indian languages with sparse data. In Proceedings of Machine Translation Summit XVI, Vol. 2: Users and Translators Track. 1--10.Google Scholar
- Evgeny Matusov, Nicola Ueffing, and Hermann Ney. 2006. Computing consensus translation for multiple machine translation systems using enhanced hypothesis alignment. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics. 33--40. Retrieved from https://www.aclweb.org/anthology/E06-1005.Google Scholar
- Cettolo Mauro, Girardi Christian, and Federico Marcello. 2012. Wit3: Web inventory of transcribed and translated talks. In Proceedings of the 16th Conference of European Association for Machine Translation. 261--268.Google Scholar
- Tomas Mikolov, Quoc V. Le, and Ilya Sutskever. 2013. Exploiting similarities among languages for machine translation. CoRR abs/1309.4168 (2013).Google Scholar
- Rudra Murthy, Anoop Kunchukuttan, and Pushpak Bhattacharyya. 2019. Addressing word-order divergence in multilingual neural machine translation for extremely low resource languages. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 3868--3873. Retrieved from https://www.aclweb.org/anthology/N19-1387.Google ScholarCross Ref
- Toshiaki Nakazawa, Shohei Higashiyama, Chenchen Ding, Raj Dabre, Anoop Kunchukuttan, Win Pa Pa, Isao Goto, Hideya Mino, Katsuhito Sudoh, and Sadao Kurohashi. 2018. Overview of the 5th workshop on Asian translation. In Proceedings of the 5th Workshop on Asian Translation (WAT’18). 1--41.Google Scholar
- Preslav Nakov and Hwee Tou Ng. 2009. Improved statistical machine translation for resource-poor languages using related resource-rich languages. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1358--1367. Retrieved from https://www.aclweb.org/anthology/D09-1141.Google ScholarCross Ref
- Graham Neubig. 2017. Neural machine translation and sequence-to-sequence models: A tutorial. CoRR abs/1703.01619 (2017).Google Scholar
- Graham Neubig and Junjie Hu. 2018. Rapid adaptation of neural machine translation to new languages. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 875--880. Retrieved from http://aclweb.org/anthology/D18-1103.Google ScholarCross Ref
- Toan Q. Nguyen and David Chiang. 2017. Transfer learning across low-resource, related languages for neural machine translation. In Proceedings of the 8th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Asian Federation of Natural Language Processing, 296--301. Retrieved from http://aclweb.org/anthology/I17-2050.Google Scholar
- Yuta Nishimura, Katsuhito Sudoh, Graham Neubig, and Satoshi Nakamura. 2018. Multi-source neural machine translation with missing data. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation. Association for Computational Linguistics, 92--99. Retrieved from http://aclweb.org/anthology/W18-2711.Google ScholarCross Ref
- Yuta Nishimura, Katsuhito Sudoh, Graham Neubig, and Satoshi Nakamura. 2018. Multi-source neural machine translation with data augmentation. In Proceedings of the 15th International Workshop on Spoken Language Translation (IWSLT’18). 48--53. Retrieved from https://arxiv.org/abs/1810.06826.Google Scholar
- Eric Nyberg, Teruko Mitamura, and Jaime Carbonell. 1997. The KANT machine translation system: From R&D to initial deployment. In Proceedings of the LISA Workshop on Integrating Advanced Translation Technology. 1--7.Google Scholar
- Franz Josef Och and Hermann Ney. 2001. Statistical multi-source translation. In Proceedings of the Machine Translation Summit, Vol. 8. 253--258.Google Scholar
- Robert Östling and Jörg Tiedemann. 2017. Continuous multilinguality with language vectors. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Association for Computational Linguistics, 644--649. Retrieved from https://www.aclweb.org/anthology/E17-2102.Google ScholarCross Ref
- Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 10 (Oct. 2010), 1345--1359. DOI:https://doi.org/10.1109/TKDE.2009.191Google ScholarDigital Library
- Ngoc-Quan Pham, Jan Niehues, Thanh-Le Ha, and Alexander Waibel. 2019. Improving zero-shot translation with language-independent constraints. In Proceedings of the 4th Conference on Machine Translation (Volume 1: Research Papers). Association for Computational Linguistics, 13--23. DOI:https://doi.org/10.18653/v1/W19-5202Google ScholarCross Ref
- Telmo Pires, Eva Schlinger, and Dan Garrette. 2019. How multilingual is multilingual BERT? In Proceedings of the 57th Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 4996--5001. DOI:https://doi.org/10.18653/v1/P19-1493Google ScholarCross Ref
- Emmanouil Antonios Platanios, Mrinmaya Sachan, Graham Neubig, and Tom Mitchell. 2018. Contextual parameter generation for universal neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 425--435. Retrieved from http://aclweb.org/anthology/D18-1039.Google ScholarCross Ref
- Matt Post, Chris Callison-Burch, and Miles Osborne. 2012. Constructing parallel corpora for six Indian languages via crowdsourcing. In Proceedings of the 7th Workshop on Statistical Machine Translation. Association for Computational Linguistics, 401--409.Google Scholar
- Raj Noel Dabre Prasanna. 2018. Exploiting Multilingualism and Transfer Learning for Low Resource Machine Translation. Ph.D. Dissertation. Kyoto University. Retrieved from http://hdl.handle.net/2433/232411.Google Scholar
- Maithra Raghu, Justin Gilmer, Jason Yosinski, and Jascha Sohl-Dickstein. 2017. SVCCA: Singular vector canonical correlation analysis for deep learning dynamics and interpretability. In Proceedings of the 30th Conference on Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 6076--6085. Retrieved from http://papers.nips.cc/paper/7188-svcca-singular-vector-canonical-correlation-analysis-for-deep-learning-dynamics-and-interpretability.pdf.Google Scholar
- Prajit Ramachandran, Peter Liu, and Quoc Le. 2017. Unsupervised pretraining for sequence to sequence learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 383--391. DOI:https://doi.org/10.18653/v1/D17-1039Google ScholarCross Ref
- Ananthakrishnan Ramanathan, Jayprasad Hegde, Ritesh Shah, Pushpak Bhattacharyya, and M. Sasikumar. 2008. Simple syntactic and morphological processing can help English-Hindi statistical machine translation. In Proceedings of the International Joint Conference on Natural Language Processing.Google Scholar
- Matīss Rikters, Mārcis Pinnis, and Rihards Krišlauks. 2018. Training and adapting multilingual NMT for less-resourced and morphologically rich languages. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC’18). European Language Resources Association (ELRA), 3766--3773.Google Scholar
- Sebastian Ruder. 2016. An overview of gradient descent optimization algorithms. CoRR abs/1609.04747 (2016).Google Scholar
- Devendra Sachan and Graham Neubig. 2018. Parameter sharing methods for multilingual self-attentional translation models. In Proceedings of the 3rd Conference on Machine Translation: Research Papers. Association for Computational Linguistics, 261--271. Retrieved from http://aclweb.org/anthology/W18-6327.Google ScholarCross Ref
- Amrita Saha, Mitesh M. Khapra, Sarath Chandar, Janarthanan Rajendran, and Kyunghyun Cho. 2016. A correlational encoder decoder architecture for pivot based sequence generation. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING’16). The COLING 2016 Organizing Committee, 109--118. Retrieved from https://www.aclweb.org/anthology/C16-1011.Google Scholar
- Peter H. Schönemann. 1966. A generalized solution of the orthogonal procrustes problem. Psychometrika 31, 1 (1966), 1--10.Google ScholarCross Ref
- Josh Schroeder, Trevor Cohn, and Philipp Koehn. 2009. Word lattices for multi-source translation. In Proceedings of the 12th Conference of the European Chapter of the ACL (EACL’09). Association for Computational Linguistics, 719--727. Retrieved from https://www.aclweb.org/anthology/E09-1082.Google ScholarCross Ref
- Mike Schuster and Kaisuke Nakajima. 2012. Japanese and Korean voice search. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP’12). IEEE, 5149--5152. Retrieved from http://dblp.uni-trier.de/db/conf/icassp/icassp2012.html#SchusterN12.Google ScholarCross Ref
- Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong, and Francisco Guzmán. 2019. WikiMatrix: Mining 135M parallel sentences in 1620 language pairs from Wikipedia. CoRR abs/1907.05791 (2019).Google Scholar
- Sukanta Sen, Kamal Kumar Gupta, Asif Ekbal, and Pushpak Bhattacharyya. 2019. Multilingual unsupervised NMT using shared encoder and language-specific decoders. In Proceedings of the 57th Meeting of the Association for Computational Linguistics.Google ScholarCross Ref
- Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Improving neural machine translation models with monolingual data. In Proceedings of the 54th Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 86--96. Retrieved from http://www.aclweb.org/anthology/P16-1009.Google ScholarCross Ref
- Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1715--1725. Retrieved from http://www.aclweb.org/anthology/P16-1162.Google ScholarCross Ref
- Lierni Sestorain, Massimiliano Ciaramita, Christian Buck, and Thomas Hofmann. 2018. Zero-shot dual machine translation. CoRR abs/1805.10338 (2018).Google Scholar
- Petr Sgall and Jarmila Panevová. 1987. Machine translation, linguistics, and interlingua. In Proceedings of the 3rd Conference on European Chapter of the Association for Computational Linguistics (EACL’87). Association for Computational Linguistics, 99--103. DOI:https://doi.org/10.3115/976858.976876Google ScholarDigital Library
- Itamar Shatz. 2016. Native language influence during second language acquisition: A large-scale learner corpus analysis. In Proceedings of the Pacific Second Language Research Forum (PacSLRF’16). 175--180.Google Scholar
- Aditya Siddhant, Melvin Johnson, Henry Tsai, Naveen Arivazhagan, Jason Riesa, Ankur Bapna, Orhan Firat, and Karthik Raman. 2020. Evaluating the cross-lingual effectiveness of massively multilingual neural machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’20).Google ScholarCross Ref
- Shashank Siripragrada, Jerin Philip, Vinay P. Namboodiri, and C. V. Jawahar. 2020. A multilingual parallel corpora collection effort for Indian languages. In Proceedings of the 12th Language Resources and Evaluation Conference. European Language Resources Association, 3743--3751. Retrieved from https://www.aclweb.org/anthology/2020.lrec-1.462.Google Scholar
- Anders Søgaard, Sebastian Ruder, and Ivan Vulić. 2018. On the limitations of unsupervised bilingual dictionary induction. In Proceedings of the 56th Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 778--788. DOI:https://doi.org/10.18653/v1/P18-1072Google ScholarCross Ref
- Kai Song, Yue Zhang, Heng Yu, Weihua Luo, Kun Wang, and Min Zhang. 2019. Code-switching for enhancing NMT with pre-specified translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 449--459. Retrieved from https://www.aclweb.org/anthology/N19-1044.Google Scholar
- Ralf Steinberger, Mohamed Ebrahim, Alexandros Poulis, Manuel Carrasco-Benitez, Patrick Schlüter, Marek Przybyszewski, and Signe Gilbro. 2014. An overview of the European Union’s highly multilingual parallel corpora. Lang. Resour. Eval. 48, 4 (2014), 679--707.Google ScholarDigital Library
- Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS’14). The MIT Press, 3104--3112. Retrieved from http://dl.acm.org/citation.cfm?id=2969033.2969173.Google ScholarDigital Library
- Xu Tan, Jiale Chen, Di He, Yingce Xia, Tao Qin, and Tie-Yan Liu. 2019. Multilingual neural machine translation with language clustering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 963--973. DOI:https://doi.org/10.18653/v1/D19-1089Google ScholarCross Ref
- Xu Tan, Yi Ren, Di He, Tao Qin, and Tie-Yan Liu. 2019. Multilingual neural machine translation with knowledge distillation. In Proceedings of the International Conference on Learning Representations (ICLR’19). Retrieved from http://arxiv.org/abs/1902.10461.Google Scholar
- Ye Kyaw Thu, Win Pa Pa, Masao Utiyama, Andrew M. Finch, and Eiichiro Sumita. 2016. Introducing the Asian language treebank (ALT). In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16). European Language Resources Association (ELRA), 1574--1578.Google Scholar
- Jörg Tiedemann. 2012. Character-based pivot translation for under-resourced languages and domains. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 141--151. Retrieved from https://www.aclweb.org/anthology/E12-1015.Google ScholarDigital Library
- Jörg Tiedemann. 2012. Parallel data, tools, and interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12). European Language Resources Association (ELRA), 2214--2218. Retrieved from http://www.lrec-conf.org/proceedings/lrec2012/pdf/463_Paper.pdf.Google Scholar
- Hiroshi Uchida. 1996. UNL: Universal networking language—An electronic language for communication, understanding, and collaboration. In UNU/IAS/UNL Center. Retrieved from https://www.semanticscholar.org/paper/UNL%3A-Universal-Networking-Language-An-Electronic-Uchida/f281c6a61ee69e4fa0f15f3f6d03faeee7a74e10.Google Scholar
- Masao Utiyama and Hitoshi Isahara. 2007. A comparison of pivot methods for phrase-based statistical machine translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 484--491. Retrieved from https://www.aclweb.org/anthology/N07-1061.Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 30th Conference on Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 5998--6008. Retrieved from http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf.Google Scholar
- Raúl Vázquez, Alessandro Raganato, Jörg Tiedemann, and Mathias Creutz. 2018. Multilingual NMT with a language-independent attention bridge. CoRR abs/1811.00498 (2018).Google Scholar
- David Vilar, Jan-Thorsten Peter, and Hermann Ney. 2007. Can we translate letters? In Proceedings of the 2nd Workshop on Statistical Machine Translation. Association for Computational Linguistics, 33--39. Retrieved from https://www.aclweb.org/anthology/W07-0705.Google ScholarCross Ref
- Karthik Visweswariah, Rajakrishnan Rajkumar, Ankur Gandhe, Ananthakrishnan Ramanathan, and Jiri Navratil. 2011. A word reordering model for improved machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. 486--496. Retrieved from https://www.aclweb.org/anthology/D11-1045.Google ScholarDigital Library
- Rui Wang, Andrew Finch, Masao Utiyama, and Eiichiro Sumita. 2017. Sentence embedding for neural machine translation domain adaptation. In Proceedings of the 55th Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 560--566. Retrieved from http://aclweb.org/anthology/P17-2089.Google ScholarCross Ref
- Rui Wang, Masao Utiyama, Lemao Liu, Kehai Chen, and Eiichiro Sumita. 2017. Instance weighting for neural machine translation domain adaptation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1482--1488. DOI:https://doi.org/10.18653/v1/D17-1155Google ScholarCross Ref
- Xinyi Wang and Graham Neubig. 2019. Target conditioned sampling: Optimizing data selection for multilingual neural machine translation. In Proceedings of the 57th Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 5823--5828. DOI:https://doi.org/10.18653/v1/P19-1583Google ScholarCross Ref
- Xinyi Wang, Hieu Pham, Philip Arthur, and Graham Neubig. 2019. Multilingual neural machine translation with soft decoupled encoding. In Proceedings of the International Conference on Learning Representations (ICLR’19). Retrieved from https://arxiv.org/abs/1902.03499.Google Scholar
- Yining Wang, Jiajun Zhang, Feifei Zhai, Jingfang Xu, and Chengqing Zong. 2018. Three strategies to improve one-to-many multilingual translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2955--2960. Retrieved from http://aclweb.org/anthology/D18-1326.Google ScholarCross Ref
- Yining Wang, Long Zhou, Jiajun Zhang, Feifei Zhai, Jingfang Xu, and Chengqing Zong. 2019. A compact and language-sensitive multilingual translation method. In Proceedings of the 57th Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1213--1223. DOI:https://doi.org/10.18653/v1/P19-1117Google ScholarCross Ref
- Toon Witkam. 2006. History and heritage of the DLT (Distributed Language Translation) project. In Utrecht, The Netherlands: Private Publication. 1--11. Retrieved from http://www.mt-archive.info/Witkam-2006.pdf.Google Scholar
- Hua Wu and Haifeng Wang. 2007. Pivot language approach for phrase-based statistical machine translation. Mach. Translat. 21, 3 (2007), 165--181.Google ScholarDigital Library
- Hua Wu and Haifeng Wang. 2009. Revisiting pivot language approach for machine translation. In Proceedings of the Joint Conference of the 47th Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Association for Computational Linguistics, 154--162. Retrieved from https://www.aclweb.org/anthology/P09-1018.Google Scholar
- Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR abs/1609.08144 (2016).Google Scholar
- Fei Xia and Michael McCord. 2004. Improving a statistical MT system with automatically learned rewrite patterns. In Proceedings of the 20th International Conference on Computational Linguistics (COLING’04). COLING, 508--514. Retrieved from https://www.aclweb.org/anthology/C04-1073.Google ScholarDigital Library
- Chang Xu, Tao Qin, Gang Wang, and Tie-Yan Liu. 2019. Polygon-Net: A general framework for jointly boosting multiple unsupervised neural machine translation models. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI’19). International Joint Conferences on Artificial Intelligence Organization, 5320--5326. DOI:https://doi.org/10.24963/ijcai.2019/739Google ScholarCross Ref
- Poorya Zaremoodi, Wray Buntine, and Gholamreza Haffari. 2018. Adaptive knowledge sharing in multi-task learning: Improving low-resource neural machine translation. In Proceedings of the 56th Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 656--661. Retrieved from http://aclweb.org/anthology/P18-2104.Google ScholarCross Ref
- Yang Zhao, Jiajun Zhang, and Chengqing Zong. 2018. Exploiting pre-ordering for neural machine translation. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC’18). European Language Resources Association (ELRA). Retrieved from https://www.aclweb.org/anthology/L18-1143.Google Scholar
- Long Zhou, Wenpeng Hu, Jiajun Zhang, and Chengqing Zong. 2017. Neural system combination for machine translation. In Proceedings of the 55th Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 378--384. DOI:https://doi.org/10.18653/v1/P17-2060Google ScholarCross Ref
- Michał Ziemski, Marcin Junczys-Dowmunt, and Bruno Pouliquen. 2016. The United Nations parallel corpus v1.0. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16). European Language Resources Association (ELRA), 3530--3534. Retrieved from https://www.aclweb.org/anthology/L16-1561.Google Scholar
- Barret Zoph and Kevin Knight. 2016. Multi-source neural translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 30--34. DOI:https://doi.org/10.18653/v1/N16-1004Google ScholarCross Ref
- Barret Zoph and Quoc V. Le. 2017. Neural architecture search with reinforcement learning. In Proceedings of the 5th International Conference on Learning Representations (ICLR’17). Retrieved from https://openreview.net/forum?id=r1Ue8Hcxg.Google Scholar
- Barret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight. 2016. Transfer learning for low-resource neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1568--1575. DOI:https://doi.org/10.18653/v1/D16-1163Google ScholarCross Ref
Index Terms
- A Survey of Multilingual Neural Machine Translation
Recommendations
A Survey of Orthographic Information in Machine Translation
AbstractMachine translation is one of the applications of natural language processing which has been explored in different languages. Recently researchers started paying attention towards machine translation for resource-poor languages and closely related ...
Multi-way, multilingual neural machine translation
The first attention-based neural-MT for multi-way, multilingual translation is proposed.Multi-way multilingual model is tested on more than 8 languages (En, Fr, Cz, De, Ru, Fi, Tr and Uz).It achieves the translation quality comparable to single-pair ...
Improving Multilingual Neural Machine Translation with Artificial Labels
SOICT '23: Proceedings of the 12th International Symposium on Information and Communication TechnologyInspired by the work which uses Artificial Translation Units for generation of synthetic data in low-resource Neural Machine Translation systems [12], we propose using these translation units to enhance ability of sharing information between translation ...
Comments