skip to main content
short-paper

Adaptation of Language Models for SMT Using Neural Networks with Topic Information

Authors Info & Claims
Published:22 January 2016Publication History
Skip Abstract Section

Abstract

Neural network language models (LMs) are shown to be effective in improving the performance of statistical machine translation (SMT) systems. However, state-of-the-art neural network LMs usually use words before the current position as context and neglect global topic information, which can help machine translation (MT) systems to select better translation candidates from a higher perspective. In this work, we propose improvement of the state-of-the-art feedforward neural language model with topic information. Two main issues need to be tackled when adding topics into neural network LMs for SMT: one is how to incorporate topics to the neural network; the other is how to get target-side topic distribution before translation. We incorporate topics by appending topic distribution to the input layer of a feedforward LM. We adopt a multinomial logistic-regression (MLR) model to predict the target-side topic distribution based on source side information. Moreover, we propose a feedforward neural network model to learn joint representations on the source side for topic prediction. LM experiments demonstrate that the perplexity on validation set can be greatly reduced by the topic-enhanced feedforward LM, and the prediction of target-side topics can be improved dramatically with the MLR model equipped with the joint source representations. A final MT experiment, conducted on a large-scale Chinese--English dataset, shows that our feedforward LM with predicted topics improves the translation performance against a strong baseline.

References

  1. Michael Auli, Michel Galley, Chris Quirk, and Geoffrey Zweig. 2013. Joint language and translation modeling with recurrent neural networks. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Seattle, WA, 1044--1054. http://www.aclweb.org/anthology/D13-1106.Google ScholarGoogle Scholar
  2. Paul Baltescu and Phil Blunsom. 2015. Pragmatic neural language modelling in machine translation. In Proceedings of NAACL-HLT. Association for Computational Linguistics, Denver, CO, 820--829.Google ScholarGoogle ScholarCross RefCross Ref
  3. Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. Journal of Machine Learning Research. 1137--1155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Peter F. Brown, Vincent J. Della Pietra, Robert L. Mercer, Stephen A. Della Pietra, and Jennifer C. Lai. 1992. An estimate of an upper bound for the entropy of English. Computational Linguistics 18, 1, 31--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Stanley F. Chen and Joshua Goodman. 1998. An Empirical Study of Smoothing Techniques for Language Modeling. Technical Report TR-10-98. Harvard University Center for Research in Computing Technology, Cambridge, MA.Google ScholarGoogle Scholar
  6. David Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics 33, 2, 201--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jonathan H. Clark, Chris Dyer, Alon Lavie, and Noah A. Smith. 2011. Better hypothesis testing for statistical machine translation: Controlling for optimizer instability. In Proceedings of the 49th ACL-HLT. Association for Computational Linguistics, Portland, OR, 176--181. http://www.aclweb.org/anthology/P11-2031. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jacob Devlin, Rabih Zbib, Zhongqiang Huang, Thomas Lamar, Richard Schwartz, and John Makhoul. 2014. Fast and robust neural network joint models for statistical machine translation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Baltimore, MD, 1370--1380. http://www.aclweb.org/anthology/P14-1129.Google ScholarGoogle ScholarCross RefCross Ref
  9. C. Goller and A. Kuchler. 1996. Learning task-dependent distributed representations by backpropagation through structure. In IEEE International Conference on Neural Networks, 1996, Vol. 1, 347--352. DOI:http://dx.doi.org/10.1109/ICNN.1996.548916Google ScholarGoogle Scholar
  10. Joshua Goodman. 2001. Classes for fast maximum entropy training. CoRR cs.CL/0108006 (2001). http://arxiv.org/abs/cs.CL/0108006.Google ScholarGoogle Scholar
  11. Çaglar Gülçehre, Orhan Firat, Kelvin Xu, Kyunghyun Cho, Loïc Barrault, Huei-Chi Lin, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2015. On using monolingual corpora in neural machine translation. CoRR abs/1503.03535 (2015). http://arxiv.org/abs/1503.03535.Google ScholarGoogle Scholar
  12. Michael Gutmann and Aapo Hyvärinen. 2010. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of AISTATS.Google ScholarGoogle Scholar
  13. Yuening Hu, Ke Zhai, Vladimir Eidelman, and Jordan Boyd-Graber. 2014. Polylingual tree-based topic models for translation domain adaptation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Baltimore, MD, 1166--1176. http://www.aclweb.org/anthology/P14-1110.Google ScholarGoogle ScholarCross RefCross Ref
  14. Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional neural network for modelling sentences. In Proceedings of ACL. Association for Computational Linguistics, Baltimore, MD, 655--665.Google ScholarGoogle ScholarCross RefCross Ref
  15. Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of EMNLP 2004. Association for Computational Linguistics, Barcelona, Spain, 388--395.Google ScholarGoogle Scholar
  16. Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of the 2003 Conference of the NAACL-HLT. Stroudsburg, PA, 48--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hai-Son Le, Alexandre Allauzen, and François Yvon. 2012. Continuous space translation models with neural networks. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Montréal, Canada, 39--48. http://www.aclweb.org/anthology/N12-1005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning. Beijing, China.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Yang Liu, Qun Liu, and Shouxun Lin. 2006. Tree-to-string alignment template for statistical machine translation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Sydney, Australia, 609--616. http://www.aclweb.org/anthology/P06-1077. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Thang Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, and Wojciech Zaremba. 2014. Addressing the rare word problem in neural machine translation. arXiv preprint arXiv:1410.8206.Google ScholarGoogle Scholar
  21. Daniel Marcu, Wei Wang, Abdessamad Echihabi, and Kevin Knight. 2006. SPMT: Statistical machine translation with syntactified target language phrases. In Proceedings of EMNLP. Association for Computational Linguistics, Sydney, Australia, 44--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Tomáš Mikolov. 2012. Statistical Language Models Based on Neural Networks. Ph.D. Dissertation. Brno University of Technology, Brno, Czech Republic.Google ScholarGoogle Scholar
  23. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013). http://arxiv.org/abs/1301.3781.Google ScholarGoogle Scholar
  24. Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan “Honza” Černocký, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Proceedings of INTERSPEECH.Google ScholarGoogle Scholar
  25. Tomas Mikolov and Geoffrey Zweig. 2012. Context dependent recurrent neural network language model. In Spoken Language Technologies. IEEE.Google ScholarGoogle Scholar
  26. Andriy Mnih and Geoffrey E. Hinton. 2009. A scalable hierarchical distributed language model. In Advances in Neural Information Processing Systems 21. 1081--1088.Google ScholarGoogle Scholar
  27. Vinod Nair and Geoffrey E. Hinton. 2010. Rectified linear units improve restricted Boltzmann machines. In Proceedings of ICML. 807--814.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics 29, 1, 19--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Nick Ruiz and Marcello Federico. 2011. Topic adaptation for lecture translation through bilingual latent semantic models. In Proceedings of the 6th Workshop on Statistical Machine Translation. Association for Computational Linguistics, Edinburgh, Scotland, 294--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Libin Shen, Jinxi Xu, and Ralph Weischedel. 2008. A new string-to-dependency machine translation algorithm with a target dependency language model. In Proceedings of ACL-08: HLT. Association for Computational Linguistics, Columbus, Ohio, 577--585.Google ScholarGoogle Scholar
  31. Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of EMNLP. Association for Computational Linguistics, Seattle, WA, 1631--1642.Google ScholarGoogle Scholar
  32. Ilya Sutskever, Oriol Vinyals, and Quoc V. V. Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems. 3104--3112.Google ScholarGoogle Scholar
  33. Yik-Cheung Tam, Ian Lane, and Tanja Schultz. 2007. Bilingual-LSA based LM adaptation for spoken language translation. In Proceedings of ACL. Association for Computational Linguistics, Prague, Czech Republic, 520--527.Google ScholarGoogle Scholar
  34. Ashish Vaswani, Yinggong Zhao, Victoria Fossum, and David Chiang. 2013. Decoding with large-scale neural language models improves translation. In Proceedings of EMNLP. Association for Computational Linguistics, Seattle, WA, 1387--1392.Google ScholarGoogle Scholar
  35. Mingxuan Wang, Zhengdong Lu, Hang Li, Wenbin Jiang, and Qun Liu. 2015. genCNN: A convolutional architecture for word sequence prediction. In Proceedings of ACL-IJCNLP. Association for Computational Linguistics, Beijing, China, 1567--1576.Google ScholarGoogle ScholarCross RefCross Ref
  36. Xinyan Xiao, Deyi Xiong, Min Zhang, Qun Liu, and Shouxun Lin. 2012. A topic similarity model for hierarchical phrase-based translation. In Proceedings of ACL. Association for Computational Linguistics, Jeju Island, Korea, 750--758. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Deyi Xiong and Min Zhang. 2013. A Topic-Based Coherence Model for Statistical Machine Translation. Retrieved November 22, 2015 from https://www.aaai.org/ocs/index.php/AAAI/AAAI13/paper/view/6187/7313.Google ScholarGoogle Scholar
  38. Heng Yu, Jinsong Su, Yajuan Lv, and Qun Liu. 2013. A topic-triggered language model for statistical machine translation. In Proceedings of the 6th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, Nagoya, Japan, 447--454.Google ScholarGoogle Scholar
  39. Min Zhang, Xinyan Xiao, Deyi Xiong, and Qun Liu. 2014. Topic-based dissimilarity and sensitivity models for translation rule selection. Journal of Artificial Intelligence Research 50, 1--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Yinggong Zhao, Shujian Huang, Huadong Chen, and Jiajun Chen. 2014. An investigation on statistical machine translation with neural language models. In Proceedings of Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. 175--186.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Adaptation of Language Models for SMT Using Neural Networks with Topic Information

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 15, Issue 3
      March 2016
      220 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/2876004
      Issue’s Table of Contents

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 January 2016
      • Accepted: 1 August 2015
      • Revised: 1 July 2015
      • Received: 1 January 2015
      Published in tallip Volume 15, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader