short-paper

Adaptation of Language Models for SMT Using Neural Networks with Topic Information

Authors:
Yinggong Zhao

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
View Profile

,
Shujian Huang

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
View Profile

,
Xin-Yu Dai

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
View Profile

,
Jiajun Chen

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
View Profile

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 15 Issue 3Article No.: 19pp 1–15https://doi.org/10.1145/2816816

Published:22 January 2016Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

Neural network language models (LMs) are shown to be effective in improving the performance of statistical machine translation (SMT) systems. However, state-of-the-art neural network LMs usually use words before the current position as context and neglect global topic information, which can help machine translation (MT) systems to select better translation candidates from a higher perspective. In this work, we propose improvement of the state-of-the-art feedforward neural language model with topic information. Two main issues need to be tackled when adding topics into neural network LMs for SMT: one is how to incorporate topics to the neural network; the other is how to get target-side topic distribution before translation. We incorporate topics by appending topic distribution to the input layer of a feedforward LM. We adopt a multinomial logistic-regression (MLR) model to predict the target-side topic distribution based on source side information. Moreover, we propose a feedforward neural network model to learn joint representations on the source side for topic prediction. LM experiments demonstrate that the perplexity on validation set can be greatly reduced by the topic-enhanced feedforward LM, and the prediction of target-side topics can be improved dramatically with the MLR model equipped with the joint source representations. A final MT experiment, conducted on a large-scale Chinese--English dataset, shows that our feedforward LM with predicted topics improves the translation performance against a strong baseline.

References

Michael Auli, Michel Galley, Chris Quirk, and Geoffrey Zweig. 2013. Joint language and translation modeling with recurrent neural networks. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Seattle, WA, 1044--1054. http://www.aclweb.org/anthology/D13-1106.Google Scholar
Paul Baltescu and Phil Blunsom. 2015. Pragmatic neural language modelling in machine translation. In Proceedings of NAACL-HLT. Association for Computational Linguistics, Denver, CO, 820--829.Google ScholarCross Ref
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. Journal of Machine Learning Research. 1137--1155. Google ScholarDigital Library
Peter F. Brown, Vincent J. Della Pietra, Robert L. Mercer, Stephen A. Della Pietra, and Jennifer C. Lai. 1992. An estimate of an upper bound for the entropy of English. Computational Linguistics 18, 1, 31--40. Google ScholarDigital Library
Stanley F. Chen and Joshua Goodman. 1998. An Empirical Study of Smoothing Techniques for Language Modeling. Technical Report TR-10-98. Harvard University Center for Research in Computing Technology, Cambridge, MA.Google Scholar
David Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics 33, 2, 201--228. Google ScholarDigital Library
Jonathan H. Clark, Chris Dyer, Alon Lavie, and Noah A. Smith. 2011. Better hypothesis testing for statistical machine translation: Controlling for optimizer instability. In Proceedings of the 49th ACL-HLT. Association for Computational Linguistics, Portland, OR, 176--181. http://www.aclweb.org/anthology/P11-2031. Google ScholarDigital Library
Jacob Devlin, Rabih Zbib, Zhongqiang Huang, Thomas Lamar, Richard Schwartz, and John Makhoul. 2014. Fast and robust neural network joint models for statistical machine translation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Baltimore, MD, 1370--1380. http://www.aclweb.org/anthology/P14-1129.Google ScholarCross Ref
C. Goller and A. Kuchler. 1996. Learning task-dependent distributed representations by backpropagation through structure. In IEEE International Conference on Neural Networks, 1996, Vol. 1, 347--352. DOI:http://dx.doi.org/10.1109/ICNN.1996.548916Google Scholar
Joshua Goodman. 2001. Classes for fast maximum entropy training. CoRR cs.CL/0108006 (2001). http://arxiv.org/abs/cs.CL/0108006.Google Scholar
Çaglar Gülçehre, Orhan Firat, Kelvin Xu, Kyunghyun Cho, Loïc Barrault, Huei-Chi Lin, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2015. On using monolingual corpora in neural machine translation. CoRR abs/1503.03535 (2015). http://arxiv.org/abs/1503.03535.Google Scholar
Michael Gutmann and Aapo Hyvärinen. 2010. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of AISTATS.Google Scholar
Yuening Hu, Ke Zhai, Vladimir Eidelman, and Jordan Boyd-Graber. 2014. Polylingual tree-based topic models for translation domain adaptation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Baltimore, MD, 1166--1176. http://www.aclweb.org/anthology/P14-1110.Google ScholarCross Ref
Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional neural network for modelling sentences. In Proceedings of ACL. Association for Computational Linguistics, Baltimore, MD, 655--665.Google ScholarCross Ref
Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of EMNLP 2004. Association for Computational Linguistics, Barcelona, Spain, 388--395.Google Scholar
Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of the 2003 Conference of the NAACL-HLT. Stroudsburg, PA, 48--54. Google ScholarDigital Library
Hai-Son Le, Alexandre Allauzen, and François Yvon. 2012. Continuous space translation models with neural networks. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Montréal, Canada, 39--48. http://www.aclweb.org/anthology/N12-1005. Google ScholarDigital Library
Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning. Beijing, China.Google ScholarDigital Library
Yang Liu, Qun Liu, and Shouxun Lin. 2006. Tree-to-string alignment template for statistical machine translation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Sydney, Australia, 609--616. http://www.aclweb.org/anthology/P06-1077. Google ScholarDigital Library
Thang Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, and Wojciech Zaremba. 2014. Addressing the rare word problem in neural machine translation. arXiv preprint arXiv:1410.8206.Google Scholar
Daniel Marcu, Wei Wang, Abdessamad Echihabi, and Kevin Knight. 2006. SPMT: Statistical machine translation with syntactified target language phrases. In Proceedings of EMNLP. Association for Computational Linguistics, Sydney, Australia, 44--52. Google ScholarDigital Library
Tomáš Mikolov. 2012. Statistical Language Models Based on Neural Networks. Ph.D. Dissertation. Brno University of Technology, Brno, Czech Republic.Google Scholar
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013). http://arxiv.org/abs/1301.3781.Google Scholar
Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan “Honza” Černocký, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Proceedings of INTERSPEECH.Google Scholar
Tomas Mikolov and Geoffrey Zweig. 2012. Context dependent recurrent neural network language model. In Spoken Language Technologies. IEEE.Google Scholar
Andriy Mnih and Geoffrey E. Hinton. 2009. A scalable hierarchical distributed language model. In Advances in Neural Information Processing Systems 21. 1081--1088.Google Scholar
Vinod Nair and Geoffrey E. Hinton. 2010. Rectified linear units improve restricted Boltzmann machines. In Proceedings of ICML. 807--814.Google ScholarDigital Library
Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics 29, 1, 19--51. Google ScholarDigital Library
Nick Ruiz and Marcello Federico. 2011. Topic adaptation for lecture translation through bilingual latent semantic models. In Proceedings of the 6th Workshop on Statistical Machine Translation. Association for Computational Linguistics, Edinburgh, Scotland, 294--302. Google ScholarDigital Library
Libin Shen, Jinxi Xu, and Ralph Weischedel. 2008. A new string-to-dependency machine translation algorithm with a target dependency language model. In Proceedings of ACL-08: HLT. Association for Computational Linguistics, Columbus, Ohio, 577--585.Google Scholar
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of EMNLP. Association for Computational Linguistics, Seattle, WA, 1631--1642.Google Scholar
Ilya Sutskever, Oriol Vinyals, and Quoc V. V. Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems. 3104--3112.Google Scholar
Yik-Cheung Tam, Ian Lane, and Tanja Schultz. 2007. Bilingual-LSA based LM adaptation for spoken language translation. In Proceedings of ACL. Association for Computational Linguistics, Prague, Czech Republic, 520--527.Google Scholar
Ashish Vaswani, Yinggong Zhao, Victoria Fossum, and David Chiang. 2013. Decoding with large-scale neural language models improves translation. In Proceedings of EMNLP. Association for Computational Linguistics, Seattle, WA, 1387--1392.Google Scholar
Mingxuan Wang, Zhengdong Lu, Hang Li, Wenbin Jiang, and Qun Liu. 2015. genCNN: A convolutional architecture for word sequence prediction. In Proceedings of ACL-IJCNLP. Association for Computational Linguistics, Beijing, China, 1567--1576.Google ScholarCross Ref
Xinyan Xiao, Deyi Xiong, Min Zhang, Qun Liu, and Shouxun Lin. 2012. A topic similarity model for hierarchical phrase-based translation. In Proceedings of ACL. Association for Computational Linguistics, Jeju Island, Korea, 750--758. Google ScholarDigital Library
Deyi Xiong and Min Zhang. 2013. A Topic-Based Coherence Model for Statistical Machine Translation. Retrieved November 22, 2015 from https://www.aaai.org/ocs/index.php/AAAI/AAAI13/paper/view/6187/7313.Google Scholar
Heng Yu, Jinsong Su, Yajuan Lv, and Qun Liu. 2013. A topic-triggered language model for statistical machine translation. In Proceedings of the 6th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, Nagoya, Japan, 447--454.Google Scholar
Min Zhang, Xinyan Xiao, Deyi Xiong, and Qun Liu. 2014. Topic-based dissimilarity and sensitivity models for translation rule selection. Journal of Artificial Intelligence Research 50, 1--30. Google ScholarDigital Library
Yinggong Zhao, Shujian Huang, Huadong Chen, and Jiajun Chen. 2014. An investigation on statistical machine translation with neural language models. In Proceedings of Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. 175--186.Google ScholarCross Ref

Index Terms

Adaptation of Language Models for SMT Using Neural Networks with Topic Information
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Improving handwritten Chinese text recognition using neural network language models and convolutional neural network shape models
Abstract
Handwritten Chinese text recognition based on over-segmentation and path search integrating multiple contexts has been demonstrated successful, wherein the language model (LM) and character shape models play important roles. Although back-off N-...
Highlights
- We evaluate comprehensively neural network language models (NNLMs) and hybrid NNLMs in handwritten Chinese text recognition.
- We apply CNNs to over-segmentation and geometric context modeling in addition to character recognition.
- ...
Read More
Topic sentiment change analysis
MLDM'11: Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition

Public opinions on a topic may change over time. Topic Sentiment change analysis is a new research problem consisting of two main components: (a) mining opinions on a certain topic, and (b) detect significant changes of sentiment of the opinions on the ...
Read More
Topic-driven reader comments summarization
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Readers of a news article often read its comments contributed by other readers. By reading comments, readers obtain not only complementary information about this news article but also the opinions from other readers. However, the existing ranking ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Asian and Low-Resource Language Information Processing Volume 15, Issue 3
March 2016
220 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/2876004
Editor:
Richard Sproat
Google, Inc., USA
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 January 2016
- Accepted: 1 August 2015
- Revised: 1 July 2015
- Received: 1 January 2015
Published in tallip Volume 15, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Statistical machine translation
feedforward neural network language model
joint representation
multinomial logistic regression
topic model
Qualifiers
- short-paper
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 250
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Adaptation of Language Models for SMT Using Neural Networks with Topic Information

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Improving handwritten Chinese text recognition using neural network language models and convolutional neural network shape models

Topic sentiment change analysis

Topic-driven reader comments summarization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Adaptation of Language Models for SMT Using Neural Networks with Topic Information

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Improving handwritten Chinese text recognition using neural network language models and convolutional neural network shape models

Topic sentiment change analysis

Topic-driven reader comments summarization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media