Abstract
Neural network language models (LMs) are shown to be effective in improving the performance of statistical machine translation (SMT) systems. However, state-of-the-art neural network LMs usually use words before the current position as context and neglect global topic information, which can help machine translation (MT) systems to select better translation candidates from a higher perspective. In this work, we propose improvement of the state-of-the-art feedforward neural language model with topic information. Two main issues need to be tackled when adding topics into neural network LMs for SMT: one is how to incorporate topics to the neural network; the other is how to get target-side topic distribution before translation. We incorporate topics by appending topic distribution to the input layer of a feedforward LM. We adopt a multinomial logistic-regression (MLR) model to predict the target-side topic distribution based on source side information. Moreover, we propose a feedforward neural network model to learn joint representations on the source side for topic prediction. LM experiments demonstrate that the perplexity on validation set can be greatly reduced by the topic-enhanced feedforward LM, and the prediction of target-side topics can be improved dramatically with the MLR model equipped with the joint source representations. A final MT experiment, conducted on a large-scale Chinese--English dataset, shows that our feedforward LM with predicted topics improves the translation performance against a strong baseline.
- Michael Auli, Michel Galley, Chris Quirk, and Geoffrey Zweig. 2013. Joint language and translation modeling with recurrent neural networks. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Seattle, WA, 1044--1054. http://www.aclweb.org/anthology/D13-1106.Google Scholar
- Paul Baltescu and Phil Blunsom. 2015. Pragmatic neural language modelling in machine translation. In Proceedings of NAACL-HLT. Association for Computational Linguistics, Denver, CO, 820--829.Google ScholarCross Ref
- Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. Journal of Machine Learning Research. 1137--1155. Google ScholarDigital Library
- Peter F. Brown, Vincent J. Della Pietra, Robert L. Mercer, Stephen A. Della Pietra, and Jennifer C. Lai. 1992. An estimate of an upper bound for the entropy of English. Computational Linguistics 18, 1, 31--40. Google ScholarDigital Library
- Stanley F. Chen and Joshua Goodman. 1998. An Empirical Study of Smoothing Techniques for Language Modeling. Technical Report TR-10-98. Harvard University Center for Research in Computing Technology, Cambridge, MA.Google Scholar
- David Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics 33, 2, 201--228. Google ScholarDigital Library
- Jonathan H. Clark, Chris Dyer, Alon Lavie, and Noah A. Smith. 2011. Better hypothesis testing for statistical machine translation: Controlling for optimizer instability. In Proceedings of the 49th ACL-HLT. Association for Computational Linguistics, Portland, OR, 176--181. http://www.aclweb.org/anthology/P11-2031. Google ScholarDigital Library
- Jacob Devlin, Rabih Zbib, Zhongqiang Huang, Thomas Lamar, Richard Schwartz, and John Makhoul. 2014. Fast and robust neural network joint models for statistical machine translation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Baltimore, MD, 1370--1380. http://www.aclweb.org/anthology/P14-1129.Google ScholarCross Ref
- C. Goller and A. Kuchler. 1996. Learning task-dependent distributed representations by backpropagation through structure. In IEEE International Conference on Neural Networks, 1996, Vol. 1, 347--352. DOI:http://dx.doi.org/10.1109/ICNN.1996.548916Google Scholar
- Joshua Goodman. 2001. Classes for fast maximum entropy training. CoRR cs.CL/0108006 (2001). http://arxiv.org/abs/cs.CL/0108006.Google Scholar
- Çaglar Gülçehre, Orhan Firat, Kelvin Xu, Kyunghyun Cho, Loïc Barrault, Huei-Chi Lin, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2015. On using monolingual corpora in neural machine translation. CoRR abs/1503.03535 (2015). http://arxiv.org/abs/1503.03535.Google Scholar
- Michael Gutmann and Aapo Hyvärinen. 2010. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of AISTATS.Google Scholar
- Yuening Hu, Ke Zhai, Vladimir Eidelman, and Jordan Boyd-Graber. 2014. Polylingual tree-based topic models for translation domain adaptation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Baltimore, MD, 1166--1176. http://www.aclweb.org/anthology/P14-1110.Google ScholarCross Ref
- Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional neural network for modelling sentences. In Proceedings of ACL. Association for Computational Linguistics, Baltimore, MD, 655--665.Google ScholarCross Ref
- Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of EMNLP 2004. Association for Computational Linguistics, Barcelona, Spain, 388--395.Google Scholar
- Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of the 2003 Conference of the NAACL-HLT. Stroudsburg, PA, 48--54. Google ScholarDigital Library
- Hai-Son Le, Alexandre Allauzen, and François Yvon. 2012. Continuous space translation models with neural networks. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Montréal, Canada, 39--48. http://www.aclweb.org/anthology/N12-1005. Google ScholarDigital Library
- Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning. Beijing, China.Google ScholarDigital Library
- Yang Liu, Qun Liu, and Shouxun Lin. 2006. Tree-to-string alignment template for statistical machine translation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Sydney, Australia, 609--616. http://www.aclweb.org/anthology/P06-1077. Google ScholarDigital Library
- Thang Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, and Wojciech Zaremba. 2014. Addressing the rare word problem in neural machine translation. arXiv preprint arXiv:1410.8206.Google Scholar
- Daniel Marcu, Wei Wang, Abdessamad Echihabi, and Kevin Knight. 2006. SPMT: Statistical machine translation with syntactified target language phrases. In Proceedings of EMNLP. Association for Computational Linguistics, Sydney, Australia, 44--52. Google ScholarDigital Library
- Tomáš Mikolov. 2012. Statistical Language Models Based on Neural Networks. Ph.D. Dissertation. Brno University of Technology, Brno, Czech Republic.Google Scholar
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013). http://arxiv.org/abs/1301.3781.Google Scholar
- Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan “Honza” Černocký, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Proceedings of INTERSPEECH.Google Scholar
- Tomas Mikolov and Geoffrey Zweig. 2012. Context dependent recurrent neural network language model. In Spoken Language Technologies. IEEE.Google Scholar
- Andriy Mnih and Geoffrey E. Hinton. 2009. A scalable hierarchical distributed language model. In Advances in Neural Information Processing Systems 21. 1081--1088.Google Scholar
- Vinod Nair and Geoffrey E. Hinton. 2010. Rectified linear units improve restricted Boltzmann machines. In Proceedings of ICML. 807--814.Google ScholarDigital Library
- Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics 29, 1, 19--51. Google ScholarDigital Library
- Nick Ruiz and Marcello Federico. 2011. Topic adaptation for lecture translation through bilingual latent semantic models. In Proceedings of the 6th Workshop on Statistical Machine Translation. Association for Computational Linguistics, Edinburgh, Scotland, 294--302. Google ScholarDigital Library
- Libin Shen, Jinxi Xu, and Ralph Weischedel. 2008. A new string-to-dependency machine translation algorithm with a target dependency language model. In Proceedings of ACL-08: HLT. Association for Computational Linguistics, Columbus, Ohio, 577--585.Google Scholar
- Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of EMNLP. Association for Computational Linguistics, Seattle, WA, 1631--1642.Google Scholar
- Ilya Sutskever, Oriol Vinyals, and Quoc V. V. Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems. 3104--3112.Google Scholar
- Yik-Cheung Tam, Ian Lane, and Tanja Schultz. 2007. Bilingual-LSA based LM adaptation for spoken language translation. In Proceedings of ACL. Association for Computational Linguistics, Prague, Czech Republic, 520--527.Google Scholar
- Ashish Vaswani, Yinggong Zhao, Victoria Fossum, and David Chiang. 2013. Decoding with large-scale neural language models improves translation. In Proceedings of EMNLP. Association for Computational Linguistics, Seattle, WA, 1387--1392.Google Scholar
- Mingxuan Wang, Zhengdong Lu, Hang Li, Wenbin Jiang, and Qun Liu. 2015. genCNN: A convolutional architecture for word sequence prediction. In Proceedings of ACL-IJCNLP. Association for Computational Linguistics, Beijing, China, 1567--1576.Google ScholarCross Ref
- Xinyan Xiao, Deyi Xiong, Min Zhang, Qun Liu, and Shouxun Lin. 2012. A topic similarity model for hierarchical phrase-based translation. In Proceedings of ACL. Association for Computational Linguistics, Jeju Island, Korea, 750--758. Google ScholarDigital Library
- Deyi Xiong and Min Zhang. 2013. A Topic-Based Coherence Model for Statistical Machine Translation. Retrieved November 22, 2015 from https://www.aaai.org/ocs/index.php/AAAI/AAAI13/paper/view/6187/7313.Google Scholar
- Heng Yu, Jinsong Su, Yajuan Lv, and Qun Liu. 2013. A topic-triggered language model for statistical machine translation. In Proceedings of the 6th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, Nagoya, Japan, 447--454.Google Scholar
- Min Zhang, Xinyan Xiao, Deyi Xiong, and Qun Liu. 2014. Topic-based dissimilarity and sensitivity models for translation rule selection. Journal of Artificial Intelligence Research 50, 1--30. Google ScholarDigital Library
- Yinggong Zhao, Shujian Huang, Huadong Chen, and Jiajun Chen. 2014. An investigation on statistical machine translation with neural language models. In Proceedings of Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. 175--186.Google ScholarCross Ref
Index Terms
- Adaptation of Language Models for SMT Using Neural Networks with Topic Information
Recommendations
Improving handwritten Chinese text recognition using neural network language models and convolutional neural network shape models
AbstractHandwritten Chinese text recognition based on over-segmentation and path search integrating multiple contexts has been demonstrated successful, wherein the language model (LM) and character shape models play important roles. Although back-off N-...
Highlights- We evaluate comprehensively neural network language models (NNLMs) and hybrid NNLMs in handwritten Chinese text recognition.
- We apply CNNs to over-segmentation and geometric context modeling in addition to character recognition.
- ...
Topic sentiment change analysis
MLDM'11: Proceedings of the 7th international conference on Machine learning and data mining in pattern recognitionPublic opinions on a topic may change over time. Topic Sentiment change analysis is a new research problem consisting of two main components: (a) mining opinions on a certain topic, and (b) detect significant changes of sentiment of the opinions on the ...
Topic-driven reader comments summarization
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementReaders of a news article often read its comments contributed by other readers. By reading comments, readers obtain not only complementary information about this news article but also the opinions from other readers. However, the existing ranking ...
Comments