ABSTRACT
Phrase-based statistical machine translation approaches have been highly successful in translating between natural languages and are heavily used by commercial systems (e.g. Google Translate).
The main objective of this work is to investigate the applicability of these approaches for translating between programming languages. Towards that, we investigated several variants of the phrase-based translation approach: i) a direct application of the approach to programming languages, ii) a novel modification of the approach to incorporate the grammatical structure of the target programming language (so to avoid generating target programs which do not parse), and iii) a combination of ii) with custom rules added to improve the quality of the translation.
To experiment with the above systems, we investigated machine translation from C# to Java. For the training, which takes about 60 hours, we used a parallel corpus of 20,499 C#-to-Java method translations. We then evaluated each of the three systems above by translating 1,000 C# methods. Our experimental results indicate that with the most advanced system, about 60% of the translated methods compile (the top ranked) and out of a random sample of 50 correctly compiled methods, 68% (34 methods) were semantically equivalent to the reference solution.
- ANDREAS, J., VLACHOS, A., AND CLARK, S. Semantic parsing as machine translation. The Association for Computer Linguistics, pp. 47--52.Google Scholar
- BANEA, C., MIHALCEA, R., WIEBE, J., AND HASSAN, S. Multilingual subjectivity analysis using machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (Stroudsburg, PA, USA, 2008), EMNLP '08, Association for Computational Linguistics, pp. 127--135. Google ScholarDigital Library
- Berkeley aligner. https://code.google.com/p/berkeleyaligner/.Google Scholar
- CER, D., GALLEY, M., JURAFSKY, D., AND MANNING, C. D. Phrasal: A statistical machine translation toolkit for exploring new model features. In Proceedings of the NAACL HLT 2010 Demonstration Session (Los Angeles, California, June 2010), Association for Computational Linguistics, pp. 9--12. Google ScholarDigital Library
- HINDLE, A., BARR, E. T., SU, Z., GABEL, M., AND DEVANBU, P. On the naturalness of software. In ICSE 2012 (2012). Google ScholarDigital Library
- HOPCROFT, J. E., AND ULLMAN, J. D. Introduction to Automata Theory, Languages and Computation. Addison-Wesley, 1979. Google ScholarDigital Library
- KOEHN, P. Statistical Machine Translation, 1st ed. Cambridge University Press, New York, NY, USA, 2010. Google ScholarDigital Library
- KOEHN, P., OCH, F. J., AND MARCU, D. Statistical phrase-based translation. In NAACL'2003 - Volume 1. Google ScholarDigital Library
- KUNCHUKUTTAN, A., ROY, S., PATEL, P., LADHA, K., GUPTA, S., KHAPRA, M. M., AND BHATTACHARYYA, P. Experiences in resource generation for machine translation through crowdsourcing. In LREC (2012), pp. 384--391.Google Scholar
- MACHEREY, W., OCH, F. J., THAYER, I., AND USZKOREIT, J. Lattice-based minimum error rate training for statistical machine translation. In EMNLP (2008), pp. 725--734. Google ScholarDigital Library
- NGUYEN, A. T., NGUYEN, T. T., AND NGUYEN, T. N. Lexical statistical machine translation for language migration. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering (New York, NY, USA, 2013), ESEC/FSE 2013, ACM, pp. 651--654. Google ScholarDigital Library
- OCH, F. J. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1 (Stroudsburg, PA, USA, 2003), ACL '03, Association for Computational Linguistics, pp. 160--167. Google ScholarDigital Library
- OCH, F. J., AND NEY, H. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (Stroudsburg, PA, USA, 2002), ACL '02, Association for Computational Linguistics, pp. 295--302. Google ScholarDigital Library
- PAPINENI, K., ROUKOS, S., WARD, T., AND ZHU, W.-J. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (Stroudsburg, PA, USA, 2002), ACL '02, Association for Computational Linguistics, pp. 311--318. Google ScholarDigital Library
- PARR, T. The Definitive ANTLR 4 Reference. Pragmatic Bookshelf, 2013. Google ScholarDigital Library
- RAYCHEV, V., SCH¨A FER, M., SRIDHARAN, M., AND VECHEV, M. Refactoring with synthesis. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications (New York, NY, USA, 2013), OOPSLA '13, ACM, pp. 339--354. Google ScholarDigital Library
- RAYCHEV, V., VECHEV, M., AND YAHAV, E. Code completion with statistical language models. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (New York, NY, USA, 2014), PLDI '14, ACM, pp. 419--428. Google ScholarDigital Library
- SENELLART, J., DIENES, P., AND VRADI, T. New generation systran translation system. In In Proceedings of MT Summit IIX Senellart J., Yang J., Rebollo A. 2003. SYSTRAN Intuitive Coding Technology. In Proceedings of MT Summit IX (2001).Google Scholar
- STOLCKE, A. SRILM-an Extensible Language Modeling Toolkit. International Conference on Spoken Language Processing (2002).Google Scholar
Index Terms
- Phrase-Based Statistical Translation of Programming Languages
Recommendations
Integrating source-language context into phrase-based statistical machine translation
The translation features typically used in Phrase-Based Statistical Machine Translation (PB-SMT) model dependencies between the source and target phrases, but not among the phrases in the source language themselves. A swathe of research has demonstrated ...
Slavic languages in phrase-based statistical machine translation: a survey
The demand for translations is increasing at a rate far beyond the capacity of professional translators. It is too difficult, time consuming and expensive to translate everything from scratch in each language. Machine translation offers a solution, as ...
Syntactic discriminative language model rerankers for statistical machine translation
This article describes a method that successfully exploits syntactic features for n-best translation candidate reranking using perceptrons. We motivate the utility of syntax by demonstrating the superior performance of parsers over n-gram language ...
Comments