skip to main content
10.1145/2661136.2661148acmconferencesArticle/Chapter ViewAbstractPublication PagessplashConference Proceedingsconference-collections
research-article

Phrase-Based Statistical Translation of Programming Languages

Published:14 October 2014Publication History

ABSTRACT

Phrase-based statistical machine translation approaches have been highly successful in translating between natural languages and are heavily used by commercial systems (e.g. Google Translate).

The main objective of this work is to investigate the applicability of these approaches for translating between programming languages. Towards that, we investigated several variants of the phrase-based translation approach: i) a direct application of the approach to programming languages, ii) a novel modification of the approach to incorporate the grammatical structure of the target programming language (so to avoid generating target programs which do not parse), and iii) a combination of ii) with custom rules added to improve the quality of the translation.

To experiment with the above systems, we investigated machine translation from C# to Java. For the training, which takes about 60 hours, we used a parallel corpus of 20,499 C#-to-Java method translations. We then evaluated each of the three systems above by translating 1,000 C# methods. Our experimental results indicate that with the most advanced system, about 60% of the translated methods compile (the top ranked) and out of a random sample of 50 correctly compiled methods, 68% (34 methods) were semantically equivalent to the reference solution.

References

  1. ANDREAS, J., VLACHOS, A., AND CLARK, S. Semantic parsing as machine translation. The Association for Computer Linguistics, pp. 47--52.Google ScholarGoogle Scholar
  2. BANEA, C., MIHALCEA, R., WIEBE, J., AND HASSAN, S. Multilingual subjectivity analysis using machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (Stroudsburg, PA, USA, 2008), EMNLP '08, Association for Computational Linguistics, pp. 127--135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Berkeley aligner. https://code.google.com/p/berkeleyaligner/.Google ScholarGoogle Scholar
  4. CER, D., GALLEY, M., JURAFSKY, D., AND MANNING, C. D. Phrasal: A statistical machine translation toolkit for exploring new model features. In Proceedings of the NAACL HLT 2010 Demonstration Session (Los Angeles, California, June 2010), Association for Computational Linguistics, pp. 9--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. HINDLE, A., BARR, E. T., SU, Z., GABEL, M., AND DEVANBU, P. On the naturalness of software. In ICSE 2012 (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. HOPCROFT, J. E., AND ULLMAN, J. D. Introduction to Automata Theory, Languages and Computation. Addison-Wesley, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. KOEHN, P. Statistical Machine Translation, 1st ed. Cambridge University Press, New York, NY, USA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. KOEHN, P., OCH, F. J., AND MARCU, D. Statistical phrase-based translation. In NAACL'2003 - Volume 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. KUNCHUKUTTAN, A., ROY, S., PATEL, P., LADHA, K., GUPTA, S., KHAPRA, M. M., AND BHATTACHARYYA, P. Experiences in resource generation for machine translation through crowdsourcing. In LREC (2012), pp. 384--391.Google ScholarGoogle Scholar
  10. MACHEREY, W., OCH, F. J., THAYER, I., AND USZKOREIT, J. Lattice-based minimum error rate training for statistical machine translation. In EMNLP (2008), pp. 725--734. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. NGUYEN, A. T., NGUYEN, T. T., AND NGUYEN, T. N. Lexical statistical machine translation for language migration. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering (New York, NY, USA, 2013), ESEC/FSE 2013, ACM, pp. 651--654. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. OCH, F. J. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1 (Stroudsburg, PA, USA, 2003), ACL '03, Association for Computational Linguistics, pp. 160--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. OCH, F. J., AND NEY, H. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (Stroudsburg, PA, USA, 2002), ACL '02, Association for Computational Linguistics, pp. 295--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. PAPINENI, K., ROUKOS, S., WARD, T., AND ZHU, W.-J. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (Stroudsburg, PA, USA, 2002), ACL '02, Association for Computational Linguistics, pp. 311--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. PARR, T. The Definitive ANTLR 4 Reference. Pragmatic Bookshelf, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. RAYCHEV, V., SCH¨A FER, M., SRIDHARAN, M., AND VECHEV, M. Refactoring with synthesis. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications (New York, NY, USA, 2013), OOPSLA '13, ACM, pp. 339--354. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. RAYCHEV, V., VECHEV, M., AND YAHAV, E. Code completion with statistical language models. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (New York, NY, USA, 2014), PLDI '14, ACM, pp. 419--428. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. SENELLART, J., DIENES, P., AND VRADI, T. New generation systran translation system. In In Proceedings of MT Summit IIX Senellart J., Yang J., Rebollo A. 2003. SYSTRAN Intuitive Coding Technology. In Proceedings of MT Summit IX (2001).Google ScholarGoogle Scholar
  19. STOLCKE, A. SRILM-an Extensible Language Modeling Toolkit. International Conference on Spoken Language Processing (2002).Google ScholarGoogle Scholar

Index Terms

  1. Phrase-Based Statistical Translation of Programming Languages

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        Onward! 2014: Proceedings of the 2014 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software
        October 2014
        332 pages
        ISBN:9781450332101
        DOI:10.1145/2661136

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 October 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Onward! 2014 Paper Acceptance Rate16of35submissions,46%Overall Acceptance Rate40of105submissions,38%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader