research-article

Lexical statistical machine translation for language migration

Authors:
Anh Tuan Nguyen

Iowa State University, USA

Iowa State University, USA
View Profile

,
Tung Thanh Nguyen

Iowa State University, USA

Iowa State University, USA
View Profile

,
Tien N. Nguyen

Iowa State University, USA

Iowa State University, USA
View Profile

ESEC/FSE 2013: Proceedings of the 2013 9th Joint Meeting on Foundations of Software EngineeringAugust 2013Pages 651–654https://doi.org/10.1145/2491411.2494584

Published:18 August 2013Publication History

ESEC/FSE 2013: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering

Pages 651–654

ABSTRACT

Prior research has shown that source code also exhibits naturalness, i.e. it is written by humans and is likely to be repetitive. The researchers also showed that the n-gram language model is useful in predicting the next token in a source file given a large corpus of existing source code. In this paper, we investigate how well statistical machine translation (SMT) models for natural languages could help in migrating source code from one programming language to another. We treat source code as a sequence of lexical tokens and apply a phrase-based SMT model on the lexemes of those tokens. Our empirical evaluation on migrating two Java projects into C# showed that lexical, phrase-based SMT could achieve high lexical translation accuracy (BLEU from 81.3-82.6%). Users would have to manually edit only 11.9-15.8% of the total number of tokens in the resulting code to correct it. However, a high percentage of total translation methods (49.5-58.6%) is syntactically incorrect. Therefore, our result calls for a more program-oriented SMT model that is capable of better integrating the syntactic and semantic information of a program to support language migration.

References

D. Cer, M. Galley, D. Jurafsky, and C. D. Manning. Phrasal: A statistical machine translation toolkit for exploring new model features. In Proceedings of the NAACL HLT 2010 Demonstration Session, pages 9–12, 2010. Association for Computational Linguistics. Google ScholarDigital Library
B. Dagenais and M. P. Robillard. Recommending adaptive changes for framework evolution. In ICSE’08: Proceedings of the 30th International Conference on Software Engineering, pages 481–490. ACM, 2008. Google ScholarDigital Library
db4o. http://sourceforge.net/projects/db4o/.Google Scholar
Google Translate. http://translate.google.com/.Google Scholar
A. Hindle, E. T. Barr, Z. Su, M. Gabel, and P. Devanbu. On the naturalness of software. In Proceedings of International Conference on Software Engineering, ICSE’12, pp. 837–847. IEEE Press, 2012. Google ScholarDigital Library
Java2CSharp. http://j2cstranslator.wiki.sourceforge.net/.Google Scholar
P. Koehn. Statistical Machine Translation. The Cambridge Press, 2010. Google ScholarDigital Library
Lucene. http://lucene.apache.org/.Google Scholar
S. Meng, X. Wang, L. Zhang, and H. Mei. A history-based matching approach to identification of framework evolution. In ICSE’12, pp. 353–363. IEEE. Google ScholarDigital Library
M. Mossienko. Automated Cobol to Java recycling. In Proceedings of European Conference on Software Maintenance and Reengineering, CSMR’03. IEEE. Google ScholarDigital Library
M. Nita and D. Notkin. Using twinning to adapt programs to alternative APIs. In Proceedings of ACM/IEEE International Conference on Software Engineering, ICSE ’10, pages 205–214. ACM, 2010. Google ScholarDigital Library
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL’02, pages 311–318. 2002. Google ScholarDigital Library
R. C. Waters. Program translation via abstraction and reimplementation. IEEE Trans. Softw. Eng., 14(8):1207–1228, Aug. 1988. Google ScholarDigital Library
W. Wu, Y.-G. Guéhéneuc, G. Antoniol, and M. Kim. AURA: a hybrid approach to identify framework evolution. In ICSE ’10, pages 325–334. ACM, 2010. Google ScholarDigital Library
K. Yasumatsu and N. Doi. Spice: A system for translating Smalltalk programs into a C environment. IEEE Trans. Softw. Eng., 21(11):902–912, Nov. 1995. Google ScholarDigital Library
H. Zhong, S. Thummalapenta, T. Xie, L. Zhang, and Q. Wang. Mining API mapping for language migration. In ICSE’10, pages 195–204. ACM, 2010. Google ScholarDigital Library

Index Terms

Lexical statistical machine translation for language migration
1. Social and professional topics
  1. Professional topics
    1. Management of computing and information systems
      1. Software management
        Software maintenance
2. Software and its engineering
  1. Software creation and management
    1. Software post-development issues

Recommendations

Integrating source-language context into phrase-based statistical machine translation

The translation features typically used in Phrase-Based Statistical Machine Translation (PB-SMT) model dependencies between the source and target phrases, but not among the phrases in the source language themselves. A swathe of research has demonstrated ...
Read More
Syntactic discriminative language model rerankers for statistical machine translation

This article describes a method that successfully exploits syntactic features for n-best translation candidate reranking using perceptrons. We motivate the utility of syntax by demonstrating the superior performance of parsers over n-gram language ...
Read More
Statistical machine translation of subtitles for highly inflected language pair

This paper addresses the problem of statistical machine translation between highly inflected languages. Even when dealing with closely-related language pairs, statistical machine translation encounters problems if the parallel corpus is not big enough. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ESEC/FSE 2013: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
August 2013
738 pages
ISBN:9781450322379
DOI:10.1145/2491411
General Chair:
Bertrand Meyer
ETH Zurich, Switzerland
,
Program Chairs:
Luciano Baresi
Politecnico di Milano, Italy
,
Mira Mezini
TU Darmstadt, Germany
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 August 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Language Migration
Statistical Machine Translation
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate112of543submissions,21%
Upcoming Conference
FSE '24

Sponsor:

sigsoft

32nd ACM International Conference on the Foundations of Software Engineering

July 15 - 19, 2024

Ipojuca (Pernambuco) , Brazil
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 66
  Total Citations
  View Citations
- 788
  Total Downloads
- Downloads (Last 12 months)78
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Lexical statistical machine translation for language migration

ESEC/FSE 2013: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Integrating source-language context into phrase-based statistical machine translation

Syntactic discriminative language model rerankers for statistical machine translation

Statistical machine translation of subtitles for highly inflected language pair