research-article

Code completion with statistical language models

Authors:
Veselin Raychev

ETH Zürich

ETH Zürich
View Profile

,
Martin Vechev

ETH Zürich

ETH Zürich
View Profile

,
Eran Yahav

Technion

Technion
View Profile

PLDI '14: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and ImplementationJune 2014Pages 419–428https://doi.org/10.1145/2594291.2594321

Published:09 June 2014Publication History

PLDI '14: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 419–428

ABSTRACT

We address the problem of synthesizing code completions for programs using APIs. Given a program with holes, we synthesize completions for holes with the most likely sequences of method calls.

Our main idea is to reduce the problem of code completion to a natural-language processing problem of predicting probabilities of sentences. We design a simple and scalable static analysis that extracts sequences of method calls from a large codebase, and index these into a statistical language model. We then employ the language model to find the highest ranked sentences, and use them to synthesize a code completion. Our approach is able to synthesize sequences of calls across multiple objects together with their arguments.

Experiments show that our approach is fast and effective. Virtually all computed completions typecheck, and the desired completion appears in the top 3 results in 90% of the cases.

References

Android-er. http://android-er.blogspot.ch/2011/03/set-wallpaper-using-wallpapermanager.html.Google Scholar
Android how-to's. https://sites.google.com/site/androidhowto/how-to-1/display-a-web-page.Google Scholar
Stack overflow. http://www.stackoverflow.com/.Google Scholar
Tutorial for android. http://www.tutorialforandroid.com/2009/01/changing-screen-brightness.html.Google Scholar
Tutorial for android. http://www.tutorialforandroid.com/2009/10/turn-off-turn-on-wifi-in-android-using.html.Google Scholar
Vogella tutorials. http://www.vogella.com/articles/AndroidMedia/article.html.Google Scholar
Alnusair, A., Zhao, T., and Bodden, E. Effective API navigation and reuse. In IRI (aug. 2010), pp. 7--12.Google Scholar
Ammons, G., Bodík, R., and Larus, J. R. Mining specifications. In POPL '02 (2002). Google ScholarDigital Library
Beckman, N., Kim, D., and Aldrich, J. An empirical study of object protocols in the wild. In ECOOP'11. Google ScholarDigital Library
Bengio, Y., Ducharme, R., Vincent, P., and Janvin, C. A neural probabilistic language model. J. Mach. Learn. Res. 3 (Mar. 2003), 1137--1155. Google ScholarDigital Library
Cook, J. E., and Wolf, A. L. Discovering models of software processes from event-based data. ACM Trans. Softw. Eng. Methodol. 7, 3 (1998), 215--249. Google ScholarDigital Library
Dagenais, B., and Hendren, L. J. Enabling static analysis for partial Java programs. In OOPSLA'08, pp. 313--328. Google ScholarDigital Library
Elman, J. L. Finding structure in time. Cognitive Science 14, 2 (1990), 179--211.Google ScholarCross Ref
Gulwani, S. Dimensions in program synthesis. In symp. on Principles and practice of declarative programming (2010), PPDP '10. Google ScholarDigital Library
Gvero, T., Kuncak, V., Kuraj, I., and Piskac, R. Complete completion using types and weights. In PLDI '13 (2013). Google ScholarDigital Library
Gvero, T., Kuncak, V., and Piskac, R. Interactive synthesis of code snippets. In CAV'11, vol. 6806 of LNCS. 2011. Google ScholarDigital Library
Hindle, A., Barr, E. T., Su, Z., Gabel, M., and Devanbu, P. On the naturalness of software. In ICSE 2012 (2012). Google ScholarDigital Library
Holmes, R., and Murphy, G. C. Using structural context to recommend source code examples. In ICSE '05. Google ScholarDigital Library
Holmes, R., Walker, R. J., and Murphy, G. C. Strathcona example recommendation tool. In FSE'05, pp. 237--240. Google ScholarDigital Library
Katz, S. M. Estimation of probabilities from sparse data for the language model component of a speech recognizer. In IEEE Trans. on Acoustics, Speech and Singal processing (March 1987), vol. ASSP-35.Google Scholar
Kneser, R., and Ney, H. Improved backing-off for m-gram language modeling. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (May 1995), vol. I.Google ScholarCross Ref
Kombrink, S., Mikolov, T., Karafiát, M., and Burget, L. Recurrent neural network based language modeling in meeting recognition. In INTERSPEECH (2011), pp. 2877--2880.Google ScholarCross Ref
Mandelin, D., Xu, L., Bodík, R., and Kimelman, D. Jungloid mining: Helping to navigate the api jungle. In PLDI '05 (2005). Google ScholarDigital Library
Mikolov, T., Deoras, A., Povey, D., Burget, L., and Cernocky, J. Strategies for training large scale neural network language models. In ASRU 2011 (2011), IEEE Signal Processing Society.Google ScholarCross Ref
Mishne, A., Shoham, S., and Yahav, E. Typestate-based semantic code search over partial programs. In OOPSLA '12 (2012). Google ScholarDigital Library
Perelman, D., Gulwani, S., Ball, T., and Grossman, D. Type-directed completion of partial expressions. In PLDI (2012). Google ScholarDigital Library
Reiss, S. P. Semantics-based code search. In ICSE'09. Google ScholarDigital Library
Rosenfeld, R. Two decades of statistical language modeling: Where do we go from here. In Proceedings of the IEEE (2000), p. 2000.Google Scholar
Shoham, S., Yahav, E., Fink, S., and Pistoia, M. Static specification mining using automata-based abstractions. In ISSTA '07 (2007). Google ScholarDigital Library
Solar-Lezama, A. The sketching approach to program synthesis. In APLAS '09 (2009). Google ScholarDigital Library
Solar-Lezama, A., Tancau, L., Bodík, R., Seshia, S. A., and Saraswat, V. A. Combinatorial sketching for finite programs. In ASPLOS (2006), pp. 404--415. Google ScholarDigital Library
Srivastava, S., Gulwani, S., and Foster, J. S. From program verification to program synthesis. In POPL '10 (2010). Google ScholarDigital Library
Steensgaard, B. Points-to analysis in almost linear time. In Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages (1996), POPL '96, pp. 32--41. Google ScholarDigital Library
Stolcke, A. SRILM-an Extensible Language Modeling Toolkit. International Conference on Spoken Language Processing (2002).Google Scholar
Thummalapenta, S., and Xie, T. Parseweb: a programmer assistant for reusing open source code on the web. In ASE '07 (2007). Google ScholarDigital Library
Vallée-Rai, R., et al. Soot - a Java Optimization Framework. In Proceedings of CASCON 1999 (1999), pp. 125--135.Google Scholar
Vechev, M., and Yahav, E. Deriving linearizable fine-grained concurrent objects. In PLDI '08 (2008). Google ScholarDigital Library
Wasylkowski, A., and Zeller, A. Mining temporal specifications from object usage. In Autom. Softw. Eng. (2011), vol. 18. Google ScholarDigital Library
Weimer, W., and Necula, G. Mining temporal specifications for error detection. In TACAS'05, vol. 3440 of LNCS. 2005, pp. 461--476. Google ScholarDigital Library
Witten, I. H., and Bell, T. C. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Transactions on Information Theory 37, 4 (1991), 1085--1094. Google ScholarDigital Library
Yang, J., Evans, D., Bhardwaj, D., Bhat, T., and Das, M. Perracotta: mining temporal API rules from imperfect traces. In ICSE '06, pp. 282--291. Google ScholarDigital Library
Yessenov, K., Xu, Z., and Solar-Lezama, A. Data-driven synthesis for object-oriented frameworks. In OOPSLA '11 (2011). Google ScholarDigital Library
Zhong, H., Xie, T., Zhang, L., Pei, J., and Mei, H. MAPO: Mining and recommending API usage patterns. In ECOOP'09. Google ScholarDigital Library

Index Terms

Code completion with statistical language models
1. Theory of computation
  1. Semantics and reasoning
    1. Program reasoning
      1. Program verification

Recommendations

Code completion with statistical language models
PLDI '14

We address the problem of synthesizing code completions for programs using APIs. Given a program with holes, we synthesize completions for holes with the most likely sequences of method calls.

Our main idea is to reduce the problem of code completion to ...
Read More
Language-parametric static semantic code completion

Code completion is an editor service in IDEs that proposes code fragments for the user to insert at the caret position in their code. Code completion should be sound and complete. It should be sound, such that it only proposes fragments that do not ...
Read More
Head-Driven Statistical Models for Natural Language Parsing

This article describes three statistical models for natural language parsing. The models extend methods from probabilistic context-free grammars to lexicalized grammars, leading to approaches in which a parse tree is represented as the sequence of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PLDI '14: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2014
619 pages
ISBN:9781450327848
DOI:10.1145/2594291
General Chair:
Michael O'Boyle
University of Edinburgh
,
Program Chair:
Keshav Pingali
University of Texas, Austin
ACM SIGPLAN Notices Volume 49, Issue 6
PLDI '14
June 2014
598 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2666356
Editor:
Andy Gill
University of Kansas, Lawrence, KS
Issue’s Table of Contents
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 June 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
PLDI '14 Paper Acceptance Rate52of287submissions,18%Overall Acceptance Rate406of2,067submissions,20%
More
Upcoming Conference
PLDI '24

Sponsor:

sigplan

ACM SIGPLAN Conference on Programming Language Design and Implementation

June 24 - 28, 2024

Copenhagen , Denmark
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 464
  Total Citations
  View Citations
- 2,775
  Total Downloads
- Downloads (Last 12 months)311
- Downloads (Last 6 weeks)57
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Code completion with statistical language models

PLDI '14: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Code completion with statistical language models

Language-parametric static semantic code completion

Head-Driven Statistical Models for Natural Language Parsing