ABSTRACT
We address the problem of synthesizing code completions for programs using APIs. Given a program with holes, we synthesize completions for holes with the most likely sequences of method calls.
Our main idea is to reduce the problem of code completion to a natural-language processing problem of predicting probabilities of sentences. We design a simple and scalable static analysis that extracts sequences of method calls from a large codebase, and index these into a statistical language model. We then employ the language model to find the highest ranked sentences, and use them to synthesize a code completion. Our approach is able to synthesize sequences of calls across multiple objects together with their arguments.
Experiments show that our approach is fast and effective. Virtually all computed completions typecheck, and the desired completion appears in the top 3 results in 90% of the cases.
- Android-er. http://android-er.blogspot.ch/2011/03/set-wallpaper-using-wallpapermanager.html.Google Scholar
- Android how-to's. https://sites.google.com/site/androidhowto/how-to-1/display-a-web-page.Google Scholar
- Stack overflow. http://www.stackoverflow.com/.Google Scholar
- Tutorial for android. http://www.tutorialforandroid.com/2009/01/changing-screen-brightness.html.Google Scholar
- Tutorial for android. http://www.tutorialforandroid.com/2009/10/turn-off-turn-on-wifi-in-android-using.html.Google Scholar
- Vogella tutorials. http://www.vogella.com/articles/AndroidMedia/article.html.Google Scholar
- Alnusair, A., Zhao, T., and Bodden, E. Effective API navigation and reuse. In IRI (aug. 2010), pp. 7--12.Google Scholar
- Ammons, G., Bodík, R., and Larus, J. R. Mining specifications. In POPL '02 (2002). Google ScholarDigital Library
- Beckman, N., Kim, D., and Aldrich, J. An empirical study of object protocols in the wild. In ECOOP'11. Google ScholarDigital Library
- Bengio, Y., Ducharme, R., Vincent, P., and Janvin, C. A neural probabilistic language model. J. Mach. Learn. Res. 3 (Mar. 2003), 1137--1155. Google ScholarDigital Library
- Cook, J. E., and Wolf, A. L. Discovering models of software processes from event-based data. ACM Trans. Softw. Eng. Methodol. 7, 3 (1998), 215--249. Google ScholarDigital Library
- Dagenais, B., and Hendren, L. J. Enabling static analysis for partial Java programs. In OOPSLA'08, pp. 313--328. Google ScholarDigital Library
- Elman, J. L. Finding structure in time. Cognitive Science 14, 2 (1990), 179--211.Google ScholarCross Ref
- Gulwani, S. Dimensions in program synthesis. In symp. on Principles and practice of declarative programming (2010), PPDP '10. Google ScholarDigital Library
- Gvero, T., Kuncak, V., Kuraj, I., and Piskac, R. Complete completion using types and weights. In PLDI '13 (2013). Google ScholarDigital Library
- Gvero, T., Kuncak, V., and Piskac, R. Interactive synthesis of code snippets. In CAV'11, vol. 6806 of LNCS. 2011. Google ScholarDigital Library
- Hindle, A., Barr, E. T., Su, Z., Gabel, M., and Devanbu, P. On the naturalness of software. In ICSE 2012 (2012). Google ScholarDigital Library
- Holmes, R., and Murphy, G. C. Using structural context to recommend source code examples. In ICSE '05. Google ScholarDigital Library
- Holmes, R., Walker, R. J., and Murphy, G. C. Strathcona example recommendation tool. In FSE'05, pp. 237--240. Google ScholarDigital Library
- Katz, S. M. Estimation of probabilities from sparse data for the language model component of a speech recognizer. In IEEE Trans. on Acoustics, Speech and Singal processing (March 1987), vol. ASSP-35.Google Scholar
- Kneser, R., and Ney, H. Improved backing-off for m-gram language modeling. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (May 1995), vol. I.Google ScholarCross Ref
- Kombrink, S., Mikolov, T., Karafiát, M., and Burget, L. Recurrent neural network based language modeling in meeting recognition. In INTERSPEECH (2011), pp. 2877--2880.Google ScholarCross Ref
- Mandelin, D., Xu, L., Bodík, R., and Kimelman, D. Jungloid mining: Helping to navigate the api jungle. In PLDI '05 (2005). Google ScholarDigital Library
- Mikolov, T., Deoras, A., Povey, D., Burget, L., and Cernocky, J. Strategies for training large scale neural network language models. In ASRU 2011 (2011), IEEE Signal Processing Society.Google ScholarCross Ref
- Mishne, A., Shoham, S., and Yahav, E. Typestate-based semantic code search over partial programs. In OOPSLA '12 (2012). Google ScholarDigital Library
- Perelman, D., Gulwani, S., Ball, T., and Grossman, D. Type-directed completion of partial expressions. In PLDI (2012). Google ScholarDigital Library
- Reiss, S. P. Semantics-based code search. In ICSE'09. Google ScholarDigital Library
- Rosenfeld, R. Two decades of statistical language modeling: Where do we go from here. In Proceedings of the IEEE (2000), p. 2000.Google Scholar
- Shoham, S., Yahav, E., Fink, S., and Pistoia, M. Static specification mining using automata-based abstractions. In ISSTA '07 (2007). Google ScholarDigital Library
- Solar-Lezama, A. The sketching approach to program synthesis. In APLAS '09 (2009). Google ScholarDigital Library
- Solar-Lezama, A., Tancau, L., Bodík, R., Seshia, S. A., and Saraswat, V. A. Combinatorial sketching for finite programs. In ASPLOS (2006), pp. 404--415. Google ScholarDigital Library
- Srivastava, S., Gulwani, S., and Foster, J. S. From program verification to program synthesis. In POPL '10 (2010). Google ScholarDigital Library
- Steensgaard, B. Points-to analysis in almost linear time. In Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages (1996), POPL '96, pp. 32--41. Google ScholarDigital Library
- Stolcke, A. SRILM-an Extensible Language Modeling Toolkit. International Conference on Spoken Language Processing (2002).Google Scholar
- Thummalapenta, S., and Xie, T. Parseweb: a programmer assistant for reusing open source code on the web. In ASE '07 (2007). Google ScholarDigital Library
- Vallée-Rai, R., et al. Soot - a Java Optimization Framework. In Proceedings of CASCON 1999 (1999), pp. 125--135.Google Scholar
- Vechev, M., and Yahav, E. Deriving linearizable fine-grained concurrent objects. In PLDI '08 (2008). Google ScholarDigital Library
- Wasylkowski, A., and Zeller, A. Mining temporal specifications from object usage. In Autom. Softw. Eng. (2011), vol. 18. Google ScholarDigital Library
- Weimer, W., and Necula, G. Mining temporal specifications for error detection. In TACAS'05, vol. 3440 of LNCS. 2005, pp. 461--476. Google ScholarDigital Library
- Witten, I. H., and Bell, T. C. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Transactions on Information Theory 37, 4 (1991), 1085--1094. Google ScholarDigital Library
- Yang, J., Evans, D., Bhardwaj, D., Bhat, T., and Das, M. Perracotta: mining temporal API rules from imperfect traces. In ICSE '06, pp. 282--291. Google ScholarDigital Library
- Yessenov, K., Xu, Z., and Solar-Lezama, A. Data-driven synthesis for object-oriented frameworks. In OOPSLA '11 (2011). Google ScholarDigital Library
- Zhong, H., Xie, T., Zhang, L., Pei, J., and Mei, H. MAPO: Mining and recommending API usage patterns. In ECOOP'09. Google ScholarDigital Library
Index Terms
- Code completion with statistical language models
Recommendations
Code completion with statistical language models
PLDI '14We address the problem of synthesizing code completions for programs using APIs. Given a program with holes, we synthesize completions for holes with the most likely sequences of method calls.
Our main idea is to reduce the problem of code completion to ...
Language-parametric static semantic code completion
Code completion is an editor service in IDEs that proposes code fragments for the user to insert at the caret position in their code. Code completion should be sound and complete. It should be sound, such that it only proposes fragments that do not ...
Head-Driven Statistical Models for Natural Language Parsing
This article describes three statistical models for natural language parsing. The models extend methods from probabilistic context-free grammars to lexicalized grammars, leading to approaches in which a parse tree is represented as the sequence of ...
Comments