skip to main content
10.1145/2594291.2594321acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Code completion with statistical language models

Published:09 June 2014Publication History

ABSTRACT

We address the problem of synthesizing code completions for programs using APIs. Given a program with holes, we synthesize completions for holes with the most likely sequences of method calls.

Our main idea is to reduce the problem of code completion to a natural-language processing problem of predicting probabilities of sentences. We design a simple and scalable static analysis that extracts sequences of method calls from a large codebase, and index these into a statistical language model. We then employ the language model to find the highest ranked sentences, and use them to synthesize a code completion. Our approach is able to synthesize sequences of calls across multiple objects together with their arguments.

Experiments show that our approach is fast and effective. Virtually all computed completions typecheck, and the desired completion appears in the top 3 results in 90% of the cases.

References

  1. Android-er. http://android-er.blogspot.ch/2011/03/set-wallpaper-using-wallpapermanager.html.Google ScholarGoogle Scholar
  2. Android how-to's. https://sites.google.com/site/androidhowto/how-to-1/display-a-web-page.Google ScholarGoogle Scholar
  3. Stack overflow. http://www.stackoverflow.com/.Google ScholarGoogle Scholar
  4. Tutorial for android. http://www.tutorialforandroid.com/2009/01/changing-screen-brightness.html.Google ScholarGoogle Scholar
  5. Tutorial for android. http://www.tutorialforandroid.com/2009/10/turn-off-turn-on-wifi-in-android-using.html.Google ScholarGoogle Scholar
  6. Vogella tutorials. http://www.vogella.com/articles/AndroidMedia/article.html.Google ScholarGoogle Scholar
  7. Alnusair, A., Zhao, T., and Bodden, E. Effective API navigation and reuse. In IRI (aug. 2010), pp. 7--12.Google ScholarGoogle Scholar
  8. Ammons, G., Bodík, R., and Larus, J. R. Mining specifications. In POPL '02 (2002). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Beckman, N., Kim, D., and Aldrich, J. An empirical study of object protocols in the wild. In ECOOP'11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Bengio, Y., Ducharme, R., Vincent, P., and Janvin, C. A neural probabilistic language model. J. Mach. Learn. Res. 3 (Mar. 2003), 1137--1155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cook, J. E., and Wolf, A. L. Discovering models of software processes from event-based data. ACM Trans. Softw. Eng. Methodol. 7, 3 (1998), 215--249. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dagenais, B., and Hendren, L. J. Enabling static analysis for partial Java programs. In OOPSLA'08, pp. 313--328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Elman, J. L. Finding structure in time. Cognitive Science 14, 2 (1990), 179--211.Google ScholarGoogle ScholarCross RefCross Ref
  14. Gulwani, S. Dimensions in program synthesis. In symp. on Principles and practice of declarative programming (2010), PPDP '10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Gvero, T., Kuncak, V., Kuraj, I., and Piskac, R. Complete completion using types and weights. In PLDI '13 (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Gvero, T., Kuncak, V., and Piskac, R. Interactive synthesis of code snippets. In CAV'11, vol. 6806 of LNCS. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hindle, A., Barr, E. T., Su, Z., Gabel, M., and Devanbu, P. On the naturalness of software. In ICSE 2012 (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Holmes, R., and Murphy, G. C. Using structural context to recommend source code examples. In ICSE '05. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Holmes, R., Walker, R. J., and Murphy, G. C. Strathcona example recommendation tool. In FSE'05, pp. 237--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Katz, S. M. Estimation of probabilities from sparse data for the language model component of a speech recognizer. In IEEE Trans. on Acoustics, Speech and Singal processing (March 1987), vol. ASSP-35.Google ScholarGoogle Scholar
  21. Kneser, R., and Ney, H. Improved backing-off for m-gram language modeling. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (May 1995), vol. I.Google ScholarGoogle ScholarCross RefCross Ref
  22. Kombrink, S., Mikolov, T., Karafiát, M., and Burget, L. Recurrent neural network based language modeling in meeting recognition. In INTERSPEECH (2011), pp. 2877--2880.Google ScholarGoogle ScholarCross RefCross Ref
  23. Mandelin, D., Xu, L., Bodík, R., and Kimelman, D. Jungloid mining: Helping to navigate the api jungle. In PLDI '05 (2005). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Mikolov, T., Deoras, A., Povey, D., Burget, L., and Cernocky, J. Strategies for training large scale neural network language models. In ASRU 2011 (2011), IEEE Signal Processing Society.Google ScholarGoogle ScholarCross RefCross Ref
  25. Mishne, A., Shoham, S., and Yahav, E. Typestate-based semantic code search over partial programs. In OOPSLA '12 (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Perelman, D., Gulwani, S., Ball, T., and Grossman, D. Type-directed completion of partial expressions. In PLDI (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Reiss, S. P. Semantics-based code search. In ICSE'09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Rosenfeld, R. Two decades of statistical language modeling: Where do we go from here. In Proceedings of the IEEE (2000), p. 2000.Google ScholarGoogle Scholar
  29. Shoham, S., Yahav, E., Fink, S., and Pistoia, M. Static specification mining using automata-based abstractions. In ISSTA '07 (2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Solar-Lezama, A. The sketching approach to program synthesis. In APLAS '09 (2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Solar-Lezama, A., Tancau, L., Bodík, R., Seshia, S. A., and Saraswat, V. A. Combinatorial sketching for finite programs. In ASPLOS (2006), pp. 404--415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Srivastava, S., Gulwani, S., and Foster, J. S. From program verification to program synthesis. In POPL '10 (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Steensgaard, B. Points-to analysis in almost linear time. In Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages (1996), POPL '96, pp. 32--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Stolcke, A. SRILM-an Extensible Language Modeling Toolkit. International Conference on Spoken Language Processing (2002).Google ScholarGoogle Scholar
  35. Thummalapenta, S., and Xie, T. Parseweb: a programmer assistant for reusing open source code on the web. In ASE '07 (2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Vallée-Rai, R., et al. Soot - a Java Optimization Framework. In Proceedings of CASCON 1999 (1999), pp. 125--135.Google ScholarGoogle Scholar
  37. Vechev, M., and Yahav, E. Deriving linearizable fine-grained concurrent objects. In PLDI '08 (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Wasylkowski, A., and Zeller, A. Mining temporal specifications from object usage. In Autom. Softw. Eng. (2011), vol. 18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Weimer, W., and Necula, G. Mining temporal specifications for error detection. In TACAS'05, vol. 3440 of LNCS. 2005, pp. 461--476. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Witten, I. H., and Bell, T. C. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Transactions on Information Theory 37, 4 (1991), 1085--1094. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Yang, J., Evans, D., Bhardwaj, D., Bhat, T., and Das, M. Perracotta: mining temporal API rules from imperfect traces. In ICSE '06, pp. 282--291. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Yessenov, K., Xu, Z., and Solar-Lezama, A. Data-driven synthesis for object-oriented frameworks. In OOPSLA '11 (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Zhong, H., Xie, T., Zhang, L., Pei, J., and Mei, H. MAPO: Mining and recommending API usage patterns. In ECOOP'09. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Code completion with statistical language models

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PLDI '14: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation
      June 2014
      619 pages
      ISBN:9781450327848
      DOI:10.1145/2594291
      • cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 49, Issue 6
        PLDI '14
        June 2014
        598 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2666356
        • Editor:
        • Andy Gill
        Issue’s Table of Contents

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 June 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      PLDI '14 Paper Acceptance Rate52of287submissions,18%Overall Acceptance Rate406of2,067submissions,20%

      Upcoming Conference

      PLDI '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader