skip to main content
survey
Public Access

A Survey of Machine Learning for Big Code and Naturalness

Published:31 July 2018Publication History
Skip Abstract Section

Abstract

Research at the intersection of machine learning, programming languages, and software engineering has recently taken important steps in proposing learnable probabilistic models of source code that exploit the abundance of patterns of code. In this article, we survey this work. We contrast programming languages against natural languages and discuss how these similarities and differences drive the design of probabilistic models. We present a taxonomy based on the underlying design principles of each model and use it to navigate the literature. Then, we review how researchers have adapted these models to application areas and discuss cross-cutting and application-specific challenges and opportunities.

Skip Supplemental Material Section

Supplemental Material

References

  1. Mithun Acharya, Tao Xie, Jian Pei, and Jun Xu. 2007. Mining API patterns as partial orders from source code: From usage scenarios to specifications. In Proceedings of the Joint Meeting of the European Software Engineering Conference and the Symposium on the Foundations of Software Engineering (ESEC/FSE’07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Karan Aggarwal, Mohammad Salameh, and Abram Hindle. 2015. Using Machine Translation for Converting Python 2 to Python 3 Code. Technical Report.Google ScholarGoogle Scholar
  3. Alex A. Alemi, Francois Chollet, Geoffrey Irving, Christian Szegedy, and Josef Urban. 2016. DeepMath--Deep sequence models for premise selection. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Miltiadis Allamanis, Earl T. Barr, Christian Bird, Premkumar Devanbu, Mark Marron, and Charles Sutton. 2016. Mining Semantic Loop Idioms from Big Code. Technical Report. Retrieved from https://www.microsoft.com/en-us/research/publication/mining-semantic-loop-idioms-big-code/.Google ScholarGoogle Scholar
  5. Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton. 2014. Learning natural coding conventions. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton. 2015. Suggesting accurate method and class names. In Proceedings of the Joint Meeting of the European Software Engineering Conference and the Symposium on the Foundations of Software Engineering (ESEC/FSE’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Miltiadis Allamanis and Marc Brockschmidt. 2017. SmartPaste: Learning to adapt source code. arXiv Preprint arXiv:1705.07867 (2017).Google ScholarGoogle Scholar
  8. Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to represent programs with graphs. In Proceedings of the International Conference on Learning Representations (ICLR’18).Google ScholarGoogle Scholar
  9. Miltiadis Allamanis, Pankajan Chanthirasegaran, Pushmeet Kohli, and Charles Sutton. 2017. Learning continuous semantic representations of symbolic expressions. In Proceedings of the International Conference on Machine Learning (ICML’17).Google ScholarGoogle Scholar
  10. Miltiadis Allamanis, Hao Peng, and Charles Sutton. 2016. A convolutional attention network for extreme summarization of source code. In Proceedings of the International Conference on Machine Learning (ICML’16).Google ScholarGoogle Scholar
  11. Miltiadis Allamanis and Charles Sutton. 2013. Mining source code repositories at massive scale using language modeling. In Proceedings of the Working Conference on Mining Software Repositories (MSR’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Miltiadis Allamanis and Charles Sutton. 2014. Mining idioms from source code. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Miltiadis Allamanis, Daniel Tarlow, Andrew Gordon, and Yi Wei. 2015. Bimodal modelling of source code and natural language. In Proceedings of the International Conference on Machine Learning (ICML’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Sven Amann, Sebastian Proksch, Sarah Nadi, and Mira Mezini. 2016. A study of visual studio usage in practice. In Proceedings of the International Conference on Software Analysis, Evolution, and Reengineering (SANER’16).Google ScholarGoogle ScholarCross RefCross Ref
  15. Gene M. Amdahl. 1967. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the Spring Joint Computer Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Matthew Amodio, Swarat Chaudhuri, and Thomas Reps. 2017. Neural attribute machines for program generation. arXiv Preprint arXiv:1705.09231 (2017).Google ScholarGoogle Scholar
  17. Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. 2016. Learning to compose neural networks for question answering. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’16).Google ScholarGoogle ScholarCross RefCross Ref
  18. Daniel Arp, Michael Spreitzenbarth, Malte Hubner, Hugo Gascon, and Konrad Rieck. 2014. DREBIN: Effective and explainable detection of android malware in your pocket. In Proceedings of the Network and Distributed System Security Symposium.Google ScholarGoogle ScholarCross RefCross Ref
  19. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the International Conference on Learning Representations (ICLR’15).Google ScholarGoogle Scholar
  20. Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow. 2017. DeepCoder: Learning to write programs. In Proceedings of the International Conference on Learning Representations (ICLR’17).Google ScholarGoogle Scholar
  21. Antonio Valerio Miceli Barone and Rico Sennrich. 2017. A parallel corpus of Python functions and documentation strings for automated code documentation and code generation. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers) 2 (2017), 314--319.Google ScholarGoogle Scholar
  22. Rohan Bavishi, Michael Pradel, and Koushik Sen. 2017. Context2Name: A deep learning-based approach to infer natural variable names from usage contexts. TU Darmstadt, Department of Computer Science.Google ScholarGoogle Scholar
  23. Tony Beltramelli. 2018. pix2code: Generating code from a graphical user interface screenshot. In Proceedings of the ACM SIGCHI Symposium on Engineering Interactive Computing Systems. ACM, 3 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, and Dawson Engler. 2010. A few billion lines of code later: Using static analysis to find bugs in the real world. Communications of the ACM 53, 2 (2010), 66--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Sahil Bhatia and Rishabh Singh. 2018. Automated correction for syntax errors in programming assignments using recurrent neural networks. In Proceedings of the International Conference on Software Engineering (ICSE’18).Google ScholarGoogle Scholar
  26. Avishkar Bhoopchand, Tim Rocktäschel, Earl Barr, and Sebastian Riedel. 2016. Learning Python code suggestion with a sparse pointer network. arXiv Preprint arXiv:1611.08307 (2016).Google ScholarGoogle Scholar
  27. Benjamin Bichsel, Veselin Raychev, Petar Tsankov, and Martin Vechev. 2016. Statistical deobfuscation of android applications. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Pavol Bielik, Veselin Raychev, and Martin Vechev. 2015. Programming with “big code”: Lessons, techniques and applications. In Proceedings of the LIPIcs-Leibniz International Proceedings in Informatics.Google ScholarGoogle Scholar
  29. Pavol Bielik, Veselin Raychev, and Martin Vechev. 2016. PHOG: Probabilistic model for code. In Proceedings of the International Conference on Machine Learning (ICML’16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. David M. Blei. 2012. Probabilistic topic models. Communications of the ACM 55, 4 (2012), 77--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Marc Brockschmidt, Yuxin Chen, Pushmeet Kohli, Siddharth Krishna, and Daniel Tarlow. 2017. Learning shape analysis. In Proceedings of the International Static Analysis Symposium. Springer.Google ScholarGoogle ScholarCross RefCross Ref
  32. Peter John Brown. 1979. Software Portability: An Advanced Course. CUP Archive.Google ScholarGoogle Scholar
  33. Marcel Bruch, Martin Monperrus, and Mira Mezini. 2009. Learning from examples to improve code completion systems. In Proceedings of the Joint Meeting of the European Software Engineering Conference and the Symposium on the Foundations of Software Engineering (ESEC/FSE’09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Raymond P. L. Buse and Westley Weimer. 2012. Synthesizing API usage examples. In Proceedings of the International Conference on Software Engineering (ICSE’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Joshua Charles Campbell, Abram Hindle, and José Nelson Amaral. 2014. Syntax errors just aren’t natural: Improving error reporting with language models. In Proceedings of the Working Conference on Mining Software Repositories (MSR’14).Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Lei Cen, Christoher S. Gates, Luo Si, and Ninghui Li. 2015. A probabilistic discriminative model for Android malware detection with decompiled source code. IEEE Transactions on Dependable and Secure Computing 12, 4 (2015), 400--412.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Luigi Cerulo, Massimiliano Di Penta, Alberto Bacchelli, Michele Ceccarelli, and Gerardo Canfora. 2015. Irish: A hidden markov model to detect coded information islands in free text. Science of Computer Programming 105 (2015), 26--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Kwonsoo Chae, Hakjoo Oh, Kihong Heo, and Hongseok Yang. 2017. Automatically generating features for learning program analysis heuristics for C-like languages. In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages 8 Applications (OOPSLA’17).Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM Computing Surveys (CSUR) 41, 3 (2009), 15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Stanley F. Chen and Joshua Goodman. 1999. An empirical study of smoothing techniques for language modeling. Computer Speech and Language 13, 4 (1999), 359--394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Kyunghyun Cho, Bart van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder--Decoder approaches. In Syntax, Semantics and Structure in Statistical Translation (2014).Google ScholarGoogle ScholarCross RefCross Ref
  42. Edmund Clarke, Daniel Kroening, and Karen Yorav. 2003. Behavioral consistency of C and verilog programs using bounded model checking. In Proceedings of the 40th Annual Design Automation Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Trevor Cohn, Phil Blunsom, and Sharon Goldwater. 2010. Inducing tree-substitution grammars. Journal of Machine Learning Research 11, Nov (2010), 3053--3096. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Christopher S. Corley, Kostadin Damevski, and Nicholas A. Kraft. 2015. Exploring the use of deep learning for feature location. In Proceedings of the 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Patrick Cousot, Radhia Cousot, Jerôme Feret, Laurent Mauborgne, Antoine Miné, David Monniaux, and Xavier Rival. 2005. The ASTRÉE analyzer. In ESPO. Springer.Google ScholarGoogle Scholar
  46. William Croft. 2008. Evolutionary linguistics. Ann. Rev. Anthropol. (2008).Google ScholarGoogle Scholar
  47. Chris Cummins, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2017. End-to-end deep learning of optimization heuristics. In Proceedings of the 26th International Conference on Parallel Computing Technologies (PACT'17). IEEE, 219--232.Google ScholarGoogle ScholarCross RefCross Ref
  48. Chris Cummins, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2017. Synthesizing benchmarks for predictive modeling. In Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization (CGO’17). IEEE, 86--99. Google ScholarGoogle ScholarCross RefCross Ref
  49. Hoa Khanh Dam, Truyen Tran, and Trang Pham. 2016. A deep language model for software code. arXiv Preprint arXiv:1608.02715 (2016).Google ScholarGoogle Scholar
  50. Florian Deißenböck and Markus Pizka. 2006. Concise and consistent naming. Software Quality Journal 14, 3 (2006), 261--282. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Yuntian Deng, Anssi Kanervisto, Jeffrey Ling, and Alexander M. Rush. 2017. Image-to-markup generation with coarse-to-fine attention. In Proceedings of the International Conference on Machine Learning (ICML’17). 980--989.Google ScholarGoogle Scholar
  52. Premkumar Devanbu. 2015. New initiative: The naturalness of software. In Proceedings of the International Conference on Software Engineering (ICSE’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel rahman Mohamed, and Pushmeet Kohli. 2017. Robustfill: Neural program learning under noisy I/O. In Proceedings of the International Conference on Machine Learning (ICML’17).Google ScholarGoogle Scholar
  54. Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, and Tien N. Nguyen. 2013. Boa: A language and infrastructure for analyzing ultra-large-scale software repositories. In Proceedings of the International Conference on Software Engineering (ICSE’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Kevin Ellis, Daniel Ritchie, Armando Solar-Lezama, and Joshua B. Tenenbaum. 2017. Learning to infer graphics programs from hand-drawn images. arXiv Preprint arXiv:1707.09627 (2017).Google ScholarGoogle Scholar
  56. Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou, and Benjamin Chelf. 2001. Bugs as deviant behavior: A general approach to inferring errors in systems code. In ACM SIGOPS Operating Systems Review. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Michael D. Ernst. 2017. Natural language is a programming language: Applying natural language processing to software development. In Proceedings of the LIPIcs-Leibniz International Proceedings in Informatics.Google ScholarGoogle Scholar
  58. Ethan Fast, Daniel Steffee, Lucy Wang, Joel R. Brandt, and Michael S. Bernstein. 2014. Emergent, crowd-scale programming practice in the IDE. In Proceedings of the Annual ACM Conference on Human Factors in Computing Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. John K. Feser, Marc Brockschmidt, Alexander L. Gaunt, and Daniel Tarlow. 2017. Neural functional programming. InProceedings of the International Conference on Learning Representations (ICLR’17).Google ScholarGoogle Scholar
  60. Matthew Finifter, Adrian Mettler, Naveen Sastry, and David Wagner. 2008. Verifiable functional purity in java. In Proceedings of the 15th ACM Conference on Computer and Communications Security. ACM, 161--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Eclipse Foundation. Code Recommenders. Retrieved June 2017 from www.ecli pse.org/recommenders.Google ScholarGoogle Scholar
  62. Jaroslav Fowkes, Pankajan Chanthirasegaran, Razvan Ranca, Miltos Allamanis, Mirella Lapata, and Charles Sutton. 2017. Autofolding for source code summarization. IEEE Transactions on Software Engineering 43, 12 (2017), 1095--1109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Jaroslav Fowkes and Charles Sutton. 2015. Parameter-free probabilistic API mining at GitHub Scale. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Christine Franks, Zhaopeng Tu, Premkumar Devanbu, and Vincent Hellendoorn. 2015. Cacheca: A cache language model based code suggestion tool. In Proceedings of the International Conference on Software Engineering (ICSE’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Wei Fu and Tim Menzies. 2017. Easy over hard: A case study on deep learning. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’17). Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Mark Gabel and Zhendong Su. 2008. Javert: Fully automatic mining of general temporal properties from dynamic traces. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Mark Gabel and Zhendong Su. 2010. A study of the uniqueness of source code. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Rosalva E. Gallardo-Valencia and Susan Elliott Sim. 2009. Internet-scale code search. In Proceedings of the 2009 ICSE Workshop on Search-Driven Development-Users, Infrastructure, Tools and Evaluation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Alexander L. Gaunt, Marc Brockschmidt, Rishabh Singh, Nate Kushman, Pushmeet Kohli, Jonathan Taylor, and Daniel Tarlow. 2016. TerpreT: A probabilistic programming language for program induction. arXiv Preprint arXiv:1608.04428 (2016).Google ScholarGoogle Scholar
  70. Spandana Gella, Mirella Lapata, and Frank Keller. 2016. Unsupervised visual sense disambiguation for verbs using multimodal embeddings. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’16).Google ScholarGoogle ScholarCross RefCross Ref
  71. Elena L. Glassman, Jeremy Scott, Rishabh Singh, Philip J. Guo, and Robert C. Miller. 2015. OverCode: Visualizing variation in student solutions to programming problems at scale. ACM Transactions on Computer-Human Interaction (TOCHI) 22, 2 (2015), 7 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Andrew D. Gordon, Thomas A. Henzinger, Aditya V. Nori, and Sriram K. Rajamani. 2014. Probabilistic programming. In Proceedings of the International Conference on Software Engineering (ICSE’14).Google ScholarGoogle Scholar
  74. Orlena Gotel, Jane Cleland-Huang, Jane Huffman Hayes, Andrea Zisman, Alexander Egyed, Paul Grünbacher, Alex Dekhtyar, Giuliano Antoniol, Jonathan Maletic, and Patrick Mäder. 2012. Traceability fundamentals. In Software and Systems Traceability. Springer, 3--22.Google ScholarGoogle Scholar
  75. Alex Graves, Greg Wayne, and Ivo Danihelka. 2014. Neural Turing machines. arXiv Preprint arXiv:1410.5401 (2014).Google ScholarGoogle Scholar
  76. Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, and Sunghun Kim. 2016. Deep API learning. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Sumit Gulwani and Mark Marron. 2014. NLyze: Interactive programming by natural language for spreadsheet data analysis and manipulation. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Sumit Gulwani, Oleksandr Polozov, Rishabh Singh, and others. 2017. Program synthesis. In Foundations and Trends® in Programming Languages 4, 1--2 (2017), 1--119.Google ScholarGoogle ScholarCross RefCross Ref
  79. Jin Guo, Jinghui Cheng, and Jane Cleland-Huang. 2017. Semantically enhanced software traceability using deep learning techniques. In Proceedings of the International Conference on Software Engineering (ICSE’17). Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Rahul Gupta, Aditya Kanade, and Shirish Shevade. 2018. Deep reinforcement learning for programming language correction. arXiv Preprint arXiv:1801.10467 (2018).Google ScholarGoogle Scholar
  81. Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. DeepFix: Fixing common C language errors by deep learning. In Proceedings of the Conference of Artificial Intelligence (AAAI’17).Google ScholarGoogle Scholar
  82. Tihomir Gvero and Viktor Kuncak. 2015. Synthesizing java expressions from free-form queries. In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages 8 Applications (OOPSLA’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Alon Halevy, Peter Norvig, and Fernando Pereira. 2009. The unreasonable effectiveness of data. IEEE Intelligent Systems 24, 2 (2009), 8--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Vincent J. Hellendoorn and Premkumar Devanbu. 2017. Are deep neural networks the best choice for modeling source code? In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’17). Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Vincent J. Hellendoorn, Premkumar T. Devanbu, and Alberto Bacchelli. 2015. Will they like this?: Evaluating code contributions with language models. In Proceedings of the Working Conference on Mining Software Repositories (MSR’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Felix Hill, KyungHyun Cho, Anna Korhonen, and Yoshua Bengio. 2016. Learning to understand phrases by embedding the dictionary. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’16).Google ScholarGoogle ScholarCross RefCross Ref
  87. Abram Hindle, Earl T. Barr, Mark Gabel, Zhendong Su, and Premkumar Devanbu. 2016. On the naturalness of software. Communications of the ACM 59, 5 (2016), 122--131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the naturalness of software. In Proceedings of the International Conference on Software Engineering (ICSE’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. G. E. Hinton, J. L. McClelland, and D. E. Rumelhart. 1986. Distributed representations. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1. MIT Press, 77--109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. C. A. R. Hoare. 1969. An axiomatic basis for computer programming. Commun. ACM 12, 10 (Oct. 1969), 576--580. Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Reid Holmes, Robert J. Walker, and Gail C. Murphy. 2005. Strathcona example recommendation tool. In ACM SIGSOFT Software Engineering Notes 30, 5 (2005), 237--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Chun-Hung Hsiao, Michael Cafarella, and Satish Narayanasamy. 2014. Using web corpus statistics for program analysis. In ACM SIGPLAN Notices 49, 10 (2014), 49--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Xing Hu, Yuhan Wei, Ge Li, and Zhi Jin. 2017. CodeSum: Translate program language to natural language. arXiv Preprint arXiv:1708.01837 (2017).Google ScholarGoogle Scholar
  95. Andrew Hunt and David Thomas. 2000. The Pragmatic Programmer: From Journeyman to Master. Addison-Wesley Professional. Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’16).Google ScholarGoogle ScholarCross RefCross Ref
  97. Siyuan Jiang, Ameer Armaly, and Collin McMillan. 2017. Automatically generating commit messages from diffs using neural machine translation. In Proceedings of the International Conference on Automated Software Engineering (ASE’17). Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. Daniel D. Johnson. 2016. Learning graphical state transitions. In Proceedings of the International Conference on Learning Representations (ICLR’16).Google ScholarGoogle Scholar
  99. Dan Jurafsky. 2000. Speech 8 Language Processing (3rd. ed.). Pearson Education.Google ScholarGoogle Scholar
  100. René Just, Darioush Jalali, and Michael D. Ernst. 2014. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Neel Kant. 2018. Recent advances in neural program synthesis. arXiv Preprint arXiv:1802.02353 (2018).Google ScholarGoogle Scholar
  102. Svetoslav Karaivanov, Veselin Raychev, and Martin Vechev. 2014. Phrase-based statistical translation of programming languages. In Proceedings of the 2014 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming 8 Software. ACM, 173--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. Andrej Karpathy, Justin Johnson, and Fei-Fei Li. 2015. Visualizing and understanding recurrent networks. arXiv Preprint arXiv:1506.02078 (2015).Google ScholarGoogle Scholar
  104. Reinhard Kneser and Hermann Ney. 1995. Improved backing-off for m-gram language modeling. In Procdedings of the 1995 International Conference on Acoustics, Speech, and Signal Processing (ICASSP’95). 1 (1995), 181--184.Google ScholarGoogle ScholarCross RefCross Ref
  105. Donald Ervin Knuth. 1984. Literate programming. The Computer Journal 27, 2 (1984), 97--111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. Ugur Koc, Parsa Saadatpanah, Jeffrey S. Foster, and Adam A. Porter. 2017. Learning a classifier for false positive error reports emitted by static code analysis tools. In Proceedings of the 1st ACM SIGPLAN International Workshop on Machine Learning and Programming Languages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. Rainer Koschke. 2007. Survey of research on software clones. In Dagstuhl Seminar Proceedings. Schloss Dagstuhl-Leibniz-Zentrum für Informatik.Google ScholarGoogle Scholar
  108. Ted Kremenek, Andrew Y. Ng, and Dawson R. Engler. 2007. A factor graph model for software bug finding. In Proceedings of the International Joint Conference on Artifical intelligence (IJCAI’07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. Roland Kuhn and Renato De Mori. 1990. A cache-based natural language model for speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 6 (1990), 570--583. Google ScholarGoogle ScholarDigital LibraryDigital Library
  110. Nate Kushman and Regina Barzilay. 2013. Using semantic unification to generate regular expressions from natural language. In Proceedings of Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’13).Google ScholarGoogle Scholar
  111. Tessa Lau. 2001. Programming by Demonstration: A Machine Learning Approach. Ph.D. Dissertation. University of Washington. Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. Quoc V. Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning (ICML’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. Tien-Duy B. Le, Mario Linares-Vásquez, David Lo, and Denys Poshyvanyk. 2015. Rclinker: Automated linking of issue reports and commits leveraging rich contextual information. In Proceedings of the International Conference on Program Comprehension (ICPC’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. Dor Levy and Lior Wolf. 2017. Learning to align the source code to the compiled object code. In Proceedings of the International Conference on Machine Learning (ICML’17).Google ScholarGoogle Scholar
  115. Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2016. Gated graph sequence neural networks. In Proceedings of the International Conference on Learning Representations (ICLR’16).Google ScholarGoogle Scholar
  116. Percy Liang, Michael I. Jordan, and Dan Klein. 2010. Learning programs: A hierarchical bayesian approach. In Proceedings of the International Conference on Machine Learning (ICML’10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  117. Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan. 2005. Scalable statistical bug isolation. In ACM SIGPLAN Notices 40, 6 (2005), 15--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  118. Xi Victoria Lin, Chenglong Wang, Deric Pang, Kevin Vu, Luke Zettlemoyer, and Michael D. Ernst. 2017. Program Synthesis from Natural Language using Recurrent Neural Networks. Technical Report UW-CSE-17-03-01. University of Washington Department of Computer Science and Engineering, Seattle, WA.Google ScholarGoogle Scholar
  119. Xi Victoria Lin, Chenglong Wang, Luke Zettlemoyer, and Michael D. Ernst. 2018. NL2Bash: A corpus and semantic parser for natural language interface to the linux operating system. In Proceedings of the International Conference on Language Resources and Evaluation.Google ScholarGoogle Scholar
  120. Wang Ling, Edward Grefenstette, Karl Moritz Hermann, Tomas Kocisky, Andrew Senior, Fumin Wang, and Phil Blunsom. 2016. Latent predictor networks for code generation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’16).Google ScholarGoogle ScholarCross RefCross Ref
  121. Han Liu. 2016. Towards better program obfuscation: Optimization via language models. In Proceedings of the 38th International Conference on Software Engineering Companion. Google ScholarGoogle ScholarDigital LibraryDigital Library
  122. Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani, and Anindya Banerjee. 2009. Merlin: Specification inference for explicit information flow problems. In Proceedings of the Symposium on Programming Language Design and Implementation (PLDI’09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  123. Sarah M. Loos, Geoffrey Irving, Christian Szegedy, and Cezary Kaliszyk. 2017. Deep network guided proof search. In Proceedings of the International Conference on Logic for Programming Artificial Intelligence and Reasoning (LPAR’17).Google ScholarGoogle Scholar
  124. Pablo Loyola, Edison Marrese-Taylor, and Yutaka Matsuo. 2017. A neural architecture for generating natural language descriptions from source code changes. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2 (2017), 287--292.Google ScholarGoogle ScholarCross RefCross Ref
  125. Yanxin Lu, Swarat Chaudhuri, Chris Jermaine, and David Melski. 2017. Data-Driven program completion. arXiv Preprint arXiv:1705.09042 (2017).Google ScholarGoogle Scholar
  126. Chris Maddison and Daniel Tarlow. 2014. Structured generative models of natural source code. In Proceedings of the International Conference on Machine Learning (ICML’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  127. Ravi Mangal, Xin Zhang, Aditya V. Nori, and Mayur Naik. 2015. A user-guided approach to program analysis. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  128. Collin Mcmillan, Denys Poshyvanyk, Mark Grechanik, Qing Xie, and Chen Fu. 2013. Portfolio: Searching for relevant functions and their usages in millions of lines of code. ACM Transactions on Software Engineering and Methodology (TOSEM) 22, 4 (2013), 37 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  129. Aditya Menon, Omer Tamuz, Sumit Gulwani, Butler Lampson, and Adam Kalai. 2013. A machine learning framework for programming by example. In Proceedings of the International Conference on Machine Learning (ICML’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  130. Kim Mens and Angela Lozano. 2014. Source code-based recommendation systems. In Recommendation Systems in Software Engineering. Springer, 93--130.Google ScholarGoogle Scholar
  131. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv Preprint arXiv:1301.3781 (2013).Google ScholarGoogle Scholar
  132. Lili Mou, Ge Li, Lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional neural networks over tree structures for programming language processing. In Proceedings of the Conference of Artificial Intelligence (AAAI’16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  133. Dana Movshovitz-Attias and William W. Cohen. 2013. Natural language models for predicting programming comments. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’13).Google ScholarGoogle Scholar
  134. Dana Movshovitz-Attias and William W. Cohen. 2015. KB-LDA: Jointly learning a knowledge base of hierarchy, relations, and facts. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’15).Google ScholarGoogle Scholar
  135. Vijayaraghavan Murali, Letao Qi, Swarat Chaudhuri, and Chris Jermaine. 2018. Neural sketch learning for conditional program generation. In Proceedings of the International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  136. Vijayaraghavan Murali, Swarat Chaudhuri, and Chris Jermaine. 2017. Bayesian specification learning for finding API usage errors. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, 151--162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  137. Arvind Neelakantan, Quoc V. Le, and Ilya Sutskever. 2015. Neural programmer: Inducing latent programs with gradient descent. In Proceedings of the International Conference on Learning Representations (ICLR’15).Google ScholarGoogle Scholar
  138. Graham Neubig. 2016. Survey of methods to generate natural language from source code. Retrieved from http://www.languageandcode.org/nlse2015/neubig15nlse-survey.pdf.Google ScholarGoogle Scholar
  139. Anh Tuan Nguyen and Tien N. Nguyen. 2015. Graph-based statistical language model for code. In Proceedings of the International Conference on Software Engineering (ICSE’15).Google ScholarGoogle Scholar
  140. Anh Tuan Nguyen, Tung Thanh Nguyen, and Tien N. Nguyen. 2013. Lexical statistical machine translation for language migration. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’13).Google ScholarGoogle Scholar
  141. Anh T. Nguyen, Tung Thanh Nguyen, and Tien N. Nguyen. 2015. Divide-and-conquer approach for multi-phase statistical migration for source code. In Proceedings of the International Conference on Automated Software Engineering (ASE’15).Google ScholarGoogle Scholar
  142. Trong Duc Nguyen, Anh Tuan Nguyen, and Tien N. Nguyen. 2016. Mapping API elements for code migration with vector representations. In Proceedings of the International Conference on Software Engineering (ICSE’16).Google ScholarGoogle Scholar
  143. Trong Duc Nguyen, Anh Tuan Nguyen, Hung Dang Phan, and Tien N. Nguyen. 2017. Exploring API embedding for API usages and applications. In Proceedings of the International Conference on Software Engineering (ICSE’17).Google ScholarGoogle Scholar
  144. Tung Thanh Nguyen, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen. 2013. A statistical semantic language model for source code. In Proceedings of the Joint Meeting of the European Software Engineering Conference and the Symposium on the Foundations of Software Engineering (ESEC/FSE’13).Google ScholarGoogle Scholar
  145. Haoran Niu, Iman Keivanloo, and Ying Zou. 2017. Learning to rank code examples for code search engines. Empirical Software Engineering (ESEM’16) 22, 1 (2017), 259--291. Google ScholarGoogle ScholarDigital LibraryDigital Library
  146. Yusuke Oda, Hiroyuki Fudaba, Graham Neubig, Hideaki Hata, Sakriani Sakti, Tomoki Toda, and Satoshi Nakamura. 2015. Learning to generate pseudo-code from source code using statistical machine translation. In Proceedings of the International Conference on Automated Software Engineering (ASE’15).Google ScholarGoogle ScholarDigital LibraryDigital Library
  147. Hakjoo Oh, Hongseok Yang, and Kwangkeun Yi. 2015. Learning a strategy for adapting a program analysis via bayesian optimisation. In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages 8 Applications (OOPSLA’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  148. Cyrus Omar. 2013. Structured statistical syntax tree prediction. In Proceedings of the Conference on Systems, Programming, Languages and Applications: Software for Humanity (SPLASH’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  149. Cyrus Omar, Ian Voysey, Michael Hilton, Joshua Sunshine, Claire Le Goues, Jonathan Aldrich, and Matthew A. Hammer. 2017. Toward semantic foundations for program editors. arXiv preprint arXiv:1703.08694.Google ScholarGoogle Scholar
  150. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’02). Google ScholarGoogle ScholarDigital LibraryDigital Library
  151. Emilio Parisotto, Abdel-rahman Mohamed, Rishabh Singh, Lihong Li, Dengyong Zhou, and Pushmeet Kohli. 2017. Neuro-symbolic program synthesis. In Proceedings of the International Conference on Learning Representations (ICLR’17).Google ScholarGoogle Scholar
  152. Terence Parr and Jurgen J. Vinju. 2016. Towards a universal code formatter through machine learning. In Proceedings of the International Conference on Software Language Engineering (SLE’16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  153. Jibesh Patra and Michael Pradel. 2016. Learning to Fuzz: Application-Independent Fuzz Testing with Probabilistic, Generative Models of Input Data. TU Darmstadt, Department of Computer Science, TUD-CS-2016-14664.Google ScholarGoogle Scholar
  154. Hung Viet Pham, Phong Minh Vu, Tung Thanh Nguyen, and others. 2016. Learning API usages from bytecode: A statistical approach. In Proceedings of the International Conference on Software Engineering (ICSE’16).Google ScholarGoogle Scholar
  155. Chris Piech, Jonathan Huang, Andy Nguyen, Mike Phulsuksombati, Mehran Sahami, and Leonidas J. Guibas. 2015. Learning program embeddings to propagate feedback on student code. In Proceedings of the International Conference on Machine Learning (ICML’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  156. Matt Post and Daniel Gildea. 2009. Bayesian learning of a tree substitution grammar. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  157. Michael Pradel and Koushik Sen. 2017. Deep learning to find bugs. TU Darmstadt, Department of Computer Science.Google ScholarGoogle Scholar
  158. Sebastian Proksch, Sven Amann, Sarah Nadi, and Mira Mezini. 2016. Evaluating the evaluations of code recommender systems: A reality check. In Proceedings of the International Conference on Automated Software Engineering (ASE’16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  159. Sebastian Proksch, Johannes Lerch, and Mira Mezini. 2015. Intelligent code completion with bayesian networks. ACM Transactions on Software Engineering and Methodology (TOSEM) 25, 1 (2015), 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  160. Yewen Pu, Karthik Narasimhan, Armando Solar-Lezama, and Regina Barzilay. 2016. sk_p: A neural program corrector for MOOCs. In Proceedings of the Conference on Systems, Programming, Languages and Applications: Software for Humanity (SPLASH’16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  161. Chris Quirk, Raymond Mooney, and Michel Galley. 2015. Language to code: Learning semantic parsers for if-this-then-that recipes. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’15).Google ScholarGoogle ScholarCross RefCross Ref
  162. Maxim Rabinovich, Mitchell Stern, and Dan Klein. 2017. Abstract syntax networks for code generation and semantic parsing. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’17).Google ScholarGoogle ScholarCross RefCross Ref
  163. Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli, and Premkumar Devanbu. 2016. On the naturalness of buggy code. In Proceedings of the International Conference on Software Engineering (ICSE’16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  164. Veselin Raychev, Pavol Bielik, Martin Vechev, and Andreas Krause. 2016. Learning programs from noisy data. In Proceedings of the Symposium on Principles of Programming Languages (POPL’16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  165. Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting program properties from “big code.” In Proceedings of the Symposium on Principles of Programming Languages (POPL’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  166. Veselin Raychev, Martin Vechev, and Eran Yahav. 2014. Code completion with statistical language models. In Proceedings of the Symposium on Programming Language Design and Implementation (PLDI’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  167. Scott Reed and Nando de Freitas. 2016. Neural programmer-interpreters. In Proceedings of the International Conference on Learning Representations (ICLR’16).Google ScholarGoogle Scholar
  168. Sebastian Riedel, Matko Bosnjak, and Tim Rocktäschel. 2017. Programming with a differentiable forth interpreter. In Proceedings of the International Conference on Machine Learning (ICML’17).Google ScholarGoogle Scholar
  169. Martin Robillard, Robert Walker, and Thomas Zimmermann. 2010. Recommendation systems for software engineering. IEEE Software 27, 4 (2010), 80--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  170. Martin P. Robillard, Walid Maalej, Robert J. Walker, and Thomas Zimmermann. 2014. Recommendation Systems in Software Engineering. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  171. Tim Rocktäschel and Sebastian Riedel. 2017. End-to-end differentiable proving. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’17).Google ScholarGoogle Scholar
  172. Caitlin Sadowski, Kathryn T. Stolee, and Sebastian Elbaum. 2015. How developers search for code: A case study. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  173. Juliana Saraiva, Christian Bird, and Thomas Zimmermann. 2015. Products, developers, and milestones: How should I build my N-gram language model. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  174. Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’16).Google ScholarGoogle ScholarCross RefCross Ref
  175. Abhishek Sharma, Yuan Tian, and David Lo. 2015. NIRMAL: Automatic identification of software relevant tweets leveraging language model. In Proceedings of the International Conference on Software Analysis, Evolution, and Reengineering (SANER’15).Google ScholarGoogle ScholarCross RefCross Ref
  176. Rishabh Singh and Sumit Gulwani. 2015. Predicting a correct program in programming by example. In Proceedings of the International Conference on Computer Aided Verification.Google ScholarGoogle ScholarCross RefCross Ref
  177. Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  178. Suresh Thummalapenta and Tao Xie. 2007. Parseweb: A programmer assistant for reusing open source code on the web. In Proceedings of the International Conference on Automated Software Engineering (ASE’07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  179. Christoph Treude and Martin P. Robillard. 2016. Augmenting API documentation with insights from stack overflow. In Proceedings of the International Conference on Software Engineering (ICSE’16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  180. Zhaopeng Tu, Zhendong Su, and Premkumar Devanbu. 2014. On the localness of software. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  181. Bogdan Vasilescu, Casey Casalnuovo, and Premkumar Devanbu. 2017. Recovering clear, natural identifiers from obfuscated JS names. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’17). Google ScholarGoogle ScholarDigital LibraryDigital Library
  182. Lisa Wang, Angela Sy, Larry Liu, and Chris Piech. 2017. Deep knowledge tracing on programming exercises. In Proceedings of the Conference on Learning @ Scale. Google ScholarGoogle ScholarDigital LibraryDigital Library
  183. Song Wang, Devin Chollak, Dana Movshovitz-Attias, and Lin Tan. 2016. Bugram: Bug detection with n-gram language models. In Proceedings of the International Conference on Automated Software Engineering (ASE’16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  184. Song Wang, Taiyue Liu, and Lin Tan. 2016. Automatically learning semantic features for defect prediction. In Proceedings of the International Conference on Software Engineering (ICSE’16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  185. Xin Wang, Chang Liu, Richard Shin, Joseph E. Gonzalez, and Dawn Song. 2016. Neural Code Completion. Retrieved from https://openreview.net/pdf?id=rJbPBt9lg.Google ScholarGoogle Scholar
  186. Andrzej Wasylkowski, Andreas Zeller, and Christian Lindig. 2007. Detecting object usage anomalies. In Proceedings of the Joint Meeting of the European Software Engineering Conference and the Symposium on the Foundations of Software Engineering (ESEC/FSE’07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  187. Martin White, Michele Tufano, Christopher Vendome, and Denys Poshyvanyk. 2016. Deep learning code fragments for code clone detection. In Proceedings of the International Conference on Automated Software Engineering (ASE’16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  188. Martin White, Christopher Vendome, Mario Linares-Vásquez, and Denys Poshyvanyk. 2015. Toward deep learning software repositories. In Proceedings of the Working Conference on Mining Software Repositories (MSR’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  189. Chadd C. Williams and Jeffrey K. Hollingsworth. 2005. Automatic mining of source code repositories to improve bug finding techniques. IEEE Transactions on Software Engineering 31, 6 (2005), 466--480. Google ScholarGoogle ScholarDigital LibraryDigital Library
  190. Ian H. Witten, Eibe Frank, Mark A. Hall, and Christopher J. Pal. 2016. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  191. W. Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa. 2016. A survey on software fault localization. IEEE Transactions on Software Engineering 42, 8 (2016), 707--740. Google ScholarGoogle ScholarDigital LibraryDigital Library
  192. Tao Xie and Jian Pei. 2006. MAPO: Mining API usages from open source repositories. In Proceedings of the Working Conference on Mining Software Repositories (MSR’06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  193. Chang Xu, Dacheng Tao, and Chao Xu. 2013. A survey on multi-view learning. arXiv Preprint arXiv:1304.5634 (2013).Google ScholarGoogle Scholar
  194. Shir Yadid and Eran Yahav. 2016. Extracting code from programming tutorial videos. In Proceedings of the 2016 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software. Google ScholarGoogle ScholarDigital LibraryDigital Library
  195. Eran Yahav. 2015. Programming with “big code.” In Asian Symposium on Programming Languages and Systems. Springer, 3--8.Google ScholarGoogle ScholarCross RefCross Ref
  196. Pengcheng Yin and Graham Neubig. 2017. A syntactic neural model for general-purpose code generation. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’17).Google ScholarGoogle ScholarCross RefCross Ref
  197. Wojciech Zaremba and Ilya Sutskever. 2014. Learning to execute. arXiv Preprint arXiv:1410.4615 (2014).Google ScholarGoogle Scholar
  198. Alice X. Zheng, Michael I. Jordan, Ben Liblit, and Alex Aiken. 2003. Statistical debugging of sampled programs. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’03). Google ScholarGoogle ScholarDigital LibraryDigital Library
  199. Alice X. Zheng, Michael I. Jordan, Ben Liblit, Mayur Naik, and Alex Aiken. 2006. Statistical debugging: Simultaneous identification of multiple bugs. In Proceedings of the International Conference on Machine Learning (ICML’06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  200. Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2SQL: Generating structured queries from natural language using reinforcement learning. arXiv Preprint arXiv:1709.00103 (2017).Google ScholarGoogle Scholar
  201. Thomas Zimmermann, Andreas Zeller, Peter Weissgerber, and Stephan Diehl. 2005. Mining version histories to guide software changes. IEEE Transactions on Software Engineering 31, 6 (2005), 429--445. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Survey of Machine Learning for Big Code and Naturalness

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Computing Surveys
            ACM Computing Surveys  Volume 51, Issue 4
            July 2019
            765 pages
            ISSN:0360-0300
            EISSN:1557-7341
            DOI:10.1145/3236632
            • Editor:
            • Sartaj Sahni
            Issue’s Table of Contents

            Copyright © 2018 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 31 July 2018
            • Accepted: 1 April 2018
            • Revised: 1 March 2018
            • Received: 1 September 2017
            Published in csur Volume 51, Issue 4

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • survey
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader