survey

Public Access

A Survey of Machine Learning for Big Code and Naturalness

Authors:
Miltiadis Allamanis

Microsoft Research, Cambridge, United Kingdom

Microsoft Research, Cambridge, United Kingdom

0000-0002-5819-9900
View Profile

,
Earl T. Barr

University College London, Gower Street, United Kingdom

University College London, Gower Street, United Kingdom
View Profile

,
Premkumar Devanbu

University of California, Davis, California, USA

University of California, Davis, California, USA
View Profile

,
Charles Sutton

University of Edinburgh and The Alan Turing Institute, Edinburgh, United Kingdom

University of Edinburgh and The Alan Turing Institute, Edinburgh, United Kingdom
View Profile

Authors Info & Claims

ACM Computing Surveys Volume 51 Issue 4Article No.: 81pp 1–37https://doi.org/10.1145/3212695

Published:31 July 2018Publication History

ACM Computing Surveys

Abstract

Research at the intersection of machine learning, programming languages, and software engineering has recently taken important steps in proposing learnable probabilistic models of source code that exploit the abundance of patterns of code. In this article, we survey this work. We contrast programming languages against natural languages and discuss how these similarities and differences drive the design of probabilistic models. We present a taxonomy based on the underlying design principles of each model and use it to navigate the literature. Then, we review how researchers have adapted these models to application areas and discuss cross-cutting and application-specific challenges and opportunities.

Supplemental Material

Available for Download

zip

allamanis.zip (28.2 KB)

Supplemental movie, appendix, image and software files for, A Survey of Machine Learning for Big Code and Naturalness

References

Mithun Acharya, Tao Xie, Jian Pei, and Jun Xu. 2007. Mining API patterns as partial orders from source code: From usage scenarios to specifications. In Proceedings of the Joint Meeting of the European Software Engineering Conference and the Symposium on the Foundations of Software Engineering (ESEC/FSE’07). Google ScholarDigital Library
Karan Aggarwal, Mohammad Salameh, and Abram Hindle. 2015. Using Machine Translation for Converting Python 2 to Python 3 Code. Technical Report.Google Scholar
Alex A. Alemi, Francois Chollet, Geoffrey Irving, Christian Szegedy, and Josef Urban. 2016. DeepMath--Deep sequence models for premise selection. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’16). Google ScholarDigital Library
Miltiadis Allamanis, Earl T. Barr, Christian Bird, Premkumar Devanbu, Mark Marron, and Charles Sutton. 2016. Mining Semantic Loop Idioms from Big Code. Technical Report. Retrieved from https://www.microsoft.com/en-us/research/publication/mining-semantic-loop-idioms-big-code/.Google Scholar
Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton. 2014. Learning natural coding conventions. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’14). Google ScholarDigital Library
Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton. 2015. Suggesting accurate method and class names. In Proceedings of the Joint Meeting of the European Software Engineering Conference and the Symposium on the Foundations of Software Engineering (ESEC/FSE’15). Google ScholarDigital Library
Miltiadis Allamanis and Marc Brockschmidt. 2017. SmartPaste: Learning to adapt source code. arXiv Preprint arXiv:1705.07867 (2017).Google Scholar
Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to represent programs with graphs. In Proceedings of the International Conference on Learning Representations (ICLR’18).Google Scholar
Miltiadis Allamanis, Pankajan Chanthirasegaran, Pushmeet Kohli, and Charles Sutton. 2017. Learning continuous semantic representations of symbolic expressions. In Proceedings of the International Conference on Machine Learning (ICML’17).Google Scholar
Miltiadis Allamanis, Hao Peng, and Charles Sutton. 2016. A convolutional attention network for extreme summarization of source code. In Proceedings of the International Conference on Machine Learning (ICML’16).Google Scholar
Miltiadis Allamanis and Charles Sutton. 2013. Mining source code repositories at massive scale using language modeling. In Proceedings of the Working Conference on Mining Software Repositories (MSR’13). Google ScholarDigital Library
Miltiadis Allamanis and Charles Sutton. 2014. Mining idioms from source code. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’14). Google ScholarDigital Library
Miltiadis Allamanis, Daniel Tarlow, Andrew Gordon, and Yi Wei. 2015. Bimodal modelling of source code and natural language. In Proceedings of the International Conference on Machine Learning (ICML’15). Google ScholarDigital Library
Sven Amann, Sebastian Proksch, Sarah Nadi, and Mira Mezini. 2016. A study of visual studio usage in practice. In Proceedings of the International Conference on Software Analysis, Evolution, and Reengineering (SANER’16).Google ScholarCross Ref
Gene M. Amdahl. 1967. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the Spring Joint Computer Conference. Google ScholarDigital Library
Matthew Amodio, Swarat Chaudhuri, and Thomas Reps. 2017. Neural attribute machines for program generation. arXiv Preprint arXiv:1705.09231 (2017).Google Scholar
Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. 2016. Learning to compose neural networks for question answering. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’16).Google ScholarCross Ref
Daniel Arp, Michael Spreitzenbarth, Malte Hubner, Hugo Gascon, and Konrad Rieck. 2014. DREBIN: Effective and explainable detection of android malware in your pocket. In Proceedings of the Network and Distributed System Security Symposium.Google ScholarCross Ref
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the International Conference on Learning Representations (ICLR’15).Google Scholar
Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow. 2017. DeepCoder: Learning to write programs. In Proceedings of the International Conference on Learning Representations (ICLR’17).Google Scholar
Antonio Valerio Miceli Barone and Rico Sennrich. 2017. A parallel corpus of Python functions and documentation strings for automated code documentation and code generation. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers) 2 (2017), 314--319.Google Scholar
Rohan Bavishi, Michael Pradel, and Koushik Sen. 2017. Context2Name: A deep learning-based approach to infer natural variable names from usage contexts. TU Darmstadt, Department of Computer Science.Google Scholar
Tony Beltramelli. 2018. pix2code: Generating code from a graphical user interface screenshot. In Proceedings of the ACM SIGCHI Symposium on Engineering Interactive Computing Systems. ACM, 3 pages. Google ScholarDigital Library
Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, and Dawson Engler. 2010. A few billion lines of code later: Using static analysis to find bugs in the real world. Communications of the ACM 53, 2 (2010), 66--75. Google ScholarDigital Library
Sahil Bhatia and Rishabh Singh. 2018. Automated correction for syntax errors in programming assignments using recurrent neural networks. In Proceedings of the International Conference on Software Engineering (ICSE’18).Google Scholar
Avishkar Bhoopchand, Tim Rocktäschel, Earl Barr, and Sebastian Riedel. 2016. Learning Python code suggestion with a sparse pointer network. arXiv Preprint arXiv:1611.08307 (2016).Google Scholar
Benjamin Bichsel, Veselin Raychev, Petar Tsankov, and Martin Vechev. 2016. Statistical deobfuscation of android applications. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. Google ScholarDigital Library
Pavol Bielik, Veselin Raychev, and Martin Vechev. 2015. Programming with “big code”: Lessons, techniques and applications. In Proceedings of the LIPIcs-Leibniz International Proceedings in Informatics.Google Scholar
Pavol Bielik, Veselin Raychev, and Martin Vechev. 2016. PHOG: Probabilistic model for code. In Proceedings of the International Conference on Machine Learning (ICML’16). Google ScholarDigital Library
David M. Blei. 2012. Probabilistic topic models. Communications of the ACM 55, 4 (2012), 77--84. Google ScholarDigital Library
Marc Brockschmidt, Yuxin Chen, Pushmeet Kohli, Siddharth Krishna, and Daniel Tarlow. 2017. Learning shape analysis. In Proceedings of the International Static Analysis Symposium. Springer.Google ScholarCross Ref
Peter John Brown. 1979. Software Portability: An Advanced Course. CUP Archive.Google Scholar
Marcel Bruch, Martin Monperrus, and Mira Mezini. 2009. Learning from examples to improve code completion systems. In Proceedings of the Joint Meeting of the European Software Engineering Conference and the Symposium on the Foundations of Software Engineering (ESEC/FSE’09). Google ScholarDigital Library
Raymond P. L. Buse and Westley Weimer. 2012. Synthesizing API usage examples. In Proceedings of the International Conference on Software Engineering (ICSE’12). Google ScholarDigital Library
Joshua Charles Campbell, Abram Hindle, and José Nelson Amaral. 2014. Syntax errors just aren’t natural: Improving error reporting with language models. In Proceedings of the Working Conference on Mining Software Repositories (MSR’14).Google ScholarDigital Library
Lei Cen, Christoher S. Gates, Luo Si, and Ninghui Li. 2015. A probabilistic discriminative model for Android malware detection with decompiled source code. IEEE Transactions on Dependable and Secure Computing 12, 4 (2015), 400--412.Google ScholarDigital Library
Luigi Cerulo, Massimiliano Di Penta, Alberto Bacchelli, Michele Ceccarelli, and Gerardo Canfora. 2015. Irish: A hidden markov model to detect coded information islands in free text. Science of Computer Programming 105 (2015), 26--43. Google ScholarDigital Library
Kwonsoo Chae, Hakjoo Oh, Kihong Heo, and Hongseok Yang. 2017. Automatically generating features for learning program analysis heuristics for C-like languages. In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages 8 Applications (OOPSLA’17).Google ScholarDigital Library
Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM Computing Surveys (CSUR) 41, 3 (2009), 15. Google ScholarDigital Library
Stanley F. Chen and Joshua Goodman. 1999. An empirical study of smoothing techniques for language modeling. Computer Speech and Language 13, 4 (1999), 359--394. Google ScholarDigital Library
Kyunghyun Cho, Bart van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder--Decoder approaches. In Syntax, Semantics and Structure in Statistical Translation (2014).Google ScholarCross Ref
Edmund Clarke, Daniel Kroening, and Karen Yorav. 2003. Behavioral consistency of C and verilog programs using bounded model checking. In Proceedings of the 40th Annual Design Automation Conference. Google ScholarDigital Library
Trevor Cohn, Phil Blunsom, and Sharon Goldwater. 2010. Inducing tree-substitution grammars. Journal of Machine Learning Research 11, Nov (2010), 3053--3096. Google ScholarDigital Library
Christopher S. Corley, Kostadin Damevski, and Nicholas A. Kraft. 2015. Exploring the use of deep learning for feature location. In Proceedings of the 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME’15). Google ScholarDigital Library
Patrick Cousot, Radhia Cousot, Jerôme Feret, Laurent Mauborgne, Antoine Miné, David Monniaux, and Xavier Rival. 2005. The ASTRÉE analyzer. In ESPO. Springer.Google Scholar
William Croft. 2008. Evolutionary linguistics. Ann. Rev. Anthropol. (2008).Google Scholar
Chris Cummins, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2017. End-to-end deep learning of optimization heuristics. In Proceedings of the 26th International Conference on Parallel Computing Technologies (PACT'17). IEEE, 219--232.Google ScholarCross Ref
Chris Cummins, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2017. Synthesizing benchmarks for predictive modeling. In Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization (CGO’17). IEEE, 86--99. Google ScholarCross Ref
Hoa Khanh Dam, Truyen Tran, and Trang Pham. 2016. A deep language model for software code. arXiv Preprint arXiv:1608.02715 (2016).Google Scholar
Florian Deißenböck and Markus Pizka. 2006. Concise and consistent naming. Software Quality Journal 14, 3 (2006), 261--282. Google ScholarDigital Library
Yuntian Deng, Anssi Kanervisto, Jeffrey Ling, and Alexander M. Rush. 2017. Image-to-markup generation with coarse-to-fine attention. In Proceedings of the International Conference on Machine Learning (ICML’17). 980--989.Google Scholar
Premkumar Devanbu. 2015. New initiative: The naturalness of software. In Proceedings of the International Conference on Software Engineering (ICSE’15). Google ScholarDigital Library
Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel rahman Mohamed, and Pushmeet Kohli. 2017. Robustfill: Neural program learning under noisy I/O. In Proceedings of the International Conference on Machine Learning (ICML’17).Google Scholar
Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, and Tien N. Nguyen. 2013. Boa: A language and infrastructure for analyzing ultra-large-scale software repositories. In Proceedings of the International Conference on Software Engineering (ICSE’13). Google ScholarDigital Library
Kevin Ellis, Daniel Ritchie, Armando Solar-Lezama, and Joshua B. Tenenbaum. 2017. Learning to infer graphics programs from hand-drawn images. arXiv Preprint arXiv:1707.09627 (2017).Google Scholar
Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou, and Benjamin Chelf. 2001. Bugs as deviant behavior: A general approach to inferring errors in systems code. In ACM SIGOPS Operating Systems Review. Google ScholarDigital Library
Michael D. Ernst. 2017. Natural language is a programming language: Applying natural language processing to software development. In Proceedings of the LIPIcs-Leibniz International Proceedings in Informatics.Google Scholar
Ethan Fast, Daniel Steffee, Lucy Wang, Joel R. Brandt, and Michael S. Bernstein. 2014. Emergent, crowd-scale programming practice in the IDE. In Proceedings of the Annual ACM Conference on Human Factors in Computing Systems. Google ScholarDigital Library
John K. Feser, Marc Brockschmidt, Alexander L. Gaunt, and Daniel Tarlow. 2017. Neural functional programming. InProceedings of the International Conference on Learning Representations (ICLR’17).Google Scholar
Matthew Finifter, Adrian Mettler, Naveen Sastry, and David Wagner. 2008. Verifiable functional purity in java. In Proceedings of the 15th ACM Conference on Computer and Communications Security. ACM, 161--174. Google ScholarDigital Library
Eclipse Foundation. Code Recommenders. Retrieved June 2017 from www.ecli pse.org/recommenders.Google Scholar
Jaroslav Fowkes, Pankajan Chanthirasegaran, Razvan Ranca, Miltos Allamanis, Mirella Lapata, and Charles Sutton. 2017. Autofolding for source code summarization. IEEE Transactions on Software Engineering 43, 12 (2017), 1095--1109. Google ScholarDigital Library
Jaroslav Fowkes and Charles Sutton. 2015. Parameter-free probabilistic API mining at GitHub Scale. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’15). Google ScholarDigital Library
Christine Franks, Zhaopeng Tu, Premkumar Devanbu, and Vincent Hellendoorn. 2015. Cacheca: A cache language model based code suggestion tool. In Proceedings of the International Conference on Software Engineering (ICSE’15). Google ScholarDigital Library
Wei Fu and Tim Menzies. 2017. Easy over hard: A case study on deep learning. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’17). Google ScholarDigital Library
Mark Gabel and Zhendong Su. 2008. Javert: Fully automatic mining of general temporal properties from dynamic traces. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’08). Google ScholarDigital Library
Mark Gabel and Zhendong Su. 2010. A study of the uniqueness of source code. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’10). Google ScholarDigital Library
Rosalva E. Gallardo-Valencia and Susan Elliott Sim. 2009. Internet-scale code search. In Proceedings of the 2009 ICSE Workshop on Search-Driven Development-Users, Infrastructure, Tools and Evaluation. Google ScholarDigital Library
Alexander L. Gaunt, Marc Brockschmidt, Rishabh Singh, Nate Kushman, Pushmeet Kohli, Jonathan Taylor, and Daniel Tarlow. 2016. TerpreT: A probabilistic programming language for program induction. arXiv Preprint arXiv:1608.04428 (2016).Google Scholar
Spandana Gella, Mirella Lapata, and Frank Keller. 2016. Unsupervised visual sense disambiguation for verbs using multimodal embeddings. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’16).Google ScholarCross Ref
Elena L. Glassman, Jeremy Scott, Rishabh Singh, Philip J. Guo, and Robert C. Miller. 2015. OverCode: Visualizing variation in student solutions to programming problems at scale. ACM Transactions on Computer-Human Interaction (TOCHI) 22, 2 (2015), 7 pages. Google ScholarDigital Library
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. Google ScholarDigital Library
Andrew D. Gordon, Thomas A. Henzinger, Aditya V. Nori, and Sriram K. Rajamani. 2014. Probabilistic programming. In Proceedings of the International Conference on Software Engineering (ICSE’14).Google Scholar
Orlena Gotel, Jane Cleland-Huang, Jane Huffman Hayes, Andrea Zisman, Alexander Egyed, Paul Grünbacher, Alex Dekhtyar, Giuliano Antoniol, Jonathan Maletic, and Patrick Mäder. 2012. Traceability fundamentals. In Software and Systems Traceability. Springer, 3--22.Google Scholar
Alex Graves, Greg Wayne, and Ivo Danihelka. 2014. Neural Turing machines. arXiv Preprint arXiv:1410.5401 (2014).Google Scholar
Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, and Sunghun Kim. 2016. Deep API learning. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’16). Google ScholarDigital Library
Sumit Gulwani and Mark Marron. 2014. NLyze: Interactive programming by natural language for spreadsheet data analysis and manipulation. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. Google ScholarDigital Library
Sumit Gulwani, Oleksandr Polozov, Rishabh Singh, and others. 2017. Program synthesis. In Foundations and Trends® in Programming Languages 4, 1--2 (2017), 1--119.Google ScholarCross Ref
Jin Guo, Jinghui Cheng, and Jane Cleland-Huang. 2017. Semantically enhanced software traceability using deep learning techniques. In Proceedings of the International Conference on Software Engineering (ICSE’17). Google ScholarDigital Library
Rahul Gupta, Aditya Kanade, and Shirish Shevade. 2018. Deep reinforcement learning for programming language correction. arXiv Preprint arXiv:1801.10467 (2018).Google Scholar
Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. DeepFix: Fixing common C language errors by deep learning. In Proceedings of the Conference of Artificial Intelligence (AAAI’17).Google Scholar
Tihomir Gvero and Viktor Kuncak. 2015. Synthesizing java expressions from free-form queries. In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages 8 Applications (OOPSLA’15). Google ScholarDigital Library
Alon Halevy, Peter Norvig, and Fernando Pereira. 2009. The unreasonable effectiveness of data. IEEE Intelligent Systems 24, 2 (2009), 8--12. Google ScholarDigital Library
Vincent J. Hellendoorn and Premkumar Devanbu. 2017. Are deep neural networks the best choice for modeling source code? In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’17). Google ScholarDigital Library
Vincent J. Hellendoorn, Premkumar T. Devanbu, and Alberto Bacchelli. 2015. Will they like this?: Evaluating code contributions with language models. In Proceedings of the Working Conference on Mining Software Repositories (MSR’15). Google ScholarDigital Library
Felix Hill, KyungHyun Cho, Anna Korhonen, and Yoshua Bengio. 2016. Learning to understand phrases by embedding the dictionary. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’16).Google ScholarCross Ref
Abram Hindle, Earl T. Barr, Mark Gabel, Zhendong Su, and Premkumar Devanbu. 2016. On the naturalness of software. Communications of the ACM 59, 5 (2016), 122--131. Google ScholarDigital Library
Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the naturalness of software. In Proceedings of the International Conference on Software Engineering (ICSE’12). Google ScholarDigital Library
G. E. Hinton, J. L. McClelland, and D. E. Rumelhart. 1986. Distributed representations. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1. MIT Press, 77--109. Google ScholarDigital Library
C. A. R. Hoare. 1969. An axiomatic basis for computer programming. Commun. ACM 12, 10 (Oct. 1969), 576--580. Google ScholarDigital Library
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780. Google ScholarDigital Library
Reid Holmes, Robert J. Walker, and Gail C. Murphy. 2005. Strathcona example recommendation tool. In ACM SIGSOFT Software Engineering Notes 30, 5 (2005), 237--240. Google ScholarDigital Library
Chun-Hung Hsiao, Michael Cafarella, and Satish Narayanasamy. 2014. Using web corpus statistics for program analysis. In ACM SIGPLAN Notices 49, 10 (2014), 49--65. Google ScholarDigital Library
Xing Hu, Yuhan Wei, Ge Li, and Zhi Jin. 2017. CodeSum: Translate program language to natural language. arXiv Preprint arXiv:1708.01837 (2017).Google Scholar
Andrew Hunt and David Thomas. 2000. The Pragmatic Programmer: From Journeyman to Master. Addison-Wesley Professional. Google ScholarDigital Library
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’16).Google ScholarCross Ref
Siyuan Jiang, Ameer Armaly, and Collin McMillan. 2017. Automatically generating commit messages from diffs using neural machine translation. In Proceedings of the International Conference on Automated Software Engineering (ASE’17). Google ScholarDigital Library
Daniel D. Johnson. 2016. Learning graphical state transitions. In Proceedings of the International Conference on Learning Representations (ICLR’16).Google Scholar
Dan Jurafsky. 2000. Speech 8 Language Processing (3rd. ed.). Pearson Education.Google Scholar
René Just, Darioush Jalali, and Michael D. Ernst. 2014. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA’14). Google ScholarDigital Library
Neel Kant. 2018. Recent advances in neural program synthesis. arXiv Preprint arXiv:1802.02353 (2018).Google Scholar
Svetoslav Karaivanov, Veselin Raychev, and Martin Vechev. 2014. Phrase-based statistical translation of programming languages. In Proceedings of the 2014 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming 8 Software. ACM, 173--184. Google ScholarDigital Library
Andrej Karpathy, Justin Johnson, and Fei-Fei Li. 2015. Visualizing and understanding recurrent networks. arXiv Preprint arXiv:1506.02078 (2015).Google Scholar
Reinhard Kneser and Hermann Ney. 1995. Improved backing-off for m-gram language modeling. In Procdedings of the 1995 International Conference on Acoustics, Speech, and Signal Processing (ICASSP’95). 1 (1995), 181--184.Google ScholarCross Ref
Donald Ervin Knuth. 1984. Literate programming. The Computer Journal 27, 2 (1984), 97--111. Google ScholarDigital Library
Ugur Koc, Parsa Saadatpanah, Jeffrey S. Foster, and Adam A. Porter. 2017. Learning a classifier for false positive error reports emitted by static code analysis tools. In Proceedings of the 1st ACM SIGPLAN International Workshop on Machine Learning and Programming Languages. Google ScholarDigital Library
Rainer Koschke. 2007. Survey of research on software clones. In Dagstuhl Seminar Proceedings. Schloss Dagstuhl-Leibniz-Zentrum für Informatik.Google Scholar
Ted Kremenek, Andrew Y. Ng, and Dawson R. Engler. 2007. A factor graph model for software bug finding. In Proceedings of the International Joint Conference on Artifical intelligence (IJCAI’07). Google ScholarDigital Library
Roland Kuhn and Renato De Mori. 1990. A cache-based natural language model for speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 6 (1990), 570--583. Google ScholarDigital Library
Nate Kushman and Regina Barzilay. 2013. Using semantic unification to generate regular expressions from natural language. In Proceedings of Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’13).Google Scholar
Tessa Lau. 2001. Programming by Demonstration: A Machine Learning Approach. Ph.D. Dissertation. University of Washington. Google ScholarDigital Library
Quoc V. Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning (ICML’14). Google ScholarDigital Library
Tien-Duy B. Le, Mario Linares-Vásquez, David Lo, and Denys Poshyvanyk. 2015. Rclinker: Automated linking of issue reports and commits leveraging rich contextual information. In Proceedings of the International Conference on Program Comprehension (ICPC’15). Google ScholarDigital Library
Dor Levy and Lior Wolf. 2017. Learning to align the source code to the compiled object code. In Proceedings of the International Conference on Machine Learning (ICML’17).Google Scholar
Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2016. Gated graph sequence neural networks. In Proceedings of the International Conference on Learning Representations (ICLR’16).Google Scholar
Percy Liang, Michael I. Jordan, and Dan Klein. 2010. Learning programs: A hierarchical bayesian approach. In Proceedings of the International Conference on Machine Learning (ICML’10). Google ScholarDigital Library
Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan. 2005. Scalable statistical bug isolation. In ACM SIGPLAN Notices 40, 6 (2005), 15--26. Google ScholarDigital Library
Xi Victoria Lin, Chenglong Wang, Deric Pang, Kevin Vu, Luke Zettlemoyer, and Michael D. Ernst. 2017. Program Synthesis from Natural Language using Recurrent Neural Networks. Technical Report UW-CSE-17-03-01. University of Washington Department of Computer Science and Engineering, Seattle, WA.Google Scholar
Xi Victoria Lin, Chenglong Wang, Luke Zettlemoyer, and Michael D. Ernst. 2018. NL2Bash: A corpus and semantic parser for natural language interface to the linux operating system. In Proceedings of the International Conference on Language Resources and Evaluation.Google Scholar
Wang Ling, Edward Grefenstette, Karl Moritz Hermann, Tomas Kocisky, Andrew Senior, Fumin Wang, and Phil Blunsom. 2016. Latent predictor networks for code generation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’16).Google ScholarCross Ref
Han Liu. 2016. Towards better program obfuscation: Optimization via language models. In Proceedings of the 38th International Conference on Software Engineering Companion. Google ScholarDigital Library
Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani, and Anindya Banerjee. 2009. Merlin: Specification inference for explicit information flow problems. In Proceedings of the Symposium on Programming Language Design and Implementation (PLDI’09). Google ScholarDigital Library
Sarah M. Loos, Geoffrey Irving, Christian Szegedy, and Cezary Kaliszyk. 2017. Deep network guided proof search. In Proceedings of the International Conference on Logic for Programming Artificial Intelligence and Reasoning (LPAR’17).Google Scholar
Pablo Loyola, Edison Marrese-Taylor, and Yutaka Matsuo. 2017. A neural architecture for generating natural language descriptions from source code changes. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2 (2017), 287--292.Google ScholarCross Ref
Yanxin Lu, Swarat Chaudhuri, Chris Jermaine, and David Melski. 2017. Data-Driven program completion. arXiv Preprint arXiv:1705.09042 (2017).Google Scholar
Chris Maddison and Daniel Tarlow. 2014. Structured generative models of natural source code. In Proceedings of the International Conference on Machine Learning (ICML’14). Google ScholarDigital Library
Ravi Mangal, Xin Zhang, Aditya V. Nori, and Mayur Naik. 2015. A user-guided approach to program analysis. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’15). Google ScholarDigital Library
Collin Mcmillan, Denys Poshyvanyk, Mark Grechanik, Qing Xie, and Chen Fu. 2013. Portfolio: Searching for relevant functions and their usages in millions of lines of code. ACM Transactions on Software Engineering and Methodology (TOSEM) 22, 4 (2013), 37 pages. Google ScholarDigital Library
Aditya Menon, Omer Tamuz, Sumit Gulwani, Butler Lampson, and Adam Kalai. 2013. A machine learning framework for programming by example. In Proceedings of the International Conference on Machine Learning (ICML’13). Google ScholarDigital Library
Kim Mens and Angela Lozano. 2014. Source code-based recommendation systems. In Recommendation Systems in Software Engineering. Springer, 93--130.Google Scholar
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv Preprint arXiv:1301.3781 (2013).Google Scholar
Lili Mou, Ge Li, Lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional neural networks over tree structures for programming language processing. In Proceedings of the Conference of Artificial Intelligence (AAAI’16). Google ScholarDigital Library
Dana Movshovitz-Attias and William W. Cohen. 2013. Natural language models for predicting programming comments. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’13).Google Scholar
Dana Movshovitz-Attias and William W. Cohen. 2015. KB-LDA: Jointly learning a knowledge base of hierarchy, relations, and facts. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’15).Google Scholar
Vijayaraghavan Murali, Letao Qi, Swarat Chaudhuri, and Chris Jermaine. 2018. Neural sketch learning for conditional program generation. In Proceedings of the International Conference on Learning Representations (ICLR).Google Scholar
Vijayaraghavan Murali, Swarat Chaudhuri, and Chris Jermaine. 2017. Bayesian specification learning for finding API usage errors. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, 151--162. Google ScholarDigital Library
Arvind Neelakantan, Quoc V. Le, and Ilya Sutskever. 2015. Neural programmer: Inducing latent programs with gradient descent. In Proceedings of the International Conference on Learning Representations (ICLR’15).Google Scholar
Graham Neubig. 2016. Survey of methods to generate natural language from source code. Retrieved from http://www.languageandcode.org/nlse2015/neubig15nlse-survey.pdf.Google Scholar
Anh Tuan Nguyen and Tien N. Nguyen. 2015. Graph-based statistical language model for code. In Proceedings of the International Conference on Software Engineering (ICSE’15).Google Scholar
Anh Tuan Nguyen, Tung Thanh Nguyen, and Tien N. Nguyen. 2013. Lexical statistical machine translation for language migration. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’13).Google Scholar
Anh T. Nguyen, Tung Thanh Nguyen, and Tien N. Nguyen. 2015. Divide-and-conquer approach for multi-phase statistical migration for source code. In Proceedings of the International Conference on Automated Software Engineering (ASE’15).Google Scholar
Trong Duc Nguyen, Anh Tuan Nguyen, and Tien N. Nguyen. 2016. Mapping API elements for code migration with vector representations. In Proceedings of the International Conference on Software Engineering (ICSE’16).Google Scholar
Trong Duc Nguyen, Anh Tuan Nguyen, Hung Dang Phan, and Tien N. Nguyen. 2017. Exploring API embedding for API usages and applications. In Proceedings of the International Conference on Software Engineering (ICSE’17).Google Scholar
Tung Thanh Nguyen, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen. 2013. A statistical semantic language model for source code. In Proceedings of the Joint Meeting of the European Software Engineering Conference and the Symposium on the Foundations of Software Engineering (ESEC/FSE’13).Google Scholar
Haoran Niu, Iman Keivanloo, and Ying Zou. 2017. Learning to rank code examples for code search engines. Empirical Software Engineering (ESEM’16) 22, 1 (2017), 259--291. Google ScholarDigital Library
Yusuke Oda, Hiroyuki Fudaba, Graham Neubig, Hideaki Hata, Sakriani Sakti, Tomoki Toda, and Satoshi Nakamura. 2015. Learning to generate pseudo-code from source code using statistical machine translation. In Proceedings of the International Conference on Automated Software Engineering (ASE’15).Google ScholarDigital Library
Hakjoo Oh, Hongseok Yang, and Kwangkeun Yi. 2015. Learning a strategy for adapting a program analysis via bayesian optimisation. In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages 8 Applications (OOPSLA’15). Google ScholarDigital Library
Cyrus Omar. 2013. Structured statistical syntax tree prediction. In Proceedings of the Conference on Systems, Programming, Languages and Applications: Software for Humanity (SPLASH’13). Google ScholarDigital Library
Cyrus Omar, Ian Voysey, Michael Hilton, Joshua Sunshine, Claire Le Goues, Jonathan Aldrich, and Matthew A. Hammer. 2017. Toward semantic foundations for program editors. arXiv preprint arXiv:1703.08694.Google Scholar
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’02). Google ScholarDigital Library
Emilio Parisotto, Abdel-rahman Mohamed, Rishabh Singh, Lihong Li, Dengyong Zhou, and Pushmeet Kohli. 2017. Neuro-symbolic program synthesis. In Proceedings of the International Conference on Learning Representations (ICLR’17).Google Scholar
Terence Parr and Jurgen J. Vinju. 2016. Towards a universal code formatter through machine learning. In Proceedings of the International Conference on Software Language Engineering (SLE’16). Google ScholarDigital Library
Jibesh Patra and Michael Pradel. 2016. Learning to Fuzz: Application-Independent Fuzz Testing with Probabilistic, Generative Models of Input Data. TU Darmstadt, Department of Computer Science, TUD-CS-2016-14664.Google Scholar
Hung Viet Pham, Phong Minh Vu, Tung Thanh Nguyen, and others. 2016. Learning API usages from bytecode: A statistical approach. In Proceedings of the International Conference on Software Engineering (ICSE’16).Google Scholar
Chris Piech, Jonathan Huang, Andy Nguyen, Mike Phulsuksombati, Mehran Sahami, and Leonidas J. Guibas. 2015. Learning program embeddings to propagate feedback on student code. In Proceedings of the International Conference on Machine Learning (ICML’15). Google ScholarDigital Library
Matt Post and Daniel Gildea. 2009. Bayesian learning of a tree substitution grammar. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’09). Google ScholarDigital Library
Michael Pradel and Koushik Sen. 2017. Deep learning to find bugs. TU Darmstadt, Department of Computer Science.Google Scholar
Sebastian Proksch, Sven Amann, Sarah Nadi, and Mira Mezini. 2016. Evaluating the evaluations of code recommender systems: A reality check. In Proceedings of the International Conference on Automated Software Engineering (ASE’16). Google ScholarDigital Library
Sebastian Proksch, Johannes Lerch, and Mira Mezini. 2015. Intelligent code completion with bayesian networks. ACM Transactions on Software Engineering and Methodology (TOSEM) 25, 1 (2015), 3. Google ScholarDigital Library
Yewen Pu, Karthik Narasimhan, Armando Solar-Lezama, and Regina Barzilay. 2016. sk_p: A neural program corrector for MOOCs. In Proceedings of the Conference on Systems, Programming, Languages and Applications: Software for Humanity (SPLASH’16). Google ScholarDigital Library
Chris Quirk, Raymond Mooney, and Michel Galley. 2015. Language to code: Learning semantic parsers for if-this-then-that recipes. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’15).Google ScholarCross Ref
Maxim Rabinovich, Mitchell Stern, and Dan Klein. 2017. Abstract syntax networks for code generation and semantic parsing. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’17).Google ScholarCross Ref
Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli, and Premkumar Devanbu. 2016. On the naturalness of buggy code. In Proceedings of the International Conference on Software Engineering (ICSE’16). Google ScholarDigital Library
Veselin Raychev, Pavol Bielik, Martin Vechev, and Andreas Krause. 2016. Learning programs from noisy data. In Proceedings of the Symposium on Principles of Programming Languages (POPL’16). Google ScholarDigital Library
Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting program properties from “big code.” In Proceedings of the Symposium on Principles of Programming Languages (POPL’15). Google ScholarDigital Library
Veselin Raychev, Martin Vechev, and Eran Yahav. 2014. Code completion with statistical language models. In Proceedings of the Symposium on Programming Language Design and Implementation (PLDI’14). Google ScholarDigital Library
Scott Reed and Nando de Freitas. 2016. Neural programmer-interpreters. In Proceedings of the International Conference on Learning Representations (ICLR’16).Google Scholar
Sebastian Riedel, Matko Bosnjak, and Tim Rocktäschel. 2017. Programming with a differentiable forth interpreter. In Proceedings of the International Conference on Machine Learning (ICML’17).Google Scholar
Martin Robillard, Robert Walker, and Thomas Zimmermann. 2010. Recommendation systems for software engineering. IEEE Software 27, 4 (2010), 80--86. Google ScholarDigital Library
Martin P. Robillard, Walid Maalej, Robert J. Walker, and Thomas Zimmermann. 2014. Recommendation Systems in Software Engineering. Springer. Google ScholarDigital Library
Tim Rocktäschel and Sebastian Riedel. 2017. End-to-end differentiable proving. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’17).Google Scholar
Caitlin Sadowski, Kathryn T. Stolee, and Sebastian Elbaum. 2015. How developers search for code: A case study. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’15). Google ScholarDigital Library
Juliana Saraiva, Christian Bird, and Thomas Zimmermann. 2015. Products, developers, and milestones: How should I build my N-gram language model. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’15). Google ScholarDigital Library
Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’16).Google ScholarCross Ref
Abhishek Sharma, Yuan Tian, and David Lo. 2015. NIRMAL: Automatic identification of software relevant tweets leveraging language model. In Proceedings of the International Conference on Software Analysis, Evolution, and Reengineering (SANER’15).Google ScholarCross Ref
Rishabh Singh and Sumit Gulwani. 2015. Predicting a correct program in programming by example. In Proceedings of the International Conference on Computer Aided Verification.Google ScholarCross Ref
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’14). Google ScholarDigital Library
Suresh Thummalapenta and Tao Xie. 2007. Parseweb: A programmer assistant for reusing open source code on the web. In Proceedings of the International Conference on Automated Software Engineering (ASE’07). Google ScholarDigital Library
Christoph Treude and Martin P. Robillard. 2016. Augmenting API documentation with insights from stack overflow. In Proceedings of the International Conference on Software Engineering (ICSE’16). Google ScholarDigital Library
Zhaopeng Tu, Zhendong Su, and Premkumar Devanbu. 2014. On the localness of software. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’14). Google ScholarDigital Library
Bogdan Vasilescu, Casey Casalnuovo, and Premkumar Devanbu. 2017. Recovering clear, natural identifiers from obfuscated JS names. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’17). Google ScholarDigital Library
Lisa Wang, Angela Sy, Larry Liu, and Chris Piech. 2017. Deep knowledge tracing on programming exercises. In Proceedings of the Conference on Learning @ Scale. Google ScholarDigital Library
Song Wang, Devin Chollak, Dana Movshovitz-Attias, and Lin Tan. 2016. Bugram: Bug detection with n-gram language models. In Proceedings of the International Conference on Automated Software Engineering (ASE’16). Google ScholarDigital Library
Song Wang, Taiyue Liu, and Lin Tan. 2016. Automatically learning semantic features for defect prediction. In Proceedings of the International Conference on Software Engineering (ICSE’16). Google ScholarDigital Library
Xin Wang, Chang Liu, Richard Shin, Joseph E. Gonzalez, and Dawn Song. 2016. Neural Code Completion. Retrieved from https://openreview.net/pdf?id=rJbPBt9lg.Google Scholar
Andrzej Wasylkowski, Andreas Zeller, and Christian Lindig. 2007. Detecting object usage anomalies. In Proceedings of the Joint Meeting of the European Software Engineering Conference and the Symposium on the Foundations of Software Engineering (ESEC/FSE’07). Google ScholarDigital Library
Martin White, Michele Tufano, Christopher Vendome, and Denys Poshyvanyk. 2016. Deep learning code fragments for code clone detection. In Proceedings of the International Conference on Automated Software Engineering (ASE’16). Google ScholarDigital Library
Martin White, Christopher Vendome, Mario Linares-Vásquez, and Denys Poshyvanyk. 2015. Toward deep learning software repositories. In Proceedings of the Working Conference on Mining Software Repositories (MSR’15). Google ScholarDigital Library
Chadd C. Williams and Jeffrey K. Hollingsworth. 2005. Automatic mining of source code repositories to improve bug finding techniques. IEEE Transactions on Software Engineering 31, 6 (2005), 466--480. Google ScholarDigital Library
Ian H. Witten, Eibe Frank, Mark A. Hall, and Christopher J. Pal. 2016. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann. Google ScholarDigital Library
W. Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa. 2016. A survey on software fault localization. IEEE Transactions on Software Engineering 42, 8 (2016), 707--740. Google ScholarDigital Library
Tao Xie and Jian Pei. 2006. MAPO: Mining API usages from open source repositories. In Proceedings of the Working Conference on Mining Software Repositories (MSR’06). Google ScholarDigital Library
Chang Xu, Dacheng Tao, and Chao Xu. 2013. A survey on multi-view learning. arXiv Preprint arXiv:1304.5634 (2013).Google Scholar
Shir Yadid and Eran Yahav. 2016. Extracting code from programming tutorial videos. In Proceedings of the 2016 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software. Google ScholarDigital Library
Eran Yahav. 2015. Programming with “big code.” In Asian Symposium on Programming Languages and Systems. Springer, 3--8.Google ScholarCross Ref
Pengcheng Yin and Graham Neubig. 2017. A syntactic neural model for general-purpose code generation. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’17).Google ScholarCross Ref
Wojciech Zaremba and Ilya Sutskever. 2014. Learning to execute. arXiv Preprint arXiv:1410.4615 (2014).Google Scholar
Alice X. Zheng, Michael I. Jordan, Ben Liblit, and Alex Aiken. 2003. Statistical debugging of sampled programs. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’03). Google ScholarDigital Library
Alice X. Zheng, Michael I. Jordan, Ben Liblit, Mayur Naik, and Alex Aiken. 2006. Statistical debugging: Simultaneous identification of multiple bugs. In Proceedings of the International Conference on Machine Learning (ICML’06). Google ScholarDigital Library
Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2SQL: Generating structured queries from natural language using reinforcement learning. arXiv Preprint arXiv:1709.00103 (2017).Google Scholar
Thomas Zimmermann, Andreas Zeller, Peter Weissgerber, and Stephan Diehl. 2005. Mining version histories to guide software changes. IEEE Transactions on Software Engineering 31, 6 (2005), 429--445. Google ScholarDigital Library

Index Terms

A Survey of Machine Learning for Big Code and Naturalness

Recommendations

The adverse effects of code duplication in machine learning models of code
Onward! 2019: Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software

The field of big code relies on mining large corpora of code to perform some learning task towards creating better tools for software engineers. A significant threat to this approach was recently identified by Lopes et al. (2017) who found a large ...
Read More
Machine learning on big data

Machine learning (ML) is continuously unleashing its power in a wide range of applications. It has been pushed to the forefront in recent years partly owing to the advent of big data. ML algorithms have never been better promised while challenged by big ...
Read More
Severity Classification of Code Smells Using Machine-Learning Methods
Abstract
Code smell detection can be very useful for minimizing maintenance costs and improving software quality. Code smells help developers/programmers, researchers to subjectively interpret design defects in different ways. Code smells instances can ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Computing Surveys Volume 51, Issue 4
July 2019
765 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3236632
Editor:
Sartaj Sahni
Department of Computer and Information Science and Engineering / University of Florida / Gainesville, FL 32611
Issue’s Table of Contents
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 July 2018
- Accepted: 1 April 2018
- Revised: 1 March 2018
- Received: 1 September 2017
Published in csur Volume 51, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Big code
code naturalness
machine learning
software engineering tools
Qualifiers
- survey
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 413
  Total Citations
  View Citations
- 10,011
  Total Downloads
- Downloads (Last 12 months)1,663
- Downloads (Last 6 weeks)209
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Survey of Machine Learning for Big Code and Naturalness

ACM Computing Surveys

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

The adverse effects of code duplication in machine learning models of code

Machine learning on big data

Severity Classification of Code Smells Using Machine-Learning Methods