research-article

Learning natural coding conventions

Authors:
Miltiadis Allamanis

University of Edinburgh, UK

University of Edinburgh, UK
View Profile

,
Earl T. Barr

University College London, UK

University College London, UK
View Profile

,
Christian Bird

Microsoft Research, USA

Microsoft Research, USA
View Profile

,
Charles Sutton

University of Edinburgh, UK

University of Edinburgh, UK
View Profile

FSE 2014: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software EngineeringNovember 2014Pages 281–293https://doi.org/10.1145/2635868.2635883

Published:11 November 2014Publication History

FSE 2014: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering

Pages 281–293

ABSTRACT

Every programmer has a characteristic style, ranging from preferences about identifier naming to preferences about object relationships and design patterns. Coding conventions define a consistent syntactic style, fostering readability and hence maintainability. When collaborating, programmers strive to obey a project’s coding conventions. However, one third of reviews of changes contain feedback about coding conventions, indicating that programmers do not always follow them and that project members care deeply about adherence. Unfortunately, programmers are often unaware of coding conventions because inferring them requires a global view, one that aggregates the many local decisions programmers make and identifies emergent consensus on style. We present NATURALIZE, a framework that learns the style of a codebase, and suggests revisions to improve stylistic consistency. NATURALIZE builds on recent work in applying statistical natural language processing to source code. We apply NATURALIZE to suggest natural identifier names and formatting conventions. We present four tools focused on ensuring natural code during development and release management, including code review. NATURALIZE achieves 94 % accuracy in its top suggestions for identifier names. We used NATURALIZE to generate 18 patches for 5 open source projects: 14 were accepted.

References

S. L. Abebe, S. Haiduc, P. Tonella, and A. Marcus. The effect of lexicon bad smells on concept location in source code. In Source Code Analysis and Manipulation (SCAM), 2011 11th IEEE International Working Conference on, pages 125–134. IEEE, 2011. Google ScholarDigital Library
A. Abran, P. Bourque, R. Dupuis, J. W. Moore, and L. L. Tripp. Guide to the Software Engineering Body of Knowledge - SWEBOK. IEEE Press, Piscataway, NJ, USA, 2004 version edition, 2004.Google Scholar
E. N. Adams. Optimizing preventive service of software products. IBM Journal of Research and Development, 28(1):2–14, Jan. 1984. Google ScholarDigital Library
M. Allamanis and C. Sutton. Mining source code repositories at massive scale using language modeling. In Proceedings of the Tenth International Workshop on Mining Software Repositories, pages 207–216. IEEE Press, 2013. Google ScholarDigital Library
N. Anquetil and T. Lethbridge. Assessing the relevance of identifier names in a legacy software system. In Proceedings of the 1998 Conference of the Centre for Advanced Studies on Collaborative Research, page 4, 1998. Google ScholarDigital Library
N. Anquetil and T. C. Lethbridge. Recovering software architecture from the names of source files. Journal of Software Maintenance, 11(3):201–221, 1999. Google ScholarDigital Library
C. Arthur. Apple’s SSL iPhone vulnerability: How did it happen, and what next? bit.ly/1bJ7aSa, 2014. Visited Mar 2014.Google Scholar
M. I. S. R. Association et al. MISRA-C 2012: Guidelines for the Use of the C Language in Critical Systems. ISBN 9781906400118, 2012.Google Scholar
astyle Contributors. Artistic style 2.03. http://astyle.sourceforge.net/, 2013. Visited September 9, 2013.Google Scholar
A. Bacchelli and C. Bird. Expectations, outcomes, and challenges of modern code review. In ICSE, 2013. Google ScholarDigital Library
J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. The Journal of Machine Learning Research, 13:281–305, 2012. Google ScholarDigital Library
T. J. Biggerstaff, B. G. Mitbander, and D. Webster. The concept assignment problem in program understanding. In Proceedings of the 15th International Conference on Software Engineering, pages 482–498. IEEE Computer Society Press, 1993. Google ScholarDigital Library
D. Binkley, M. Davis, D. Lawrie, J. Maletic, C. Morrell, and B. Sharif. The impact of identifier style on effort and comprehension. Empirical Software Engineering, 18(2):219–276, 2013. Google ScholarDigital Library
D. Binkley, M. Davis, D. Lawrie, and C. Morrell. To CamelCase or Under_score. In IEEE International Conference on Program Comprehension (ICPC), pages 158–167, 2009.Google ScholarCross Ref
C. Boogerd and L. Moonen. Assessing the value of coding standards: An empirical study. In H. Mei and K. Wong, editors, Proceedings of the 24th IEEE International Conference on Software Maintenance (ICSM 2008), pages 277 – 286. IEEE, October 2008.Google ScholarCross Ref
F. P. Brooks. The Mythical Man-Month. Addison-Wesley Reading, 1975. Google ScholarDigital Library
M. Broy, F. Deißenböck, and M. Pizka. A holistic approach to software quality at work. In Proc. 3rd World Congress for Software Quality (3WCSQ), 2005.Google Scholar
M. Bruch, M. Monperrus, and M. Mezini. Learning from examples to improve code completion systems. In ESEC/SIGSOFT FSE, pages 213–222. ACM, 2009. Google ScholarDigital Library
R. P. Buse and W. R. Weimer. Learning a metric for code readability. Software Engineering, IEEE Transactions on, 36(4):546–558, 2010. Google ScholarDigital Library
B. Caprile and P. Tonella. Restructuring program identifier names. In International Conference on Software Maintenance (ICSM’00), pages 97–107, 2000. Google ScholarDigital Library
J. Carletta. Assessing agreement on classification tasks: the kappa statistic. Computational Linguistics, 22(2):249–254, 1996. Google ScholarDigital Library
S. Chen and J. Goodman. An empirical study of smoothing techniques for language modeling. In Proceedings of the 34th Annual Meeting on Association for Computational Linguistics, pages 310–318. Association for Computational Linguistics, 1996. Google ScholarDigital Library
N. Cowan. The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences, 24(1):87–114, 2001.Google ScholarCross Ref
F. Deißenböck and M. Pizka. Concise and consistent naming {software system identifier naming}. In Proceedings of the 13th International Workshop on Program Comprehension (IWPC’05), pages 97–106, 2005. Google ScholarDigital Library
S. Dowdy, S. Wearden, and D. Chilko. Statistics for Research, volume 512. John Wiley & Sons, 2011.Google Scholar
Eclipse-Contributors. Eclipse JDT. http://www.eclipse.org/jdt/, 2013. Visited September 9, 2013.Google Scholar
L. M. Eshkevari, V. Arnaoudova, M. Di Penta, R. Oliveto, Y.-G. Guéhéneuc, and G. Antoniol. An exploratory study of identifier renamings. In Proceedings of the 8th Working Conference on Mining Software Repositories, pages 33–42. ACM, 2011. Google ScholarDigital Library
M. Gabel and Z. Su. A study of the uniqueness of source code. In Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of software engineering, FSE ’10, pages 147–156, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
M. G. Gabel. Inferring Programmer Intent and Related Errors from Software. PhD thesis, University of California, 2011. Google ScholarDigital Library
GitHub. JUnit Pull Request #834. bit.ly/O8bmjM, 2014. Visited Mar 2014.Google Scholar
GitHub. libgdx Pull Request #1400. bit.ly/O8aBqV, 2014. Visited Mar 2014.Google Scholar
gnu-indent Contributors. GNU Indent – beautify C code. http://www.gnu.org/software/indent/, 2013. Visited September 9, 2013.Google Scholar
S. Gupta, S. Malik, L. Pollock, and K. Vijay-Shanker. Part-of-speech tagging of program identifiers for improved text-based software engineering tools. In International Conference on Program Comprehension, pages 3–12. IEEE, 2013.Google Scholar
L. Hatton. Safer language subsets: an overview and a case history, MISRA C. Information and Software Technology, 46(7):465–472, 2004.Google ScholarCross Ref
A. Hindle, E. T. Barr, Z. Su, M. Gabel, and P. Devanbu. On the naturalness of software. In International Conference on Software Engineering (ICSE), pages 837–847. IEEE, 2012. Google ScholarDigital Library
A. Hindle, M. W. Godfrey, and R. C. Holt. Reading beside the lines: Using indentation to rank revisions by complexity. Science of Computer Programming, 74(7):414–429, May 2009. Google ScholarDigital Library
E. W. Høst and B. M. Østvold. Debugging method names. In In European Conference on Object-Oriented Programming (ECOOP), pages 294–317. Springer, 2009. Google ScholarDigital Library
D. Jurafsky and J. H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Prentice Hall, 2nd edition, 2009. Google ScholarDigital Library
K. Kukich. Techniques for automatically correcting words in text. ACM Computing Surveys, 24(4):377–439, Dec. 1992. Google ScholarDigital Library
A. Langley. Apple’s SSL/TLS bug. bit.ly/MMvx6b, 2014. Visited Mar 2014.Google Scholar
D. Lawrie, H. Feild, and D. Binkley. Syntactic identifier conciseness and consistency. In IEEE International Workshop on Source Code Analysis and Manipulation, pages 139–148. IEEE, 2006. Google ScholarDigital Library
D. Lawrie, H. Feild, and D. Binkley. An empirical study of rules for well-formed identifiers: Research articles. Journal of Software Maintenance Evolution: Research and Practice, 19(4):205–229, July 2007. Google ScholarDigital Library
D. Lawrie, C. Morrell, H. Feild, and D. Binkley. What’s in a Name? A Study of Identifiers. In Proceedings of the 14th IEEE International Conference on Program Comprehension (ICPC’06), ICPC ’06, pages 3–12, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
B. Liblit, A. Begel, and E. Sweetser. Cognitive perspectives on the role of naming in computer programs. In Annual Psychology of Programming Workshop, 2006.Google Scholar
C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pages 74–81, 2004.Google Scholar
C. J. Maddison and D. Tarlow. Structured generative models of natural source code. arXiv preprint arXiv:1401.0514, 2014.Google Scholar
E. Mays, F. J. Damerau, and R. L. Mercer. Context based spelling correction. Information Processing and Management, 27(5):517–522, 1991. Google ScholarDigital Library
G. A. Miller. The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychological review, 63(2):81, 1956.Google ScholarCross Ref
D. Movshovitz-Attias and W. W. Cohen. Natural language models for predicting programming comments. In Proc of the ACL, 2013.Google Scholar
E. Murphy-Hill, C. Parnin, and A. P. Black. How we refactor, and how we know it. Software Engineering, IEEE Transactions on, 38(1):5–18, 2012. Google ScholarDigital Library
N. Nagappan and T. Ball. Using software dependencies and churn metrics to predict field failures: An empirical case study. In ESEM, pages 364–373, 2007. Google ScholarDigital Library
A. T. Nguyen, T. T. Nguyen, H. A. Nguyen, A. Tamrawi, H. V. Nguyen, J. Al-Kofahi, and T. N. Nguyen. Graph-based pattern-oriented, context-sensitive source code completion. In ACM/IEEE International Conference on Software Engineering (ICSE). IEEE, 2012. Google ScholarDigital Library
T. T. Nguyen, A. T. Nguyen, H. A. Nguyen, and T. N. Nguyen. A statistical semantic language model for source code. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, pages 532–542. ACM, 2013. Google ScholarDigital Library
M. Ohba and K. Gondow. Toward mining concept keywords from identifiers in large software projects. In ACM SIGSOFT Software Engineering Notes, volume 30, pages 1–5. ACM, 2005. Google ScholarDigital Library
Oracle. Code Conventions for the Java Programming Language. http://www.oracle.com/technetwork/ java/codeconv-138413.html, 1999. Visited September 2, 2013.Google Scholar
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. BLEU: a method for automatic evaluation of machine translation. In Association for Computational Linguistics (ACL), pages 311–318, 2002. Google ScholarDigital Library
R. Pike. Go at Google. http://talks.golang.org/2012/splash.slide, 2012. Visited September 9, 2013.Google Scholar
Pylint-Contributors. Pylint – code analysis for Python. http://www.pylint.org/, 2013. Visited September 9, 2013.Google Scholar
V. Rajlich and P. Gosavi. Incremental change in object-oriented programming. Software, IEEE, 21(4):62–69, 2004. Google ScholarDigital Library
D. Ratiu and F. Deißenböck. From reality to programs and (not quite) back again. In IEEE International Conference on Program Comprehension (ICPC), pages 91–102. IEEE, 2007. Google ScholarDigital Library
P. C. Rigby and C. Bird. Convergent software peer review practices. In Proceedings of the the Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering (ESEC/FSE). ACM, 2013. Google ScholarDigital Library
R. Robbes and M. Lanza. How program history can improve code completion. In Automated Software Engineering (ASE), pages 317–326. IEEE, 2008. Google ScholarDigital Library
M. Robillard, R. Walker, and T. Zimmermann. Recommendation systems for software engineering. Software, IEEE, 27(4):80–86, 2010. Google ScholarDigital Library
G. v. Rossum, B. Warsaw, and N. Coghlan. PEP 8–Style Guide for Python Code. http://www.python.org/dev/peps/pep-0008/, 2013. Visited September 8, 2013.Google Scholar
C. Simonyi. Hungarian notation. http://msdn.microsoft. com/en-us/library/aa260976(VS.60).aspx, 1999. Visited September 2, 2013.Google Scholar
E. Soloway and K. Ehrlich. Empirical studies of programming knowledge. Software Engineering, IEEE Transactions on, (5):595–609, 1984. Google ScholarDigital Library
W. Strunk Jr and E. White. The Elements of Style. Macmillan, New York, 3rd edition, 1979.Google Scholar
A. Takang, P. Grubb, and R. Macredie. The effects of comments and identifier names on program comprehensibility: an experiential study. Journal of Program Languages, 4(3):143–167, 1996.Google Scholar
A. A. Takang, P. A. Grubb, and R. D. Macredie. The effects of comments and identifier names on program comprehensibility: an experimental investigation. J. Prog. Lang., 4(3):143–167, 1996.Google Scholar
G. Uddin, B. Dagenais, and M. P. Robillard. Analyzing temporal API usage patterns. In Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, pages 456–459. IEEE Computer Society, 2011. Google ScholarDigital Library
J. Wang, Y. Dang, H. Zhang, K. Chen, T. Xie, and D. Zhang. Mining succinct and high-coverage API usage patterns from source code. In Proceedings of the Tenth International Workshop on Mining Software Repositories, pages 319–328. IEEE Press, 2013. Google ScholarDigital Library
X. Wang, L. Pollock, and K. Vijay-Shanker. Automatic segmentation of method code into meaningful blocks to improve readability. In Working Conference on Reverse Engineering, pages 35–44. IEEE, 2011. Google ScholarDigital Library
Wikipedia. Coding Conventions. http: //en.wikipedia.org/wiki/Coding_conventions.Google Scholar
H. P. Young. The economics of convention. The Journal of Economic Perspectives, 10(2):105–122, 1996.Google ScholarCross Ref
C. Zhang, J. Yang, Y. Zhang, J. Fan, X. Zhang, J. Zhao, and P. Ou. Automatic parameter recommendation for practical api usage. In Proceedings of the 34th International Conference on Software Engineering, pages 826–836. IEEE Press, 2012. Google ScholarDigital Library
H. Zhong, T. Xie, L. Zhang, J. Pei, and H. Mei. MAPO: Mining and recommending API usage patterns. In ECOOP 2009–Object-Oriented Programming, pages 318–343. Springer, 2009. Introduction Motivating Example Use Cases and Tools The Naturalize Framework The Core of Naturalize Choices of Scoring Function Suggesting Natural Names Suggesting Natural Formatting Converting Conventions into Rules Evaluation The Importance of Coding Conventions Suggestion Robustness of Suggestions Manual Examination of Suggestions Suggestions Accepted by Projects Related Work Conclusion Acknowledgements References Google ScholarDigital Library

Index Terms

Learning natural coding conventions
1. Software and its engineering
  1. Software notations and tools

Recommendations

Suggesting accurate method and class names
ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering

Descriptive names are a vital part of readable, and hence maintainable, code. Recent progress on automatically suggesting names for local variables tantalizes with the prospect of replicating that success with method and class names. However, ...
Read More
Styler: learning formatting conventions to repair Checkstyle violations
Abstract
Ensuring the consistent usage of formatting conventions is an important aspect of modern software quality assurance. To do so, the source code of a project should be checked against the formatting conventions (or rules) adopted by its development ...
Read More
Python coding style compliance on stack overflow
MSR '19: Proceedings of the 16th International Conference on Mining Software Repositories

Software developers all over the world use Stack Overflow (SO) to interact and exchange code snippets. Research also uses SO to harvest code snippets for use with recommendation systems. However, previous work has shown that code on SO may have quality ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
FSE 2014: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering
November 2014
856 pages
ISBN:9781450330565
DOI:10.1145/2635868
General Chair:
Shing-Chi Cheung
Hong Kong University of Science and Technology, China
,
Program Chairs:
Alessandro Orso
Georgia Institute of Technology, USA
,
Margaret-Anne Storey
University of Victoria, Canada
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 November 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Coding conventions
naturalness of software
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate17of128submissions,13%
Upcoming Conference
FSE '24

Sponsor:

sigsoft

32nd ACM International Conference on the Foundations of Software Engineering

July 15 - 19, 2024

Ipojuca (Pernambuco) , Brazil
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 250
  Total Citations
  View Citations
- 2,472
  Total Downloads
- Downloads (Last 12 months)164
- Downloads (Last 6 weeks)26
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Learning natural coding conventions

FSE 2014: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Suggesting accurate method and class names

Styler: learning formatting conventions to repair Checkstyle violations

Python coding style compliance on stack overflow