research-article

Intelligent Code Completion with Bayesian Networks

Authors:
Sebastian Proksch

Technische Universität Darmstadt, Darmstadt, Germany

Technische Universität Darmstadt, Darmstadt, Germany
View Profile

,
Johannes Lerch

Technische Universität Darmstadt, Darmstadt, Germany

Technische Universität Darmstadt, Darmstadt, Germany
View Profile

,
Mira Mezini

Technische Universität Darmstadt, Darmstadt, Germany

Technische Universität Darmstadt, Darmstadt, Germany
View Profile

ACM Transactions on Software Engineering and Methodology Volume 25 Issue 1Article No.: 3pp 1–31https://doi.org/10.1145/2744200

Published:02 December 2015Publication History

ACM Transactions on Software Engineering and Methodology

Abstract

Code completion is an integral part of modern Integrated Development Environments (IDEs). Developers often use it to explore Application Programming Interfaces (APIs). It is also useful to reduce the required amount of typing and to help avoid typos. Traditional code completion systems propose all type-correct methods to the developer. Such a list is often very long with many irrelevant items. More intelligent code completion systems have been proposed in prior work to reduce the list of proposed methods to relevant items.

This work extends one of these existing approaches, the Best Matching Neighbor (BMN) algorithm. We introduce Bayesian networks as an alternative underlying model, use additional context information for more precise recommendations, and apply clustering techniques to improve model sizes. We compare our new approach, Pattern-based Bayesian Networks (PBN), to the existing BMN algorithm. We extend previously used evaluation methodologies and, in addition to prediction quality, we also evaluate model size and inference speed.

Our results show that the additional context information we collect improves prediction quality, especially for queries that do not contain method calls. We also show that PBN can obtain comparable prediction quality to BMN, while model size and inference speed scale better with large input sizes.

References

Marcel Bruch and Mira Mezini. 2008. Improving code recommender systems using Boolean factor analysis and graphical models. In Proceedings of the International Workshop on Recommendation Systems for Software Engineering (RSSE'08). ACM Press, New York. Google ScholarDigital Library
Marcel Bruch, Martin Monperrus, and Mira Mezini. 2009. Learning from examples to improve code completion systems. In Proceedings of the 7^th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering (ESEC/FSE'09). ACM Press, New York, 213--222. Google ScholarDigital Library
Marcel Bruch, Thorsten Schafer, and Mira Mezini. 2006. FrUiT: IDE support for framework understanding. In Proceedings of the OOPSLA Workshop on Eclipse Technology eXchange (Eclipse'06). ACM Press, New York, 55--59. Google ScholarDigital Library
Raymond P. L. Buse and Westley Weimer. 2012. Synthesizing API usage examples. In Proceedings of the International Conference on Software Engineering (ICSE'12). IEEE Press, 782--792. Google ScholarDigital Library
Olivier Chapelle and Ya Zhang. 2009. A dynamic Bayesian network click model for Web search ranking. In Proceedings of the 18^th International Conference on World Wide Web (WWW'09). ACM Press, New York, 1--10. Google ScholarDigital Library
Stanley F. Chen and Joshua Goodman. 1996. An empirical study of smoothing techniques for language modeling. In Proceedings of the 34^th Annual Meeting on Association for Computational Linguistics (ACL'96). Association for Computational Linguistics, 310--318. Google ScholarDigital Library
Thomas Cover and Peter Hart. 2006. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 1, 21--27. Google ScholarDigital Library
Tihomir Gvero, Viktor Kuncak, Ivan Kuraj, and Ruzica Piskac. 2013. Complete completion using types and weights. In Proceedings of the 34^th Conference on Programming Language Design and Implementation (PLDI'13). ACM Press, New York, 27--38. Google ScholarDigital Library
Lars Heinemann, Veronika Bauer, Markus Herrmannsdoerfer, and Benjamin Hummel. 2012. Identifier-based context-dependent API method recommendation. In Proceedings of the 16^th European Conference on Software Maintenance and Reengineering (CSMR'12). IEEE, 31--40. Google ScholarDigital Library
Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the naturalness of software. In Proceedings of the International Conference on Software Engineering (ICSE'12). IEEE Press, 837--847. Google ScholarDigital Library
Zhenmin Li and Yuanyuan Zhou. 2005. PR-Miner: Automatically extracting implicit programming rules and detecting violations in large software code. In Proceedings of the 10^th European Software Engineering Conference Held Jointly with the 13^th International Symposium on The Foundations of Software Engineering (ESEC/FSE'05). ACM Press, New York, 306--315. Google ScholarDigital Library
Benjamin Livshits and Thomas Zimmermann. 2005. DynaMine: Finding common error patterns by mining software revision histories. In Proceedings of the 10^th European Software Engineering Conference Held Jointly with the 13^th International Symposium on The Foundations of Software Engineering (ESEC/FSE'05). ACM Press, New York, 296--305. Google ScholarDigital Library
Robert Cecil Martin. 2003. Agile Software Development: Principles, Patterns, and Practices. Prentice Hall, PTR, Upper Saddle River, NJ. Google ScholarDigital Library
Andrew Mccallum, Kamal Nigam, and Lyle H. Ungar. 2000. Efficient clustering of high-dimensional data sets with application to reference matching. In Proceedings of the 6^th International Conference on Knowledge Discovery and Data Mining (KDD'00). ACM Press, New York, 169--178. Google ScholarDigital Library
Amir Michail. 2000. Data mining library reuse patterns using generalized association rules. In Proceedings of the 22^nd International Conference on Software Engineering (ICSE'00). ACM Press, New York, 167--176. Google ScholarDigital Library
Martin Monperrus, Marcel Bruch, and Mira Mezini. 2010. Detecting missing method calls in object-oriented software. In Proceedings of the 24^th European Conference on Object-Oriented Programming (ECOOP'10). 2--25. Google ScholarDigital Library
Meiyappan Nagappan, Thomas Zimmermann, and Christian Bird. 2013. Diversity in software engineering research. In Proceedings of the 9^th Joint Meeting of the European Software Engineering Conference and the Symposium on The Foundations of Software Engineering (ESEC/FSE'13). ACM Press, New York, 466--476. Google ScholarDigital Library
Anh Tuan Nguyen, Tung Thanh Nguyen, Hoan Anh Nguyen, Ahmed Tamrawi, Hung Viet Nguyen, Jafar Al-Kofahi, and Tien N. Nguyen. 2012. Graph-based pattern-oriented, context-sensitive source code completion. In Proceedings of the International Conference on Software Engineering (ICSE'12). IEEE Press, 69--79. Google ScholarDigital Library
Tung Thanh Nguyen, Hoan Anh Nguyen, Nam H. Pham, Jafar M. Al-Kofahi, and Tien N. Nguyen. 2009. Graph-based mining of multiple object usage patterns. In Proceedings of the 7^th Joint Meeting of the European Software Engineering Conference and the Symposium on The Foundations of Software Engineering (ESEC/FSE'09). ACM Press, New York, 383--392. Google ScholarDigital Library
Jakob Nielsen. 1994. Usability Engineering. Elsevier, Amsterdam.Google Scholar
Sebastian Proksch, Sven Amann, and Mira Mezini. 2014. Towards standardized evaluation of developer-assistance tools. In Proceedings of the 4^th International Workshop on Recommendation Systems for Software Engineering (RSSE'14). ACM Press, New York, 14--18. Google ScholarDigital Library
Irina Rish. 2001. An empirical study of the naive Bayes classifier. In Proceedings of the Workshop on Empirical Methods in Artificial Intelligence (IJCAI'01). IBM, New York, 41--46.Google Scholar
Martin P. Robillard, Eric Bodden, David Kawrykow, Mira Mezini, and Tristan Ratchford. 2013. Automated API property inference techniques. IEEE Trans. Softw. Engin. 39, 5, 613--637. Google ScholarDigital Library
J. Michael Schultz and Mark Liberman. 1999. Topic detection and tracking using idf-weighted cosine coefficient. In Proceedings of the DARPA Broadcast News Workshop. Morgan Kaufmann Publishers, 189--192.Google Scholar
Olin Shivers. 1988. Control flow analysis in scheme. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI'88). ACM Press, New York, 164--174. Google ScholarDigital Library
Olin Shivers. 1991a. Data-flow analysis and type recovery in scheme. In Topics in Advanced Language Implementation. The MIT Press, Cambridge, MA.Google Scholar
Olin Shivers. 1991b. The semantics of scheme control-flow analysis. In Proceedings of the ACM SIGPLAN Symposium on Partial Evaluation and Semantics-Based Program Manipulation (PEPM'91). ACM Press, New York, 190--198. Google ScholarDigital Library
Alexander Strehl, Joydeep Ghosh, and Raymond Mooney. 2000. Impact of similarity measures on web-page clustering. In Proceedings of the Workshop on Artificial Intelligence for Web Search (AAAI'00). 58--64.Google Scholar
Xiwang Yang, Yang Guo, and Yong Liu. 2011. Bayesian-inference based recommendation in online social networks. In Proceedings of the INFOCOM Conference (INFOCOM'11). 551--555.Google ScholarCross Ref
Cheng Zhang, Juyuan Yang, Yi Zhang, Jing Fan, Xin Zhang, Jianjun Zhao, and Peizhao Ou. 2012. Automatic parameter recommendation for practical API usage. In Proceedings of the International Conference on Software Engineering (ICSE'12). IEEE Press, 826--836. Google ScholarDigital Library
Hao Zhong, Lu Zhang, and Hong Mei. 2008. Inferring specifications of object oriented APIs from API source code. In Proceedings of the 15^th Asia-Pacific Software Engineering Conference (APSEC'08). IEEE Computer Society, 221--228. Google ScholarDigital Library

Index Terms

Intelligent Code Completion with Bayesian Networks

Recommendations

Learning from examples to improve code completion systems
ESEC/FSE '09: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering

The suggestions made by current IDE's code completion features are based exclusively on static type system of the programming language. As a result, often proposals are made which are irrelevant for a particular working context. Also, these suggestions ...
Read More
The hidden cost of code completion: understanding the impact of the recommendation-list length on its efficiency
MSR '18: Proceedings of the 15th International Conference on Mining Software Repositories

Automatic code completion is a useful and popular technique that software developers use to write code more effectively and efficiently. However, while the benefits of code completion are clear, its cost is yet not well understood. We hypothesize the ...
Read More
A study on repetitiveness of code completion operations
ICSM '12: Proceedings of the 2012 IEEE International Conference on Software Maintenance (ICSM)

In current software development, code completion is necessary to enhance productivity of our programming tasks. However, how developers use code completion tools on integrated development environments is still not elucidated completely. Aiming to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Software Engineering and Methodology Volume 25, Issue 1
December 2015
339 pages
ISSN:1049-331X
EISSN:1557-7392
DOI:10.1145/2852270
Editor:
David S. Rosenblum
National University of Singapore, Singapore
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 December 2015
- Accepted: 1 March 2015
- Revised: 1 January 2015
- Received: 1 February 2014
Published in tosem Volume 25, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Content assist
code completion
code recommender
evaluation
integrated development environments
machine learning
productivity
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 51
  Total Citations
  View Citations
- 1,099
  Total Downloads
- Downloads (Last 12 months)52
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Intelligent Code Completion with Bayesian Networks

ACM Transactions on Software Engineering and Methodology

Abstract

References

Cited By

Index Terms

Recommendations

Learning from examples to improve code completion systems

The hidden cost of code completion: understanding the impact of the recommendation-list length on its efficiency

A study on repetitiveness of code completion operations