Abstract
Code completion is an integral part of modern Integrated Development Environments (IDEs). Developers often use it to explore Application Programming Interfaces (APIs). It is also useful to reduce the required amount of typing and to help avoid typos. Traditional code completion systems propose all type-correct methods to the developer. Such a list is often very long with many irrelevant items. More intelligent code completion systems have been proposed in prior work to reduce the list of proposed methods to relevant items.
This work extends one of these existing approaches, the Best Matching Neighbor (BMN) algorithm. We introduce Bayesian networks as an alternative underlying model, use additional context information for more precise recommendations, and apply clustering techniques to improve model sizes. We compare our new approach, Pattern-based Bayesian Networks (PBN), to the existing BMN algorithm. We extend previously used evaluation methodologies and, in addition to prediction quality, we also evaluate model size and inference speed.
Our results show that the additional context information we collect improves prediction quality, especially for queries that do not contain method calls. We also show that PBN can obtain comparable prediction quality to BMN, while model size and inference speed scale better with large input sizes.
- Marcel Bruch and Mira Mezini. 2008. Improving code recommender systems using Boolean factor analysis and graphical models. In Proceedings of the International Workshop on Recommendation Systems for Software Engineering (RSSE'08). ACM Press, New York. Google ScholarDigital Library
- Marcel Bruch, Martin Monperrus, and Mira Mezini. 2009. Learning from examples to improve code completion systems. In Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering (ESEC/FSE'09). ACM Press, New York, 213--222. Google ScholarDigital Library
- Marcel Bruch, Thorsten Schafer, and Mira Mezini. 2006. FrUiT: IDE support for framework understanding. In Proceedings of the OOPSLA Workshop on Eclipse Technology eXchange (Eclipse'06). ACM Press, New York, 55--59. Google ScholarDigital Library
- Raymond P. L. Buse and Westley Weimer. 2012. Synthesizing API usage examples. In Proceedings of the International Conference on Software Engineering (ICSE'12). IEEE Press, 782--792. Google ScholarDigital Library
- Olivier Chapelle and Ya Zhang. 2009. A dynamic Bayesian network click model for Web search ranking. In Proceedings of the 18th International Conference on World Wide Web (WWW'09). ACM Press, New York, 1--10. Google ScholarDigital Library
- Stanley F. Chen and Joshua Goodman. 1996. An empirical study of smoothing techniques for language modeling. In Proceedings of the 34th Annual Meeting on Association for Computational Linguistics (ACL'96). Association for Computational Linguistics, 310--318. Google ScholarDigital Library
- Thomas Cover and Peter Hart. 2006. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 1, 21--27. Google ScholarDigital Library
- Tihomir Gvero, Viktor Kuncak, Ivan Kuraj, and Ruzica Piskac. 2013. Complete completion using types and weights. In Proceedings of the 34th Conference on Programming Language Design and Implementation (PLDI'13). ACM Press, New York, 27--38. Google ScholarDigital Library
- Lars Heinemann, Veronika Bauer, Markus Herrmannsdoerfer, and Benjamin Hummel. 2012. Identifier-based context-dependent API method recommendation. In Proceedings of the 16th European Conference on Software Maintenance and Reengineering (CSMR'12). IEEE, 31--40. Google ScholarDigital Library
- Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the naturalness of software. In Proceedings of the International Conference on Software Engineering (ICSE'12). IEEE Press, 837--847. Google ScholarDigital Library
- Zhenmin Li and Yuanyuan Zhou. 2005. PR-Miner: Automatically extracting implicit programming rules and detecting violations in large software code. In Proceedings of the 10th European Software Engineering Conference Held Jointly with the 13th International Symposium on The Foundations of Software Engineering (ESEC/FSE'05). ACM Press, New York, 306--315. Google ScholarDigital Library
- Benjamin Livshits and Thomas Zimmermann. 2005. DynaMine: Finding common error patterns by mining software revision histories. In Proceedings of the 10th European Software Engineering Conference Held Jointly with the 13th International Symposium on The Foundations of Software Engineering (ESEC/FSE'05). ACM Press, New York, 296--305. Google ScholarDigital Library
- Robert Cecil Martin. 2003. Agile Software Development: Principles, Patterns, and Practices. Prentice Hall, PTR, Upper Saddle River, NJ. Google ScholarDigital Library
- Andrew Mccallum, Kamal Nigam, and Lyle H. Ungar. 2000. Efficient clustering of high-dimensional data sets with application to reference matching. In Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining (KDD'00). ACM Press, New York, 169--178. Google ScholarDigital Library
- Amir Michail. 2000. Data mining library reuse patterns using generalized association rules. In Proceedings of the 22nd International Conference on Software Engineering (ICSE'00). ACM Press, New York, 167--176. Google ScholarDigital Library
- Martin Monperrus, Marcel Bruch, and Mira Mezini. 2010. Detecting missing method calls in object-oriented software. In Proceedings of the 24th European Conference on Object-Oriented Programming (ECOOP'10). 2--25. Google ScholarDigital Library
- Meiyappan Nagappan, Thomas Zimmermann, and Christian Bird. 2013. Diversity in software engineering research. In Proceedings of the 9th Joint Meeting of the European Software Engineering Conference and the Symposium on The Foundations of Software Engineering (ESEC/FSE'13). ACM Press, New York, 466--476. Google ScholarDigital Library
- Anh Tuan Nguyen, Tung Thanh Nguyen, Hoan Anh Nguyen, Ahmed Tamrawi, Hung Viet Nguyen, Jafar Al-Kofahi, and Tien N. Nguyen. 2012. Graph-based pattern-oriented, context-sensitive source code completion. In Proceedings of the International Conference on Software Engineering (ICSE'12). IEEE Press, 69--79. Google ScholarDigital Library
- Tung Thanh Nguyen, Hoan Anh Nguyen, Nam H. Pham, Jafar M. Al-Kofahi, and Tien N. Nguyen. 2009. Graph-based mining of multiple object usage patterns. In Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the Symposium on The Foundations of Software Engineering (ESEC/FSE'09). ACM Press, New York, 383--392. Google ScholarDigital Library
- Jakob Nielsen. 1994. Usability Engineering. Elsevier, Amsterdam.Google Scholar
- Sebastian Proksch, Sven Amann, and Mira Mezini. 2014. Towards standardized evaluation of developer-assistance tools. In Proceedings of the 4th International Workshop on Recommendation Systems for Software Engineering (RSSE'14). ACM Press, New York, 14--18. Google ScholarDigital Library
- Irina Rish. 2001. An empirical study of the naive Bayes classifier. In Proceedings of the Workshop on Empirical Methods in Artificial Intelligence (IJCAI'01). IBM, New York, 41--46.Google Scholar
- Martin P. Robillard, Eric Bodden, David Kawrykow, Mira Mezini, and Tristan Ratchford. 2013. Automated API property inference techniques. IEEE Trans. Softw. Engin. 39, 5, 613--637. Google ScholarDigital Library
- J. Michael Schultz and Mark Liberman. 1999. Topic detection and tracking using idf-weighted cosine coefficient. In Proceedings of the DARPA Broadcast News Workshop. Morgan Kaufmann Publishers, 189--192.Google Scholar
- Olin Shivers. 1988. Control flow analysis in scheme. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI'88). ACM Press, New York, 164--174. Google ScholarDigital Library
- Olin Shivers. 1991a. Data-flow analysis and type recovery in scheme. In Topics in Advanced Language Implementation. The MIT Press, Cambridge, MA.Google Scholar
- Olin Shivers. 1991b. The semantics of scheme control-flow analysis. In Proceedings of the ACM SIGPLAN Symposium on Partial Evaluation and Semantics-Based Program Manipulation (PEPM'91). ACM Press, New York, 190--198. Google ScholarDigital Library
- Alexander Strehl, Joydeep Ghosh, and Raymond Mooney. 2000. Impact of similarity measures on web-page clustering. In Proceedings of the Workshop on Artificial Intelligence for Web Search (AAAI'00). 58--64.Google Scholar
- Xiwang Yang, Yang Guo, and Yong Liu. 2011. Bayesian-inference based recommendation in online social networks. In Proceedings of the INFOCOM Conference (INFOCOM'11). 551--555.Google ScholarCross Ref
- Cheng Zhang, Juyuan Yang, Yi Zhang, Jing Fan, Xin Zhang, Jianjun Zhao, and Peizhao Ou. 2012. Automatic parameter recommendation for practical API usage. In Proceedings of the International Conference on Software Engineering (ICSE'12). IEEE Press, 826--836. Google ScholarDigital Library
- Hao Zhong, Lu Zhang, and Hong Mei. 2008. Inferring specifications of object oriented APIs from API source code. In Proceedings of the 15th Asia-Pacific Software Engineering Conference (APSEC'08). IEEE Computer Society, 221--228. Google ScholarDigital Library
Index Terms
- Intelligent Code Completion with Bayesian Networks
Recommendations
Learning from examples to improve code completion systems
ESEC/FSE '09: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineeringThe suggestions made by current IDE's code completion features are based exclusively on static type system of the programming language. As a result, often proposals are made which are irrelevant for a particular working context. Also, these suggestions ...
The hidden cost of code completion: understanding the impact of the recommendation-list length on its efficiency
MSR '18: Proceedings of the 15th International Conference on Mining Software RepositoriesAutomatic code completion is a useful and popular technique that software developers use to write code more effectively and efficiently. However, while the benefits of code completion are clear, its cost is yet not well understood. We hypothesize the ...
A study on repetitiveness of code completion operations
ICSM '12: Proceedings of the 2012 IEEE International Conference on Software Maintenance (ICSM)In current software development, code completion is necessary to enhance productivity of our programming tasks. However, how developers use code completion tools on integrated development environments is still not elucidated completely. Aiming to ...
Comments