skip to main content
research-article
Open Access

Improving bug detection via context-based code representation learning and attention-based neural networks

Published:10 October 2019Publication History
Skip Abstract Section

Abstract

Bug detection has been shown to be an effective way to help developers in detecting bugs early, thus, saving much effort and time in software development process. Recently, deep learning-based bug detection approaches have gained successes over the traditional machine learning-based approaches, the rule-based program analysis approaches, and mining-based approaches. However, they are still limited in detecting bugs that involve multiple methods and suffer high rate of false positives. In this paper, we propose a combination approach with the use of contexts and attention neural network to overcome those limitations. We propose to use as the global context the Program Dependence Graph (PDG) and Data Flow Graph (DFG) to connect the method under investigation with the other relevant methods that might contribute to the buggy code. The global context is complemented by the local context extracted from the path on the AST built from the method’s body. The use of PDG and DFG enables our model to reduce the false positive rate, while to complement for the potential reduction in recall, we make use of the attention neural network mechanism to put more weights on the buggy paths in the source code. That is, the paths that are similar to the buggy paths will be ranked higher, thus, improving the recall of our model. We have conducted several experiments to evaluate our approach on a very large dataset with +4.973M methods in 92 different project versions. The results show that our tool can have a relative improvement up to 160% on F-score when comparing with the state-of-the-art bug detection approaches. Our tool can detect 48 true bugs in the list of top 100 reported bugs, which is 24 more true bugs when comparing with the baseline approaches. We also reported that our representation is better suitable for bug detection and relatively improves over the other representations up to 206% in accuracy.

Skip Supplemental Material Section

Supplemental Material

a162-li.webm

webm

89.8 MB

References

  1. 2019. The GitHub Repository for This Study. (2019). https://github.com/OOPSLA-2019-BugDetection/OOPSLA-2019-BugDetectionGoogle ScholarGoogle Scholar
  2. Miltiadis Allamanis, Hao Peng, and Charles A. Sutton. 2016. A Convolutional Attention Network for Extreme Summarization of Source Code. CoRR abs/1602.03001 (2016). arXiv: 1602.03001 http://arxiv.org/abs/1602.03001Google ScholarGoogle Scholar
  3. Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2018. code2vec: Learning Distributed Representations of Code. CoRR abs/1803.09473 (2018). arXiv: 1803.09473 http://arxiv.org/abs/1803.09473Google ScholarGoogle Scholar
  4. Matthew Amodio, Swarat Chaudhuri, and Thomas W. Reps. 2017. Neural Attribute Machines for Program Generation. CoRR abs/1705.09231 (2017). arXiv: 1705.09231 http://arxiv.org/abs/1705.09231Google ScholarGoogle Scholar
  5. Nathaniel Ayewah, William Pugh, J David Morgenthaler, John Penix, and YuQian Zhou. 2007. Evaluating static analysis defect warnings on production software. In Proceedings of the 7th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering. ACM, 1–8.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Sahil Bhatia and Rishabh Singh. 2016. Automated Correction for Syntax Errors in Programming Assignments using Recurrent Neural Networks. CoRR abs/1603.06129 (2016). arXiv: 1603.06129 http://arxiv.org/abs/1603.06129Google ScholarGoogle Scholar
  7. Pan Bian, Bin Liang, Wenchang Shi, Jianjun Huang, and Yan Cai. 2018. NAR-miner: Discovering Negative Association Rules from Code for Bug Detection. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018). ACM, New York, NY, USA, 411–422. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Pavol Bielik, Veselin Raychev, and Martin Vechev. 2016. PHOG: Probabilistic Model for Code. In Proceedings of The 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research), Maria Florina Balcan and Kilian Q. Weinberger (Eds.), Vol. 48. PMLR, New York, New York, USA, 2933–2942. http://proceedings.mlr.press/v48/ bielik16.htmlGoogle ScholarGoogle Scholar
  9. Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. CoRR abs/1406.1078 (2014). arXiv: 1406.1078 http://arxiv.org/abs/1406.1078Google ScholarGoogle Scholar
  10. Brian Cole, Daniel Hakim, David Hovemeyer, Reuven Lazarus, William Pugh, and Kristin Stephens. 2006. Improving Your Software Using Static Analysis to Find Bugs. In Companion to the 21st ACM SIGPLAN Symposium on Objectoriented Programming Systems, Languages, and Applications (OOPSLA ’06). ACM, New York, NY, USA, 673–674. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Yann Le Cun, Conrad C. Galland, and Geoffrey E. Hinton. 1989. Advances in Neural Information Processing Systems 1. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, Chapter GEMINI: Gradient Estimation Through Matrix Inversion After Noise Injection, 141–148. http://dl.acm.org/citation.cfm?id=89851.89868Google ScholarGoogle Scholar
  12. Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou, and Benjamin Chelf. 2001. Bugs As Deviant Behavior: A General Approach to Inferring Errors in Systems Code. SIGOPS Oper. Syst. Rev. 35, 5 (Oct. 2001), 57–72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jeanne Ferrante, Karl J. Ottenstein, and Joe D. Warren. 1987. The Program Dependence Graph and Its Use in Optimization. ACM Trans. Program. Lang. Syst. 9, 3 (July 1987), 319–349. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning for Networks. CoRR abs/1607.00653 (2016). arXiv: 1607.00653 http://arxiv.org/abs/1607.00653Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Natalie Gruska, Andrzej Wasylkowski, and Andreas Zeller. 2010. Learning from 6,000 Projects: Lightweight Cross-project Anomaly Detection. In Proceedings of the 19th International Symposium on Software Testing and Analysis (ISSTA ’10). ACM, New York, NY, USA, 119–130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jordan Henkel, Shuvendu Lahiri, Ben Liblit, and Thomas W. Reps. 2018. Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces. CoRR abs/1803.06686 (2018). arXiv: 1803.06686 http://arxiv.org/abs/1803.06686Google ScholarGoogle Scholar
  17. Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the Naturalness of Software. In Proceedings of the 34th International Conference on Software Engineering (ICSE ’12). IEEE Press, Piscataway, NJ, USA, 837–847. http://dl.acm.org/citation.cfm?id=2337223.2337322Google ScholarGoogle ScholarCross RefCross Ref
  18. David Hovemeyer and William Pugh. 2007. Finding More Null Pointer Bugs, but Not Too Many. In Proceedings of the 7th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE ’07). ACM, New York, NY, USA, 9–14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Guoliang Jin, Linhai Song, Xiaoming Shi, Joel Scherpelz, and Shan Lu. 2012. Understanding and Detecting Real-world Performance Bugs. SIGPLAN Not. 47, 6 (June 2012), 77–88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Gary A Kildall. 1973. A unified approach to global program optimization. In Proceedings of the 1st annual ACM SIGACT-SIGPLAN symposium on Principles of programming languages. ACM, 194–206.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Hyeji Kim, Yihan Jiang, Sreeram Kannan, Sewoong Oh, and Pramod Viswanath. 2018. Deepcode: Feedback Codes via Deep Learning. CoRR abs/1807.00801 (2018). arXiv: 1807.00801 http://arxiv.org/abs/1807.00801Google ScholarGoogle Scholar
  22. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097–1105.Google ScholarGoogle Scholar
  23. Liuqing Li, He Feng, Wenjie Zhuang, Na Meng, and Barbara Ryder. 2017. CCLearner: A Deep Learning-Based Clone Detection Approach. In 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME). 249–260. Google ScholarGoogle ScholarCross RefCross Ref
  24. Zhenmin Li and Yuanyuan Zhou. 2005. PR-Miner: Automatically Extracting Implicit Programming Rules and Detecting Violations in Large Software Code. SIGSOFT Softw. Eng. Notes 30, 5 (Sept. 2005), 306–315. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Bin Liang, Pan Bian, Yan Zhang, Wenchang Shi, Wei You, and Yan Cai. 2016. AntMiner: Mining More Bugs by Reducing Noise Interference. In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). 333–344. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Benjamin Livshits and Thomas Zimmermann. 2005. DynaMine: Finding Common Error Patterns by Mining Software Revision Histories. SIGSOFT Softw. Eng. Notes 30, 5 (Sept. 2005), 296–305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Distributed Representations of Words and Phrases and their Compositionality. CoRR abs/1310.4546 (2013). arXiv: 1310.4546 http://arxiv.org/abs/1310.4546Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013b. Distributed Representations of Words and Phrases and their Compositionality. In 27th Annual Conference on Neural Information Processing Systems 2013 (NIPS’13). 3111–3119.Google ScholarGoogle Scholar
  29. Audris Mockus and Lawrence G Votta. 2000. Identifying Reasons for Software Changes using Historic Databases.. In icsm. 120–130.Google ScholarGoogle Scholar
  30. Lili Mou, Ge Li, Zhi Jin, Lu Zhang, and Tao Wang. 2014. TBCNN: A Tree-Based Convolutional Neural Network for Programming Language Processing. CoRR abs/1409.5718 (2014). arXiv: 1409.5718 http://arxiv.org/abs/1409.5718Google ScholarGoogle Scholar
  31. Jaechang Nam and Sunghun Kim. 2015. Heterogeneous Defect Prediction. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). ACM, New York, NY, USA, 508–519. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Hoan Anh Nguyen, Tung Thanh Nguyen, Nam H. Pham, Jafar M. Al-Kofahi, and Tien N. Nguyen. 2009a. Accurate and Efficient Structural Characteristic Feature Extraction for Clone Detection. In Proceedings of the 12th International Conference on Fundamental Approaches to Software Engineering: Held As Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009 (FASE’09). Springer-Verlag, 440–455.Google ScholarGoogle Scholar
  33. Tung Thanh Nguyen, Hoan Anh Nguyen, Nam H. Pham, Jafar M. Al-Kofahi, and Tien N. Nguyen. 2009b. Graph-based Mining of Multiple Object Usage Patterns. In Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering (ESEC/FSE ’09). ACM, New York, NY, USA, 383–392. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Oswaldo Olivo, Isil Dillig, and Calvin Lin. 2015. Static Detection of Asymptotic Performance Bugs in Collection Traversals. SIGPLAN Not. 50, 6 (June 2015), 369–378. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jibesh Patra and Michael Pradel. 2016. Learning to Fuzz: Application-Independent Fuzz Testing with Probabilistic, Generative Models of Input Data.Google ScholarGoogle Scholar
  36. Michael Pradel and Koushik Sen. 2018. DeepBugs: A Learning Approach to Name-based Bug Detection. CoRR abs/1805.11683 (2018). arXiv: 1805.11683 http://arxiv.org/abs/1805.11683Google ScholarGoogle Scholar
  37. Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli, and Premkumar Devanbu. 2016. On the" naturalness" of buggy code. In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). IEEE, 428–439.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu. 2014. A large scale study of programming languages and code quality in github. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 155–165.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Randy Smith and Susan Horwitz. 2009. Detecting and Measuring Similarity in Code Clones.Google ScholarGoogle Scholar
  40. Soot. [n. d.]. Soot Introduction. https://sable.github.io/soot/ . ([n. d.]). Last Accessed July 11, 2019.Google ScholarGoogle Scholar
  41. Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. CoRR abs/1503.00075 (2015). arXiv: 1503.00075 http://arxiv.org/abs/1503.00075Google ScholarGoogle Scholar
  42. John Toman and Dan Grossman. 2017. Taming the Static Analysis Beast. In 2nd Summit on Advances in Programming Languages (SNAPL 2017) (Leibniz International Proceedings in Informatics (LIPIcs)), Benjamin S. Lerner, Rastislav Bodík, and Shriram Krishnamurthi (Eds.), Vol. 71. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 18:1–18:14. Google ScholarGoogle ScholarCross RefCross Ref
  43. Michele Tufano, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, Martin White, and Denys Poshyvanyk. 2018. Deep Learning Similarities from Different Representations of Source Code. In Proceedings of the 15th International Conference on Mining Software Repositories (MSR ’18). ACM, New York, NY, USA, 542–553. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. CoRR abs/1706.03762 (2017). arXiv: 1706.03762 http://arxiv.org/abs/1706.03762Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. WALA. [n. d.]. WALA Documentation. http://wala.sourceforge.net/wiki/index.php/Main_Page . ([n. d.]). Last Accessed July 11, 2019.Google ScholarGoogle Scholar
  46. Song Wang, Devin Chollak, Dana Movshovitz-Attias, and Lin Tan. 2016a. Bugram: Bug Detection with N-gram Language Models. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE 2016). ACM, New York, NY, USA, 708–719. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Song Wang, Taiyue Liu, and Lin Tan. 2016b. Automatically Learning Semantic Features for Defect Prediction. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16). ACM, New York, NY, USA, 297–308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Andrzej Wasylkowski, Andreas Zeller, and Christian Lindig. 2007. Detecting Object Usage Anomalies. In Proceedings of the the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering (ESEC-FSE ’07). ACM, New York, NY, USA, 35–44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Martin White, Michele Tufano, Christopher Vendome, and Denys Poshyvanyk. 2016. Deep Learning Code Fragments for Code Clone Detection. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE 2016). ACM, New York, NY, USA, 87–98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Wenpeng Yin, Hinrich Schütze, Bing Xiang, and Bowen Zhou. 2015. ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs. CoRR abs/1512.05193 (2015). arXiv: 1512.05193 http://arxiv.org/abs/1512.05193Google ScholarGoogle Scholar
  51. Edward Yourdon. 1975. Structured Programming and Structured Design As Art Forms. In Proceedings of the May 19-22, 1975, National Computer Conference and Exposition (AFIPS ’75). ACM, New York, NY, USA, 277–277. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Gang Zhao and Jeff Huang. 2018. DeepSim: Deep Learning Code Functional Similarity. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018). ACM, New York, NY, USA, 141–151. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Improving bug detection via context-based code representation learning and attention-based neural networks

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader