skip to main content
survey

Software Vulnerability Analysis and Discovery Using Machine-Learning and Data-Mining Techniques: A Survey

Published:25 August 2017Publication History
Skip Abstract Section

Abstract

Software security vulnerabilities are one of the critical issues in the realm of computer security. Due to their potential high severity impacts, many different approaches have been proposed in the past decades to mitigate the damages of software vulnerabilities. Machine-learning and data-mining techniques are also among the many approaches to address this issue. In this article, we provide an extensive review of the many different works in the field of software vulnerability analysis and discovery that utilize machine-learning and data-mining techniques. We review different categories of works in this domain, discuss both advantages and shortcomings, and point out challenges and some uncharted territories in the field.

References

  1. Mithun Acharya, Tao Xie, Jian Pei, and Jun Xu. 2007. Mining API patterns as partial orders from source code: From usage scenarios to specifications. In Proceedings of the the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on Foundations of Software Engineering (ESEC/FSE’07). ACM, 25--34.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Adobe Security Bulletin. 2015. APSA15-05: Security Advisory for Adobe Flash Player. Retrieved from https://helpx.adobe.com/security/products/flash-player/apsa15-05.html.Google ScholarGoogle Scholar
  3. Charu C. Aggarwal and Haixun Wang. 2010. A survey of clustering algorithms for graph data. In Managing and Mining Graph Data. Springer, 275--301. Google ScholarGoogle ScholarCross RefCross Ref
  4. Leman Akoglu, Hanghang Tong, and Danai Koutra. 2015. Graph based anomaly detection and description: A survey. Data Min. Knowl. Discov. 29, 3 (2015), 626--688. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Marcos Alvares, Tshilidzi Marwala, and Fernando Buarque De Lima Neto. 2013. Applications of computational intelligence for static software checking against memory corruption vulnerabilities. In Proceedings of the IEEE Symposium on Computational Intelligence in Cyber Security (CICS’13). IEEE, 59--66. Google ScholarGoogle ScholarCross RefCross Ref
  6. Brad Arkin, Scott Stender, and Gary McGraw. 2005. Software penetration testing. IEEE Security 8 Privacy 3, 1 (2005), 84--87.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Nathaniel Ayewah, David Hovemeyer, J. David Morgenthaler, John Penix, and William Pugh. 2008. Using static analysis to find bugs. IEEE Softw. 25, 5 (2008), 22--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, and Dawson Engler. 2010. A few billion lines of code later: Using static analysis to find bugs in the real world. Commun. ACM (CACM) 53, 2 (2010), 66--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Matt Bishop. 2007. About penetration testing. IEEE Security 8 Privacy 5, 6 (2007), 84--87.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Tim Boland and Paul E Black. 2012. Juliet 1.1 C/C++ and java test suite. Computer 45, 10 (2012), 88--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Amiangshu Bosu, Jeffrey C. Carver, Munawar Hafiz, Patrick Hilley, and Derek Janni. 2014. Identifying the characteristics of vulnerable code changes: An empirical study. In Proceedings of the 22nd ACM International Symposium on Foundations of Software Engineering (FSE’14). ACM, 257--268.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Stephen Breen. 2015. What Do WebLogic, WebSphere, JBoss, Jenkins, OpenNMS, and Your Application Have in Common? This Vulnerability. Retrieved from http://foxglovesecurity.com/2015/11/06/what-do-weblogic-websphere-jboss-jenkins-opennms-and-your-application-have-in-common-this-vulnerability/.Google ScholarGoogle Scholar
  13. Godwin Caruana and Maozhen Li. 2012. A survey of emerging approaches to spam filtering. ACM Comput. Surveys (CSUR) 44, 2 (2012), 9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Cagatay Catal and Banu Diri. 2009. A systematic review of software fault prediction studies. Expert Syst. Appl. 36, 4 (2009), 7346--7354. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM Comput. Surveys (CSUR) 41, 3 (2009), 15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ray Yaung Chang, Andy Podgurski, and Jiong Yang. 2008. Discovering neglected conditions in software by mining dependence graphs. IEEE Trans. Softw. Eng. 34, 5 (2008), 579--596. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hong Cheng, Xifeng Yan, and Jiawei Han. 2014. Mining graph patterns. In Frequent Pattern Mining. Springer, 307--338. Google ScholarGoogle ScholarCross RefCross Ref
  18. Jan Chorowski. 2012. Learning Understandable Classifier Models. Ph.D. Dissertation. University of Louisville.Google ScholarGoogle Scholar
  19. Codenomicon. 2014. The Heartbleed Bug. Retrieved from http://heartbleed.com/.Google ScholarGoogle Scholar
  20. Pedro Domingos. 2012. A few useful things to know about machine learning. Commun. ACM (CACM) 55, 10 (2012), 78--87. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Mark Dowd, John McDonald, and Justin Schuh. 2007. The Art of Software Security Assessment: Identifying and Preventing Software Vulnerabilities. Addison-Wesley Professional.Google ScholarGoogle Scholar
  22. Maureen Doyle and James Walden. 2011. An empirical study of the evolution of PHP web application security. In Proceedings of the 3rd International Workshop on Security Measurements and Metrics (MetriSec’11). IEEE, 11--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou, and Benjamin Chelf. 2001. Bugs as deviant behavior: A general approach to inferring errors in systems code. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP’01). ACM, 57--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. David Evans and David Larochelle. 2002. Improving security using extensible lightweight static analysis. IEEE Software 19, 1 (2002), 42--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Pasquale Foggia, Gennaro Percannella, and Mario Vento. 2014. Graph matching and learning in pattern recognition in the last 10 years. Int. J. Pattern Recogn. Artific. Intell. 28, 1 (2014), 1450001. Google ScholarGoogle ScholarCross RefCross Ref
  26. Alex A. Freitas. 2014. Comprehensible classification models: A position paper. ACM SIGKDD Explor. News. 15, 1 (2014), 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Pedro Garcia-Teodoro, J. Diaz-Verdejo, Gabriel Macia-Fernandez, and Enrique Vazquez. 2009. Anomaly-based network intrusion detection: Techniques, systems and challenges. Comput. Secur. 28, 1 (2009), 18--28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Carrie Gates and Carol Taylor. 2006. Challenging the anomaly detection paradigm: A provocative discussion. In Proceedings of the New Security Paradigms Workshop (NSPW’06). ACM, 21--29.Google ScholarGoogle Scholar
  29. Patrice Godefroid. 2007. Random testing for security: Blackbox vs. whitebox fuzzing. In Proceedings of the 2nd International Workshop on Random Testing (RT’07). ACM, 1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Patrice Godefroid, Michael Y. Levin, and David Molnar. 2012. SAGE: Whitebox fuzzing for security testing. Queue 10, 1 (2012), 20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Gustavo Grieco, Guillermo Luis Grinblat, Lucas Uzal, Sanjay Rawat, Josselin Feist, and Laurent Mounier. 2015. Toward Large-scale Vulnerability Discovery Using Machine Learning. Technical Report. The Free International Center of Information Sciences and Systems (CIFASIS), National Council for Science and Technology of Argentina (CONICET).Google ScholarGoogle Scholar
  32. Natalie Gruska, Andrzej Wasylkowski, and Andreas Zeller. 2010. Learning from 6,000 projects: Lightweight cross-project anomaly detection. In Proceedings of the 19th International Symposium on Software Testing and Analysis (ISSTA’10). ACM, 119--130.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Thiago S. Guzella and Walmir M. Caminhas. 2009. A review of machine-learning approaches to spam filtering. Expert Syst. Appl. 36, 7 (2009), 10206--10222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jiawei Han, Micheline Kamber, and Jian Pei. 2011. Data Mining: Concepts and Techniques (3rd ed.). Morgan Kaufmann.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Sean Heelan. 2011. Vulnerability detection systems: Think cyborg, not robot. IEEE Secur. Privacy 9, 3 (2011), 74--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Caitriona H. Heinl. 2014. Artificial (intelligent) agents and active cyber defence: Policy implications. In Proceedings of the 6th International Conference on Cyber Conflict (CyCon’14). IEEE, 53--66.Google ScholarGoogle ScholarCross RefCross Ref
  37. Aram Hovsepyan, Riccardo Scandariato, Wouter Joosen, and James Walden. 2012. Software vulnerability prediction using text analysis techniques. In Proceedings of the 4th International Workshop on Security Measurements and Metrics (MetriSec’12). ACM, 7--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. IEEE Standards. 1990. IEEE Standard Glossary of Software Engineering Terminology. IEEE Std. 610.12-1990.Google ScholarGoogle Scholar
  39. Ranjit Jhala and Rupak Majumdar. 2009. Software model checking. ACM Comput. Surveys (CSUR’09) 41, 4 (2009), 21.Google ScholarGoogle Scholar
  40. Cem Kaner and Walter P. Bond. 2004. Software engineering metrics: What do they measure and how do we know? In Proceedings of the 10th International Symposium on Software Metrics (METRICS’04). IEEE.Google ScholarGoogle Scholar
  41. Taghi M. Khoshgoftaar, Edward B. Allen, John P. Hudepohl, and Stephen J. Aud. 1997. Application of neural networks to software quality modeling of a very large telecommunications system. IEEE Trans. Neural Netw. 8, 4 (1997), 902--909. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Ivan Victor Krsul. 1998. Software Vulnerability Analysis. Ph.D. Dissertation. Purdue University.Google ScholarGoogle Scholar
  43. William Landi. 1992. Undecidability of static analysis. ACM Lett. Program. Lang. Syst. (LOPLAS) 1, 4 (1992), 323--337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Carl Landwehr. 2008. Cybersecurity and artificial intelligence: From fixing the plumbing to smart water. IEEE Secur. Privacy 6, 5 (2008), 3--4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. James R. Larus, Thomas Ball, Manuvir Das, Robert DeLine, Manuel Fahndrich, Jon Pincus, Sriram K. Rajamani, and Ramanathan Venkatapathy. 2004. Righting software. IEEE Softw. 21, 3 (2004), 92--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436--444. Google ScholarGoogle ScholarCross RefCross Ref
  47. Zhenmin Li and Yuanyuan Zhou. 2005. PR-miner: Automatically extracting implicit programming rules and detecting violations in large software code. In Proceedings of the 10th European Software Engineering Conference Held Jointly with the 13th ACM International Symposium on Foundations of Software Engineering (ESEC/FSE’05). ACM, 306--315.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Benjamin Livshits and Thomas Zimmermann. 2005. DynaMine: Finding common error patterns by mining software revision histories. In Proceedings of the 10th European Software Engineering Conference held Jointly with the 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE’05). ACM, 296--305.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Fan Long and Martin Rinard. 2016. Automatic patch generation by learning correct code. In Proceedings of the 43rd Symposium on Principles of Programming Languages (POPL’16). ACM, 298--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Ying Ma, Guangchun Luo, Xue Zeng, and Aiguo Chen. 2012. Transfer learning for cross-company software defect prediction. Info. Softw. Technol. 54, 3 (2012), 248--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Ruchika Malhotra. 2015. A systematic review of machine-learning techniques for software fault prediction. Appl. Soft Comput. 27 (2015), 504--518. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Iberia Medeiros, Nuno F. Neves, and Miguel Correia. 2014. Automatic detection and correction of web application vulnerabilities using data mining to predict false positives. In Proceedings of the 23rd International Conference on World Wide Web (WWW’14). ACM, 63--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Andrew Meneely, Harshavardhan Srinivasan, Afiqah Musa, Alberto Rodriguez-Tejeda, Matthew Mokary, and Brian Spates. 2013. When a patch goes bad: Exploring the properties of vulnerability-contributing commits. In Proceedings of the ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM’13). IEEE, 65--74.Google ScholarGoogle ScholarCross RefCross Ref
  54. Andrew Meneely and Laurie Williams. 2010. Strengthening the empirical analysis of the relationship between linus’ law and software security. In Proceedings of the ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM’10). ACM, 9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Julie Moeyersoms, Enric Junque de Fortuny, Karel Dejaeger, Bart Baesens, and David Martens. 2015. Comprehensible software fault and effort prediction: A data-mining approach. J. Syst. Softw. 100 (2015), 80--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Benoit Morel. 2011. Artificial intelligence: A key to the future of cybersecurity. In Proceedings of the 4th ACM workshop on Artificial Intelligence and Security (AISec’11). ACM, 93--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Patrick Morrison, Kim Herzig, Brendan Murphy, and Laurie Williams. 2015. Challenges with applying vulnerability prediction models. In Proceedings of the Symposium and Bootcamp on the Science of Security (HotSoS’15). ACM, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Sara Moshtari, Ashkan Sami, and Mahdi Azimi. 2013. Using complexity metrics to improve software security. Comput. Fraud Secur. 2013, 5 (2013), 8--17. Google ScholarGoogle ScholarCross RefCross Ref
  59. Jaechang Nam, Sinno Jialin Pan, and Sunghun Kim. 2013. Transfer defect learning. In Proceedings of the International Conference on Software Engineering (ICSE’13). IEEE, 382--391. Google ScholarGoogle ScholarCross RefCross Ref
  60. Kartik Nayak, Daniel Marino, Petros Efstathopoulos, and Tudor Dumitras. 2014. Some vulnerabilities are different than others. In Proceedings of the 17th International Symposium on Research in Attacks, Intrusions and Defenses (RAID’14). Springer, 426--446. Google ScholarGoogle ScholarCross RefCross Ref
  61. Andy Ozment. 2007. Improving vulnerability discovery models. In Proceedings of the 2007 ACM workshop on Quality of Protection (QoP’07). ACM, 6--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 10 (2010), 1345--1359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Yulei Pang, Xiaozhen Xue, and Akbar Siami Namin. 2015. Predicting vulnerable software components through N-gram analysis and statistical feature selection. In Proceedings of the 14th International Conference on Machine Learning and Applications (ICMLA’15). IEEE, 543--548.Google ScholarGoogle ScholarCross RefCross Ref
  64. Hao Peng, Lili Mou, Ge Li, Yuxuan Liu, Lu Zhang, and Zhi Jin. 2015. Building program vector representations for deep learning. In Proceedings of the 8th International Conference on Knowledge Science, Engineering and Management (KSEM’15). Springer, 547--553. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Henning Perl, Sergej Dechand, Matthew Smith, Daniel Arp, Fabian Yamaguchi, Konrad Rieck, Sascha Fahl, and Yasemin Acar. 2015. VccFinder: Finding potential vulnerabilities in open-source projects to assist code audits. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS’15). ACM, 426--437.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Thomas Reps. 2000. Undecidability of context-sensitive data-dependence analysis. ACM Trans. Program. Lang. Syst. (TOPLAS) 22, 1 (2000), 162--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Stuart Russell and Peter Norvig. 2009. Artificial Intelligence: A Modern Approach (3rd ed.). Pearson.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Alireza Sadeghi, Naeem Esfahani, and Sam Malek. 2014. Mining the categorized software repositories to improve the analysis of security vulnerabilities. In Proceedings of the 17th International Conference on Fundamental Approaches to Software Engineering (FASE’14) Part of the European Joint Conferences on Theory and Practice of Software (ETAPS’14). Springer, 155--169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Arthur L. Samuel. 1959. Some studies in machine learning using the game of checkers. IBM J. Res. Dev. 44, 1 (1959), 210--229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Riccardo Scandariato, James Walden, Aram Hovsepyan, and Wouter Joosen. 2014. Predicting vulnerable software components via text mining. IEEE Trans. Softw. Eng. 40, 10 (2014), 993--1006. Google ScholarGoogle ScholarCross RefCross Ref
  71. Hossain Shahriar and Mohammad Zulkernine. 2012. Mitigating program security vulnerabilities: Approaches and challenges. ACM Comput. Surveys (CSUR) 44, 3 (2012), 11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Lwin Khin Shar, Lionel C Briand, and Hee Beng Kuan Tan. 2015. Web application vulnerability prediction using hybrid program analysis and machine learning. IEEE Trans. Depend. Secure Comput. 12, 6 (2015), 688--707. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Lwin Khin Shar and Hee Beng Kuan Tan. 2012. Predicting common web application vulnerabilities from input validation and sanitization code patterns. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE’12). IEEE, 310--313. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Lwin Khin Shar and Hee Beng Kuan Tan. 2013. Predicting SQL injection and cross site scripting vulnerabilities through mining input sanitization patterns. Info. Softw. Technol. 55, 10 (2013), 1767--1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Lwin Khin Shar, Hee Beng Kuan Tan, and Lionel C. Briand. 2013. Mining SQL injection and cross site scripting vulnerabilities using hybrid program analysis. In Proceedings of the 35th International Conference on Software Engineering (ICSE’13). IEEE, 642--651. Google ScholarGoogle ScholarCross RefCross Ref
  76. Yonghee Shin, Andrew Meneely, Laurie Williams, and Jason Osborne. 2011. Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE Trans. Softw. Eng. 37, 6 (2011), 772--787. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Yonghee Shin and Laurie Williams. 2011. An initial study on the use of execution complexity metrics as indicators of software vulnerabilities. In Proceedings of the 7th International Workshop on Software Engineering for Secure Systems (SESS’11). ACM, 1--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Yonghee Shin and Laurie Williams. 2013. Can traditional fault prediction models be used for vulnerability prediction? Empir. Softw. Eng. 18, 1 (2013), 25--59. Google ScholarGoogle ScholarCross RefCross Ref
  79. Robin Sommer and Vern Paxson. 2010. Outside the closed world: On using machine learning for network intrusion detection. In Proceedings of the 31st IEEE Symposium on Security and Privacy (SP’10). IEEE, 305--316.Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Sherri Sparks, Shawn Embleton, Ryan Cunningham, and Cliff Zou. 2007. Automated vulnerability analysis: Leveraging control flow for evolutionary input crafting. In Proceedings of the 23rd Annual Computer Security Applications Conference (ACSAC’07). IEEE, 477--486.Google ScholarGoogle ScholarCross RefCross Ref
  81. Symantec Security Response. 2014. ShellShock: All you need to know about the Bash Bug vulnerability. Retrieved from http://www.symantec.com/connect/blogs/shellshock-all-you-need-know-about-bash-bug-vulnerability.Google ScholarGoogle Scholar
  82. Yaming Tang, Fei Zhao, Yibiao Yang, Hongmin Lu, Yuming Zhou, and Baowen Xu. 2015. Predicting vulnerable components via text mining or software metrics? An effort-aware perspective. In Proceedings of the International Conference on Software Quality, Reliability and Security (QRS’15). IEEE, 27--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Suresh Thummalapenta and Tao Xie. 2009. Alattin: Mining alternative patterns for detecting neglected conditions. In Proceedings of the 24th IEEE/ACM International Conference on Automated Software Engineering (ASE’09). IEEE, 283--294.Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Enn Tyugu. 2011. Artificial intelligence in cyber defense. In Proceedings of the 3rd International Conference on Cyber Conflict (CyCon’11). IEEE, 1--11.Google ScholarGoogle Scholar
  85. US-CERT. 2013. Oracle Java Contains Multiple Vulnerabilities. Retrieved from https://www.us-cert.gov/ncas/alerts/TA13-064A.Google ScholarGoogle Scholar
  86. US-CERT. 2015. Adobe Flash and Microsoft Windows Vulnerabilities. Retrieved from https://www.us-cert.gov/ncas/alerts/TA15-195A.Google ScholarGoogle Scholar
  87. Anneleen Van-Assche and Hendrik Blockeel. 2007. Seeing the forest through the trees: Learning a comprehensible model from an ensemble. In Machine Learning: Proceedings of the 18th European Conference on Machine Learning (ECML’07) (Lecture Notes in Computer Science (LNCS)). Springer, Berlin, 418--429.Google ScholarGoogle Scholar
  88. James Walden, Jeffrey Stuckman, and Riccardo Scandariato. 2014. Predicting vulnerable components: Software metrics vs text mining. In Proceedings of the 25th International Symposium on Software Reliability Engineering (ISSRE’14). IEEE, 23--33.Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Andrzej Wasylkowski, Andreas Zeller, and Christian Lindig. 2007. Detecting object usage anomalies. In Proceedings of the the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE’07). ACM, 35--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Dumidu Wijayasekara, Milos Manic, and Miles McQueen. 2014. Vulnerability identification and classification via text mining bug databases. In Proceedings of the 40th Annual Conference of the IEEE Industrial Electronics Society (IECON’14). IEEE, 3612--3618. Google ScholarGoogle ScholarCross RefCross Ref
  91. Dumidu Wijayasekara, Milos Manic, Jason L. Wright, and Miles McQueen. 2012. Mining bug databases for unidentified software vulnerabilities. In Proceedings of the 5th International Conference on Human System Interactions (HSI’12). IEEE, 89--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Tao Xie and Jian Pei. 2006. MAPO: Mining API usages from open source repositories. In Proceedings of the International Workshop on Mining Software Repositories (MSR’06). ACM, 54--57.Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Yichen Xie, Mayur Naik, Brian Hackett, and Alex Aiken. 2005. Soundness and its role in bug detection systems. In Proceedings of the Workshop on the Evaluation of Software Defect Detection Tools (BUGS’05).Google ScholarGoogle Scholar
  94. Fabian Yamaguchi, Nico Golde, Daniel Arp, and Konrad Rieck. 2014. Modeling and discovering vulnerabilities with code property graphs. In Proceedings of the 35th IEEE Symposium on Security and Privacy (SP’14). IEEE, 590--604. Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. Fabian Yamaguchi, Felix Lindner, and Konrad Rieck. 2011. Vulnerability extrapolation: Assisted discovery of vulnerabilities using machine learning. In Proceedings of the 5th USENIX Workshop on Offensive Technologies. USENIX Association.Google ScholarGoogle Scholar
  96. Fabian Yamaguchi, Markus Lottmann, and Konrad Rieck. 2012. Generalized vulnerability extrapolation using abstract syntax trees. In Proceedings of the 28th Annual Computer Security Applications Conference (ACSAC’12). ACM, 359--368. Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Fabian Yamaguchi, Alwin Maier, Hugo Gascon, and Konrad Rieck. 2015. Automatic inference of search patterns for taint-style vulnerabilities. In Proceedings of the 36th IEEE Symposium on Security and Privacy (SP’15). IEEE, 797--812. Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. Fabian Yamaguchi, Christian Wressnegger, Hugo Gascon, and Konrad Rieck. 2013. Chucky: Exposing missing checks in source code for vulnerability discovery. In Proceedings of the 20th ACM SIGSAC Conference on Computer 8 Communications Security (CCS’13). ACM, 499--510.Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Xinli Yang, David Lo, Xin Xia, Yun Zhang, and Jianling Sun. 2015. Deep learning for just-in-time defect prediction. In Proceedings of the International Conference on Software Quality, Reliability and Security (QRS’15). IEEE, 17--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. Awad Younis, Yashwant Malaiya, Charles Anderson, and Indrajit Ray. 2016. To fear or not to fear that is the question: Code characteristics of a vulnerable function with an existing exploit. In Proceedings of the 6th ACM Conference on Data and Application Security and Privacy (CODASPY’16). ACM, 97--104.Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Andreas Zeller, Thomas Zimmermann, and Christian Bird. 2011. Failure is a four-letter word: A parody in empirical research. In Proceedings of the 7th International Conference on Predictive Models in Software Engineering. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. Chenfeng Vincent Zhou, Christopher Leckie, and Shanika Karunasekera. 2010. A survey of coordinated attacks and collaborative intrusion detection. Comput. Secur. 29, 1 (2010), 124--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. Thomas Zimmermann, Nachiappan Nagappan, and Laurie Williams. 2010. Searching for a needle in a haystack: Predicting security vulnerabilities for windows vista. In Proceedings of the 3rd International Conference on Software Testing, Verification and Validation (ICST’10). IEEE, 421--428.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Software Vulnerability Analysis and Discovery Using Machine-Learning and Data-Mining Techniques: A Survey

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Computing Surveys
            ACM Computing Surveys  Volume 50, Issue 4
            July 2018
            531 pages
            ISSN:0360-0300
            EISSN:1557-7341
            DOI:10.1145/3135069
            • Editor:
            • Sartaj Sahni
            Issue’s Table of Contents

            Copyright © 2017 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 25 August 2017
            • Accepted: 1 May 2017
            • Revised: 1 April 2017
            • Received: 1 August 2016
            Published in csur Volume 50, Issue 4

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • survey
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader