survey

Software Vulnerability Analysis and Discovery Using Machine-Learning and Data-Mining Techniques: A Survey

Authors:
Seyed Mohammad Ghaffarian

Amirkabir University of Technology, Tehran, Islamic Republic of Iran

Amirkabir University of Technology, Tehran, Islamic Republic of Iran

0000-0002-7302-3460
View Profile

,
Hamid Reza Shahriari

Amirkabir University of Technology, Tehran, Islamic Republic of Iran

Amirkabir University of Technology, Tehran, Islamic Republic of Iran
View Profile

Authors Info & Claims

ACM Computing Surveys Volume 50 Issue 4Article No.: 56pp 1–36https://doi.org/10.1145/3092566

Published:25 August 2017Publication History

ACM Computing Surveys

Abstract

Software security vulnerabilities are one of the critical issues in the realm of computer security. Due to their potential high severity impacts, many different approaches have been proposed in the past decades to mitigate the damages of software vulnerabilities. Machine-learning and data-mining techniques are also among the many approaches to address this issue. In this article, we provide an extensive review of the many different works in the field of software vulnerability analysis and discovery that utilize machine-learning and data-mining techniques. We review different categories of works in this domain, discuss both advantages and shortcomings, and point out challenges and some uncharted territories in the field.

References

Mithun Acharya, Tao Xie, Jian Pei, and Jun Xu. 2007. Mining API patterns as partial orders from source code: From usage scenarios to specifications. In Proceedings of the the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on Foundations of Software Engineering (ESEC/FSE’07). ACM, 25--34.Google ScholarDigital Library
Adobe Security Bulletin. 2015. APSA15-05: Security Advisory for Adobe Flash Player. Retrieved from https://helpx.adobe.com/security/products/flash-player/apsa15-05.html.Google Scholar
Charu C. Aggarwal and Haixun Wang. 2010. A survey of clustering algorithms for graph data. In Managing and Mining Graph Data. Springer, 275--301. Google ScholarCross Ref
Leman Akoglu, Hanghang Tong, and Danai Koutra. 2015. Graph based anomaly detection and description: A survey. Data Min. Knowl. Discov. 29, 3 (2015), 626--688. Google ScholarDigital Library
Marcos Alvares, Tshilidzi Marwala, and Fernando Buarque De Lima Neto. 2013. Applications of computational intelligence for static software checking against memory corruption vulnerabilities. In Proceedings of the IEEE Symposium on Computational Intelligence in Cyber Security (CICS’13). IEEE, 59--66. Google ScholarCross Ref
Brad Arkin, Scott Stender, and Gary McGraw. 2005. Software penetration testing. IEEE Security 8 Privacy 3, 1 (2005), 84--87.Google ScholarDigital Library
Nathaniel Ayewah, David Hovemeyer, J. David Morgenthaler, John Penix, and William Pugh. 2008. Using static analysis to find bugs. IEEE Softw. 25, 5 (2008), 22--29. Google ScholarDigital Library
Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, and Dawson Engler. 2010. A few billion lines of code later: Using static analysis to find bugs in the real world. Commun. ACM (CACM) 53, 2 (2010), 66--75. Google ScholarDigital Library
Matt Bishop. 2007. About penetration testing. IEEE Security 8 Privacy 5, 6 (2007), 84--87.Google ScholarDigital Library
Tim Boland and Paul E Black. 2012. Juliet 1.1 C/C++ and java test suite. Computer 45, 10 (2012), 88--90. Google ScholarDigital Library
Amiangshu Bosu, Jeffrey C. Carver, Munawar Hafiz, Patrick Hilley, and Derek Janni. 2014. Identifying the characteristics of vulnerable code changes: An empirical study. In Proceedings of the 22nd ACM International Symposium on Foundations of Software Engineering (FSE’14). ACM, 257--268.Google ScholarDigital Library
Stephen Breen. 2015. What Do WebLogic, WebSphere, JBoss, Jenkins, OpenNMS, and Your Application Have in Common? This Vulnerability. Retrieved from http://foxglovesecurity.com/2015/11/06/what-do-weblogic-websphere-jboss-jenkins-opennms-and-your-application-have-in-common-this-vulnerability/.Google Scholar
Godwin Caruana and Maozhen Li. 2012. A survey of emerging approaches to spam filtering. ACM Comput. Surveys (CSUR) 44, 2 (2012), 9.Google ScholarDigital Library
Cagatay Catal and Banu Diri. 2009. A systematic review of software fault prediction studies. Expert Syst. Appl. 36, 4 (2009), 7346--7354. Google ScholarDigital Library
Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM Comput. Surveys (CSUR) 41, 3 (2009), 15.Google ScholarDigital Library
Ray Yaung Chang, Andy Podgurski, and Jiong Yang. 2008. Discovering neglected conditions in software by mining dependence graphs. IEEE Trans. Softw. Eng. 34, 5 (2008), 579--596. Google ScholarDigital Library
Hong Cheng, Xifeng Yan, and Jiawei Han. 2014. Mining graph patterns. In Frequent Pattern Mining. Springer, 307--338. Google ScholarCross Ref
Jan Chorowski. 2012. Learning Understandable Classifier Models. Ph.D. Dissertation. University of Louisville.Google Scholar
Codenomicon. 2014. The Heartbleed Bug. Retrieved from http://heartbleed.com/.Google Scholar
Pedro Domingos. 2012. A few useful things to know about machine learning. Commun. ACM (CACM) 55, 10 (2012), 78--87. Google ScholarDigital Library
Mark Dowd, John McDonald, and Justin Schuh. 2007. The Art of Software Security Assessment: Identifying and Preventing Software Vulnerabilities. Addison-Wesley Professional.Google Scholar
Maureen Doyle and James Walden. 2011. An empirical study of the evolution of PHP web application security. In Proceedings of the 3rd International Workshop on Security Measurements and Metrics (MetriSec’11). IEEE, 11--20. Google ScholarDigital Library
Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou, and Benjamin Chelf. 2001. Bugs as deviant behavior: A general approach to inferring errors in systems code. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP’01). ACM, 57--72. Google ScholarDigital Library
David Evans and David Larochelle. 2002. Improving security using extensible lightweight static analysis. IEEE Software 19, 1 (2002), 42--51. Google ScholarDigital Library
Pasquale Foggia, Gennaro Percannella, and Mario Vento. 2014. Graph matching and learning in pattern recognition in the last 10 years. Int. J. Pattern Recogn. Artific. Intell. 28, 1 (2014), 1450001. Google ScholarCross Ref
Alex A. Freitas. 2014. Comprehensible classification models: A position paper. ACM SIGKDD Explor. News. 15, 1 (2014), 1--10. Google ScholarDigital Library
Pedro Garcia-Teodoro, J. Diaz-Verdejo, Gabriel Macia-Fernandez, and Enrique Vazquez. 2009. Anomaly-based network intrusion detection: Techniques, systems and challenges. Comput. Secur. 28, 1 (2009), 18--28.Google ScholarDigital Library
Carrie Gates and Carol Taylor. 2006. Challenging the anomaly detection paradigm: A provocative discussion. In Proceedings of the New Security Paradigms Workshop (NSPW’06). ACM, 21--29.Google Scholar
Patrice Godefroid. 2007. Random testing for security: Blackbox vs. whitebox fuzzing. In Proceedings of the 2nd International Workshop on Random Testing (RT’07). ACM, 1.Google ScholarDigital Library
Patrice Godefroid, Michael Y. Levin, and David Molnar. 2012. SAGE: Whitebox fuzzing for security testing. Queue 10, 1 (2012), 20.Google ScholarDigital Library
Gustavo Grieco, Guillermo Luis Grinblat, Lucas Uzal, Sanjay Rawat, Josselin Feist, and Laurent Mounier. 2015. Toward Large-scale Vulnerability Discovery Using Machine Learning. Technical Report. The Free International Center of Information Sciences and Systems (CIFASIS), National Council for Science and Technology of Argentina (CONICET).Google Scholar
Natalie Gruska, Andrzej Wasylkowski, and Andreas Zeller. 2010. Learning from 6,000 projects: Lightweight cross-project anomaly detection. In Proceedings of the 19th International Symposium on Software Testing and Analysis (ISSTA’10). ACM, 119--130.Google ScholarDigital Library
Thiago S. Guzella and Walmir M. Caminhas. 2009. A review of machine-learning approaches to spam filtering. Expert Syst. Appl. 36, 7 (2009), 10206--10222. Google ScholarDigital Library
Jiawei Han, Micheline Kamber, and Jian Pei. 2011. Data Mining: Concepts and Techniques (3rd ed.). Morgan Kaufmann.Google ScholarDigital Library
Sean Heelan. 2011. Vulnerability detection systems: Think cyborg, not robot. IEEE Secur. Privacy 9, 3 (2011), 74--77. Google ScholarDigital Library
Caitriona H. Heinl. 2014. Artificial (intelligent) agents and active cyber defence: Policy implications. In Proceedings of the 6th International Conference on Cyber Conflict (CyCon’14). IEEE, 53--66.Google ScholarCross Ref
Aram Hovsepyan, Riccardo Scandariato, Wouter Joosen, and James Walden. 2012. Software vulnerability prediction using text analysis techniques. In Proceedings of the 4th International Workshop on Security Measurements and Metrics (MetriSec’12). ACM, 7--10. Google ScholarDigital Library
IEEE Standards. 1990. IEEE Standard Glossary of Software Engineering Terminology. IEEE Std. 610.12-1990.Google Scholar
Ranjit Jhala and Rupak Majumdar. 2009. Software model checking. ACM Comput. Surveys (CSUR’09) 41, 4 (2009), 21.Google Scholar
Cem Kaner and Walter P. Bond. 2004. Software engineering metrics: What do they measure and how do we know? In Proceedings of the 10th International Symposium on Software Metrics (METRICS’04). IEEE.Google Scholar
Taghi M. Khoshgoftaar, Edward B. Allen, John P. Hudepohl, and Stephen J. Aud. 1997. Application of neural networks to software quality modeling of a very large telecommunications system. IEEE Trans. Neural Netw. 8, 4 (1997), 902--909. Google ScholarDigital Library
Ivan Victor Krsul. 1998. Software Vulnerability Analysis. Ph.D. Dissertation. Purdue University.Google Scholar
William Landi. 1992. Undecidability of static analysis. ACM Lett. Program. Lang. Syst. (LOPLAS) 1, 4 (1992), 323--337. Google ScholarDigital Library
Carl Landwehr. 2008. Cybersecurity and artificial intelligence: From fixing the plumbing to smart water. IEEE Secur. Privacy 6, 5 (2008), 3--4. Google ScholarDigital Library
James R. Larus, Thomas Ball, Manuvir Das, Robert DeLine, Manuel Fahndrich, Jon Pincus, Sriram K. Rajamani, and Ramanathan Venkatapathy. 2004. Righting software. IEEE Softw. 21, 3 (2004), 92--100. Google ScholarDigital Library
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436--444. Google ScholarCross Ref
Zhenmin Li and Yuanyuan Zhou. 2005. PR-miner: Automatically extracting implicit programming rules and detecting violations in large software code. In Proceedings of the 10th European Software Engineering Conference Held Jointly with the 13th ACM International Symposium on Foundations of Software Engineering (ESEC/FSE’05). ACM, 306--315.Google ScholarDigital Library
Benjamin Livshits and Thomas Zimmermann. 2005. DynaMine: Finding common error patterns by mining software revision histories. In Proceedings of the 10th European Software Engineering Conference held Jointly with the 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE’05). ACM, 296--305.Google ScholarDigital Library
Fan Long and Martin Rinard. 2016. Automatic patch generation by learning correct code. In Proceedings of the 43rd Symposium on Principles of Programming Languages (POPL’16). ACM, 298--312. Google ScholarDigital Library
Ying Ma, Guangchun Luo, Xue Zeng, and Aiguo Chen. 2012. Transfer learning for cross-company software defect prediction. Info. Softw. Technol. 54, 3 (2012), 248--256. Google ScholarDigital Library
Ruchika Malhotra. 2015. A systematic review of machine-learning techniques for software fault prediction. Appl. Soft Comput. 27 (2015), 504--518. Google ScholarDigital Library
Iberia Medeiros, Nuno F. Neves, and Miguel Correia. 2014. Automatic detection and correction of web application vulnerabilities using data mining to predict false positives. In Proceedings of the 23rd International Conference on World Wide Web (WWW’14). ACM, 63--74. Google ScholarDigital Library
Andrew Meneely, Harshavardhan Srinivasan, Afiqah Musa, Alberto Rodriguez-Tejeda, Matthew Mokary, and Brian Spates. 2013. When a patch goes bad: Exploring the properties of vulnerability-contributing commits. In Proceedings of the ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM’13). IEEE, 65--74.Google ScholarCross Ref
Andrew Meneely and Laurie Williams. 2010. Strengthening the empirical analysis of the relationship between linus’ law and software security. In Proceedings of the ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM’10). ACM, 9.Google ScholarDigital Library
Julie Moeyersoms, Enric Junque de Fortuny, Karel Dejaeger, Bart Baesens, and David Martens. 2015. Comprehensible software fault and effort prediction: A data-mining approach. J. Syst. Softw. 100 (2015), 80--90. Google ScholarDigital Library
Benoit Morel. 2011. Artificial intelligence: A key to the future of cybersecurity. In Proceedings of the 4th ACM workshop on Artificial Intelligence and Security (AISec’11). ACM, 93--98. Google ScholarDigital Library
Patrick Morrison, Kim Herzig, Brendan Murphy, and Laurie Williams. 2015. Challenges with applying vulnerability prediction models. In Proceedings of the Symposium and Bootcamp on the Science of Security (HotSoS’15). ACM, 4. Google ScholarDigital Library
Sara Moshtari, Ashkan Sami, and Mahdi Azimi. 2013. Using complexity metrics to improve software security. Comput. Fraud Secur. 2013, 5 (2013), 8--17. Google ScholarCross Ref
Jaechang Nam, Sinno Jialin Pan, and Sunghun Kim. 2013. Transfer defect learning. In Proceedings of the International Conference on Software Engineering (ICSE’13). IEEE, 382--391. Google ScholarCross Ref
Kartik Nayak, Daniel Marino, Petros Efstathopoulos, and Tudor Dumitras. 2014. Some vulnerabilities are different than others. In Proceedings of the 17th International Symposium on Research in Attacks, Intrusions and Defenses (RAID’14). Springer, 426--446. Google ScholarCross Ref
Andy Ozment. 2007. Improving vulnerability discovery models. In Proceedings of the 2007 ACM workshop on Quality of Protection (QoP’07). ACM, 6--11. Google ScholarDigital Library
Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 10 (2010), 1345--1359. Google ScholarDigital Library
Yulei Pang, Xiaozhen Xue, and Akbar Siami Namin. 2015. Predicting vulnerable software components through N-gram analysis and statistical feature selection. In Proceedings of the 14th International Conference on Machine Learning and Applications (ICMLA’15). IEEE, 543--548.Google ScholarCross Ref
Hao Peng, Lili Mou, Ge Li, Yuxuan Liu, Lu Zhang, and Zhi Jin. 2015. Building program vector representations for deep learning. In Proceedings of the 8th International Conference on Knowledge Science, Engineering and Management (KSEM’15). Springer, 547--553. Google ScholarDigital Library
Henning Perl, Sergej Dechand, Matthew Smith, Daniel Arp, Fabian Yamaguchi, Konrad Rieck, Sascha Fahl, and Yasemin Acar. 2015. VccFinder: Finding potential vulnerabilities in open-source projects to assist code audits. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS’15). ACM, 426--437.Google ScholarDigital Library
Thomas Reps. 2000. Undecidability of context-sensitive data-dependence analysis. ACM Trans. Program. Lang. Syst. (TOPLAS) 22, 1 (2000), 162--186. Google ScholarDigital Library
Stuart Russell and Peter Norvig. 2009. Artificial Intelligence: A Modern Approach (3rd ed.). Pearson.Google ScholarDigital Library
Alireza Sadeghi, Naeem Esfahani, and Sam Malek. 2014. Mining the categorized software repositories to improve the analysis of security vulnerabilities. In Proceedings of the 17th International Conference on Fundamental Approaches to Software Engineering (FASE’14) Part of the European Joint Conferences on Theory and Practice of Software (ETAPS’14). Springer, 155--169. Google ScholarDigital Library
Arthur L. Samuel. 1959. Some studies in machine learning using the game of checkers. IBM J. Res. Dev. 44, 1 (1959), 210--229. Google ScholarDigital Library
Riccardo Scandariato, James Walden, Aram Hovsepyan, and Wouter Joosen. 2014. Predicting vulnerable software components via text mining. IEEE Trans. Softw. Eng. 40, 10 (2014), 993--1006. Google ScholarCross Ref
Hossain Shahriar and Mohammad Zulkernine. 2012. Mitigating program security vulnerabilities: Approaches and challenges. ACM Comput. Surveys (CSUR) 44, 3 (2012), 11.Google ScholarDigital Library
Lwin Khin Shar, Lionel C Briand, and Hee Beng Kuan Tan. 2015. Web application vulnerability prediction using hybrid program analysis and machine learning. IEEE Trans. Depend. Secure Comput. 12, 6 (2015), 688--707. Google ScholarDigital Library
Lwin Khin Shar and Hee Beng Kuan Tan. 2012. Predicting common web application vulnerabilities from input validation and sanitization code patterns. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE’12). IEEE, 310--313. Google ScholarDigital Library
Lwin Khin Shar and Hee Beng Kuan Tan. 2013. Predicting SQL injection and cross site scripting vulnerabilities through mining input sanitization patterns. Info. Softw. Technol. 55, 10 (2013), 1767--1780. Google ScholarDigital Library
Lwin Khin Shar, Hee Beng Kuan Tan, and Lionel C. Briand. 2013. Mining SQL injection and cross site scripting vulnerabilities using hybrid program analysis. In Proceedings of the 35th International Conference on Software Engineering (ICSE’13). IEEE, 642--651. Google ScholarCross Ref
Yonghee Shin, Andrew Meneely, Laurie Williams, and Jason Osborne. 2011. Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE Trans. Softw. Eng. 37, 6 (2011), 772--787. Google ScholarDigital Library
Yonghee Shin and Laurie Williams. 2011. An initial study on the use of execution complexity metrics as indicators of software vulnerabilities. In Proceedings of the 7th International Workshop on Software Engineering for Secure Systems (SESS’11). ACM, 1--7. Google ScholarDigital Library
Yonghee Shin and Laurie Williams. 2013. Can traditional fault prediction models be used for vulnerability prediction? Empir. Softw. Eng. 18, 1 (2013), 25--59. Google ScholarCross Ref
Robin Sommer and Vern Paxson. 2010. Outside the closed world: On using machine learning for network intrusion detection. In Proceedings of the 31st IEEE Symposium on Security and Privacy (SP’10). IEEE, 305--316.Google ScholarDigital Library
Sherri Sparks, Shawn Embleton, Ryan Cunningham, and Cliff Zou. 2007. Automated vulnerability analysis: Leveraging control flow for evolutionary input crafting. In Proceedings of the 23rd Annual Computer Security Applications Conference (ACSAC’07). IEEE, 477--486.Google ScholarCross Ref
Symantec Security Response. 2014. ShellShock: All you need to know about the Bash Bug vulnerability. Retrieved from http://www.symantec.com/connect/blogs/shellshock-all-you-need-know-about-bash-bug-vulnerability.Google Scholar
Yaming Tang, Fei Zhao, Yibiao Yang, Hongmin Lu, Yuming Zhou, and Baowen Xu. 2015. Predicting vulnerable components via text mining or software metrics? An effort-aware perspective. In Proceedings of the International Conference on Software Quality, Reliability and Security (QRS’15). IEEE, 27--36. Google ScholarDigital Library
Suresh Thummalapenta and Tao Xie. 2009. Alattin: Mining alternative patterns for detecting neglected conditions. In Proceedings of the 24th IEEE/ACM International Conference on Automated Software Engineering (ASE’09). IEEE, 283--294.Google ScholarDigital Library
Enn Tyugu. 2011. Artificial intelligence in cyber defense. In Proceedings of the 3rd International Conference on Cyber Conflict (CyCon’11). IEEE, 1--11.Google Scholar
US-CERT. 2013. Oracle Java Contains Multiple Vulnerabilities. Retrieved from https://www.us-cert.gov/ncas/alerts/TA13-064A.Google Scholar
US-CERT. 2015. Adobe Flash and Microsoft Windows Vulnerabilities. Retrieved from https://www.us-cert.gov/ncas/alerts/TA15-195A.Google Scholar
Anneleen Van-Assche and Hendrik Blockeel. 2007. Seeing the forest through the trees: Learning a comprehensible model from an ensemble. In Machine Learning: Proceedings of the 18th European Conference on Machine Learning (ECML’07) (Lecture Notes in Computer Science (LNCS)). Springer, Berlin, 418--429.Google Scholar
James Walden, Jeffrey Stuckman, and Riccardo Scandariato. 2014. Predicting vulnerable components: Software metrics vs text mining. In Proceedings of the 25th International Symposium on Software Reliability Engineering (ISSRE’14). IEEE, 23--33.Google ScholarDigital Library
Andrzej Wasylkowski, Andreas Zeller, and Christian Lindig. 2007. Detecting object usage anomalies. In Proceedings of the the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE’07). ACM, 35--44. Google ScholarDigital Library
Dumidu Wijayasekara, Milos Manic, and Miles McQueen. 2014. Vulnerability identification and classification via text mining bug databases. In Proceedings of the 40th Annual Conference of the IEEE Industrial Electronics Society (IECON’14). IEEE, 3612--3618. Google ScholarCross Ref
Dumidu Wijayasekara, Milos Manic, Jason L. Wright, and Miles McQueen. 2012. Mining bug databases for unidentified software vulnerabilities. In Proceedings of the 5th International Conference on Human System Interactions (HSI’12). IEEE, 89--96. Google ScholarDigital Library
Tao Xie and Jian Pei. 2006. MAPO: Mining API usages from open source repositories. In Proceedings of the International Workshop on Mining Software Repositories (MSR’06). ACM, 54--57.Google ScholarDigital Library
Yichen Xie, Mayur Naik, Brian Hackett, and Alex Aiken. 2005. Soundness and its role in bug detection systems. In Proceedings of the Workshop on the Evaluation of Software Defect Detection Tools (BUGS’05).Google Scholar
Fabian Yamaguchi, Nico Golde, Daniel Arp, and Konrad Rieck. 2014. Modeling and discovering vulnerabilities with code property graphs. In Proceedings of the 35th IEEE Symposium on Security and Privacy (SP’14). IEEE, 590--604. Google ScholarDigital Library
Fabian Yamaguchi, Felix Lindner, and Konrad Rieck. 2011. Vulnerability extrapolation: Assisted discovery of vulnerabilities using machine learning. In Proceedings of the 5th USENIX Workshop on Offensive Technologies. USENIX Association.Google Scholar
Fabian Yamaguchi, Markus Lottmann, and Konrad Rieck. 2012. Generalized vulnerability extrapolation using abstract syntax trees. In Proceedings of the 28th Annual Computer Security Applications Conference (ACSAC’12). ACM, 359--368. Google ScholarDigital Library
Fabian Yamaguchi, Alwin Maier, Hugo Gascon, and Konrad Rieck. 2015. Automatic inference of search patterns for taint-style vulnerabilities. In Proceedings of the 36th IEEE Symposium on Security and Privacy (SP’15). IEEE, 797--812. Google ScholarDigital Library
Fabian Yamaguchi, Christian Wressnegger, Hugo Gascon, and Konrad Rieck. 2013. Chucky: Exposing missing checks in source code for vulnerability discovery. In Proceedings of the 20th ACM SIGSAC Conference on Computer 8 Communications Security (CCS’13). ACM, 499--510.Google ScholarDigital Library
Xinli Yang, David Lo, Xin Xia, Yun Zhang, and Jianling Sun. 2015. Deep learning for just-in-time defect prediction. In Proceedings of the International Conference on Software Quality, Reliability and Security (QRS’15). IEEE, 17--26. Google ScholarDigital Library
Awad Younis, Yashwant Malaiya, Charles Anderson, and Indrajit Ray. 2016. To fear or not to fear that is the question: Code characteristics of a vulnerable function with an existing exploit. In Proceedings of the 6th ACM Conference on Data and Application Security and Privacy (CODASPY’16). ACM, 97--104.Google ScholarDigital Library
Andreas Zeller, Thomas Zimmermann, and Christian Bird. 2011. Failure is a four-letter word: A parody in empirical research. In Proceedings of the 7th International Conference on Predictive Models in Software Engineering. ACM. Google ScholarDigital Library
Chenfeng Vincent Zhou, Christopher Leckie, and Shanika Karunasekera. 2010. A survey of coordinated attacks and collaborative intrusion detection. Comput. Secur. 29, 1 (2010), 124--140. Google ScholarDigital Library
Thomas Zimmermann, Nachiappan Nagappan, and Laurie Williams. 2010. Searching for a needle in a haystack: Predicting security vulnerabilities for windows vista. In Proceedings of the 3rd International Conference on Software Testing, Verification and Validation (ICST’10). IEEE, 421--428.Google ScholarDigital Library

Index Terms

Software Vulnerability Analysis and Discovery Using Machine-Learning and Data-Mining Techniques: A Survey

Recommendations

Discovering software vulnerabilities using data-flow analysis and machine learning
ARES '18: Proceedings of the 13th International Conference on Availability, Reliability and Security

We present a novel method for static analysis in which we combine data-flow analysis with machine learning to detect SQL injection (SQLi) and Cross-Site Scripting (XSS) vulnerabilities in PHP applications. We assembled a dataset from the National ...
Read More
Fuzzing vulnerability discovery techniques: Survey, challenges and future directions
Abstract
Fuzzing is a powerful tool for vulnerability discovery in software, with much progress being made in the field in recent years. There is limited literature available on the fuzzing vulnerability discovery approaches. Hence, in this ...
Read More
Detecting Blind Cross-Site Scripting Attacks Using Machine Learning
SPML '18: Proceedings of the 2018 International Conference on Signal Processing and Machine Learning

Cross-site scripting (XSS) is a scripting attack targeting web applications by injecting malicious scripts into web pages. Blind XSS is a subset of stored XSS, where an attacker blindly deploys malicious payloads in web pages that are stored in a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Computing Surveys Volume 50, Issue 4
July 2018
531 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3135069
Editor:
Sartaj Sahni
Department of Computer and Information Science and Engineering / University of Florida / Gainesville, FL
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 August 2017
- Accepted: 1 May 2017
- Revised: 1 April 2017
- Received: 1 August 2016
Published in csur Volume 50, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Software vulnerability analysis
data-mining
machine-learning
review
software security
software vulnerability discovery
survey
Qualifiers
- survey
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 217
  Total Citations
  View Citations
- 6,256
  Total Downloads
- Downloads (Last 12 months)701
- Downloads (Last 6 weeks)83
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Software Vulnerability Analysis and Discovery Using Machine-Learning and Data-Mining Techniques: A Survey

ACM Computing Surveys

Abstract

References

Cited By

Index Terms

Recommendations

Discovering software vulnerabilities using data-flow analysis and machine learning

Fuzzing vulnerability discovery techniques: Survey, challenges and future directions

Detecting Blind Cross-Site Scripting Attacks Using Machine Learning