research-article

Beyond heuristics: learning to classify vulnerabilities and predict exploits

Authors:
Mehran Bozorgi

UCSD / Google, San Diego, USA

UCSD / Google, San Diego, USA
View Profile

,
Lawrence K. Saul

UCSD, San Diego, USA

UCSD, San Diego, USA
View Profile

,
Stefan Savage

UCSD, San Diego, USA

UCSD, San Diego, USA
View Profile

,
Geoffrey M. Voelker

UCSD, San Diego, USA

UCSD, San Diego, USA
View Profile

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data miningJuly 2010Pages 105–114https://doi.org/10.1145/1835804.1835821

Published:25 July 2010Publication History

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 105–114

ABSTRACT

The security demands on modern system administration are enormous and getting worse. Chief among these demands, administrators must monitor the continual ongoing disclosure of software vulnerabilities that have the potential to compromise their systems in some way. Such vulnerabilities include buffer overflow errors, improperly validated inputs, and other unanticipated attack modalities. In 2008, over 7,400 new vulnerabilities were disclosed--well over 100 per week. While no enterprise is affected by all of these disclosures, administrators commonly face many outstanding vulnerabilities across the software systems they manage. Vulnerabilities can be addressed by patches, reconfigurations, and other workarounds; however, these actions may incur down-time or unforeseen side-effects. Thus, a key question for systems administrators is which vulnerabilities to prioritize. From publicly available databases that document past vulnerabilities, we show how to train classifiers that predict whether and how soon a vulnerability is likely to be exploited. As input, our classifiers operate on high dimensional feature vectors that we extract from the text fields, time stamps, cross references, and other entries in existing vulnerability disclosure reports. Compared to current industry-standard heuristics based on expert knowledge and static formulas, our classifiers predict much more accurately whether and how soon individual vulnerabilities are likely to be exploited.

Supplemental Material

kdd2010_bozorgi_bhlc_01.mov

mov

72.1 MB

Download

References

W. A. Arbaugh, W. L. Fithen, and J. McHugh. Windows of vulnerability: A case study analysis. Computer, 33(12):52--59, 2000. Google ScholarDigital Library
A. Arora, A. Nandkumar, and R. Telang. Does information security attack frequency increase with vulnerability disclosure? an empirical analysis. Information Systems Frontiers, 8(5), 2006. Google ScholarDigital Library
A. Arora, R. Telang, and H. Xu. Optimal policy for software vulnerability disclosure. In Workshop on Economics and Information Security (WEIS'04), 2004.Google Scholar
S. M. Bellovin. On the Brittleness of Software and the Infeasibility of Security Metrics. IEEE Security and Privacy, 4(4), July 2006. Google ScholarDigital Library
Cisco. Risk Assessment: Risk Triage for Security Vulnerability Announcements. Cisco Whitepaper, Accessed September, 2009. http://www.cisco.com/web/about/security/intelligence/vulnerability-risk-triage.html.Google Scholar
CVE Editorial Board. Common Vulnerabilities and Exposures: The Standard for Information Security Vulnerability Names. http://cve.mitre.org/.Google Scholar
C. Dougherty. Vulnerability metric, Updated on July 24, 2008. https://www.securecoding.cert.org/confluence/ display/seccode/Vulnerability+Metric.Google Scholar
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR -- A Library for Large Linear Classification. http://www.csie.ntu.edu.tw/~cjlin/liblinear/. Google ScholarDigital Library
Forum of Incident Response and Security Teams (FIRST). Common Vulnerabilities Scoring System (CVSS). http://www.first.org/cvss/.Google Scholar
S. Frei, D. Schatzmann, B. Plattner, and B. Trammel. Modeling the Security Ecosystem - The Dynamics of (In)Security. In Proc. of the Workshop on the Economics of Information Security (WEIS), June 2009.Google Scholar
IBM. IBM Internet Security Systems X-Force 2008 Trend and Risk Report. White paper, Jan. 2009. http://www-935.ibm.com/services/us/iss/xforce/trendreports/xforce-2008-annual-report.pdf.Google Scholar
D. Lewis. Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval. In Proceedings of ECML-98, the 10th European Conference on Machine Learning, pages 4--15, 1998. Google ScholarDigital Library
P. Mell, K. Scarfone, and S. Romanosky. A complete guide to the common vulnerability scoring system version 2.0, June, 2007. http://www.first.org/cvss/cvss-guide.html.Google Scholar
Microsoft TechNet Security Team. Microsoft Security Bulletin. http://www.microsoft.com/technet/security/current.aspx.Google Scholar
D. Moore, C. Shannon, and k. claffy. Code-red: a case study on the spread and victims of an internet worm. In Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurement, pages 273--284, 2002. Google ScholarDigital Library
D. Nizovtsev and M. Thursby. Economic analysis of incentives to disclose software vulnerabilities. In Proc. of the Workshop on the Economics of Information Security, 2005.Google Scholar
OSVDB. The Open Source Vulnerability Database. http://osvdb.org/.Google Scholar
A. Ozment. The likelihood of vulnerability rediscovery and the social utility of vulnerability hunting. In Proc. of the Workshop on the Economics of Information Security, 2005.Google Scholar
E. Rescorla. Security holes... who cares? In Proc. of the 12th conference on USENIX Security Symposium, 2003. Google ScholarDigital Library
Secunia Corporation. Secunia Advisories. http://secunia.com.Google Scholar
Symantec Corporation. Security Focus. http://www.securityfocus.com.Google Scholar
V. Vapnik. Statistical Learning Theory. John Wiley & Sons, New York, NY, 1998.Google ScholarDigital Library

Index Terms

Beyond heuristics: learning to classify vulnerabilities and predict exploits
1. Security and privacy
  1. Network security

Recommendations

A threat pattern for the "cross-site scripting (XSS)" attack
PLoP '15: Proceedings of the 22nd Conference on Pattern Languages of Programs

We present a threat pattern that describes cross-site scripting (XSS) attacks. In this attack attackers insert scripts in web applications that will lead to misuses in a target web application. Cross-Site Scripting is listed as number three risk on the ...
Read More
Two threat patterns that exploit "security misconfiguration" and "sensitive data exposure" vulnerabilities
EuroPLoP '15: Proceedings of the 20th European Conference on Pattern Languages of Programs

We present threat patterns that describe attacks against applications that take advantage of security misconfigurations in the application stack and applications that expose sensitive data. These patterns provide insight on how to build and configure ...
Read More
It's a TRaP: Table Randomization and Protection against Function-Reuse Attacks
CCS '15: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security

Code-reuse attacks continue to evolve and remain a severe threat to modern software. Recent research has proposed a variety of defenses with differing security, efficiency, and practicality characteristics. Whereas the majority of these solutions focus ...
Read More

Reviews

Reviewer: Vijay K Gurbani

Machine learning techniques are being applied to all kinds of problems in computer science. This paper applies machine learning to classifying vulnerabilities and predicting time to exploit a vulnerability, once information on it has been released. Bozorgi et al. train a linear support vector machine (SVM) on feature vectors extracted from two publicly available vulnerability databases: the open-source vulnerability database (OSVDB) and MITRE's common vulnerabilities and exposures (CVE). The feature extraction process consists of a frequency count of keywords that appear in a vulnerability disclosure report. The SVM is trained on available vulnerability data from 1991 to 2005; data from 2005 to 2007 is used as a testing vector. After the training, the authors test the classifier on two predictions: (a) whether a given vulnerability will be exploited at all and (b) the time to exploit a known vulnerability. The results indicate that for prediction (a), the classifier achieves a true positive (TP) rate of 95 percent (the false positive (FP) rate is five percent). For prediction (b), the results indicate that the classifier is 98 percent accurate-TP is 98 percent and FP is two percent-in predicting whether a vulnerability will be exploited within two days; other time frames, such as seven, 14, or 30 days, yield the same result. A final contribution of the paper is an alternative vulnerability scoring system that shows how critical a vulnerability is. Current scoring systems have differing ways of representing this and, in fact, some of them have magic numbers embedded in deriving the score. Bozorgi et al. propose using the signed distance to the maximum margin hyperplane separating positive and negative examples as a canonical score for the exploitability of a vulnerability. The paper makes a good argument for using machine learning models to predict vulnerabilities. A more structured approach mitigates the presence of magic numbers that are found in existing manual classification schemes. To be sure, machine learning will not mitigate the importance of human intelligence in determining vulnerabilities-for instance, zero-day exploits cannot be predicted through these techniques-but it can move it a bit closer to being a science rather than an art. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
July 2010
1240 pages
ISBN:9781450300551
DOI:10.1145/1835804
General Chairs:
Bharat Rao
Siemens
,
Balaji Krishnapuram
Siemens
,
Program Chairs:
Andrew Tomkins
Google Inc.
,
Qiang Yang
Hong Kong University of Science and Technology
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 July 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
SVM
exploits
supervised learning
vulnerabilities
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 169
  Total Citations
  View Citations
- 1,696
  Total Downloads
- Downloads (Last 12 months)97
- Downloads (Last 6 weeks)21
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.