skip to main content
10.1145/1835804.1835821acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Beyond heuristics: learning to classify vulnerabilities and predict exploits

Published:25 July 2010Publication History

ABSTRACT

The security demands on modern system administration are enormous and getting worse. Chief among these demands, administrators must monitor the continual ongoing disclosure of software vulnerabilities that have the potential to compromise their systems in some way. Such vulnerabilities include buffer overflow errors, improperly validated inputs, and other unanticipated attack modalities. In 2008, over 7,400 new vulnerabilities were disclosed--well over 100 per week. While no enterprise is affected by all of these disclosures, administrators commonly face many outstanding vulnerabilities across the software systems they manage. Vulnerabilities can be addressed by patches, reconfigurations, and other workarounds; however, these actions may incur down-time or unforeseen side-effects. Thus, a key question for systems administrators is which vulnerabilities to prioritize. From publicly available databases that document past vulnerabilities, we show how to train classifiers that predict whether and how soon a vulnerability is likely to be exploited. As input, our classifiers operate on high dimensional feature vectors that we extract from the text fields, time stamps, cross references, and other entries in existing vulnerability disclosure reports. Compared to current industry-standard heuristics based on expert knowledge and static formulas, our classifiers predict much more accurately whether and how soon individual vulnerabilities are likely to be exploited.

Skip Supplemental Material Section

Supplemental Material

kdd2010_bozorgi_bhlc_01.mov

mov

72.1 MB

References

  1. W. A. Arbaugh, W. L. Fithen, and J. McHugh. Windows of vulnerability: A case study analysis. Computer, 33(12):52--59, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Arora, A. Nandkumar, and R. Telang. Does information security attack frequency increase with vulnerability disclosure? an empirical analysis. Information Systems Frontiers, 8(5), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Arora, R. Telang, and H. Xu. Optimal policy for software vulnerability disclosure. In Workshop on Economics and Information Security (WEIS'04), 2004.Google ScholarGoogle Scholar
  4. S. M. Bellovin. On the Brittleness of Software and the Infeasibility of Security Metrics. IEEE Security and Privacy, 4(4), July 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cisco. Risk Assessment: Risk Triage for Security Vulnerability Announcements. Cisco Whitepaper, Accessed September, 2009. http://www.cisco.com/web/about/security/intelligence/vulnerability-risk-triage.html.Google ScholarGoogle Scholar
  6. CVE Editorial Board. Common Vulnerabilities and Exposures: The Standard for Information Security Vulnerability Names. http://cve.mitre.org/.Google ScholarGoogle Scholar
  7. C. Dougherty. Vulnerability metric, Updated on July 24, 2008. https://www.securecoding.cert.org/confluence/ display/seccode/Vulnerability+Metric.Google ScholarGoogle Scholar
  8. R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR -- A Library for Large Linear Classification. http://www.csie.ntu.edu.tw/~cjlin/liblinear/. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Forum of Incident Response and Security Teams (FIRST). Common Vulnerabilities Scoring System (CVSS). http://www.first.org/cvss/.Google ScholarGoogle Scholar
  10. S. Frei, D. Schatzmann, B. Plattner, and B. Trammel. Modeling the Security Ecosystem - The Dynamics of (In)Security. In Proc. of the Workshop on the Economics of Information Security (WEIS), June 2009.Google ScholarGoogle Scholar
  11. IBM. IBM Internet Security Systems X-Force 2008 Trend and Risk Report. White paper, Jan. 2009. http://www-935.ibm.com/services/us/iss/xforce/trendreports/xforce-2008-annual-report.pdf.Google ScholarGoogle Scholar
  12. D. Lewis. Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval. In Proceedings of ECML-98, the 10th European Conference on Machine Learning, pages 4--15, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Mell, K. Scarfone, and S. Romanosky. A complete guide to the common vulnerability scoring system version 2.0, June, 2007. http://www.first.org/cvss/cvss-guide.html.Google ScholarGoogle Scholar
  14. Microsoft TechNet Security Team. Microsoft Security Bulletin. http://www.microsoft.com/technet/security/current.aspx.Google ScholarGoogle Scholar
  15. D. Moore, C. Shannon, and k. claffy. Code-red: a case study on the spread and victims of an internet worm. In Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurement, pages 273--284, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Nizovtsev and M. Thursby. Economic analysis of incentives to disclose software vulnerabilities. In Proc. of the Workshop on the Economics of Information Security, 2005.Google ScholarGoogle Scholar
  17. OSVDB. The Open Source Vulnerability Database. http://osvdb.org/.Google ScholarGoogle Scholar
  18. A. Ozment. The likelihood of vulnerability rediscovery and the social utility of vulnerability hunting. In Proc. of the Workshop on the Economics of Information Security, 2005.Google ScholarGoogle Scholar
  19. E. Rescorla. Security holes... who cares? In Proc. of the 12th conference on USENIX Security Symposium, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Secunia Corporation. Secunia Advisories. http://secunia.com.Google ScholarGoogle Scholar
  21. Symantec Corporation. Security Focus. http://www.securityfocus.com.Google ScholarGoogle Scholar
  22. V. Vapnik. Statistical Learning Theory. John Wiley & Sons, New York, NY, 1998.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Beyond heuristics: learning to classify vulnerabilities and predict exploits

    Recommendations

    Reviews

    Vijay K Gurbani

    Machine learning techniques are being applied to all kinds of problems in computer science. This paper applies machine learning to classifying vulnerabilities and predicting time to exploit a vulnerability, once information on it has been released. Bozorgi et al. train a linear support vector machine (SVM) on feature vectors extracted from two publicly available vulnerability databases: the open-source vulnerability database (OSVDB) and MITRE's common vulnerabilities and exposures (CVE). The feature extraction process consists of a frequency count of keywords that appear in a vulnerability disclosure report. The SVM is trained on available vulnerability data from 1991 to 2005; data from 2005 to 2007 is used as a testing vector. After the training, the authors test the classifier on two predictions: (a) whether a given vulnerability will be exploited at all and (b) the time to exploit a known vulnerability. The results indicate that for prediction (a), the classifier achieves a true positive (TP) rate of 95 percent (the false positive (FP) rate is five percent). For prediction (b), the results indicate that the classifier is 98 percent accurate-TP is 98 percent and FP is two percent-in predicting whether a vulnerability will be exploited within two days; other time frames, such as seven, 14, or 30 days, yield the same result. A final contribution of the paper is an alternative vulnerability scoring system that shows how critical a vulnerability is. Current scoring systems have differing ways of representing this and, in fact, some of them have magic numbers embedded in deriving the score. Bozorgi et al. propose using the signed distance to the maximum margin hyperplane separating positive and negative examples as a canonical score for the exploitability of a vulnerability. The paper makes a good argument for using machine learning models to predict vulnerabilities. A more structured approach mitigates the presence of magic numbers that are found in existing manual classification schemes. To be sure, machine learning will not mitigate the importance of human intelligence in determining vulnerabilities-for instance, zero-day exploits cannot be predicted through these techniques-but it can move it a bit closer to being a science rather than an art. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
      July 2010
      1240 pages
      ISBN:9781450300551
      DOI:10.1145/1835804

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 July 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader