research-article

Challenges with applying vulnerability prediction models

Authors:
Patrick Morrison

North Carolina State University, Raleigh, NC

North Carolina State University, Raleigh, NC
View Profile

,
Kim Herzig

Microsoft Research, Cambridge, UK

Microsoft Research, Cambridge, UK
View Profile

,
Brendan Murphy

Microsoft Research, Cambridge, UK

Microsoft Research, Cambridge, UK
View Profile

,
Laurie Williams

North Carolina State University, Raleigh, NC

North Carolina State University, Raleigh, NC
View Profile

HotSoS '15: Proceedings of the 2015 Symposium and Bootcamp on the Science of SecurityApril 2015Article No.: 4Pages 1–9https://doi.org/10.1145/2746194.2746198

Published:21 April 2015Publication History

HotSoS '15: Proceedings of the 2015 Symposium and Bootcamp on the Science of Security

Pages 1–9

ABSTRACT

Vulnerability prediction models (VPM) are believed to hold promise for providing software engineers guidance on where to prioritize precious verification resources to search for vulnerabilities. However, while Microsoft product teams have adopted defect prediction models, they have not adopted vulnerability prediction models (VPMs). The goal of this research is to measure whether vulnerability prediction models built using standard recommendations perform well enough to provide actionable results for engineering resource allocation. We define 'actionable' in terms of the inspection effort required to evaluate model results. We replicated a VPM for two releases of the Windows Operating System, varying model granularity and statistical learners. We reproduced binary-level prediction precision (~0.75) and recall (~0.2). However, binaries often exceed 1 million lines of code, too large to practically inspect, and engineers expressed preference for source file level predictions. Our source file level models yield precision below 0.5 and recall below 0.2. We suggest that VPMs must be refined to achieve actionable performance, possibly through security-specific metrics.

References

Zimmermann, T., Nagappan, N., and Williams, L. Searching for a Needle in a Haystack: Predicting Security Vulnerabilities for Windows Vista. In Software Testing, Verification and Validation (ICST), 2010 Third International Conference on (2010), 421--428. Google ScholarDigital Library
Howard, M. and Lipner, S. The Security Development Lifecycle. Microsoft Press, 2006. Google ScholarDigital Library
Basili, V. R., Briand, L. C., and Melo, W. L. A validation of object-oriented design metrics as quality indicators. Software Engineering, IEEE Transactions on, 22 (1996), 751--761. Google ScholarDigital Library
Emam, K. E., Melo, W., and Machado, J. C. The prediction of faulty classes using object-oriented design metrics. J. Syst. Softw., 56 (feb 2001), 63--75. Google ScholarDigital Library
Nagappan, N. and Ball, T. Use of relative code churn measures to predict system defect density. In Software Engineering, 2005. ICSE 2005. Proceedings. 27th International Conference on (2005), 284--292. Google ScholarDigital Library
Gegick, M., Williams, L., Osborne, J., and Vouk, M. Prioritizing software security fortification throughcode-level metrics. In Proceedings of the 4th ACM workshop on Quality of protection (2008), ACM, 31--38. Google ScholarDigital Library
Shin, Y. and Williams, L. Can traditional fault prediction models be used for vulnerability prediction? Empirical Software Engineering, 18 (2013), 25--59.Google ScholarCross Ref
Neuhaus, S., Zimmermann, T., Holler, C., and Zeller, A. Predicting vulnerable software components. In Proceedings of the 14th ACM conference on Computer and communications security (2007), ACM, 529--540. Google ScholarDigital Library
Zimmermann, T. and Nagappan, N. Predicting defects using network analysis on dependency graphs. In Proceedings of the 30th international conference on Software engineering (2008), ACM, 531--540. Google ScholarDigital Library
Arisholm, E. and Briand, L. C. Predicting Fault-prone Components in a Java Legacy System. In Proceedings of the 2006 ACM/IEEE International Symposium on Empirical Software Engineering (2006), ACM, 8--17. Google ScholarDigital Library
Mende, T. and Koschke, R. Revisiting the Evaluation of Defect Prediction Models. In Proceedings of the 5th International Conference on Predictor Models in Software Engineering (2009), ACM, 7:1--7:10. Google ScholarDigital Library
Menzies, T., Greenwald, J., and Frank, A. Data Mining Static Code Attributes to Learn Defect Predictors. IEEE Transactions on Software Engineering, 33 (2007), 2--13. Google ScholarDigital Library
D'Ambros, M., Lanza, M., and Robbes, R. Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empirical Softw. Engg., 17 (aug 2012), 531--577. Google ScholarDigital Library
Hall, T., Beecham, S., Bowes, D., Gray, D., and Counsell, S. A Systematic Literature Review on Fault Prediction Performance in Software Engineering. Software Engineering, IEEE Transactions on, 38 (2012), 1276--1304. Google ScholarDigital Library
Moser, R., Pedrycz, W., and Succi, G. A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In Proceedings of the 30th international conference on Software engineering (2008), ACM, 181--190. Google ScholarDigital Library
Pinzger, M., Nagappan, N., and Murphy, B. Can developer-module networks predict failures? In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering (2008), ACM, 2--12. Google ScholarDigital Library
Nagappan, N., Murphy, B., and Basili, V. The Influence of Organizational Structure on Software Quality: An Empirical Case Study. In Proceedings of the 30th International Conference on Software Engineering (2008), ACM, 521--530. Google ScholarDigital Library
Hassan, A. E. Predicting faults using the complexity of code changes. In Proceedings of the 31st International Conference on Software Engineering (2009), IEEE Computer Society, 78--88. Google ScholarDigital Library
Herzig, K. and Zeller, A. Mining cause-effect-chains from version histories. In Software Reliability Engineering (ISSRE), 2011 IEEE 22nd International Symposium on (2011), 60--69. Google ScholarDigital Library
Herzig, K., Just, S., Rau, A., and Zeller, A. Predicting Defects Using Change Genealogies. In Proceedings of the 2013 IEEE 24nd International Symposium on Software Reliability Engineering (2013), IEEE Computer Society.Google ScholarCross Ref
Hovsepyan, A., Scandariato, R., Joosen, W., and Walden, J. Software Vulnerability Prediction Using Text Analysis Techniques. In Proceedings of the 4th International Workshop on Security Measurements and Metrics (2012), ACM, 7--10. Google ScholarDigital Library
Shin, Y., Meneely, A., Williams, L., and Osborne, J. A. Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities. Software Engineering, IEEE Transactions on, 37 (2011), 772--787. Google ScholarDigital Library
Chowdhury, I. and Zulkernine, M. Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities. Journal of Systems Architecture, 57 (2011), 294--313. Google ScholarDigital Library
Giger, E., D'Ambros, M., Pinzger, M., and Gall, H. C. Method-level bug prediction. In Empirical Software Engineering and Measurement (ESEM), 2012 ACM-IEEE International Symposium on (2012), 171--180. Google ScholarDigital Library
Menzies, T., Dekhtyar, A., Distefano, J., and Greenwald, J. Problems with Precision: A Response to Comments on Data Mining Static Code Attributes to Learn Defect Predictors. Software Engineering, IEEE Transactions on, 33 (2007), 637--640. Google ScholarDigital Library
Premraj, R. and Herzig, K. Network Versus Code Metrics to Predict Defects: A Replication Study. In Proceedings of the 2011 International Symposium on Empirical Software Engineering and Measurement (2011), IEEE Computer Society, 215--224. Google ScholarDigital Library
Weyuker, E., Ostrand, T., and Bell, R. Comparing the effectiveness of several modeling methods for fault prediction. Empirical Software Engineering, 15, 277--295. Google ScholarDigital Library
Czerwonka, J., Nagappan, N., Schulte, W., and Murphy, B. CODEMINE: Building a Software Development Data Analytics Platform at Microsoft. Software, IEEE, 30, 4 (2013), 64--71. Google ScholarDigital Library
Four Grand Challenges in Trustworthy Computing., 2003.Google Scholar
Team, R. D. C. R: A Language and Environment for Statistical Computing., 2010. R Foundation for Statistical Computing.Google Scholar
Venables, W. N. and Ripley, B. D. Modern Applied Statistics with S. Fourth Edition. Springer, 2002. Google ScholarDigital Library
Pearson, K. LIII. On lines and planes of closest fit to systems of points in space. Philosophical Magazine Series 6, 2 (1901), 559--572.Google Scholar
Kuhn, M. caret: Classification and Regression Training., 2011.Google Scholar
Witten, I. H. and Frank, E. Data mining: practical machine learning tools and techniques with Java implementations. SIGMOD Rec., 31 (mar 2002), 76--77. Google ScholarDigital Library
Friedman, J., Hastie, T., and Tibshirani, R. The Elements of Statistical Learning. Springer Publishing Company, Incorporated, 2009.Google Scholar
Nagappan, N., Ball, T., and Zeller, A. Mining Metrics to Predict Component Failures. In Proceedings of the 28th International Conference on Software Engineering (2006), ACM, 452--461. Google ScholarDigital Library
Dowd, M., McDonald, J., and Schuh, J. The Art of Software Security Assessment: Identifying and Preventing Software Vulnerabilities. Addison-Wesley Professional, 2006. Google ScholarDigital Library
Smith, B. and Williams, L. Using SQL Hotspots in a Prioritization Heuristic for Detecting All Types of Web Application Vulnerabilities. In Software Testing, Verification and Validation (ICST), 2011 IEEE Fourth International Conference on (March 2011), 220--229. Google ScholarDigital Library
Chawla, N. V. C4. 5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure. In Proceedings of the ICML (2003).Google Scholar
Beautiful Evidence. 2006. Google ScholarDigital Library

Index Terms

Challenges with applying vulnerability prediction models

Recommendations

Searching for a Needle in a Haystack: Predicting Security Vulnerabilities for Windows Vista
ICST '10: Proceedings of the 2010 Third International Conference on Software Testing, Verification and Validation

Many factors are believed to increase the vulnerability of software system; for example, the more widely deployed or popular is a software system the more likely it is to be attacked. Early identification of defects has been a widely investigated topic ...
Read More
Evaluating the applicability of reliability prediction models between different software
IWPSE '02: Proceedings of the International Workshop on Principles of Software Evolution

The prediction of fault-prone modules in a large software system is an important part in software evolution. Since prediction models in past studies have been constructed and used for individual systems, it has not been practically investigated whether ...
Read More
Are Slice-Based Cohesion Metrics Actually Useful in Effort-Aware Post-Release Fault-Proneness Prediction? An Empirical Study
Background. Slice-based cohesion metrics leverage program slices with respect to the output variables of a module to quantify the strength of functional relatedness of the elements within the module. Although slice-based cohesion metrics have been ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
HotSoS '15: Proceedings of the 2015 Symposium and Bootcamp on the Science of Security
April 2015
170 pages
ISBN:9781450333764
DOI:10.1145/2746194
General Chair:
David Nicol
University of Illinois at Urbana-Champaign
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 April 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
churn
complexity
coverage
dependencies
metrics
prediction
vulnerabilities
Qualifiers
- research-article
Conference

Acceptance Rates
HotSoS '15 Paper Acceptance Rate13of22submissions,59%Overall Acceptance Rate34of60submissions,57%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 126
  Total Citations
  View Citations
- 807
  Total Downloads
- Downloads (Last 12 months)81
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Challenges with applying vulnerability prediction models

HotSoS '15: Proceedings of the 2015 Symposium and Bootcamp on the Science of Security

ABSTRACT

References

Cited By

Index Terms

Recommendations

Searching for a Needle in a Haystack: Predicting Security Vulnerabilities for Windows Vista

Evaluating the applicability of reliability prediction models between different software

Are Slice-Based Cohesion Metrics Actually Useful in Effort-Aware Post-Release Fault-Proneness Prediction? An Empirical Study

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Challenges with applying vulnerability prediction models

HotSoS '15: Proceedings of the 2015 Symposium and Bootcamp on the Science of Security

ABSTRACT

References

Cited By

Index Terms

Recommendations

Searching for a Needle in a Haystack: Predicting Security Vulnerabilities for Windows Vista

Evaluating the applicability of reliability prediction models between different software

Are Slice-Based Cohesion Metrics Actually Useful in Effort-Aware Post-Release Fault-Proneness Prediction? An Empirical Study

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media