ABSTRACT
Vulnerability prediction models (VPM) are believed to hold promise for providing software engineers guidance on where to prioritize precious verification resources to search for vulnerabilities. However, while Microsoft product teams have adopted defect prediction models, they have not adopted vulnerability prediction models (VPMs). The goal of this research is to measure whether vulnerability prediction models built using standard recommendations perform well enough to provide actionable results for engineering resource allocation. We define 'actionable' in terms of the inspection effort required to evaluate model results. We replicated a VPM for two releases of the Windows Operating System, varying model granularity and statistical learners. We reproduced binary-level prediction precision (~0.75) and recall (~0.2). However, binaries often exceed 1 million lines of code, too large to practically inspect, and engineers expressed preference for source file level predictions. Our source file level models yield precision below 0.5 and recall below 0.2. We suggest that VPMs must be refined to achieve actionable performance, possibly through security-specific metrics.
- Zimmermann, T., Nagappan, N., and Williams, L. Searching for a Needle in a Haystack: Predicting Security Vulnerabilities for Windows Vista. In Software Testing, Verification and Validation (ICST), 2010 Third International Conference on (2010), 421--428. Google ScholarDigital Library
- Howard, M. and Lipner, S. The Security Development Lifecycle. Microsoft Press, 2006. Google ScholarDigital Library
- Basili, V. R., Briand, L. C., and Melo, W. L. A validation of object-oriented design metrics as quality indicators. Software Engineering, IEEE Transactions on, 22 (1996), 751--761. Google ScholarDigital Library
- Emam, K. E., Melo, W., and Machado, J. C. The prediction of faulty classes using object-oriented design metrics. J. Syst. Softw., 56 (feb 2001), 63--75. Google ScholarDigital Library
- Nagappan, N. and Ball, T. Use of relative code churn measures to predict system defect density. In Software Engineering, 2005. ICSE 2005. Proceedings. 27th International Conference on (2005), 284--292. Google ScholarDigital Library
- Gegick, M., Williams, L., Osborne, J., and Vouk, M. Prioritizing software security fortification throughcode-level metrics. In Proceedings of the 4th ACM workshop on Quality of protection (2008), ACM, 31--38. Google ScholarDigital Library
- Shin, Y. and Williams, L. Can traditional fault prediction models be used for vulnerability prediction? Empirical Software Engineering, 18 (2013), 25--59.Google ScholarCross Ref
- Neuhaus, S., Zimmermann, T., Holler, C., and Zeller, A. Predicting vulnerable software components. In Proceedings of the 14th ACM conference on Computer and communications security (2007), ACM, 529--540. Google ScholarDigital Library
- Zimmermann, T. and Nagappan, N. Predicting defects using network analysis on dependency graphs. In Proceedings of the 30th international conference on Software engineering (2008), ACM, 531--540. Google ScholarDigital Library
- Arisholm, E. and Briand, L. C. Predicting Fault-prone Components in a Java Legacy System. In Proceedings of the 2006 ACM/IEEE International Symposium on Empirical Software Engineering (2006), ACM, 8--17. Google ScholarDigital Library
- Mende, T. and Koschke, R. Revisiting the Evaluation of Defect Prediction Models. In Proceedings of the 5th International Conference on Predictor Models in Software Engineering (2009), ACM, 7:1--7:10. Google ScholarDigital Library
- Menzies, T., Greenwald, J., and Frank, A. Data Mining Static Code Attributes to Learn Defect Predictors. IEEE Transactions on Software Engineering, 33 (2007), 2--13. Google ScholarDigital Library
- D'Ambros, M., Lanza, M., and Robbes, R. Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empirical Softw. Engg., 17 (aug 2012), 531--577. Google ScholarDigital Library
- Hall, T., Beecham, S., Bowes, D., Gray, D., and Counsell, S. A Systematic Literature Review on Fault Prediction Performance in Software Engineering. Software Engineering, IEEE Transactions on, 38 (2012), 1276--1304. Google ScholarDigital Library
- Moser, R., Pedrycz, W., and Succi, G. A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In Proceedings of the 30th international conference on Software engineering (2008), ACM, 181--190. Google ScholarDigital Library
- Pinzger, M., Nagappan, N., and Murphy, B. Can developer-module networks predict failures? In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering (2008), ACM, 2--12. Google ScholarDigital Library
- Nagappan, N., Murphy, B., and Basili, V. The Influence of Organizational Structure on Software Quality: An Empirical Case Study. In Proceedings of the 30th International Conference on Software Engineering (2008), ACM, 521--530. Google ScholarDigital Library
- Hassan, A. E. Predicting faults using the complexity of code changes. In Proceedings of the 31st International Conference on Software Engineering (2009), IEEE Computer Society, 78--88. Google ScholarDigital Library
- Herzig, K. and Zeller, A. Mining cause-effect-chains from version histories. In Software Reliability Engineering (ISSRE), 2011 IEEE 22nd International Symposium on (2011), 60--69. Google ScholarDigital Library
- Herzig, K., Just, S., Rau, A., and Zeller, A. Predicting Defects Using Change Genealogies. In Proceedings of the 2013 IEEE 24nd International Symposium on Software Reliability Engineering (2013), IEEE Computer Society.Google ScholarCross Ref
- Hovsepyan, A., Scandariato, R., Joosen, W., and Walden, J. Software Vulnerability Prediction Using Text Analysis Techniques. In Proceedings of the 4th International Workshop on Security Measurements and Metrics (2012), ACM, 7--10. Google ScholarDigital Library
- Shin, Y., Meneely, A., Williams, L., and Osborne, J. A. Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities. Software Engineering, IEEE Transactions on, 37 (2011), 772--787. Google ScholarDigital Library
- Chowdhury, I. and Zulkernine, M. Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities. Journal of Systems Architecture, 57 (2011), 294--313. Google ScholarDigital Library
- Giger, E., D'Ambros, M., Pinzger, M., and Gall, H. C. Method-level bug prediction. In Empirical Software Engineering and Measurement (ESEM), 2012 ACM-IEEE International Symposium on (2012), 171--180. Google ScholarDigital Library
- Menzies, T., Dekhtyar, A., Distefano, J., and Greenwald, J. Problems with Precision: A Response to Comments on Data Mining Static Code Attributes to Learn Defect Predictors. Software Engineering, IEEE Transactions on, 33 (2007), 637--640. Google ScholarDigital Library
- Premraj, R. and Herzig, K. Network Versus Code Metrics to Predict Defects: A Replication Study. In Proceedings of the 2011 International Symposium on Empirical Software Engineering and Measurement (2011), IEEE Computer Society, 215--224. Google ScholarDigital Library
- Weyuker, E., Ostrand, T., and Bell, R. Comparing the effectiveness of several modeling methods for fault prediction. Empirical Software Engineering, 15, 277--295. Google ScholarDigital Library
- Czerwonka, J., Nagappan, N., Schulte, W., and Murphy, B. CODEMINE: Building a Software Development Data Analytics Platform at Microsoft. Software, IEEE, 30, 4 (2013), 64--71. Google ScholarDigital Library
- Four Grand Challenges in Trustworthy Computing., 2003.Google Scholar
- Team, R. D. C. R: A Language and Environment for Statistical Computing., 2010. R Foundation for Statistical Computing.Google Scholar
- Venables, W. N. and Ripley, B. D. Modern Applied Statistics with S. Fourth Edition. Springer, 2002. Google ScholarDigital Library
- Pearson, K. LIII. On lines and planes of closest fit to systems of points in space. Philosophical Magazine Series 6, 2 (1901), 559--572.Google Scholar
- Kuhn, M. caret: Classification and Regression Training., 2011.Google Scholar
- Witten, I. H. and Frank, E. Data mining: practical machine learning tools and techniques with Java implementations. SIGMOD Rec., 31 (mar 2002), 76--77. Google ScholarDigital Library
- Friedman, J., Hastie, T., and Tibshirani, R. The Elements of Statistical Learning. Springer Publishing Company, Incorporated, 2009.Google Scholar
- Nagappan, N., Ball, T., and Zeller, A. Mining Metrics to Predict Component Failures. In Proceedings of the 28th International Conference on Software Engineering (2006), ACM, 452--461. Google ScholarDigital Library
- Dowd, M., McDonald, J., and Schuh, J. The Art of Software Security Assessment: Identifying and Preventing Software Vulnerabilities. Addison-Wesley Professional, 2006. Google ScholarDigital Library
- Smith, B. and Williams, L. Using SQL Hotspots in a Prioritization Heuristic for Detecting All Types of Web Application Vulnerabilities. In Software Testing, Verification and Validation (ICST), 2011 IEEE Fourth International Conference on (March 2011), 220--229. Google ScholarDigital Library
- Chawla, N. V. C4. 5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure. In Proceedings of the ICML (2003).Google Scholar
- Beautiful Evidence. 2006. Google ScholarDigital Library
Index Terms
- Challenges with applying vulnerability prediction models
Recommendations
Searching for a Needle in a Haystack: Predicting Security Vulnerabilities for Windows Vista
ICST '10: Proceedings of the 2010 Third International Conference on Software Testing, Verification and ValidationMany factors are believed to increase the vulnerability of software system; for example, the more widely deployed or popular is a software system the more likely it is to be attacked. Early identification of defects has been a widely investigated topic ...
Evaluating the applicability of reliability prediction models between different software
IWPSE '02: Proceedings of the International Workshop on Principles of Software EvolutionThe prediction of fault-prone modules in a large software system is an important part in software evolution. Since prediction models in past studies have been constructed and used for individual systems, it has not been practically investigated whether ...
Are Slice-Based Cohesion Metrics Actually Useful in Effort-Aware Post-Release Fault-Proneness Prediction? An Empirical Study
Background. Slice-based cohesion metrics leverage program slices with respect to the output variables of a module to quantify the strength of functional relatedness of the elements within the module. Although slice-based cohesion metrics have been ...
Comments