Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities
Introduction
There is an increasing number of critical processes supported by software systems in the modern world. Think of the current prevalence of air-traffic control and online banking. When combined with the growing dependence of valuable assets (including human health and wealth, or even human lives) on the security and dependability of computer support for these processes, we see that secure software is a core requirement of the modern world. Unfortunately, there is an escalating number of incidences of software security failures. A security failure is a violation or deviation from the security policy, and a security policy is “a statement of what is, and what is not, allowed as far as security is concerned” [1]. WhiteHat Security Inc. found that nine out of ten websites had at least one security failure when they conducted a security assessment of over 600 public-facing and pre-production websites between January 1, 2006 and February 22, 2008 [2]. The number of security-related software failures reported to the Computer Emergency Response Team Coordination Center (CERT/CC) has increased fivefold over the past seven years [3].
Security failures in a software system are the mishaps we wish to avoid, but they could not occur without the presence of vulnerabilities in the underlying software. “A vulnerability is an instance of a fault in the specification, development, or configuration of software such that its execution can violate an implicit or explicit security policy” [31]. A fault is an accidental condition that, when executed, may cause a functional unit to fail to perform its required or expected function [18]. We use the term ‘fault’ to denote any software fault or defect, and reserve vulnerability for those exploitable faults which might lead to a security failure.
Vulnerabilities are generally introduced during the development of software. However, it is difficult to detect vulnerabilities until they manifest themselves as security failures in the operational stage of the software, because security concerns are not always addressed or known sufficiently early during the Software Development Life Cycle (SDLC). Therefore, it would be very useful to know the characteristics of software artifacts that can indicate post-release vulnerabilities – vulnerabilities that are uncovered by at least one security failure during the operational phase of the software. Such indications can help software managers and developers take proactive action against potential vulnerabilities. For our work, we use the term ‘vulnerability’ to denote post-release vulnerabilities only.
Software metrics are often used to assess the ability of software to achieve a predefined goal [4]. A software metric is a measure of some property of a piece of software. Complexity, coupling, and cohesion (CCC) can be measured during various software development phases and are used to evaluate the quality of software [21]. The term software complexity is often applied to the interaction between a program and a programmer working on some programming task [69]. In this context, complexity measures typically depend on program size and control structure, among many other factors. High complexity hinders program comprehension [69]. Coupling refers to the level of interconnection and dependency among software entities. Entities are said to be highly coupled when they depend on each other to such an extent that a change in one necessitates changes in others dependent upon it. Moreover, highly coupled entities are difficult to understand in isolation and reuse because dependant entities must be included. Cohesion refers to the degree that a particular entity provides a single functionality to the software system as a whole [21]. Highly cohesive entities, which have only one responsibility, are more desirable than weakly cohesive entities that do many operations and therefore are likely to be less maintainable and reusable.
Complexity, coupling and cohesion-related structural measurements pertain to software architecture because “the software architecture of a system is the structure or structures of the system, which comprises software elements, the externally visible properties of those elements, and the relationships among them” [71]. These metrics provide complementary solutions that are potentially useful for architecture evaluation [73], leading to more secured software design and code, and eventually more secured and dependable software.
Numerous studies [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [21], [69] show that high complexity and coupling and low cohesion make understanding, developing, testing, and maintaining software difficult, and, as a side effect, may introduce faults in software systems. Our intuition is that these may, as well, lead to introduction of vulnerabilities – weaknesses that can be exploited by malicious users to compromise a software system. In fact, in one of our previous studies, we have shown that high coupling is likely to increase damage propagation when a system gets compromised [35].
Although CCC metrics have been successfully employed to indicate faults in general [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], the efficacy of these metrics to indicate vulnerabilities has not yet been extensively investigated. A very few works associate complexity and coupling with vulnerabilities. Shin and William [31], [32], [33] investigate how vulnerabilities can be inferred from (only) code complexity. A study by Traroe et al. [30] uses the notion of “service coupling”, a measurement specific to service-oriented architecture. The effect of cohesion on vulnerabilities has never been studied before.
In this work, we explore how the likelihood of having vulnerabilities is affected by all three aforementioned aspects – complexity, coupling, and cohesion. This study incorporates some standard and traditional CCC metrics to CCC metrics for object-oriented architecture. Our objective is to investigate whether structural information from the non-security realm such as complexity, coupling, and cohesion metrics can be helpful in automatically predicting vulnerabilities in software.
The principal contributions of this research can be summarized as follows. First, a systematic framework to automatically predict vulnerability-prone entities from CCC metrics is proposed. Second, statistical and machine learning techniques are used to build the vulnerability predictors. In doing so, we compare the prediction performances of four alternative techniques, namely C4.5 Decision Tree, Random Forests, Logistic Regression and Naïve-Bayes. Among these, C4.5 Decision Tree, Random Forests, and Naïve-Bayes have not been applied in any kind of vulnerability prediction before. Third, an extensive empirical study is conducted on fifty-two releases of Mozilla Firefox [39] to validate the usefulness of CCC metrics in vulnerability prediction. In doing so, we provide a tool to automatically map vulnerabilities to entities by extracting information from software repositories such as security advisories, bug databases, and concurrent version systems.
The major implications of this research are as follows. First, automatic predictions of vulnerabilities will assist software practitioners in taking preventive actions against potential vulnerabilities during the early stages of the software lifecycle. Therefore, there will be a shift from reactive to proactive approach to deal with vulnerabilities. Another implication of this research is that techniques to automatically predict fault-prone entities from CCC metrics can be adopted or leveraged to automatically predict vulnerable-prone entities as well, which has not been systematically done as of now. However, the results might not necessarily be the same as for software fault prediction. Although vulnerabilities can be viewed as exploitable faults in software, there is a need to specifically investigate the efficacy of predicting vulnerabilities from CCC metrics. Research has shown that vulnerable entities have distinctive characteristics from faulty-but-non-vulnerable entities in terms of code characteristics [32], [33], [34]. Moreover, it has been found that prediction of vulnerable functions from all functions provides better results than prediction of vulnerable functions from faulty functions [35]. Finally, it is implied that robust architecture, and quality design and code are important for security and dependability. Hence, a relationship with the CCC metrics to the vulnerabilities can lead to conception of more secured software architecture, design and code, and eventually more secured and dependable software.
The rest of the paper is organized as follows. In Section 2, we present the framework to predict vulnerability using CCC metrics. In Section 3, we provide background on CCC metrics and give brief overviews of the statistical and machine learning techniques used for vulnerability prediction. In Section 4, we discuss in detail how to predict vulnerability-prone entities using the framework. In Section 5, we report the vulnerability prediction results and discuss the implications of the results. Section 6 compares and contrasts the related work on fault and vulnerability prediction. Finally, we conclude the paper, discuss some limitations of our approaches, and outline avenues for future work in Section 7.
Section snippets
Overview of vulnerability-prediction framework
There are two main approaches to software vulnerability prediction. First, count-based techniques focus on predicting the number of vulnerabilities in a software system. Managers can use these predictions to determine if the software is ready for release or if it is likely to have many lurking vulnerabilities. An example of such work is [28]. Second, classification
Background
This section provides background on complexity, coupling, and cohesion (CCC) metrics that are hypothesized to affect vulnerability-proneness. It also furnishes brief overviews of the statistical and machine learning techniques used in this study to predict vulnerabilities.
Predicting vulnerabilities
This section describes how to predict vulnerability-prone entities in software as outlined by the framework initially presented in Fig. 1 of Section 2. As an empirical evaluation of the framework, we conduct case studies on Mozilla Firefox to predict its vulnerability-prone files. This section begins by providing an overview of Mozilla Firefox (the source of data for our empirical evaluation). Then, in Section 4.2, we explain the dependent and independent variables of the prediction task at
Results and discussion
This section presents the results of predicting vulnerability-prone files in Mozilla Firefox based on their complexity, coupling, and cohesion (CCC) metrics. These results will help us quantitatively evaluate the usefulness of using CCC metrics for vulnerability prediction.
Related work
The related research is presented in three parts. First, we describe the research on fault prediction using complexity, coupling, and cohesion metrics [6], [7], [8], [9], [10], [11], [12]. Second, we compare and contrast recent work that predicts vulnerabilities from complexity and coupling metrics [30], [31], [32], [33]. Finally, we describe some studies that use other phenomena (e.g., import patterns or past vulnerabilities) to identify the vulnerable components in a software system [28], [29]
Conclusions
In this work, we investigate the efficacy of applying complexity, coupling, and cohesion metrics to automatically predict vulnerability-prone entities in software systems. We use four alternative statistical and machine learning techniques to build vulnerability predictors that learn from the CCC metrics and vulnerability history. The techniques are C4.5 Decision Trees, Random Forests, Logistic Regression, and Naïve-Bayes. We conduct an extensive empirical study on Mozilla Firefox to
Acknowledgments
This research is partially funded by the Natural Sciences and Engineering Research Council of Canada (NSERC). The authors wish to thank Stephan Neuhaus of Saarland University, Saarbrücken, Germany for sharing his dataset (as of January 4th, 2007) on vulnerabilities in Mozilla Firefox and for his suggestions on how to obtain an updated dataset. We also thank Yonghee Shin of North Carolina State University, Raleigh, NC, USA for taking the time to answer our queries about her technique of
Istehad Chowdhury is currently a research intern in Cloakware Inc., Canada. He received his M.Sc. degree from the Department of Electrical and Computer Engineering of Queen’s University, Canada in 2009, where he was a research assistant and a member of Queen’s Reliable Software Technology (QRST) research group. He received his B.Sc. degree in Computer Science from Independent University, Bangladesh in 2005. Before joining Queen’s, he was a lecturer in the Department of Computer Science of
References (71)
- et al.
Identification of defect-prone classes in telecommunication software systems using design metrics
Journal of Systems and Software
(2006) - et al.
Practical assessment of the models for identification of defect-prone classes in object-oriented commercial systems using design metrics
Journal of Systems and Software
(2003) - et al.
The prediction of faulty classes using object-oriented design metrics
Journal of Systems and Software
(2001) - et al.
Predicting defect-prone software modules using support vector machines
Journal of Systems and Software
(2008) - et al.
Measuring, analyzing and predicting security vulnerabilities in software systems
Computers and Security
(2007) - M. Bishop, Computer Security: Art and Science, Addison-Wesley, Boston, MA,...
- J. Grossman, Website Vulnerabilities Revealed: What everyone knew, but afraid to believe, White Hat Security Inc.,...
- Computer Emergency Response Team Coordination Center (CERT/CC), <http://www.cert.org/stats/cert_stats.html> (accessed...
- Jaquith, Security Metrics: Replacing Fear, Uncertainty, and Doubt, Pearson Education Inc. Upper Saddle River, NJ,...
- E. Damiani, C.A. Ardagna, N.E. Ioini, Open Source Systems Security Certification, Springer,...
An empirical comparison and characterization of high defect and high complexity modules
Journal of Systems and Software
An empirical investigation of an object-oriented software system
IEEE Transactions on Software Engineering
A validation of object-oriented design metrics as quality indicators
IEEE Transactions on Software Engineering
Data mining static code attributes to learn defect predictors
IEEE Transactions on Software Engineering
A composite complexity approach for software defect modelling
Software Quality Journal
A probabilistic model for software defect prediction
IEEE Transactions on Software Engineering
A metrics suite for object oriented design
IEEE Transactions on Software Engineering
Software Metrics: A Rigorous and Practical Approach
A complexity measure
IEEE Transactions on Software Engineering
A complexity measure based on nesting level
ACM Sigplan Notices
Software structure metrics based on information flow
IEEE Transactions on Software Engineering
Cited by (0)
Istehad Chowdhury is currently a research intern in Cloakware Inc., Canada. He received his M.Sc. degree from the Department of Electrical and Computer Engineering of Queen’s University, Canada in 2009, where he was a research assistant and a member of Queen’s Reliable Software Technology (QRST) research group. He received his B.Sc. degree in Computer Science from Independent University, Bangladesh in 2005. Before joining Queen’s, he was a lecturer in the Department of Computer Science of Stamford University, Bangladesh. He has been a member of ACM since 2001. His main research interest lies in the area of software engineering with special interest in software reliability and security, software metrics, empirical software engineering, and mining software repositories. More information about his research and publications can be found at http://www.cs.queensu.ca/~istehad.
Mohammad Zulkernine is a faculty member of the School of Computing of Queen’s University, Canada, where he leads the Queen’s Reliable Software Technology (QRST) research group. He received his B.Sc. in Computer Science and Engineering from Bangladesh University of Engineering and Technology in 1993. Dr. Zulkernine received an M. Eng. in Computer Science and Systems Engineering from Muroran Institute of Technology, Japan in 1998. He received his Ph.D. from the Department of Electrical and Computer Engineering of the University of Waterloo, Canada in 2003, where he belonged to the university’s Bell Canada Software Reliability Laboratory. Dr. Zulkernine’s research focuses on software engineering (software reliability and security), automatic software monitoring and intrusion detection, methods and tools for reliable and secure software. His research work are funded by a number of provincial and federal research organizations of Canada, while he is having an industry research partnership with Bell Canada. He is a senior member of the IEEE and a member of the ACM. Dr. Zulkernine is also cross-appointed in the Department of Electrical and Computer Engineering of Queen’s University, and a licensed professional engineer of the province of Ontario, Canada.