Elsevier

Applied Soft Computing

Volume 67, June 2018, Pages 540-557
Applied Soft Computing

Uncertainty management in software effort estimation using a consistent fuzzy analogy-based method

https://doi.org/10.1016/j.asoc.2018.03.022Get rights and content

Highlights

  • New C-FASEE method is proposed to better managing for uncertainty in software effort estimation environment.

  • The quality of software drivers is enhanced using a consistent fuzzy representation.

  • Uncertainty quantification by providing a possibility distribution of possible effort values can take the new project.

  • The new estimation model derive both the fuzzy estimation and the crisp estimation.

  • Experiment studies show a good estimation accuracy of the proposal and significant difference to comparison methods.

Abstract

Software effort estimation is a critical task in software project development management. Unfortunately, the uncertainty and inaccuracy are inherent properties of the software effort estimation environment. These are caused by the limited capabilities of the managers, to foresee, measure and describe factors influencing the software effort. The promising Fuzzy Analogy-based Software Effort Estimation model (FASEE) employs successfully fuzzy logic with approximate reasoning theory to handle imprecision and reasoning under uncertainty. Also, FASEE use possibility distribution to quantify the uncertainty in the estimate that aid the software managers to assess risks. Yet, the FASEE suffer from the low data quality and the uncertainty induced in the reasoning process. In this paper, we propose an enhancement of the FASEE, by imposing consistency criteria to deal with the aforementioned drawbacks. So, the underlying model, called Consistent Fuzzy Analogy-based Software Effort Estimation (C-FASEE) is endowed with two capabilities. The first one introduces consistency criteria in attribute representation by fuzzy sets to enable fitting each attribute to the software effort. The second one introduces a new relation of confidence to measure the extent that the resulted most similar projects respect the assumption “similar projects have similar efforts”. Moreover, the C-FASEE method provide a fuzzy estimate of the most possible fuzzy set will the true effort of the new software project falls in. This allow to the software manager to assess risks more optimally. The proposed C-FASEE is validated over thirteen software project datasets that represent different complexities. The obtained results are compared to variant methods of the analogy-based software effort estimation approach. The experimental results show that our proposal provides a good estimation accuracy of and has significantly best performance against the comparison methods.

Introduction

Software effort estimation is one of the most important activities in software project development management. It is crucial to optimal planning and is important for controlling the software development process. Indeed, the overestimation can lead to the misuse of development resources, while underestimation can lead to a lower quality product. To carry out accurate estimates, several software effort prediction techniques have been suggested. They fall into two major categories: parametric models, which are derived from the statistical and/or numerical analysis of historical project data, and machine learning (ML) models, which are based on a set of artificial intelligence techniques such as artificial neural networks (ANN), genetic algorithms (GA), analogy-based or case based reasoning (CBR), decision trees, and genetic programming. Many studies have reported that none of these approaches have proven to be consistently successful at predicting software effort [1]. However, they confirmed that analogy-based software effort estimation (ASEE) [[2], [3]] is a viable alternative to other conventional estimation methods.

ASEE approach is essentially a form of Case-Based Reasoning [4], which is based on the assumption that “similar projects have similar costs”. ASEE method infers an estimate for the effort of a new software project by identifying a set of relevant attributes to describe the new project, retrieves similar cases based on a distance function from historical projects and derives an estimate by using the effort values of the closest historical projects. Effort estimation by analogy has two main advantages. First, it is an intuitive method that can be understood and explained to users (as opposed to black-box approaches like neural networks) [[2], [5]]; second, it can be used to model complex and nonlinear relationships between the dependent variable (i.e. effort or cost) and the independent variables (i.e. cost drivers) [[3], [6], [7], [8], [9], [10]]. Despite to the aforementioned advantages, existing ASEE techniques are still suffering from some difficult issues. First, they are limited by their inability to handle imprecision and uncertainty when describing software attributes and deriving estimates for new projects [[6], [11], [12]]. Second, they cannot handle correctly non quantitative attributes other than binary valued variables and missing values [[7], [8], [12], [14]]. Third, they are sensitive to irrelevant and redundant attributes and the degree of attribute influence on the effort estimation [[11], [12], [15], [16]]. Finally, their usability by practitioners is still limited [15]. Accordingly, many labors were dedicated to the enhancement of the performances of ASEE techniques and avoiding the above-mentioned weakness. As a result, various paradigms were used in combination with the ASEE techniques to overcome several challenges related to feature and case selection, similarity measures, and adaptation strategies [[12], [17], [18], [19], [20]]. In the systematic mapping and review conducted by [[15], [21]], the authors reported that the best improvements are usually obtained when combining ASEE techniques with statistical methods (SM) and fuzzy logic (FL). In fact, the SM methods are mainly combined with ASEE in optimization perspectives such as feature and case selection [[22], [23]] or prediction interval computation [24]. On the other side, the FL is used to address major issues of software effort estimation that are figured in imprecision and uncertainty management. In fact, FL provides a natural framework to deal efficiently with linguistic values, missing values and approximate reasoning. In this context, Fuzzy Analogy-based Effort Estimation (FASEE) [[7], [25]] seems to be a promising model to deal with imprecision and uncertainty, which subject to various enhancements to capture most of the above other challenges.

The FASEE extends the classical ASEE by introducing the fuzzy set theory and linguistic quantifier. This allows the ASEE to be able to address the uncertainty caused by imprecise, ambiguous, incomplete and approximate information. Besides, the FASEE method adopts the theory of approximate reasoning introduced by Zadeh [26], which is a powerful approach in reasoning under uncertainty. The approximate reasoning theory is characterized by three main features: fuzzy sets as the primary knowledge structure, fuzziness in reasoning process and non-unique solution. In this respect, the authors of FASEE use fuzzy sets to describe the software projects, propose a fuzzy similarity measure based on parameterized Regular Increasing Monotone (RIM) linguistic quantifiers [27], use the fuzzy set “most similar” to select the closest historical projects to the new project, and finally derive a crisp estimate for the new project based on the efforts of the closest historical projects.

Subsequently, the FASEE approach has gained several enhancements to address several limitations. the authors of [[28], [29]] propose an extension of the FASEE approach to correctly handle the categorical attributes. Also, a substantiated guideline, is provided by [30], for constructing a FASEE model using datasets with missing values. Moreover, the authors of [31] provide an enhancement of FASEE by introducing a learning technique in the adaptation step, which learn the adequate number of similar projects that will be used to derive an optimal estimate for a new project. Another work by [32] investigated the potential of Fuzzy and Classical Analogy ensemble techniques in increasing the estimation accuracy.

Although the FASEE model endow with the capability of uncertainty management during the estimation process and derive an accurate estimate against the classical ASEE model, it suffers from the issue of uncertainty management in the derived estimate. Indeed, Kitchenham and Linkman [33], and Adam [34] provide rich studies about uncertainty and risks in the software effort estimation environment. The authors assert that software effort estimates are certain to be uncertain and inaccurate. They state that the cause of these deficiencies is coming from the limited capabilities of the managers to: include all the software effort drivers in the estimation model, anticipate the changes that can occur in some software effort drivers, measure software effort drivers by exact values – in fact the effort drivers are usually measured by vague linguistic terms like “low, medium or high”–. Accordingly, they formalize these limitations into four main sources of uncertainty: measurement error, modeling error, assumption error and scope error. This leads us to say that, measuring and managing uncertainty and imprecision in the estimated software effort should be the heart of building approaches of software effort estimation models. It means that the software estimation model should infer a set of estimates with a probability distribution rather than one uncertain estimate, which could lead to wrong managerial decisions and project failure.

From this perspective, the authors of [14], propose a study that investigates the capability of the FASEE model to deal with the two first sources of uncertainty. It is shown that describing effort drivers by fuzzy sets and using a fuzzy similarity measure, reduce the effect of measurement error. In other side, the authors suggest to adopt a non-deterministic adaptation technique, based on the assumption “similar project have possibly similar costs”. So, the estimates of the effort are obtained with a possibility distribution based on similarity scores; the final estimate is then derived according to the associated possibility degree. In spite of the advantages of this study to build an estimation model that provides a useful information (i.e., a possibility distribution) to the software managers for risk assessment, it suffers from some drawbacks:

  • Uncertain possibility: the used possibility distribution is built based only on the similarity scores of the closest projects, where the obtained similarity degrees are uncertain and there is no information to support their correctness.

  • Ambiguous possibility: the FASEE approach doesn’t allow an explanation with the ambiguous possibility distribution in the sense that it can provide different costs with the same possibility degrees.

  • Incoherent fuzzy sets: the extracted fuzzy sets for each attribute doesn’t provide any knowledge about the software effort; this makes the attribute representation independent of the effort.

In this work, our aim is to provide a new FASEE approach to deal with the FASEE uncertainty management drawbacks and allow to infer the most optimal and specific possibility distribution. Besides, the possibility distribution is considered as a faithful representation of partial ignorance [35]. Also, it can be represented by a fuzzy set of the most possible set of values a software effort can take [36]. In this context, we propose a new Consistent FASEE model (C-FASEE). The proposed approach focused in two main facts: First ones, the accuracy of the estimation provided by the model is directly related to the quality of information upon which the estimate is based. Second ones, the imperfection of the estimation model often induces uncertainty in the obtained results. In this respect, we involve two consistency criteria on the new C-FASEE model in order to:

  • Increasing the data quality of the historical projects by fitting the software attributes to the software effort. Hence, we exploit the fuzzy set extraction in the identification step to provide a consistent representation of software attributes. The new fuzzy set extraction method is based in the assumption “A fuzzy representation of an attribute (cost driver) must include knowledge about the target attribute (the effort)”. Using this assumption the fuzzy sets associated to an attribute will be consistent with the software effort, in the sense that any fuzzy sets must describe as much as possible a compact set of software effort values. In this way, as the consistency criteria is optimized in the fuzzy set extraction process as the attribute fitted more to the software effort.

  • Reducing the uncertainty in the estimation model by introducing a new adaptation strategy. Indeed, the similarity relation suffers from the issues of imprecision and partial knowledge, which results to similar projects with distinct software efforts. In our new adaptation strategy, we introduce a confidence relation that measures the extent that the resulted most similar projects respect the assumption “similar projects have similar efforts”. Subsequently, we calculate a more consistent possibility distribution of most possible effort values can take the new project, based on both similarity and confidence relations. Finally, based on the resulted possibility distribution, the adaptation step derives two estimates: i) fuzzy estimate, which consists of the surest fuzzy subset that the new project effort falls in such as low, medium, high or very high. ii) crisp estimate, which defines the most likely effort value that can take the new project.

By validating the two consistency needs, the new C-FASEE approach becomes more advantageous in dealing with the sources of uncertainty, since it enhances software project attributes relevance, and introduces the confidence relation to ignore the misleading similar projects. Also, by deriving the fuzzy estimate, we offer to the software managers a clear and interpretable quantification of uncertainty in the estimate. The experimental study was conducted over thirteen software project datasets with different complexities. The experiments confirm that our proposal enhances the capabilities of the FASEE model in terms of the accuracy and the management of uncertainty, and show that our method performs better than the literature methods.

The remainder of this paper is organized as follows. Section 2 presents the related works, which describes the popular studies about uncertainty management on software effort estimation. Section 3 provides an overview of fuzzy analogy. Section 4 presents the proposed C-FASEE. Section 5 describes the experimental design. Section 6 presents and discusses the results of the experiment. Finally, conclusions and future work are presented in Section 7.

Section snippets

Related works

Uncertainty and inaccuracy are major issues in software effort estimation. The information upon which software effort estimation models are based, are unpredictable, incomplete, uncertain, ambiguous, imprecise, or even contradicting. Then, measuring and managing uncertainty and inaccuracy lies at the heart of good software effort estimation approach. In this respect, including uncertainty management mechanisms in the software effort estimation process becomes an inherent element of an efficient

Overview of fuzzy analogy-based software effort estimation

The generalized fuzzy analogy software effort estimation approach involves three steps (Fig. 2): identification of cases, retrieval of similar cases, and case adaptation. Each step is a fuzzification of its equivalent in the classical analogy procedure. The description of each of these steps is given in the following subsections.

C-FASEE: a new approach for uncertainty management in effort estimation

In this section, we describe an enhanced model of FASEE, called Consistent-Fuzzy Analogy based Software Effort Estimation (C-FASEE). The proposal attempts to integrate powerful uncertainty management mechanisms in the whole process of the analogy-based estimation approach. So, in Section 4.1 we present the general architecture of C-FASEE. Then, a new fuzzy set extraction method is described in Section 4.2. Finally, Section 4.3 presents the new adaptation technique for C-FASEE.

Experiments configuration

This section presents the experimental configuration designed to analyze our proposed C-FASEE. We present useful materials and tools for evaluating the estimation accuracy, analyzing the behavior of the resulting estimation model and facilitating the comparison of their performances with other estimation models. This configuration includes the datasets, the Performance measure, the Comparison estimation model and statistical procedure, and the experiment setup.

Experiments results

The aim of the presented results is to highlight the positive impact of uncertainty management using consistency criteria to the C-FASEE estimation accuracy. Subsequently, in the next section we run an illustration of C-FASEE model new software project estimation on Albrecht dataset. Next, we evaluate the performance of the proposal and the parameter significance. Then, we perform a comparative study of the SA between our proposal and the aforementioned comparison methods.

Conclusion

In this paper, we propose a new approach of uncertainty management in fuzzy analogy software effort estimation. The proposal introduces consistency criteria in the estimation model, in order to increase the data quality and infer a consistent possibility distribution. In this respect the proposed C-FASEE is dedicated with three main enhancements:

  • The fuzzification of the software project attributes has the goal to extract a consistent fuzzy sets that fit the software effort.

  • The introduction of a

References (59)

  • F.J. Heemstra

    Software cost estimation

    Inf. Softw. Technol.

    (1992)
  • N.-H. Chiu et al.

    The adjusted analogy-based software effort estimation based on similarity distances

    J. Syst. Softw.

    (2007)
  • M. Jørgensen et al.

    Software effort estimation by analogy and regression toward the mean

    J. Syst. Softw.

    (2003)
  • M. Azzeh et al.

    An empirical evaluation of ensemble adjustment methods for analogy-based effort estimation

    J. Syst. Softw.

    (2015)
  • Y.F. Li et al.

    A study of mutual information based feature selection for case based reasoning in software cost estimation

    Expert Syst. Appl.

    (2009)
  • S.-M. Chen et al.

    Fuzzy classification systems based on fuzzy information gain measures

    Expert Syst. Appl.

    (2009)
  • J.C. Bezdek et al.

    FCM: the fuzzy c-means clustering algorithm

    Comput. Geosci.

    (1984)
  • R.R. Yager

    Aggregation operators and fuzzy systems modeling

    Fuzzy Sets Syst.

    (1994)
  • M. Shepperd et al.

    Evaluating prediction systems in software project estimation

    Inf. Softw. Technol.

    (2012)
  • M. Shepperd et al.

    Estimating software project effort using analogies

    IEEE Trans. Softw. Eng.

    (1997)
  • F. Walkerden et al.

    An empirical study of analogy-based software effort estimation

    Empir. Softw. Eng.

    (1999)
  • R.L. de Mantaras

    Case-based reasoning

    Machine Learning and Its Applications

    (2001)
  • L. Angelis et al.

    A simulation tool for efficient analogy based cost estimation

    Empir. Softw. Eng.

    (2000)
  • M. Azzeh et al.

    Software project similarity measurement based on fuzzy C-means

    International Conference on Software Process

    (2008)
  • A. Idri et al.

    Estimating software project effort by analogy based on linguistic values

  • A. Idri et al.

    Software cost estimation by fuzzy analogy for Web hypermedia applications

  • J. Li et al.

    Impact analysis of missing values on the prediction accuracy of analogy-based software effort estimation method AQUA

    First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007)

    (2007)
  • J. Li et al.

    A flexible method for software effort estimation by analogy

    Empir. Softw. Eng.

    (2007)
  • M. Azzeh

    A replicated assessment and comparison of adaptation techniques for analogy-based effort estimation

    Empir. Softw. Eng.

    (2012)
  • Cited by (0)

    View full text