Skip to main content
Top
Published in: Empirical Software Engineering 2/2023

01-03-2023

Evaluating ensemble imputation in software effort estimation

Authors: Ibtissam Abnane, Ali Idri, Imane Chlioui, Alain Abran

Published in: Empirical Software Engineering | Issue 2/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Choosing the appropriate missing data (MD) imputation technique for a given software development effort estimation (SDEE) technique is not a trivial task. In fact, the impact of MD imputation on the estimation output depends on the dataset and the SDEE technique used, and there is no best imputation technique in all contexts. Thus, an attractive solution is to use more than one imputation technique and combine their results to obtain a final imputation outcome. This concept is called ensemble imputation and can significantly improve the effort estimation accuracy. This study proposes and constructs 11 heterogeneous ensemble imputation techniques, whose members are two, three, or four of the following single imputation techniques: K-nearest neighbors, expectation maximization, support vector regression (SVR) and decision trees (DTs). The effects of single/ensemble imputation techniques on SDEE performance were evaluated over six SDEE datasets: COCOMO81, ISBSG, Desharnais, China, Kemerer, and Miyazaki. Five SDEE performance measures were used: standardized accuracy (SA), predictor at 25% (Pred (0.25)), mean balanced relative error (MBRE), mean inverted balanced relative error (MIBRE), and logarithmic standard deviation (LSD). Moreover, we used: (1) the Skott-Knott (SK) statistical test to cluster and compare the results, and (2) the Borda count method to rank the SDEE techniques belonging to the best SK cluster.
The results showed that ensemble imputers significantly improved the performance of SDEE techniques compared to single imputation techniques. We also found that adding one or more imputers to the ensemble imputers generally led to a significant improvement in the SDEE performance. When the performance improvement is not significant, it is better to use the ensemble imputer with the minimum number of members because it is less complex. For ensemble imputers, the results suggest that no particular ensemble imputer gave the best results in all contexts. Overall, SVR imputation was the best imputation technique used to construct ensemble imputers for the SDEE. For the SDEE techniques, the best results were obtained by the DTs and SVR variants using ensemble imputation.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
go back to reference Abnane I, Idri A (2016) Evaluating fuzzy analogy on incomplete software projects data. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, pp 1–8 Abnane I, Idri A (2016) Evaluating fuzzy analogy on incomplete software projects data. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, pp 1–8
go back to reference Abnane I, Idri A (2018) Improved analogy-based effort estimation with incomplete mixed data. In: federated conference on computer science and information systems (FedCSIS). Pp 1015–1024 Abnane I, Idri A (2018) Improved analogy-based effort estimation with incomplete mixed data. In: federated conference on computer science and information systems (FedCSIS). Pp 1015–1024
go back to reference Abnane I, Idri A (2017b) Evaluating fuzzy analogy on incomplete software projects data. In: 2016 IEEE symposium series on computational intelligence, SSCI 2016 Abnane I, Idri A (2017b) Evaluating fuzzy analogy on incomplete software projects data. In: 2016 IEEE symposium series on computational intelligence, SSCI 2016
go back to reference Abnane I, Idri A, Hosni M, Abran A (2021) Heterogeneous ensemble imputation for software development effort estimation. In: PROMISE 2021 - proceedings of the 17th international conference on predictive models and data analytics in software engineering, co-located with ESEC/FSE 2021. Pp 1–10 Abnane I, Idri A, Hosni M, Abran A (2021) Heterogeneous ensemble imputation for software development effort estimation. In: PROMISE 2021 - proceedings of the 17th international conference on predictive models and data analytics in software engineering, co-located with ESEC/FSE 2021. Pp 1–10
go back to reference Amazal FA, Idri A, Abran A (2014) An analogy-based approach to estimation of software development effort using categorical data. In: Joint Conference of the International Workshop on Software Measurement and the International Conference on Software Process and Product Measurement. pp. 252–262 Amazal FA, Idri A, Abran A (2014) An analogy-based approach to estimation of software development effort using categorical data. In: Joint Conference of the International Workshop on Software Measurement and the International Conference on Software Process and Product Measurement. pp. 252–262
go back to reference Campbell C, Cristianini N (1999) Simple learning algorithms for training support vector machines. Univ Bristol 1–29 Campbell C, Cristianini N (1999) Simple learning algorithms for training support vector machines. Univ Bristol 1–29
go back to reference Chlioui I, Idri A, Abnane I, Ezzat M (2021) Ensemble case based reasoning imputation in breast cancer classification. J Inf Sci Eng 37(5):1039–1051 Chlioui I, Idri A, Abnane I, Ezzat M  (2021) Ensemble case based reasoning imputation in breast cancer classification. J Inf Sci Eng 37(5):1039–1051
go back to reference Dempster AP, Rubin D (1983) Overview. Incomplete data in sample surveys, Vol. II: Theory and Annotated Bibliography Dempster AP, Rubin D (1983) Overview. Incomplete data in sample surveys, Vol. II: Theory and Annotated Bibliography
go back to reference Dempster AP, Laird NM, Rubin DB (1977a) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–22MathSciNetMATH Dempster AP, Laird NM, Rubin DB (1977a) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–22MathSciNetMATH
go back to reference Dwyer K, Holte R (2007) Decision tree instability and active learning. In: lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). Pp 128–139 Dwyer K, Holte R (2007) Decision tree instability and active learning. In: lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). Pp 128–139
go back to reference Foss T, Myrtveit I, Stensrud E (2001) MRE and heteroscedasticity: an empirical validation of the assumption of homoscedasticity of the magnitude of relative error. In: Proc. ESCOM, 12th European software control and metrics conference. The Netherlands, pp 157–164 Foss T, Myrtveit I, Stensrud E (2001) MRE and heteroscedasticity: an empirical validation of the assumption of homoscedasticity of the magnitude of relative error. In: Proc. ESCOM, 12th European software control and metrics conference. The Netherlands, pp 157–164
go back to reference Hosni M, Idri A, Abran A, Nassif AB (2017) On the value of parameter tuning in heterogeneous ensembles effort estimation. Soft Comput:1–34 Hosni M, Idri A, Abran A, Nassif AB (2017) On the value of parameter tuning in heterogeneous ensembles effort estimation. Soft Comput:1–34
go back to reference Idri A, Amazal FA (2012a) Software cost estimation by fuzzy analogy for ISBSG repository. In: world scientific proc. series on computer engineering and information science 7; uncertainty modeling in knowledge engineering and decision making - proceedings of the 10th international FLINS Conf. Istanbul, Turkey, pp 863–868 Idri A, Amazal FA (2012a) Software cost estimation by fuzzy analogy for ISBSG repository. In: world scientific proc. series on computer engineering and information science 7; uncertainty modeling in knowledge engineering and decision making - proceedings of the 10th international FLINS Conf. Istanbul, Turkey, pp 863–868
go back to reference Idri A, Abnane I, Abran A (2015) Systematic mapping study of missing values techniques in software engineering data. In: 2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD). IEEE, pp 1–8. https://doi.org/10.1109/SNPD.2015.7176280 Idri A, Abnane I, Abran A (2015) Systematic mapping study of missing values techniques in software engineering data. In: 2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD). IEEE, pp 1–8. https://​doi.​org/​10.​1109/​SNPD.​2015.​7176280
go back to reference Li RH, Belford GG (2002) Instability of decision tree classification algorithms. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 570–575 Li RH, Belford GG (2002) Instability of decision tree classification algorithms. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 570–575
go back to reference Little RJA, Rubin D (1987) Statistical analysis with missing data. Wiley, New YorkMATH Little RJA, Rubin D (1987) Statistical analysis with missing data. Wiley, New YorkMATH
go back to reference Liu Y, Gopalakrishnan V (2017) An overview and evaluation of recent machine learning imputation methods using cardiac imaging data. Data 2(1):8 Liu Y, Gopalakrishnan V (2017) An overview and evaluation of recent machine learning imputation methods using cardiac imaging data. Data 2(1):8
go back to reference Maimon O, Rokach L (Eds.) (2005) Data mining and knowledge discovery handbook. Maimon O, Rokach L (Eds.) (2005) Data mining and knowledge discovery handbook.
go back to reference Menzies T, Kocaguneli E, Turhan B, Minku L, Peters F (2014) Sharing data and models in software engineering. Morgan Kaufmann Menzies T, Kocaguneli E, Turhan B, Minku L, Peters F (2014) Sharing data and models in software engineering. Morgan Kaufmann
go back to reference Minku LL, Yao X (2013b) Software effort estimation as a multiobjective learning problem. ACM Transactions on Software Engineering and Methodology (TOSEM) 22(4):1–32 Minku LL, Yao X (2013b) Software effort estimation as a multiobjective learning problem. ACM Transactions on Software Engineering and Methodology (TOSEM) 22(4):1–32
go back to reference Molenberghs G, Fitzmaurice G, Kenward MG, Tsiatis A, Verbeke G (Eds.) (2014) Handbook of missing data methodology. CRC Press Molenberghs G, Fitzmaurice G, Kenward MG, Tsiatis A, Verbeke G (Eds.) (2014) Handbook of missing data methodology. CRC Press
go back to reference Polikar R (2012) Ensemble learning. In: Ensemble machine learning. Springer, Boston, pp 1–34 Polikar R (2012) Ensemble learning. In: Ensemble machine learning. Springer, Boston, pp 1–34
go back to reference Rahman MG, Islam MZ (2010) A decision tree-based missing value imputation technique for data pre-processing. Conf Res Pract Inf Technol Ser 121:41–50 Rahman MG, Islam MZ (2010) A decision tree-based missing value imputation technique for data pre-processing. Conf Res Pract Inf Technol Ser 121:41–50
go back to reference Rokach L (2019) Ensemble learning: pattern classification using ensemble methods. Rokach L (2019) Ensemble learning: pattern classification using ensemble methods.
go back to reference Rubin DB (1987) Multiple imputation for nonresponse in surveys. John Wiley & Sons, New YorkCrossRefMATH Rubin DB (1987) Multiple imputation for nonresponse in surveys. John Wiley & Sons, New YorkCrossRefMATH
go back to reference Scott AJ, Knott M (1974) A cluster analysis method for grouping means in the analysis of variance. Biometrics 30:507–512 Scott AJ, Knott M (1974) A cluster analysis method for grouping means in the analysis of variance. Biometrics 30:507–512
go back to reference Shepperd M (2007) Software project economics: a roadmap. In: Future of Software Engineering (FOSE'07). IEEE, pp 304–315 Shepperd M (2007) Software project economics: a roadmap. In: Future of Software Engineering (FOSE'07). IEEE, pp 304–315
go back to reference Shi Y, Eberhart R (1998) A modified particle swarm optimizer. In: 1998 IEEE international conference on evolutionary computation proceedings. IEEE world congress on computational intelligence (Cat. No. 98TH8360). IEEE, pp 69–73 Shi Y, Eberhart R (1998) A modified particle swarm optimizer. In: 1998 IEEE international conference on evolutionary computation proceedings. IEEE world congress on computational intelligence (Cat. No. 98TH8360). IEEE, pp 69–73
go back to reference Trendowicz A, Jeffery R (2014) Software project effort estimation: foundations and best practice guidelines for success. SpringerCrossRef Trendowicz A, Jeffery R (2014) Software project effort estimation: foundations and best practice guidelines for success. SpringerCrossRef
go back to reference Twala B, Cartwright M, Shepperd M (2006) Ensemble of missing data techniques to improve software prediction accuracy. In: Proceedings of the 28th international conference on Software engineering, pp 909–912 Twala B, Cartwright M, Shepperd M (2006) Ensemble of missing data techniques to improve software prediction accuracy. In: Proceedings of the 28th international conference on Software engineering, pp 909–912
go back to reference Vateekul P, Sarinnapakorn K (2009) Tree-based approach to missing data imputation. In: 2009 IEEE International Conference on Data Mining Workshops. IEEE, pp 70–75 Vateekul P, Sarinnapakorn K (2009) Tree-based approach to missing data imputation. In: 2009 IEEE International Conference on Data Mining Workshops. IEEE, pp 70–75
go back to reference Xia Y (2020) Correlation and association analyses in microbiome study integrating multiomics in health and disease. Prog Mol Biol Trans Sci 171:309–491 Xia Y (2020) Correlation and association analyses in microbiome study integrating multiomics in health and disease. Prog Mol Biol Trans Sci 171:309–491
go back to reference Zhou ZH (2012) Ensemble methods: foundations and algorithms. CRC press Zhou ZH (2012) Ensemble methods: foundations and algorithms. CRC press
Metadata
Title
Evaluating ensemble imputation in software effort estimation
Authors
Ibtissam Abnane
Ali Idri
Imane Chlioui
Alain Abran
Publication date
01-03-2023
Publisher
Springer US
Published in
Empirical Software Engineering / Issue 2/2023
Print ISSN: 1382-3256
Electronic ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-022-10260-0

Other articles of this Issue 2/2023

Empirical Software Engineering 2/2023 Go to the issue

Premium Partner