Skip to main content
Top
Published in: Innovations in Systems and Software Engineering 3/2018

02-05-2018 | Original Paper

Re-estimating software effort using prior phase efforts and data mining techniques

Authors: Pichai Jodpimai, Peraphon Sophatsathit, Chidchanok Lursinsap

Published in: Innovations in Systems and Software Engineering | Issue 3/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Software effort estimation has played an important role in software project management. An accurate estimation helps reduce cost overrun and the eventual project failure. Unfortunately, many existing estimation techniques rely on the total project effort which is often determined from the project life cycle. As the project moves on, the course of action deviates from what originally has planned, despite close monitoring and control. This leads to re-estimating software effort so as to improve project operating costs and budgeting. Recent research endeavors attempt to explore phase level estimation that uses known information from prior development phases to predict effort of the next phase by using different learning techniques. This study aims to investigate the influence of preprocessing in prior phases on learning techniques to re-estimate the effort of next phase. The proposed re-estimation approach preprocesses prior phase effort by means of statistical techniques to select a set of input features for learning which in turn are exploited to generate the estimation models. These models are then used to re-estimate next phase effort by using four processing steps, namely data transformation, outlier detection, feature selection, and learning. An empirical study is conducted on 440 estimation models being generated from combinations of techniques on 5 data transformation, 5 outlier detection, 5 feature selection, and 5 learning techniques. The experimental results show that suitable preprocessing is significantly useful for building proper learning techniques to boosting re-estimation accuracy. However, there is no one learning technique that can outperform other techniques over all phases. The proposed re-estimation approach yields more accurate estimation than proportion-based estimation approach. It is envisioned that the proposed re-estimation approach can facilitate researchers and project managers on re-estimating software effort so as to finish the project on time and within the allotted budget.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
1.
go back to reference Wang Y, Song Q, MacDonell S, Shepperd M, Junyi S (2009) Integrate the GM (1,1) and verhulst models to predict software stage effort. IEEE Trans Syst Man Cybern Part C 39(6):647–658CrossRef Wang Y, Song Q, MacDonell S, Shepperd M, Junyi S (2009) Integrate the GM (1,1) and verhulst models to predict software stage effort. IEEE Trans Syst Man Cybern Part C 39(6):647–658CrossRef
2.
go back to reference Zia Z, Rashid A, uz Zaman K (2011) Software cost estimation for component-based fourth-generation-language software applications. IET Softw 5(1):103–110CrossRef Zia Z, Rashid A, uz Zaman K (2011) Software cost estimation for component-based fourth-generation-language software applications. IET Softw 5(1):103–110CrossRef
3.
go back to reference Menzies T, Chen Z, Hihn J, Lum K (2006) Selecting best practices for effort estimation. IEEE Trans Softw Eng 32(11):883–895CrossRef Menzies T, Chen Z, Hihn J, Lum K (2006) Selecting best practices for effort estimation. IEEE Trans Softw Eng 32(11):883–895CrossRef
4.
go back to reference Jorgensen M, Boehm B, Rifkin S (2009) Software development effort estimation: Formal models or expert judgment? IEEE Softw 26(2):14–19CrossRef Jorgensen M, Boehm B, Rifkin S (2009) Software development effort estimation: Formal models or expert judgment? IEEE Softw 26(2):14–19CrossRef
5.
go back to reference MacDonell SG, Shepperd MJ (2003) Using prior-phase effort records for re-estimation during software projects. In: Proceedings of the ninth international software metrics symposium (METRICS’03), pp 73–86 MacDonell SG, Shepperd MJ (2003) Using prior-phase effort records for re-estimation during software projects. In: Proceedings of the ninth international software metrics symposium (METRICS’03), pp 73–86
6.
go back to reference Azzeh M, Cowling PI, Neagu D (2010) Software stage-effort estimation based on association rule mining and fuzzy set theory. In: Proceedings of 2010 IEEE 10th international conference on computer and information technology (CIT), pp 249–256 Azzeh M, Cowling PI, Neagu D (2010) Software stage-effort estimation based on association rule mining and fuzzy set theory. In: Proceedings of 2010 IEEE 10th international conference on computer and information technology (CIT), pp 249–256
7.
go back to reference Ferrucci F, Gravino C, Sarro F (2014) Exploiting prior-phase effort data to estimate the effort for the subsequent phases: a further assessment. In: Proceedings of the 10th international conference on predictive models in software engineering, PROMISE ’14, pp 42–51. ACM, New York, NY, USA Ferrucci F, Gravino C, Sarro F (2014) Exploiting prior-phase effort data to estimate the effort for the subsequent phases: a further assessment. In: Proceedings of the 10th international conference on predictive models in software engineering, PROMISE ’14, pp 42–51. ACM, New York, NY, USA
8.
go back to reference Kantardzic M (2011) Data Mining: Concepts, Models, Methods, and Algorithms. Wiley, PiscatawayCrossRefMATH Kantardzic M (2011) Data Mining: Concepts, Models, Methods, and Algorithms. Wiley, PiscatawayCrossRefMATH
9.
go back to reference Kocaguneli E, Menzies T, Keung JW (2012) On the value of ensemble effort estimation. IEEE Trans Softw Eng 38(6):1403–1416CrossRef Kocaguneli E, Menzies T, Keung JW (2012) On the value of ensemble effort estimation. IEEE Trans Softw Eng 38(6):1403–1416CrossRef
10.
go back to reference Boehm BW (1981) Software engineering economics. Prentice Hall PTR, Upper Saddle RiverMATH Boehm BW (1981) Software engineering economics. Prentice Hall PTR, Upper Saddle RiverMATH
11.
go back to reference Yucalar F, Kilinc D, Borandag E, Ozcift A (2016) Regression analysis based software effort estimation method. Int J Softw Eng Knowl Eng 26(05):807–826CrossRef Yucalar F, Kilinc D, Borandag E, Ozcift A (2016) Regression analysis based software effort estimation method. Int J Softw Eng Knowl Eng 26(05):807–826CrossRef
12.
go back to reference Huang SJ, Chiu NH, Liu YJ (2008) A comparative evaluation on the accuracies of software effort estimates from clustered data. Inf Softw Technol 50(9–10):879–888CrossRef Huang SJ, Chiu NH, Liu YJ (2008) A comparative evaluation on the accuracies of software effort estimates from clustered data. Inf Softw Technol 50(9–10):879–888CrossRef
13.
go back to reference Putnam LH (1978) A general empirical solution to the macro software sizing and estimating problem. IEEE Trans Softw Eng SE–4(4):345–361CrossRefMATH Putnam LH (1978) A general empirical solution to the macro software sizing and estimating problem. IEEE Trans Softw Eng SE–4(4):345–361CrossRefMATH
14.
go back to reference Boehm BW, Abts C, Brown AW, Chulani S, Clark BK, Horowitz E, Madachy R, Reifer D, Steece B (2000) Software cost estimation with COCOMO II. Prentice Hall PTR, Upper Saddle River Boehm BW, Abts C, Brown AW, Chulani S, Clark BK, Horowitz E, Madachy R, Reifer D, Steece B (2000) Software cost estimation with COCOMO II. Prentice Hall PTR, Upper Saddle River
15.
go back to reference Liu Q, Qin W, Mintram R, Ross M (2008) Evaluation of preliminary data analysis framework in software cost estimation based on ISBSG R9 data. Softw Qual J 16:411–458CrossRef Liu Q, Qin W, Mintram R, Ross M (2008) Evaluation of preliminary data analysis framework in software cost estimation based on ISBSG R9 data. Softw Qual J 16:411–458CrossRef
16.
go back to reference Kocaguneli E, Menzies T, Bener A, Keung JW (2012) Exploiting the essential assumptions of analogy-based effort estimation. IEEE Trans Softw Eng 38(2):425–438CrossRef Kocaguneli E, Menzies T, Bener A, Keung JW (2012) Exploiting the essential assumptions of analogy-based effort estimation. IEEE Trans Softw Eng 38(2):425–438CrossRef
17.
go back to reference Idri A, Amazal F, Abran A (2015) Analogy-based software development effort estimation: a systematic mapping and review. Inf Softw Technol 58:206–230CrossRef Idri A, Amazal F, Abran A (2015) Analogy-based software development effort estimation: a systematic mapping and review. Inf Softw Technol 58:206–230CrossRef
18.
go back to reference Kumar KV, Ravi V, Carr M, Kiran NR (2008) Software development cost estimation using wavelet neural networks. J Syst Softw 81(11):1853–1867CrossRef Kumar KV, Ravi V, Carr M, Kiran NR (2008) Software development cost estimation using wavelet neural networks. J Syst Softw 81(11):1853–1867CrossRef
19.
go back to reference Huang SJ, Chiu NH (2009) Applying fuzzy neural network to estimate software development effort. Appl Intell 30:73–83CrossRef Huang SJ, Chiu NH (2009) Applying fuzzy neural network to estimate software development effort. Appl Intell 30:73–83CrossRef
20.
go back to reference Oliveira ALI (2006) Estimation of software project effort with support vector regression. Neurocomputing 69(13–15):1749–1753CrossRef Oliveira ALI (2006) Estimation of software project effort with support vector regression. Neurocomputing 69(13–15):1749–1753CrossRef
21.
go back to reference Corazza A, Martino SD, Ferrucci F, Gravino C, Mendes E (2011) Investigating the use of support vector regression for web effort estimation. Empir Softw Eng 16:211–243CrossRef Corazza A, Martino SD, Ferrucci F, Gravino C, Mendes E (2011) Investigating the use of support vector regression for web effort estimation. Empir Softw Eng 16:211–243CrossRef
22.
go back to reference Mittal A, Parkash K, Mittal H (2010) Software cost estimation using fuzzy logic. SIGSOFT Softw Eng Notes 35(1):1–7CrossRef Mittal A, Parkash K, Mittal H (2010) Software cost estimation using fuzzy logic. SIGSOFT Softw Eng Notes 35(1):1–7CrossRef
23.
go back to reference Muzaffar Z, Ahmed MA (2010) Software development effort prediction: a study on the factors impacting the accuracy of fuzzy logic systems. Inf Softw Technol 52(1):92–109CrossRef Muzaffar Z, Ahmed MA (2010) Software development effort prediction: a study on the factors impacting the accuracy of fuzzy logic systems. Inf Softw Technol 52(1):92–109CrossRef
24.
go back to reference Oliveira AL, Braga PL, Lima RM, Cornlio ML (2010) GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation. Inf Softw Technol 52(11):1155–1166 Special Section on Best Papers PROMISE 2009CrossRef Oliveira AL, Braga PL, Lima RM, Cornlio ML (2010) GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation. Inf Softw Technol 52(11):1155–1166 Special Section on Best Papers PROMISE 2009CrossRef
25.
go back to reference Minku LL, Yao X (2013) Software effort estimation as a multiobjective learning problem. ACM Trans Softw Eng Methodol 22(4):35:1–35:32CrossRef Minku LL, Yao X (2013) Software effort estimation as a multiobjective learning problem. ACM Trans Softw Eng Methodol 22(4):35:1–35:32CrossRef
26.
go back to reference Jrgensen M (2004) A review of studies on expert estimation of software development effort. J Syst Softw 70(12):37–60CrossRef Jrgensen M (2004) A review of studies on expert estimation of software development effort. J Syst Softw 70(12):37–60CrossRef
27.
go back to reference Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6(3):21–45CrossRef Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6(3):21–45CrossRef
28.
go back to reference Tan HBK, Zhao Y, Zhang H (2009) Conceptual data model-based software size estimation for information systems. ACM Trans Softw Eng Methodol 19(2):4:1–4:37CrossRef Tan HBK, Zhao Y, Zhang H (2009) Conceptual data model-based software size estimation for information systems. ACM Trans Softw Eng Methodol 19(2):4:1–4:37CrossRef
29.
go back to reference Malik AA, Boehm BW (2011) Quantifying requirements elaboration to improve early software cost estimation. Inf Sci 181(13):2747–2760CrossRef Malik AA, Boehm BW (2011) Quantifying requirements elaboration to improve early software cost estimation. Inf Sci 181(13):2747–2760CrossRef
30.
go back to reference Yang Y, He M, Li M, Wang Q, Boehm BW (2008) Phase distribution of software development effort. In: Proceedings of the second ACM-IEEE international symposium on empirical software engineering and measurement, ESEM ’08, pp 61–69. ACM, New York, NY, USA Yang Y, He M, Li M, Wang Q, Boehm BW (2008) Phase distribution of software development effort. In: Proceedings of the second ACM-IEEE international symposium on empirical software engineering and measurement, ESEM ’08, pp 61–69. ACM, New York, NY, USA
31.
go back to reference Strike K, Emam KE, Madhavji N (2001) Software cost estimation with incomplete data. IEEE Trans Softw Eng 27(10):890–908CrossRef Strike K, Emam KE, Madhavji N (2001) Software cost estimation with incomplete data. IEEE Trans Softw Eng 27(10):890–908CrossRef
32.
go back to reference Azzeh M, Neagu D, Cowling P (2008) Improving analogy software effort estimation using fuzzy feature subset selection algorithm. In: Proceedings of the 4th international workshop on predictor models in software engineering, PROMISE ’08, pp 71–78. ACM, New York, NY, USA Azzeh M, Neagu D, Cowling P (2008) Improving analogy software effort estimation using fuzzy feature subset selection algorithm. In: Proceedings of the 4th international workshop on predictor models in software engineering, PROMISE ’08, pp 71–78. ACM, New York, NY, USA
33.
go back to reference Pai DR, McFall KS, Subramanian GH (2013) Software effort estimation using a neural network ensemble. J Comput Inf Syst 53(4):4958 Pai DR, McFall KS, Subramanian GH (2013) Software effort estimation using a neural network ensemble. J Comput Inf Syst 53(4):4958
34.
go back to reference Dejaeger K, Verbeke W, Martens D, Baesens B (2012) Data mining techniques for software effort estimation: A comparative study. IEEE Trans Softw Eng 38(2):375–397CrossRef Dejaeger K, Verbeke W, Martens D, Baesens B (2012) Data mining techniques for software effort estimation: A comparative study. IEEE Trans Softw Eng 38(2):375–397CrossRef
35.
go back to reference Sakia R (1992) The Box-Cox transformation technique: a review. J R Stat Soc Ser D 41(2):169–178 Sakia R (1992) The Box-Cox transformation technique: a review. J R Stat Soc Ser D 41(2):169–178
36.
go back to reference Junling R (2006) A pattern selection algorithm based on the generalized confidence. In: Proceedings of 18th international conference on pattern recognition (ICPR’06), vol. 2, pp. 824–827 Junling R (2006) A pattern selection algorithm based on the generalized confidence. In: Proceedings of 18th international conference on pattern recognition (ICPR’06), vol. 2, pp. 824–827
37.
go back to reference Huang SJ, Chiu NH, Chen LW (2008) Integration of the grey relational analysis with genetic algorithm for software effort estimation. Eur J Oper Res 188(3):898–909CrossRefMATH Huang SJ, Chiu NH, Chen LW (2008) Integration of the grey relational analysis with genetic algorithm for software effort estimation. Eur J Oper Res 188(3):898–909CrossRefMATH
38.
go back to reference Jarque CM (2011) International encyclopedia of statistical science. Jarque–Bera test part. Springer, Berlin Jarque CM (2011) International encyclopedia of statistical science. Jarque–Bera test part. Springer, Berlin
39.
go back to reference Cook RD (1977) Detection of influential observation in linear regression. Technometrics 19(1):15–18MathSciNetMATH Cook RD (1977) Detection of influential observation in linear regression. Technometrics 19(1):15–18MathSciNetMATH
40.
go back to reference Menzies T, Greenwald J, Frank A (2007) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33(1):2–13CrossRef Menzies T, Greenwald J, Frank A (2007) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33(1):2–13CrossRef
41.
go back to reference Malhotra R, Kaur A, Singh Y (2010) Application of machine learning methods for software effort prediction. SIGSOFT Softw Eng Notes 35(3):1–6CrossRef Malhotra R, Kaur A, Singh Y (2010) Application of machine learning methods for software effort prediction. SIGSOFT Softw Eng Notes 35(3):1–6CrossRef
42.
go back to reference Chen Z, Menzies T, Port D, Boehm BW (2005) Finding the right data for software cost modeling. IEEE Softw 22(6):38–46CrossRef Chen Z, Menzies T, Port D, Boehm BW (2005) Finding the right data for software cost modeling. IEEE Softw 22(6):38–46CrossRef
43.
go back to reference Hall MA (1999) Correlation-based feature selection for machine learning Ph.D. thesis, doctors thesis, Department of Computer Science, Waikato University. The bibliography Hall MA (1999) Correlation-based feature selection for machine learning Ph.D. thesis, doctors thesis, Department of Computer Science, Waikato University. The bibliography
46.
go back to reference Conte SD, Dunsmore HE, Shen VY (1981) Software engineering metrics and models. Benjamin-Cummings, Menlo Park Conte SD, Dunsmore HE, Shen VY (1981) Software engineering metrics and models. Benjamin-Cummings, Menlo Park
47.
go back to reference Foss T, Stensrud E, Kitchenham B, Myrtveit I (2003) A Simulation study of the model evaluation criterion MMRE. IEEE Trans Softw Eng 29(11):985–995CrossRef Foss T, Stensrud E, Kitchenham B, Myrtveit I (2003) A Simulation study of the model evaluation criterion MMRE. IEEE Trans Softw Eng 29(11):985–995CrossRef
48.
go back to reference Shepperd M, MacDonell S (2012) Evaluating prediction systems in software project estimation. Inf Softw Technol 54(8):820–827CrossRef Shepperd M, MacDonell S (2012) Evaluating prediction systems in software project estimation. Inf Softw Technol 54(8):820–827CrossRef
49.
go back to reference Miyazaki Y, Terakado M, Ozada K, Nozaki H (1994) Robust regression for developing software estimation models. J Syst Softw 27(1):3–16CrossRef Miyazaki Y, Terakado M, Ozada K, Nozaki H (1994) Robust regression for developing software estimation models. J Syst Softw 27(1):3–16CrossRef
50.
go back to reference Jorgensen M (2010) Selection of strategies in judgment-based effort estimation. J Syst Softw 83(6):1039–1050CrossRef Jorgensen M (2010) Selection of strategies in judgment-based effort estimation. J Syst Softw 83(6):1039–1050CrossRef
51.
go back to reference Kocaguneli E, Menzies T, Keung J, Cok D, Madachy R (2013) Active learning and effort estimation: finding the essential content of software effort estimation data. IEEE Trans Softw Eng 39(8):1040–1053CrossRef Kocaguneli E, Menzies T, Keung J, Cok D, Madachy R (2013) Active learning and effort estimation: finding the essential content of software effort estimation data. IEEE Trans Softw Eng 39(8):1040–1053CrossRef
52.
go back to reference Refaeilzadeh P, Tang L, Liu L (2009) Encyclopedia of database systemscross validation. Springer, New York Refaeilzadeh P, Tang L, Liu L (2009) Encyclopedia of database systemscross validation. Springer, New York
Metadata
Title
Re-estimating software effort using prior phase efforts and data mining techniques
Authors
Pichai Jodpimai
Peraphon Sophatsathit
Chidchanok Lursinsap
Publication date
02-05-2018
Publisher
Springer London
Published in
Innovations in Systems and Software Engineering / Issue 3/2018
Print ISSN: 1614-5046
Electronic ISSN: 1614-5054
DOI
https://doi.org/10.1007/s11334-018-0311-z

Premium Partner