Skip to main content
Erschienen in: Empirical Software Engineering 5/2010

01.10.2010

LSEbA: least squares regression and estimation by analogy in a semi-parametric model for software cost estimation

verfasst von: Nikolaos Mittas, Lefteris Angelis

Erschienen in: Empirical Software Engineering | Ausgabe 5/2010

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The importance of Software Cost Estimation at the early stages of the development life cycle is clearly portrayed by the utilization of several models and methods, appeared so far in the literature. The researchers’ interest has been focused on two well known techniques, namely the parametric Regression Analysis and the non-parametric Estimation by Analogy. Despite the several comparison studies, there seems to be a discrepancy in choosing the best prediction technique between them. In this paper, we introduce a semi-parametric technique, called LSEbA that achieves to combine the aforementioned methods retaining the advantages of both approaches. Furthermore, the proposed method is consistent with the mixed nature of Software Cost Estimation data and takes advantage of the whole pure information of the dataset even if there is a large amount of missing values. The paper analytically illustrates the process of building such a model and presents the experimentation on three representative datasets verifying the benefits of the proposed model in terms of accuracy, bias and spread. Comparisons of LSEbA with linear regression, estimation by analogy and a combination of them, based on the average of their outcomes are made through accuracy metrics, statistical tests and a graphical tool, the Regression Error Characteristic curves.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Angelis L, Stamelos I, Morisio M (2001) Building a software cost estimation model based on categorical data. Proceedings of the IEEE 8th International Symposium on Software Metrics, pp. 4–15 Angelis L, Stamelos I, Morisio M (2001) Building a software cost estimation model based on categorical data. Proceedings of the IEEE 8th International Symposium on Software Metrics, pp. 4–15
Zurück zum Zitat Anglin P, Gencay R (1996) Semiparametric estimation of a hedonic price function. J Appl Econ 11(6):633–648CrossRef Anglin P, Gencay R (1996) Semiparametric estimation of a hedonic price function. J Appl Econ 11(6):633–648CrossRef
Zurück zum Zitat Bi J, Bennet K-P (2003) Regression error characteristics curves. Proceedings of the AIII 20th International Conference on Machine Learning, pp. 43–50 Bi J, Bennet K-P (2003) Regression error characteristics curves. Proceedings of the AIII 20th International Conference on Machine Learning, pp. 43–50
Zurück zum Zitat Briand L, Langley T, Wieczorek I (2000) A replicated assessment and comparison of common software cost modeling techniques. Proceedings of the IEEE International Conference Software Engineering, pp. 377–386 Briand L, Langley T, Wieczorek I (2000) A replicated assessment and comparison of common software cost modeling techniques. Proceedings of the IEEE International Conference Software Engineering, pp. 377–386
Zurück zum Zitat Cartwright MH, Shepperd MJ, Song Q (2003) Dealing with missing software project data Proceedings of the METRICS, pp. 154–165 Cartwright MH, Shepperd MJ, Song Q (2003) Dealing with missing software project data Proceedings of the METRICS, pp. 154–165
Zurück zum Zitat Foss T, Stensrud E, Kitchenham B, Myrtveit I (2003) A simulation study of the model evaluation criterion MMRE. IEEE Trans Softw Eng 29(11):985–995CrossRef Foss T, Stensrud E, Kitchenham B, Myrtveit I (2003) A simulation study of the model evaluation criterion MMRE. IEEE Trans Softw Eng 29(11):985–995CrossRef
Zurück zum Zitat Hardle W (1990) Applied non-parametric regression. Economics Society Monographs, Cambridge University Press Hardle W (1990) Applied non-parametric regression. Economics Society Monographs, Cambridge University Press
Zurück zum Zitat Hardle W, Liang H, Gao J (2000) Partially linear models. Physica-Verlag, Heidelberg Hardle W, Liang H, Gao J (2000) Partially linear models. Physica-Verlag, Heidelberg
Zurück zum Zitat Jorgensen M, Shepperd M (2007) A systematic review of software development cost estimation studies. IEEE Trans Softw Eng 33(1):33–53CrossRef Jorgensen M, Shepperd M (2007) A systematic review of software development cost estimation studies. IEEE Trans Softw Eng 33(1):33–53CrossRef
Zurück zum Zitat Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis. John Wiley, New York Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis. John Wiley, New York
Zurück zum Zitat Kitchenham B (1998) A procedure for analyzing unbalanced datasets. IEEE Trans Softw Eng 24(4):278–301CrossRef Kitchenham B (1998) A procedure for analyzing unbalanced datasets. IEEE Trans Softw Eng 24(4):278–301CrossRef
Zurück zum Zitat Kitchenham B, Mendes E (2004) A comparison of cross-company and within-company effort estimation models for web applications. Proceedings of the Empirical Assessment in Software Engineering, pp. 47–55 Kitchenham B, Mendes E (2004) A comparison of cross-company and within-company effort estimation models for web applications. Proceedings of the Empirical Assessment in Software Engineering, pp. 47–55
Zurück zum Zitat Kitchenham B, Pickard L, MacDonell S, Shepperd M (2001) What accuracy statistics really measure. IEE Proc Software 148(3):81–85CrossRef Kitchenham B, Pickard L, MacDonell S, Shepperd M (2001) What accuracy statistics really measure. IEE Proc Software 148(3):81–85CrossRef
Zurück zum Zitat Kitchenham B, Pfleeger L, McColl B, Eagan S (2002) A case study of maintenance estimation accuracy. J Syst Softw 64(1):57–77CrossRef Kitchenham B, Pfleeger L, McColl B, Eagan S (2002) A case study of maintenance estimation accuracy. J Syst Softw 64(1):57–77CrossRef
Zurück zum Zitat Korte M, Port D (2008) Confidence in software cost estimation results based on mmre and pred. Proceedings of the 4th ACM International Workshop on Predictor Models in Software Engineering, pp. 63–70 Korte M, Port D (2008) Confidence in software cost estimation results based on mmre and pred. Proceedings of the 4th ACM International Workshop on Predictor Models in Software Engineering, pp. 63–70
Zurück zum Zitat Liebchen G, Shepperd M (2008) Data sets and data quality in software engineering. Proceedings of the 4th ACM International Workshop on Predictor Models in Software Engineering, pp. 39–44 Liebchen G, Shepperd M (2008) Data sets and data quality in software engineering. Proceedings of the 4th ACM International Workshop on Predictor Models in Software Engineering, pp. 39–44
Zurück zum Zitat Lokan C, Mendes E (2006) Cross-company and single-company effort models using the ISBSG database: a further replicated study. Proceedings of the ACM-IEEE International Symposium on Empirical Software Engineering, pp. 75–84 Lokan C, Mendes E (2006) Cross-company and single-company effort models using the ISBSG database: a further replicated study. Proceedings of the ACM-IEEE International Symposium on Empirical Software Engineering, pp. 75–84
Zurück zum Zitat MacDonell S, Shepperd M (2003) Combining techniques to optimize effort predictions in software project management. J Syst Softw 66(2):91–98CrossRef MacDonell S, Shepperd M (2003) Combining techniques to optimize effort predictions in software project management. J Syst Softw 66(2):91–98CrossRef
Zurück zum Zitat Mair C, Shepperd M (2005) The consistency of empirical comparisons of regression and analogy-based software project cost prediction. Proceedings of the International Symposium on Empirical Software Engineering, pp. 509–518 Mair C, Shepperd M (2005) The consistency of empirical comparisons of regression and analogy-based software project cost prediction. Proceedings of the International Symposium on Empirical Software Engineering, pp. 509–518
Zurück zum Zitat Mendes E, Kitchenham BA (2004) Further comparison of cross-company and within company effort estimation models for web applications. Proceedings of the 10th IEEE International Symposium on Software Metrics, pp. 348–357 Mendes E, Kitchenham BA (2004) Further comparison of cross-company and within company effort estimation models for web applications. Proceedings of the 10th IEEE International Symposium on Software Metrics, pp. 348–357
Zurück zum Zitat Mendes E, Lokan C (2008) Replicating studies on cross—vs single-company effort models using the ISBSG database. Emp Softw Eng 13(1):3–37CrossRef Mendes E, Lokan C (2008) Replicating studies on cross—vs single-company effort models using the ISBSG database. Emp Softw Eng 13(1):3–37CrossRef
Zurück zum Zitat Mendes E, Lokan C, Harrison R, Triggs C (2005) A replicated comparison of cross-company and within-company effort estimation models using the ISBSG database. Proceedings of the IEEE 11th International Software Metrics Symposium, pp. 36–46 Mendes E, Lokan C, Harrison R, Triggs C (2005) A replicated comparison of cross-company and within-company effort estimation models using the ISBSG database. Proceedings of the IEEE 11th International Software Metrics Symposium, pp. 36–46
Zurück zum Zitat Mittas N, Athanasiades M, Angelis L (2008) Improving analogy-based software cost estimation by a resampling method. Inform Softw Technol 50(3):221–230CrossRef Mittas N, Athanasiades M, Angelis L (2008) Improving analogy-based software cost estimation by a resampling method. Inform Softw Technol 50(3):221–230CrossRef
Zurück zum Zitat Mittas N, Angelis L (2008a) Combining regression and estimation by analogy in a semi-parametric model for software cost estimation. Proceedings of the ACM-IEEE 2nd International Symposium on Empirical Software Engineering and Management, pp. 70–79 Mittas N, Angelis L (2008a) Combining regression and estimation by analogy in a semi-parametric model for software cost estimation. Proceedings of the ACM-IEEE 2nd International Symposium on Empirical Software Engineering and Management, pp. 70–79
Zurück zum Zitat Mittas N, Angelis L (2008b) Comparing cost prediction models by resampling techniques. J Syst Softw 81(5):616–632CrossRef Mittas N, Angelis L (2008b) Comparing cost prediction models by resampling techniques. J Syst Softw 81(5):616–632CrossRef
Zurück zum Zitat Mittas N, Angelis L (2008c) Comparing software cost prediction models by a visualization tool. Proceedings of the IEEE 34th Euromicro Conference on Software Engineering and Advanced Applications, pp. 433–440 Mittas N, Angelis L (2008c) Comparing software cost prediction models by a visualization tool. Proceedings of the IEEE 34th Euromicro Conference on Software Engineering and Advanced Applications, pp. 433–440
Zurück zum Zitat Myrtveit I, Stensrud E, Olsson U (2001) Analyzing data sets with missing data: an empirical evaluation of imputation methods and likelihood-based methods. IEEE Trans Softw Eng 27(11):999–1013CrossRef Myrtveit I, Stensrud E, Olsson U (2001) Analyzing data sets with missing data: an empirical evaluation of imputation methods and likelihood-based methods. IEEE Trans Softw Eng 27(11):999–1013CrossRef
Zurück zum Zitat Myrtveit I, Stensrud E, Shepperd M (2005) Reliability and validity in comparative studies of software prediction models. IEEE Trans Softw Eng 31(5):380–391CrossRef Myrtveit I, Stensrud E, Shepperd M (2005) Reliability and validity in comparative studies of software prediction models. IEEE Trans Softw Eng 31(5):380–391CrossRef
Zurück zum Zitat Port D, Korte M (2008) Comparative studies of the model evaluation criterions mmre and pred in software cost estimation research. Proceedings of the ACM-IEEE 2nd International Symposium on Empirical Software Engineering and Management, pp. 51–60 Port D, Korte M (2008) Comparative studies of the model evaluation criterions mmre and pred in software cost estimation research. Proceedings of the ACM-IEEE 2nd International Symposium on Empirical Software Engineering and Management, pp. 51–60
Zurück zum Zitat Sentas P, Angelis L, Stamelos I, Bleris G (2005) Software productivity and effort prediction with ordinal regression. Inform Softw Technol 47:17–29CrossRef Sentas P, Angelis L, Stamelos I, Bleris G (2005) Software productivity and effort prediction with ordinal regression. Inform Softw Technol 47:17–29CrossRef
Zurück zum Zitat Shepperd M, Schofield C (1997) Estimating software project effort using analogies. IEEE Trans Softw Eng 23(11):736–743CrossRef Shepperd M, Schofield C (1997) Estimating software project effort using analogies. IEEE Trans Softw Eng 23(11):736–743CrossRef
Zurück zum Zitat Sheskin DJ (2004) Handbook of parametric and nonparametric statistical procedures (Third Edition) Chapman & Hall/CRC Sheskin DJ (2004) Handbook of parametric and nonparametric statistical procedures (Third Edition) Chapman & Hall/CRC
Zurück zum Zitat Strike K, Emam KE, Madhavji N (2001) Software cost estimation with incomplete data. IEEE Trans Softw Eng 27(10):890–908CrossRef Strike K, Emam KE, Madhavji N (2001) Software cost estimation with incomplete data. IEEE Trans Softw Eng 27(10):890–908CrossRef
Zurück zum Zitat Wissmann M, Toutenburg H, Shalabh (2007) Role of categorical variables in multicollinearity in the linear regression model. Technical Report, Number 008, Department of Statistics, University of Munich Wissmann M, Toutenburg H, Shalabh (2007) Role of categorical variables in multicollinearity in the linear regression model. Technical Report, Number 008, Department of Statistics, University of Munich
Metadaten
Titel
LSEbA: least squares regression and estimation by analogy in a semi-parametric model for software cost estimation
verfasst von
Nikolaos Mittas
Lefteris Angelis
Publikationsdatum
01.10.2010
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 5/2010
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-010-9128-6

Premium Partner