Skip to main content
Erschienen in: Empirical Software Engineering 2/2011

01.04.2011

Investigating the use of Support Vector Regression for web effort estimation

verfasst von: Anna Corazza, Sergio Di Martino, Filomena Ferrucci, Carmine Gravino, Emilia Mendes

Erschienen in: Empirical Software Engineering | Ausgabe 2/2011

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Support Vector Regression (SVR) is a new generation of Machine Learning algorithms, suitable for predictive data modeling problems. The objective of this paper is twofold: first, to investigate the effectiveness of SVR for Web effort estimation using a cross-company dataset; second, to compare different SVR configurations looking at the one that presents the best performance. In particular, we took into account three variables’ preprocessing strategies (no-preprocessing, normalization, and logarithmic), in combination with two different dependent variables (effort and inverse effort). As a result, SVR was applied using six different data configurations. Moreover, to understand the suitability of kernel functions to handle non-linear problems, SVR was applied without a kernel, and in combination with the Radial Basis Function (RBF) and the Polynomial kernels, thus obtaining 18 different SVR configurations. To identify, for each configuration, which were the best values for each of the parameters we defined a procedure based on a leave-one-out cross-validation approach. The dataset employed was the Tukutuku database, which has been adopted in many previous Web effort estimation studies. Three different training and test set splits were used, including respectively 130 and 65 projects. The SVR-based predictions were also benchmarked against predictions obtained using Manual StepWise Regression and Case-Based Reasoning. Our results showed that the configuration corresponding to the logarithmic features’ preprocessing, in combination with the RBF kernel provided the best results for all three data splits. In addition, SVR provided significantly superior prediction accuracy than all the considered benchmarking techniques.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
The norm was computed basing on all the input information available in the test set, thus all the attributes except the Effort.
 
2
SVM-light is freely available in http://​svmlight.​joachims.​org/​ for scientific use.
 
3
In a leave-one-out cross-validation, a single observation from the original sample is used to evaluate the model that is trained using the remaining observations. This is repeated until each observation in the sample is used once as validation data. The application of a leave-one-out cross validation on the training set has allowed us to prevent problems of model overfitting that could hinder the model having good prediction accuracy on an out dataset.
 
4
Observe that we removed an outlier for the boxplot of C6(L) and an outlier for the boxplot of C6(R) to improve the readability of the plots.
 
5
Observe that we removed an outlier for the boxplot of C6(L) to improve the readability of the plots.
 
Literatur
Zurück zum Zitat Bailey JW, Basili VR (1981) A meta model for software development resource expenditure. Procs. Fifth International Conference on Software Engineering, San Diego, California, USA, pp. 107–116 Bailey JW, Basili VR (1981) A meta model for software development resource expenditure. Procs. Fifth International Conference on Software Engineering, San Diego, California, USA, pp. 107–116
Zurück zum Zitat Braga PL, Oliveira ALI, Meira SRL (2007) Software Effort Estimation using Machine Learning Techniques with Robust Confidence Intervals. HIS :352-357 Braga PL, Oliveira ALI, Meira SRL (2007) Software Effort Estimation using Machine Learning Techniques with Robust Confidence Intervals. HIS :352-357
Zurück zum Zitat Braga PL, Oliveira ALI, Meira SRL (2008) A GA-based Feature Selection and Parameters Optimization for Support Vector Regression Applied to Software Effort Estimation. Proceedings of the ACM symposium on Applied computing :1788-1792 Braga PL, Oliveira ALI, Meira SRL (2008) A GA-based Feature Selection and Parameters Optimization for Support Vector Regression Applied to Software Effort Estimation. Proceedings of the ACM symposium on Applied computing :1788-1792
Zurück zum Zitat Briand L, Langley T, Wiekzorek I (2000) A Replicated Assessment and Comparison of Common Software Cost Modeling Techniques. In Proceedings of International Conference on Software Engineering, IEEE press, pp 377–386 Briand L, Langley T, Wiekzorek I (2000) A Replicated Assessment and Comparison of Common Software Cost Modeling Techniques. In Proceedings of International Conference on Software Engineering, IEEE press, pp 377–386
Zurück zum Zitat Briand L, Labiche Y, PentaM D, Yan-Bondoc H (2005) An experimental investigation of formality in UML-based development. IEEE TSE 31(10):833–849 Briand L, Labiche Y, PentaM D, Yan-Bondoc H (2005) An experimental investigation of formality in UML-based development. IEEE TSE 31(10):833–849
Zurück zum Zitat Christodoulou SP, Zafiris PA, Papatheodorou TS (2000) WWW2000: The Developer's view and a practitioner's approach to Web Engineering. Procs. ICSE Workshop on Web Engineering, Limerick, Ireland, pp 75–92 Christodoulou SP, Zafiris PA, Papatheodorou TS (2000) WWW2000: The Developer's view and a practitioner's approach to Web Engineering. Procs. ICSE Workshop on Web Engineering, Limerick, Ireland, pp 75–92
Zurück zum Zitat Chulani S, Boehm B, Steece B (1999) Bayesian Analysis of Empirical Software Engineering Cost Models. IEEE TSE 25:573–583 Chulani S, Boehm B, Steece B (1999) Bayesian Analysis of Empirical Software Engineering Cost Models. IEEE TSE 25:573–583
Zurück zum Zitat Conover WJ (1998) Practical nonparametric statistics, 3rd edn. Wiley, New York Conover WJ (1998) Practical nonparametric statistics, 3rd edn. Wiley, New York
Zurück zum Zitat Conte SD, Dunsmore HE, Shen VY (1986) Software Engineering Metrics and Models. Benjamin-Cummins Conte SD, Dunsmore HE, Shen VY (1986) Software Engineering Metrics and Models. Benjamin-Cummins
Zurück zum Zitat Corazza A, Di Martino S, Ferrucci F, Gravino C, Mendes E (2009) Applying Support Vector Regression for Web Effort Estimation using a Cross-Company Dataset. In Proceedings of Empirical Software Engineering and Measurement (ESEM’09), Lake Buena Vista Florida, pp 17-19, Ottobre Corazza A, Di Martino S, Ferrucci F, Gravino C, Mendes E (2009) Applying Support Vector Regression for Web Effort Estimation using a Cross-Company Dataset. In Proceedings of Empirical Software Engineering and Measurement (ESEM’09), Lake Buena Vista Florida, pp 17-19, Ottobre
Zurück zum Zitat Cortes C, Vapnik V (1995) Support-Vector Networks. Mach Learn 20 Cortes C, Vapnik V (1995) Support-Vector Networks. Mach Learn 20
Zurück zum Zitat Costagliola G, Di Martino S, Ferrucci F, Gravino C, Tortora G, Vitiello G (2006) Effort estimation modeling techniques: a case study for web applications. Procs. Intl. Conference on Web Engineering (ICWE’06), 9-16 Costagliola G, Di Martino S, Ferrucci F, Gravino C, Tortora G, Vitiello G (2006) Effort estimation modeling techniques: a case study for web applications. Procs. Intl. Conference on Web Engineering (ICWE’06), 9-16
Zurück zum Zitat Desharnais JM (1989) Analyse statistique de la productivitie des projets in834 formatique a partie de la technique des point des fonction, Ph.D. thesis, 835 Unpublished Masters Thesis, University of Montreal Desharnais JM (1989) Analyse statistique de la productivitie des projets in834 formatique a partie de la technique des point des fonction, Ph.D. thesis, 835 Unpublished Masters Thesis, University of Montreal
Zurück zum Zitat Di Martino S, Ferrucci F, Gravino C, Mendes E (2007) Comparing Size Measures for Predicting Web Application Development Effort: A Case Study. Procs. Empirical Software Engineering and Measurement, IEEE press, pp. 324–333 Di Martino S, Ferrucci F, Gravino C, Mendes E (2007) Comparing Size Measures for Predicting Web Application Development Effort: A Case Study. Procs. Empirical Software Engineering and Measurement, IEEE press, pp. 324–333
Zurück zum Zitat Huang C-L, Wang C-J (2006) A GA-based feature selection and parameters optimization for support vector machines. Expert Syst Appl 31(2):231–240CrossRef Huang C-L, Wang C-J (2006) A GA-based feature selection and parameters optimization for support vector machines. Expert Syst Appl 31(2):231–240CrossRef
Zurück zum Zitat Joachims T (1999) Making large-Scale SVM Learning Practical. In: Schölkopf B, Burges CJC, Smola AJ (eds) Advances in Kernel Methods— Support Vector Learning. MIT Press, Cambridge, MA Joachims T (1999) Making large-Scale SVM Learning Practical. In: Schölkopf B, Burges CJC, Smola AJ (eds) Advances in Kernel Methods— Support Vector Learning. MIT Press, Cambridge, MA
Zurück zum Zitat Kitchenham BA (1998) A Procedure for Analyzing Unbalanced Datasets. IEEE TSE 24(4):278–301 Kitchenham BA (1998) A Procedure for Analyzing Unbalanced Datasets. IEEE TSE 24(4):278–301
Zurück zum Zitat Kitchenham BA, Mendes E (2004) A Comparison of Cross-company and Single-company Effort Estimation Models for Web Applications. Proc EASE 2004:47–55 Kitchenham BA, Mendes E (2004) A Comparison of Cross-company and Single-company Effort Estimation Models for Web Applications. Proc EASE 2004:47–55
Zurück zum Zitat Kitchenham B, Pickard L, Peeger S (1995) Case studies for method and tool evaluation. IEEE Softw 12(4):52–62CrossRef Kitchenham B, Pickard L, Peeger S (1995) Case studies for method and tool evaluation. IEEE Softw 12(4):52–62CrossRef
Zurück zum Zitat Kitchenham B, Pickard LM, MacDonell SG, Shepperd MJ (2001) What accuracy statistics really measure. IEE Proc Softw 148(3):81–85CrossRef Kitchenham B, Pickard LM, MacDonell SG, Shepperd MJ (2001) What accuracy statistics really measure. IEE Proc Softw 148(3):81–85CrossRef
Zurück zum Zitat Kitchenham BA, Mendes E, Travassos G (2006) A Systematic Review of Cross- and Within-company Cost Estimation Studies”, Procs. Empirical Assessment in Software Engineering, pp 89-98 Kitchenham BA, Mendes E, Travassos G (2006) A Systematic Review of Cross- and Within-company Cost Estimation Studies”, Procs. Empirical Assessment in Software Engineering, pp 89-98
Zurück zum Zitat Kitchenham B, Mendes E, Travassos G (2007) Cross versus Within-Company Cost Estimation Studies: A systematic Review. IEEE TSE 33(5):316–329 Kitchenham B, Mendes E, Travassos G (2007) Cross versus Within-Company Cost Estimation Studies: A systematic Review. IEEE TSE 33(5):316–329
Zurück zum Zitat Maxwell K (2002) Applied Statistics for Software Managers. Software Quality Institute Series, Prentice Hall Maxwell K (2002) Applied Statistics for Software Managers. Software Quality Institute Series, Prentice Hall
Zurück zum Zitat Mendes E (2008) The Use of Bayesian Networks for Web Effort Estimation: Further Investigation. Procs. International Conference on Web Engineering Mendes E (2008) The Use of Bayesian Networks for Web Effort Estimation: Further Investigation. Procs. International Conference on Web Engineering
Zurück zum Zitat Mendes E, Kitchenham BA (2004) Further Comparison of Cross-company and Within-company Effort Estimation Models for Web Applications. Procs. IEEE Metrics, pp 348-357 Mendes E, Kitchenham BA (2004) Further Comparison of Cross-company and Within-company Effort Estimation Models for Web Applications. Procs. IEEE Metrics, pp 348-357
Zurück zum Zitat Mendes E, Mosley N (2008) Bayesian Network Models for Web Effort Prediction: A Comparative Study. IEEE TSE 34(6):723–737 Mendes E, Mosley N (2008) Bayesian Network Models for Web Effort Prediction: A Comparative Study. IEEE TSE 34(6):723–737
Zurück zum Zitat Mendes E, Mosley N, Counsell S (2002) Comparison of Length, complexity and functionality as size measures for predicting Web design and authoring effort. IEE Proc Softw 149(3):86–92CrossRef Mendes E, Mosley N, Counsell S (2002) Comparison of Length, complexity and functionality as size measures for predicting Web design and authoring effort. IEE Proc Softw 149(3):86–92CrossRef
Zurück zum Zitat Mendes E, Counsell S, Mosley N, Triggs C, Watson I (2003c) A Comparative Study of Cost Estimation Models for Web Hypermedia Applications. Empir Software Eng 8(23):163–196CrossRef Mendes E, Counsell S, Mosley N, Triggs C, Watson I (2003c) A Comparative Study of Cost Estimation Models for Web Hypermedia Applications. Empir Software Eng 8(23):163–196CrossRef
Zurück zum Zitat Mendes E, Mosley N, Counsell S (2005a) Investigating Web Size Metrics for Early Web Cost Estimation. J Syst Softw 77(2):157–172CrossRef Mendes E, Mosley N, Counsell S (2005a) Investigating Web Size Metrics for Early Web Cost Estimation. J Syst Softw 77(2):157–172CrossRef
Zurück zum Zitat Mendes E, Mosley N, Counsell S (2005) Web Effort Estimation. In: Mendes E, Mosley N (eds)Web Engineering, Springer-Verlag, ISBN: 3-540-28196-7 Mendes E, Mosley N, Counsell S (2005) Web Effort Estimation. In: Mendes E, Mosley N (eds)Web Engineering, Springer-Verlag, ISBN: 3-540-28196-7
Zurück zum Zitat Mendes E, Martino SD, Ferrucci F, Gravino C (2008) Cross-company vs. single-company web effort models using the Tukutuku database: An extended study. J Syst Softw 81(5):673–690CrossRef Mendes E, Martino SD, Ferrucci F, Gravino C (2008) Cross-company vs. single-company web effort models using the Tukutuku database: An extended study. J Syst Softw 81(5):673–690CrossRef
Zurück zum Zitat Oliveira ALI (2006) Estimation of software project effort with support vector regression. Neurocomputing 69(13–15):1749–1753CrossRef Oliveira ALI (2006) Estimation of software project effort with support vector regression. Neurocomputing 69(13–15):1749–1753CrossRef
Zurück zum Zitat Schölkopf B, Smola AJ (2002) Learning with Kernels. MIT Press Schölkopf B, Smola AJ (2002) Learning with Kernels. MIT Press
Zurück zum Zitat Shepperd MJ, Kadoda G (2001) Using Simulation to Evaluate Prediction Techniques. Procs IEEE Metrics’01, London, UK, pp 349-358 Shepperd MJ, Kadoda G (2001) Using Simulation to Evaluate Prediction Techniques. Procs IEEE Metrics’01, London, UK, pp 349-358
Zurück zum Zitat Vapnik V (1998) Statistical Learning Theory. Wiley Vapnik V (1998) Statistical Learning Theory. Wiley
Zurück zum Zitat Vapnik V, Chervonenkis A (1964) A note on one class of perceptrons. Automatics and Remote Control 25 Vapnik V, Chervonenkis A (1964) A note on one class of perceptrons. Automatics and Remote Control 25
Zurück zum Zitat Vapnik V, Lerner A (1963) Pattern recognition using generalized portrait method. Autom Remote Control 24:774–780 Vapnik V, Lerner A (1963) Pattern recognition using generalized portrait method. Autom Remote Control 24:774–780
Metadaten
Titel
Investigating the use of Support Vector Regression for web effort estimation
verfasst von
Anna Corazza
Sergio Di Martino
Filomena Ferrucci
Carmine Gravino
Emilia Mendes
Publikationsdatum
01.04.2011
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 2/2011
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-010-9138-4