Skip to main content
Top
Published in: Empirical Software Engineering 3/2015

01-06-2015

Transfer learning in effort estimation

Authors: Ekrem Kocaguneli, Tim Menzies, Emilia Mendes

Published in: Empirical Software Engineering | Issue 3/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

When projects lack sufficient local data to make predictions, they try to transfer information from other projects. How can we best support this process? In the field of software engineering, transfer learning has been shown to be effective for defect prediction. This paper checks whether it is possible to build transfer learners for software effort estimation. We use data on 154 projects from 2 sources to investigate transfer learning between different time intervals and 195 projects from 51 sources to provide evidence on the value of transfer learning for traditional cross-company learning problems. We find that the same transfer learning method can be useful for transfer effort estimation results for the cross-company learning problem and the cross-time learning problem. It is misguided to think that: (1) Old data of an organization is irrelevant to current context or (2) data of another organization cannot be used for local solutions. Transfer learning is a promising research direction that transfers relevant cross data between time intervals and domains.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Footnotes
1
In the literature, this is known as a negative transfer effect (Pan and Yang 2010) where transfer learning actually makes matters worse.
 
2
Terminology note: Kitchenham et al. called such transfers “cross-company” learning.
 
3
Examples of delphi localizations come from Boehm (1981) and Petersen and Wohlin (2009). Boehm divided software projects into one of the “embedded”, “semi-detached” or “organic” projects and offered different COCOMO-I effort models for each. Petersen & Wohlin offer a rich set of dimensions for contextualizing projects (processes, product, organization, etc).
 
6
Note that the literature contains numerous synonyms for data set shift including “concept shift” or “concept drift”, “changes of classification”, “changing environments”, “contrast mining in classification learning”,“fracture points” and “fractures between data”. We will use the term as defined in the above text.
 
Literature
go back to reference Alpaydin E (2010) Introduction to Machine Learning, 2nd edn. MIT Press Alpaydin E (2010) Introduction to Machine Learning, 2nd edn. MIT Press
go back to reference Arnold A, Nallapati R, Cohen W (2007) A comparative study of methods for transductive transfer learning. In: ICDM’07: 17th IEEE international conference on data mining workshops, pp 77 –82 Arnold A, Nallapati R, Cohen W (2007) A comparative study of methods for transductive transfer learning. In: ICDM’07: 17th IEEE international conference on data mining workshops, pp 77 –82
go back to reference Bettenburg N, Nagappan M, Hassan AE (2012) Think locally, act globally: improving defect and effort prediction models. In MSR’12 Bettenburg N, Nagappan M, Hassan AE (2012) Think locally, act globally: improving defect and effort prediction models. In MSR’12
go back to reference Boehm B (1981) Software engineering economics. Prentice hall Boehm B (1981) Software engineering economics. Prentice hall
go back to reference Chang C-l (1974) Finding prototypes for nearest classifiers. IEEE Trans Comput C3(11) Chang C-l (1974) Finding prototypes for nearest classifiers. IEEE Trans Comput C3(11)
go back to reference Corazza A, Di Martino S, Ferrucci F, Gravino C, Sarro F, Mendes E (2010) How effective is tabu search to configure support vector regression for effort estimation? In: Proceedings of the 6th international conference on predictive models in software engineering Corazza A, Di Martino S, Ferrucci F, Gravino C, Sarro F, Mendes E (2010) How effective is tabu search to configure support vector regression for effort estimation? In: Proceedings of the 6th international conference on predictive models in software engineering
go back to reference Rodriguez D, Herraiz I, Harrison R (2012) On software engineering repositories and their open problems. In: Proceedings RAISE’12 Rodriguez D, Herraiz I, Harrison R (2012) On software engineering repositories and their open problems. In: Proceedings RAISE’12
go back to reference Dai W, Xue G-R, Yang Q, Yong Y (2007) Transferring naive bayes classifiers for text classification. In: AAAI’07: Proceedings of the 22nd national conference on artificial intelligence, pp 540–545 Dai W, Xue G-R, Yang Q, Yong Y (2007) Transferring naive bayes classifiers for text classification. In: AAAI’07: Proceedings of the 22nd national conference on artificial intelligence, pp 540–545
go back to reference Foss T, Stensrud E, Kitchenham B, Myrtveit I (2003) A simulation study of the model evaluation criterion mmre. IEEE Trans Softw Eng 29(11):985–995CrossRef Foss T, Stensrud E, Kitchenham B, Myrtveit I (2003) A simulation study of the model evaluation criterion mmre. IEEE Trans Softw Eng 29(11):985–995CrossRef
go back to reference Foster G, Goutte C, Kuhn R (2010) Discriminative instance weighting for domain adaptation in statistical machine translation. In: EMNLP ’10: conference on empirical methods in natural language processing, pp 451–459 Foster G, Goutte C, Kuhn R (2010) Discriminative instance weighting for domain adaptation in statistical machine translation. In: EMNLP ’10: conference on empirical methods in natural language processing, pp 451–459
go back to reference Gao J, Fan W, Jiang J, Han J (2008) Knowledge transfer via multiple model local structure mapping. In: International conference on knowledge discovery and data mining. Las Vegas, NV Gao J, Fan W, Jiang J, Han J (2008) Knowledge transfer via multiple model local structure mapping. In: International conference on knowledge discovery and data mining. Las Vegas, NV
go back to reference Harman M, Jia Y, Zhang Y (2012) App store mining and analysis: Msr for app stores. In: MSR, pp 108–111 Harman M, Jia Y, Zhang Y (2012) App store mining and analysis: Msr for app stores. In: MSR, pp 108–111
go back to reference Hastie T, Tibshirani R, Friedman J (2008) The elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer Hastie T, Tibshirani R, Friedman J (2008) The elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer
go back to reference Hayes JH, Dekhtyar A, Sundaram SK (2006) Advancing candidate link generation for requirements tracing: the study of methods. IEEE Trans Softw Eng 32(1):4–19CrossRef Hayes JH, Dekhtyar A, Sundaram SK (2006) Advancing candidate link generation for requirements tracing: the study of methods. IEEE Trans Softw Eng 32(1):4–19CrossRef
go back to reference He Z, Shu F, Yang Y, Li M, Wang Q (2012) An investigation on the feasibility of cross-project defect prediction. Autom Softw Eng 19:167–199CrossRef He Z, Shu F, Yang Y, Li M, Wang Q (2012) An investigation on the feasibility of cross-project defect prediction. Autom Softw Eng 19:167–199CrossRef
go back to reference Hihn J, Habib-agahi H (1991) Cost estimation of software intensive projects: a survey of current practices. In: 13th international conference on software engineering 1991, pp 276 –287 Hihn J, Habib-agahi H (1991) Cost estimation of software intensive projects: a survey of current practices. In: 13th international conference on software engineering 1991, pp 276 –287
go back to reference Hindle A (2012) Green mining: a methodology of relating software change to power consumption. In: Proceedings, MSR’12 Hindle A (2012) Green mining: a methodology of relating software change to power consumption. In: Proceedings, MSR’12
go back to reference Huang J, Smola A, Gretton A, Borgwardt K, Scholkopf B (2007) Correcting sample selection bias by unlabeled data. In: Proceedings of the 19th Annual Conference on Neural Information Processing Systems, pp 601–608 Huang J, Smola A, Gretton A, Borgwardt K, Scholkopf B (2007) Correcting sample selection bias by unlabeled data. In: Proceedings of the 19th Annual Conference on Neural Information Processing Systems, pp 601–608
go back to reference Jiang Y, Cukic B, Menzies T, Bartlow N (2008) Comparing design and code metrics for software quality prediction. In: Proceedings PROMISE 2008, pp 11–18 Jiang Y, Cukic B, Menzies T, Bartlow N (2008) Comparing design and code metrics for software quality prediction. In: Proceedings PROMISE 2008, pp 11–18
go back to reference Kadoda G, Cartwright M, Shepperd M (2000) On configuring a case-based reasoning software project prediction system. UK CBR Workshop, Cambridge, UK, pp 1–10 Kadoda G, Cartwright M, Shepperd M (2000) On configuring a case-based reasoning software project prediction system. UK CBR Workshop, Cambridge, UK, pp 1–10
go back to reference Keung J (2008) Empirical evaluation of analogy-x for software cost estimation. In: ESEM ’08: Proceedings of the second international symposium on empirical software engineering and measurement. ACM, New York, NY, pp 294–296 Keung J (2008) Empirical evaluation of analogy-x for software cost estimation. In: ESEM ’08: Proceedings of the second international symposium on empirical software engineering and measurement. ACM, New York, NY, pp 294–296
go back to reference Keung J, Kocaguneli E, Menzies T (2012) Finding conclusion stability for selecting the best effort predictor in software effort estimation. Automated Software Engineering, pp 1–25. doi:10.1007/s10515-012-0108-5 Keung J, Kocaguneli E, Menzies T (2012) Finding conclusion stability for selecting the best effort predictor in software effort estimation. Automated Software Engineering, pp 1–25. doi:10.​1007/​s10515-012-0108-5
go back to reference Kitchenham BA, Mendes E, Travassos GH (2007) Cross versus within-company cost estimation studies: a systematic review. IEEE Trans Softw Eng 33(5):316–329CrossRef Kitchenham BA, Mendes E, Travassos GH (2007) Cross versus within-company cost estimation studies: a systematic review. IEEE Trans Softw Eng 33(5):316–329CrossRef
go back to reference Kocaguneli E, Gay G, Yang Y, Menzies T, Keung J (2010) When to use data from other projects for effort estimation. In: ASE ’10: Proceedings of the international conference on automated software engineering (short paper). New York, NY Kocaguneli E, Gay G, Yang Y, Menzies T, Keung J (2010) When to use data from other projects for effort estimation. In: ASE ’10: Proceedings of the international conference on automated software engineering (short paper). New York, NY
go back to reference Kocaguneli E, Menzies T (2011) How to find relevant data for effort estimation. In: ESEM’11: international symposium on empirical software engineering and measurement Kocaguneli E, Menzies T (2011) How to find relevant data for effort estimation. In: ESEM’11: international symposium on empirical software engineering and measurement
go back to reference Kocaguneli E, Menzies T (2012) Software effort models should be assessed via leave-one-out validation. Under Review Kocaguneli E, Menzies T (2012) Software effort models should be assessed via leave-one-out validation. Under Review
go back to reference Kocaguneli E, Menzies T, Bener A, Keung JW (2012) Exploiting the essential assumptions of analogy-based effort estimation. IEEE Trans Softw Eng 38(2):425–438CrossRef Kocaguneli E, Menzies T, Bener A, Keung JW (2012) Exploiting the essential assumptions of analogy-based effort estimation. IEEE Trans Softw Eng 38(2):425–438CrossRef
go back to reference Lee S-I, Chatalbashev V, Vickrey D, Koller D (2007) Learning a meta-level prior for feature relevance from multiple related tasks. In: ICML ’07: Proceedings of the 24th international conference on machine learning, pp 489–496 Lee S-I, Chatalbashev V, Vickrey D, Koller D (2007) Learning a meta-level prior for feature relevance from multiple related tasks. In: ICML ’07: Proceedings of the 24th international conference on machine learning, pp 489–496
go back to reference Li Y, Xie M, Goh T (2009) A study of project selection and feature weighting for analogy based software cost estimation. J Syst Softw 82:241–252CrossRef Li Y, Xie M, Goh T (2009) A study of project selection and feature weighting for analogy based software cost estimation. J Syst Softw 82:241–252CrossRef
go back to reference Lokan C, Mendes E (2009a) Applying moving windows to software effort estimation. In: ESEM’09: Proceedings of the 3rd international symposium on empirical software engineering and measurement, pp 111–122 Lokan C, Mendes E (2009a) Applying moving windows to software effort estimation. In: ESEM’09: Proceedings of the 3rd international symposium on empirical software engineering and measurement, pp 111–122
go back to reference Lokan C, Mendes E (2009b) Using chronological splitting to compare cross- and single-company effort models: further investigation. In: Proceedings of the thirty-second Australasian conference on computer science, vol 91. ACSC ’09, pp 47–54 Lokan C, Mendes E (2009b) Using chronological splitting to compare cross- and single-company effort models: further investigation. In: Proceedings of the thirty-second Australasian conference on computer science, vol 91. ACSC ’09, pp 47–54
go back to reference Ma Y, Luo G, Zeng X, Chen A (2012) Transfer learning for cross-company software defect prediction. Inf Softw Technol 54(3):248–256CrossRef Ma Y, Luo G, Zeng X, Chen A (2012) Transfer learning for cross-company software defect prediction. Inf Softw Technol 54(3):248–256CrossRef
go back to reference Mendes E, Mosley N (2008) Bayesian network models for web effort prediction: a comparative study. IEEE Trans Softw Eng 34:723–737CrossRef Mendes E, Mosley N (2008) Bayesian network models for web effort prediction: a comparative study. IEEE Trans Softw Eng 34:723–737CrossRef
go back to reference Mendes E, Mosley N, Counsell S (2005) Investigating web size metrics for early web cost estimation. J Syst Softw 77:157–172CrossRef Mendes E, Mosley N, Counsell S (2005) Investigating web size metrics for early web cost estimation. J Syst Softw 77:157–172CrossRef
go back to reference Mendes E, Watson ID, Triggs C, Mosley N, Counsell S (2003) A comparative study of cost estimation models for web hypermedia applications. Empir Softw Eng 8(2):163–196CrossRef Mendes E, Watson ID, Triggs C, Mosley N, Counsell S (2003) A comparative study of cost estimation models for web hypermedia applications. Empir Softw Eng 8(2):163–196CrossRef
go back to reference Menzies T, Butcher A, Cok D, Marcus A, Layman L, Shull F, Turhan B, Zimmermann T (2012) Local vs. global lessons for defect prediction and effort estimation. In: IEEE transactions on software engineering, p 1 Menzies T, Butcher A, Cok D, Marcus A, Layman L, Shull F, Turhan B, Zimmermann T (2012) Local vs. global lessons for defect prediction and effort estimation. In: IEEE transactions on software engineering, p 1
go back to reference Mihalkova L, Huynh T, Mooney RJ (2007) Mapping and revising markov logic networks for transfer learning. In: AAAI’07: Proceedings of the 22nd national conference on Artificial intelligence, pp 608–614 Mihalkova L, Huynh T, Mooney RJ (2007) Mapping and revising markov logic networks for transfer learning. In: AAAI’07: Proceedings of the 22nd national conference on Artificial intelligence, pp 608–614
go back to reference Milicic D, Wohlin C (2004) Distribution patterns of effort estimations. In: Euromicro conference series on software engineering and advanced applications, pp 422–429 Milicic D, Wohlin C (2004) Distribution patterns of effort estimations. In: Euromicro conference series on software engineering and advanced applications, pp 422–429
go back to reference Minku LL, Yao X (2012) Can cross-company data improve performance in software effort estimation? In: PROMISE ’12: Proceedings of the 8th international conference on predictive models in software engineering Minku LL, Yao X (2012) Can cross-company data improve performance in software effort estimation? In: PROMISE ’12: Proceedings of the 8th international conference on predictive models in software engineering
go back to reference Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359CrossRef Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359CrossRef
go back to reference Petersen K, Wohlin C (2009) Context in industrial software engineering research. In: 3rd International Symposium on empirical software engineering and measurement, 2009. ESEM 2009, pp 401 –404 Petersen K, Wohlin C (2009) Context in industrial software engineering research. In: 3rd International Symposium on empirical software engineering and measurement, 2009. ESEM 2009, pp 401 –404
go back to reference Posnett D, Filkov V, Devanbu P (2011) Ecological inference in empirical software engineering. In: Proceedings of ASE’11 Posnett D, Filkov V, Devanbu P (2011) Ecological inference in empirical software engineering. In: Proceedings of ASE’11
go back to reference Reifer D, Boehm BW, Chulani S (1999) The Rosetta Stone: Making COCOMO 81 Estimates Work with COCOMO II. Crosstalk. The Journal of Defense Software Engineering, pp 11–15 Reifer D, Boehm BW, Chulani S (1999) The Rosetta Stone: Making COCOMO 81 Estimates Work with COCOMO II. Crosstalk. The Journal of Defense Software Engineering, pp 11–15
go back to reference Robson C (2002) Real world research: a resource for social scientists and practitioner-researchers. Blackwell Publisher Ltd Robson C (2002) Real world research: a resource for social scientists and practitioner-researchers. Blackwell Publisher Ltd
go back to reference Shepperd M, MacDonell S (2012) Evaluating prediction systems in software project estimation. Inf Softw Technol 54(8):820–827CrossRef Shepperd M, MacDonell S (2012) Evaluating prediction systems in software project estimation. Inf Softw Technol 54(8):820–827CrossRef
go back to reference Shepperd M, Schofield C (1997) Estimating software project effort using analogies. IEEE Trans Softw Eng 23(11):736–743CrossRef Shepperd M, Schofield C (1997) Estimating software project effort using analogies. IEEE Trans Softw Eng 23(11):736–743CrossRef
go back to reference Stensrud E, Foss T, Kitchenham B, Myrtveit I (2002) An empirical validation of the relationship between the magnitude of relative error and project size. In: Proceedings of the 8th IEEE symposium on software metrics, pp 3–12 Stensrud E, Foss T, Kitchenham B, Myrtveit I (2002) An empirical validation of the relationship between the magnitude of relative error and project size. In: Proceedings of the 8th IEEE symposium on software metrics, pp 3–12
go back to reference Storkey A (2009) When training and test sets are different: characterizing learning transfer. In: Candela J, Sugiyama M, Schwaighofer A, Lawrence, N (eds) Dataset shift in machine learning. MIT Press, Cambridge, pp 3–28 Storkey A (2009) When training and test sets are different: characterizing learning transfer. In: Candela J, Sugiyama M, Schwaighofer A, Lawrence, N (eds) Dataset shift in machine learning. MIT Press, Cambridge, pp 3–28
go back to reference Turhan B, Menzies T, Bener A, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng 14(5):540–578CrossRef Turhan B, Menzies T, Bener A, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng 14(5):540–578CrossRef
go back to reference Wu P, Dietterich TG (2004) Improving svm accuracy by training on auxiliary data sources. In: Proceedings of the twenty-first international conference on Machine learning, ICML ’04. ACM, New York, NY, p 110 Wu P, Dietterich TG (2004) Improving svm accuracy by training on auxiliary data sources. In: Proceedings of the twenty-first international conference on Machine learning, ICML ’04. ACM, New York, NY, p 110
go back to reference Yang Y, Xie L, He Z, Li Q, Nguyen V, Boehm BW, Valerdi R (2011) Local bias and its impacts on the performance of parametric estimation models. In: PROMISE Yang Y, Xie L, He Z, Li Q, Nguyen V, Boehm BW, Valerdi R (2011) Local bias and its impacts on the performance of parametric estimation models. In: PROMISE
go back to reference Zhang H, Sheng S (2004) Learning weighted naive bayes with accurate ranking. In: ICDM ’04 4th IEEE international conference on data mining, pp 567–570 Zhang H, Sheng S (2004) Learning weighted naive bayes with accurate ranking. In: ICDM ’04 4th IEEE international conference on data mining, pp 567–570
go back to reference Zhang X, Dai W, Xue G-R, Yu Y (2007) Adaptive email spam filtering based on information theory. In: Web information systems engineering WISE 2007, Lecture notes in computer science, vol 4831. Springer Berlin/Heidelberg, pp 159–170 Zhang X, Dai W, Xue G-R, Yu Y (2007) Adaptive email spam filtering based on information theory. In: Web information systems engineering WISE 2007, Lecture notes in computer science, vol 4831. Springer Berlin/Heidelberg, pp 159–170
go back to reference Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. ESEC/FSE, pp 91–100 Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. ESEC/FSE, pp 91–100
Metadata
Title
Transfer learning in effort estimation
Authors
Ekrem Kocaguneli
Tim Menzies
Emilia Mendes
Publication date
01-06-2015
Publisher
Springer US
Published in
Empirical Software Engineering / Issue 3/2015
Print ISSN: 1382-3256
Electronic ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-014-9300-5

Other articles of this Issue 3/2015

Empirical Software Engineering 3/2015 Go to the issue

Premium Partner