ABSTRACT
Previous works using Cross-Company (CC) data for making Within-Company (WC) Software Effort Estimation (SEE) try to use CC data or models directly to provide predictions in the WC context. So, these data or models are only helpful when they match the WC context well. When they do not, a fair amount of WC training data, which are usually expensive to acquire, are still necessary to achieve good performance. We investigate how to make best use of CC data, so that we can reduce the amount of WC data while maintaining or improving performance in comparison to WC SEE models. This is done by proposing a new framework to learn the relationship between CC and WC projects explicitly, allowing CC models to be mapped to the WC context. Such mapped models can be useful even when the CC models themselves do not match the WC context directly. Our study shows that a new approach instantiating this framework is able not only to use substantially less WC data than a corresponding WC model, but also to achieve similar/better performance. This approach can also be used to provide insight into the behaviour of a company in comparison to others.
- B. Boehm. Software Engineering Economics. Prentice-Hall, Englewood Cliffs, NJ, 1981. Google ScholarDigital Library
- L. Briand, T. Langley, and I. Wieczorek. A replicated assessment of common software cost estimation techniques. In International Conference on Software Engineering (ICSE), pages 377–386, Limerick, Ireland, 2000. Google ScholarDigital Library
- K. Dejaeger, W. Verbeke, D. Martens, and B. Baesens. Data mining techniques for software effort estimation: A comparative study. IEEE Transactions on Software Engineering (TSE), 38(2):375–397, 2012. Google ScholarDigital Library
- T. Foss, E. Stensrud, B. Kitchenham, and I. Myrtveit. A simulation study of the model evaluation criterion MMRE. IEEE Transactions on Software Engineering (TSE), 29(11):985–995, 2003. Google ScholarDigital Library
- M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: An update. SIGKDD Explorations, 11(1):10–18, 2009. Google ScholarDigital Library
- ISBSG. The International Software Benchmarking Standards Group. http://www.isbsg.org, 2011.Google Scholar
- R. Jeffery, M. Ruhe, and I. Wieczorek. A comparative study of two software development cost modeling techniques using multi-organizational and company-specific data. Information and Software Technology (IST), 42(14):1009–1016, 2010.Google Scholar
- B. Kitchenham and E. Mendes. A comparison of crosscompany and single-company effort estimation models for web applications. In Empirical Assessment in Software Engineering (EASE), pages 47–55, Edinburgh, 2004.Google Scholar
- B. Kitchenham, E. Mendes, and G. Travassos. Cross versus within-company cost estimation studies: A systematic review. IEEE Transactions on Software Engineering (TSE), 33(5):316–329, 2007. Google ScholarDigital Library
- B. Kitchenham, S. L. Pfleeger, B. McColl, and S. Eagan. An empirical study of maintenance and development estimation accuracy. Journal of Systems and Software (JSS), 64:57–77, 2002. Google ScholarDigital Library
- E. Kocaguneli, B. Cukic, and H. Lu. Predicting more from less: Synergies of learning. In International NSF Sponsored Workshop on Realising Artificial Intelligence Synergies in Software Engineering (RAISE), pages 42–48, San Francisco, 2013.Google ScholarCross Ref
- E. Kocaguneli, G. Gay, T. Menzies, Y. Yang, and J. W. Keung. When to use data from other projects for effort estimation. In IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 321–324, Antwerp, Belgium, 2010. Google ScholarDigital Library
- M. Lefley and M. Shepperd. Using genetic programming to improve software effort estimation based on general data sets. In Genetic and Evolutionary Computation Conference (GECCO), pages 2477–2487, Chicago, 2003. Google ScholarDigital Library
- C. Lokan and E. Mendes. Applying moving windows to software effort estimation. In International Symposium on Empirical Software Engineering and Measurement (ESEM), pages 111–122, Lake Buena Vista, Florida, USA, 2009. Google ScholarDigital Library
- C. Lokan and E. Mendes. Investigating the use of chronological split for software effort estimation. IET-Software, 3(5):422–434, 2009.Google ScholarCross Ref
- S. G. McDonell and M. Shepperd. Comparing local and global software effort estimation models – reflections on a systematic review. In International Symposium on Empirical Software Engineering and Measurement (ESEM), pages 401–409, Madrid, 2007. Google ScholarDigital Library
- T. Menzies, A. Butcher, D. Cok, A. Marcus, L. Layman, F. Shull, B. Turhan, and T. Zimmerman. Local vs. global lessons for defect prediction and effort estimation. IEEE Transactions on Software Engineering (TSE), 39(6):822–834, 2013. Google ScholarDigital Library
- T. Menzies, B. Caglayan, Z. He, E. Kocaguneli, J. Krall, F. Peters, and B. Turhan. The promise repository of empirical software engineering data. http://promisedata.googlecode.com, 2012.Google Scholar
- T. Menzies and M. Shepperd. Special issue on repeatable results in software engineering prediction. Empirical Software Engineering (ESE), 17:1–17, 2012. Google ScholarDigital Library
- L. Minku and X. Yao. Can cross-company data improve performance in software effort estimation? In International Conference on Predictive Models in Software Engineering (PROMISE), pages 69–78, Lund, Sweden, 2012. Google ScholarDigital Library
- L. Minku and X. Yao. An analysis of multi-objective evolutionary algorithms for training ensemble models based on different performance measures in software effort estimation. In International Conference on Predictive Models in Software Engineering (PROMISE), Article No. 8, 10 pages, 2013. Google ScholarDigital Library
- L. Minku and X. Yao. Ensembles and locality: Insight on improving software effort estimation. Information and Software Technology (IST), 55(8):1512–1528, 2013.Google Scholar
- L. L. Minku and X. Yao. Software effort estimation as a multi-objective learning problem. ACM Transactions on Software Engineering and Methodology (TOSEM), 22(4):Article No. 35, 32 pages, 2013. Google ScholarDigital Library
- M. L. Mitchell and J. M. Jolley. Research Design Explained. Cengage Learning, USA, 7th edition, 2010.Google Scholar
- P. Sentas, L. Angelis, I. Stamelos, and G. Bleris. Software productivity and effort prediction with ordinal regression. Information and Software Technology (IST), 47:17–29, 2005.Google Scholar
- M. Shepperd and S. McDonell. Evaluating prediction systems in software project estimation. Information and Software Technology (IST), 54(8):820–827, 2012. Google ScholarDigital Library
- L. Song, L. Minku, and X. Yao. The impact of parameter tuning on software effort estimation using learning machines. In International Conference on Predictive Models in Software Engineering (PROMISE), Article No. 9, 10 pages, 2013. Google ScholarDigital Library
- J. Wen, S. Li, Z. Lin, Y. Hu, and C. Huang. Systematic literature review of machine learning based software development effort estimation models. Information and Software Technology (IST), 54:41–59, 2012. Google ScholarDigital Library
- I. Wieczorek and M. Ruhe. How valuable is company-specific data compared to multi-company data for software cost estimation? In IEEE International Software Metrics Symposium (METRICS), pages 237–246, Ottawa, 2002. Google ScholarDigital Library
Index Terms
- How to make best use of cross-company data in software effort estimation?
Recommendations
The impact of parameter tuning on software effort estimation using learning machines
PROMISE '13: Proceedings of the 9th International Conference on Predictive Models in Software EngineeringBackground: The use of machine learning approaches for software effort estimation (SEE) has been studied for more than a decade. Most studies performed comparisons of different learning machines on a number of data sets. However, most learning machines ...
Clustering Dycom: An Online Cross-Company Software Effort Estimation Study
PROMISE: Proceedings of the 13th International Conference on Predictive Models and Data Analytics in Software EngineeringBackground: Software Effort Estimation (SEE) can be formulated as an online learning problem, where new projects are completed over time and may become available for training. In this scenario, a Cross-Company (CC) SEE approach called Dycom can ...
Can cross-company data improve performance in software effort estimation?
PROMISE '12: Proceedings of the 8th International Conference on Predictive Models in Software EngineeringBackground: There has been a long debate in the software engineering literature concerning how useful cross-company (CC) data are for software effort estimation (SEE) in comparison to within-company (WC) data. Studies indicate that models trained on CC ...
Comments