skip to main content
10.1145/2568225.2568228acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article
Open Access

How to make best use of cross-company data in software effort estimation?

Published:31 May 2014Publication History

ABSTRACT

Previous works using Cross-Company (CC) data for making Within-Company (WC) Software Effort Estimation (SEE) try to use CC data or models directly to provide predictions in the WC context. So, these data or models are only helpful when they match the WC context well. When they do not, a fair amount of WC training data, which are usually expensive to acquire, are still necessary to achieve good performance. We investigate how to make best use of CC data, so that we can reduce the amount of WC data while maintaining or improving performance in comparison to WC SEE models. This is done by proposing a new framework to learn the relationship between CC and WC projects explicitly, allowing CC models to be mapped to the WC context. Such mapped models can be useful even when the CC models themselves do not match the WC context directly. Our study shows that a new approach instantiating this framework is able not only to use substantially less WC data than a corresponding WC model, but also to achieve similar/better performance. This approach can also be used to provide insight into the behaviour of a company in comparison to others.

References

  1. B. Boehm. Software Engineering Economics. Prentice-Hall, Englewood Cliffs, NJ, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. Briand, T. Langley, and I. Wieczorek. A replicated assessment of common software cost estimation techniques. In International Conference on Software Engineering (ICSE), pages 377–386, Limerick, Ireland, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. Dejaeger, W. Verbeke, D. Martens, and B. Baesens. Data mining techniques for software effort estimation: A comparative study. IEEE Transactions on Software Engineering (TSE), 38(2):375–397, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Foss, E. Stensrud, B. Kitchenham, and I. Myrtveit. A simulation study of the model evaluation criterion MMRE. IEEE Transactions on Software Engineering (TSE), 29(11):985–995, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: An update. SIGKDD Explorations, 11(1):10–18, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. ISBSG. The International Software Benchmarking Standards Group. http://www.isbsg.org, 2011.Google ScholarGoogle Scholar
  7. R. Jeffery, M. Ruhe, and I. Wieczorek. A comparative study of two software development cost modeling techniques using multi-organizational and company-specific data. Information and Software Technology (IST), 42(14):1009–1016, 2010.Google ScholarGoogle Scholar
  8. B. Kitchenham and E. Mendes. A comparison of crosscompany and single-company effort estimation models for web applications. In Empirical Assessment in Software Engineering (EASE), pages 47–55, Edinburgh, 2004.Google ScholarGoogle Scholar
  9. B. Kitchenham, E. Mendes, and G. Travassos. Cross versus within-company cost estimation studies: A systematic review. IEEE Transactions on Software Engineering (TSE), 33(5):316–329, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. Kitchenham, S. L. Pfleeger, B. McColl, and S. Eagan. An empirical study of maintenance and development estimation accuracy. Journal of Systems and Software (JSS), 64:57–77, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. Kocaguneli, B. Cukic, and H. Lu. Predicting more from less: Synergies of learning. In International NSF Sponsored Workshop on Realising Artificial Intelligence Synergies in Software Engineering (RAISE), pages 42–48, San Francisco, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  12. E. Kocaguneli, G. Gay, T. Menzies, Y. Yang, and J. W. Keung. When to use data from other projects for effort estimation. In IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 321–324, Antwerp, Belgium, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Lefley and M. Shepperd. Using genetic programming to improve software effort estimation based on general data sets. In Genetic and Evolutionary Computation Conference (GECCO), pages 2477–2487, Chicago, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Lokan and E. Mendes. Applying moving windows to software effort estimation. In International Symposium on Empirical Software Engineering and Measurement (ESEM), pages 111–122, Lake Buena Vista, Florida, USA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Lokan and E. Mendes. Investigating the use of chronological split for software effort estimation. IET-Software, 3(5):422–434, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  16. S. G. McDonell and M. Shepperd. Comparing local and global software effort estimation models – reflections on a systematic review. In International Symposium on Empirical Software Engineering and Measurement (ESEM), pages 401–409, Madrid, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. Menzies, A. Butcher, D. Cok, A. Marcus, L. Layman, F. Shull, B. Turhan, and T. Zimmerman. Local vs. global lessons for defect prediction and effort estimation. IEEE Transactions on Software Engineering (TSE), 39(6):822–834, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Menzies, B. Caglayan, Z. He, E. Kocaguneli, J. Krall, F. Peters, and B. Turhan. The promise repository of empirical software engineering data. http://promisedata.googlecode.com, 2012.Google ScholarGoogle Scholar
  19. T. Menzies and M. Shepperd. Special issue on repeatable results in software engineering prediction. Empirical Software Engineering (ESE), 17:1–17, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. Minku and X. Yao. Can cross-company data improve performance in software effort estimation? In International Conference on Predictive Models in Software Engineering (PROMISE), pages 69–78, Lund, Sweden, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Minku and X. Yao. An analysis of multi-objective evolutionary algorithms for training ensemble models based on different performance measures in software effort estimation. In International Conference on Predictive Models in Software Engineering (PROMISE), Article No. 8, 10 pages, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. Minku and X. Yao. Ensembles and locality: Insight on improving software effort estimation. Information and Software Technology (IST), 55(8):1512–1528, 2013.Google ScholarGoogle Scholar
  23. L. L. Minku and X. Yao. Software effort estimation as a multi-objective learning problem. ACM Transactions on Software Engineering and Methodology (TOSEM), 22(4):Article No. 35, 32 pages, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. L. Mitchell and J. M. Jolley. Research Design Explained. Cengage Learning, USA, 7th edition, 2010.Google ScholarGoogle Scholar
  25. P. Sentas, L. Angelis, I. Stamelos, and G. Bleris. Software productivity and effort prediction with ordinal regression. Information and Software Technology (IST), 47:17–29, 2005.Google ScholarGoogle Scholar
  26. M. Shepperd and S. McDonell. Evaluating prediction systems in software project estimation. Information and Software Technology (IST), 54(8):820–827, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. L. Song, L. Minku, and X. Yao. The impact of parameter tuning on software effort estimation using learning machines. In International Conference on Predictive Models in Software Engineering (PROMISE), Article No. 9, 10 pages, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Wen, S. Li, Z. Lin, Y. Hu, and C. Huang. Systematic literature review of machine learning based software development effort estimation models. Information and Software Technology (IST), 54:41–59, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. I. Wieczorek and M. Ruhe. How valuable is company-specific data compared to multi-company data for software cost estimation? In IEEE International Software Metrics Symposium (METRICS), pages 237–246, Ottawa, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. How to make best use of cross-company data in software effort estimation?

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            ICSE 2014: Proceedings of the 36th International Conference on Software Engineering
            May 2014
            1139 pages
            ISBN:9781450327565
            DOI:10.1145/2568225

            Copyright © 2014 Owner/Author

            Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 31 May 2014

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate276of1,856submissions,15%

            Upcoming Conference

            ICSE 2025

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader