research-article

Open Access

How to make best use of cross-company data in software effort estimation?

Authors:
Leandro L. Minku

University of Birmingham, UK

University of Birmingham, UK
View Profile

,
Xin Yao

University of Birmingham, UK

University of Birmingham, UK
View Profile

ICSE 2014: Proceedings of the 36th International Conference on Software EngineeringMay 2014Pages 446–456https://doi.org/10.1145/2568225.2568228

Published:31 May 2014Publication History

ICSE 2014: Proceedings of the 36th International Conference on Software Engineering

Pages 446–456

ABSTRACT

Previous works using Cross-Company (CC) data for making Within-Company (WC) Software Effort Estimation (SEE) try to use CC data or models directly to provide predictions in the WC context. So, these data or models are only helpful when they match the WC context well. When they do not, a fair amount of WC training data, which are usually expensive to acquire, are still necessary to achieve good performance. We investigate how to make best use of CC data, so that we can reduce the amount of WC data while maintaining or improving performance in comparison to WC SEE models. This is done by proposing a new framework to learn the relationship between CC and WC projects explicitly, allowing CC models to be mapped to the WC context. Such mapped models can be useful even when the CC models themselves do not match the WC context directly. Our study shows that a new approach instantiating this framework is able not only to use substantially less WC data than a corresponding WC model, but also to achieve similar/better performance. This approach can also be used to provide insight into the behaviour of a company in comparison to others.

References

B. Boehm. Software Engineering Economics. Prentice-Hall, Englewood Cliffs, NJ, 1981. Google ScholarDigital Library
L. Briand, T. Langley, and I. Wieczorek. A replicated assessment of common software cost estimation techniques. In International Conference on Software Engineering (ICSE), pages 377–386, Limerick, Ireland, 2000. Google ScholarDigital Library
K. Dejaeger, W. Verbeke, D. Martens, and B. Baesens. Data mining techniques for software effort estimation: A comparative study. IEEE Transactions on Software Engineering (TSE), 38(2):375–397, 2012. Google ScholarDigital Library
T. Foss, E. Stensrud, B. Kitchenham, and I. Myrtveit. A simulation study of the model evaluation criterion MMRE. IEEE Transactions on Software Engineering (TSE), 29(11):985–995, 2003. Google ScholarDigital Library
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: An update. SIGKDD Explorations, 11(1):10–18, 2009. Google ScholarDigital Library
ISBSG. The International Software Benchmarking Standards Group. http://www.isbsg.org, 2011.Google Scholar
R. Jeffery, M. Ruhe, and I. Wieczorek. A comparative study of two software development cost modeling techniques using multi-organizational and company-specific data. Information and Software Technology (IST), 42(14):1009–1016, 2010.Google Scholar
B. Kitchenham and E. Mendes. A comparison of crosscompany and single-company effort estimation models for web applications. In Empirical Assessment in Software Engineering (EASE), pages 47–55, Edinburgh, 2004.Google Scholar
B. Kitchenham, E. Mendes, and G. Travassos. Cross versus within-company cost estimation studies: A systematic review. IEEE Transactions on Software Engineering (TSE), 33(5):316–329, 2007. Google ScholarDigital Library
B. Kitchenham, S. L. Pfleeger, B. McColl, and S. Eagan. An empirical study of maintenance and development estimation accuracy. Journal of Systems and Software (JSS), 64:57–77, 2002. Google ScholarDigital Library
E. Kocaguneli, B. Cukic, and H. Lu. Predicting more from less: Synergies of learning. In International NSF Sponsored Workshop on Realising Artificial Intelligence Synergies in Software Engineering (RAISE), pages 42–48, San Francisco, 2013.Google ScholarCross Ref
E. Kocaguneli, G. Gay, T. Menzies, Y. Yang, and J. W. Keung. When to use data from other projects for effort estimation. In IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 321–324, Antwerp, Belgium, 2010. Google ScholarDigital Library
M. Lefley and M. Shepperd. Using genetic programming to improve software effort estimation based on general data sets. In Genetic and Evolutionary Computation Conference (GECCO), pages 2477–2487, Chicago, 2003. Google ScholarDigital Library
C. Lokan and E. Mendes. Applying moving windows to software effort estimation. In International Symposium on Empirical Software Engineering and Measurement (ESEM), pages 111–122, Lake Buena Vista, Florida, USA, 2009. Google ScholarDigital Library
C. Lokan and E. Mendes. Investigating the use of chronological split for software effort estimation. IET-Software, 3(5):422–434, 2009.Google ScholarCross Ref
S. G. McDonell and M. Shepperd. Comparing local and global software effort estimation models – reflections on a systematic review. In International Symposium on Empirical Software Engineering and Measurement (ESEM), pages 401–409, Madrid, 2007. Google ScholarDigital Library
T. Menzies, A. Butcher, D. Cok, A. Marcus, L. Layman, F. Shull, B. Turhan, and T. Zimmerman. Local vs. global lessons for defect prediction and effort estimation. IEEE Transactions on Software Engineering (TSE), 39(6):822–834, 2013. Google ScholarDigital Library
T. Menzies, B. Caglayan, Z. He, E. Kocaguneli, J. Krall, F. Peters, and B. Turhan. The promise repository of empirical software engineering data. http://promisedata.googlecode.com, 2012.Google Scholar
T. Menzies and M. Shepperd. Special issue on repeatable results in software engineering prediction. Empirical Software Engineering (ESE), 17:1–17, 2012. Google ScholarDigital Library
L. Minku and X. Yao. Can cross-company data improve performance in software effort estimation? In International Conference on Predictive Models in Software Engineering (PROMISE), pages 69–78, Lund, Sweden, 2012. Google ScholarDigital Library
L. Minku and X. Yao. An analysis of multi-objective evolutionary algorithms for training ensemble models based on different performance measures in software effort estimation. In International Conference on Predictive Models in Software Engineering (PROMISE), Article No. 8, 10 pages, 2013. Google ScholarDigital Library
L. Minku and X. Yao. Ensembles and locality: Insight on improving software effort estimation. Information and Software Technology (IST), 55(8):1512–1528, 2013.Google Scholar
L. L. Minku and X. Yao. Software effort estimation as a multi-objective learning problem. ACM Transactions on Software Engineering and Methodology (TOSEM), 22(4):Article No. 35, 32 pages, 2013. Google ScholarDigital Library
M. L. Mitchell and J. M. Jolley. Research Design Explained. Cengage Learning, USA, 7th edition, 2010.Google Scholar
P. Sentas, L. Angelis, I. Stamelos, and G. Bleris. Software productivity and effort prediction with ordinal regression. Information and Software Technology (IST), 47:17–29, 2005.Google Scholar
M. Shepperd and S. McDonell. Evaluating prediction systems in software project estimation. Information and Software Technology (IST), 54(8):820–827, 2012. Google ScholarDigital Library
L. Song, L. Minku, and X. Yao. The impact of parameter tuning on software effort estimation using learning machines. In International Conference on Predictive Models in Software Engineering (PROMISE), Article No. 9, 10 pages, 2013. Google ScholarDigital Library
J. Wen, S. Li, Z. Lin, Y. Hu, and C. Huang. Systematic literature review of machine learning based software development effort estimation models. Information and Software Technology (IST), 54:41–59, 2012. Google ScholarDigital Library
I. Wieczorek and M. Ruhe. How valuable is company-specific data compared to multi-company data for software cost estimation? In IEEE International Software Metrics Symposium (METRICS), pages 237–246, Ottawa, 2002. Google ScholarDigital Library

Index Terms

How to make best use of cross-company data in software effort estimation?

Recommendations

The impact of parameter tuning on software effort estimation using learning machines
PROMISE '13: Proceedings of the 9th International Conference on Predictive Models in Software Engineering

Background: The use of machine learning approaches for software effort estimation (SEE) has been studied for more than a decade. Most studies performed comparisons of different learning machines on a number of data sets. However, most learning machines ...
Read More
Clustering Dycom: An Online Cross-Company Software Effort Estimation Study
PROMISE: Proceedings of the 13th International Conference on Predictive Models and Data Analytics in Software Engineering

Background: Software Effort Estimation (SEE) can be formulated as an online learning problem, where new projects are completed over time and may become available for training. In this scenario, a Cross-Company (CC) SEE approach called Dycom can ...
Read More
Can cross-company data improve performance in software effort estimation?
PROMISE '12: Proceedings of the 8th International Conference on Predictive Models in Software Engineering

Background: There has been a long debate in the software engineering literature concerning how useful cross-company (CC) data are for software effort estimation (SEE) in comparison to within-company (WC) data. Studies indicate that models trained on CC ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICSE 2014: Proceedings of the 36th International Conference on Software Engineering
May 2014
1139 pages
ISBN:9781450327565
DOI:10.1145/2568225
General Chair:
Pankaj Jalote
IIIT-Delhi, India
,
Program Chairs:
Lionel Briand
University of Luxembourg, Luxembourg
,
André van der Hoek
University of California, Irvine, USA
Copyright © 2014 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 May 2014
Check for updates
Author Tags
Software effort estimation
cross-company learning
ensembles of learning machines
online learning
transfer learning
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate276of1,856submissions,15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 45
  Total Citations
  View Citations
- 1,109
  Total Downloads
- Downloads (Last 12 months)63
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

How to make best use of cross-company data in software effort estimation?

ICSE 2014: Proceedings of the 36th International Conference on Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

The impact of parameter tuning on software effort estimation using learning machines

Clustering Dycom: An Online Cross-Company Software Effort Estimation Study

Can cross-company data improve performance in software effort estimation?

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

How to make best use of cross-company data in software effort estimation?

ICSE 2014: Proceedings of the 36th International Conference on Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

The impact of parameter tuning on software effort estimation using learning machines

Clustering Dycom: An Online Cross-Company Software Effort Estimation Study

Can cross-company data improve performance in software effort estimation?

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media