research-article

Can cross-company data improve performance in software effort estimation?

Authors:
Leandro L. Minku

The University of Birmingham, Birmingham, UK

The University of Birmingham, Birmingham, UK
View Profile

,
Xin Yao

The University of Birmingham, Birmingham, UK

The University of Birmingham, Birmingham, UK
View Profile

PROMISE '12: Proceedings of the 8th International Conference on Predictive Models in Software EngineeringSeptember 2012Pages 69–78https://doi.org/10.1145/2365324.2365334

Published:21 September 2012Publication History

PROMISE '12: Proceedings of the 8th International Conference on Predictive Models in Software Engineering

Pages 69–78

ABSTRACT

Background: There has been a long debate in the software engineering literature concerning how useful cross-company (CC) data are for software effort estimation (SEE) in comparison to within-company (WC) data. Studies indicate that models trained on CC data obtain either similar or worse performance than models trained solely on WC data.

Aims: We aim at investigating if CC data could help to increase performance and under what conditions.

Method: The work concentrates on the fact that SEE is a class of online learning tasks which operate in changing environments, even though most work so far has neglected that. We conduct an analysis based on the performance of different approaches considering CC and WC data. These are: (1) an approach not designed for changing environments, (2) approaches designed for changing environments and (3) a new online learning approach able to identify when CC data are helpful or detrimental.

Results: Interesting features of data sets commonly used in the SEE literature are revealed, showing that different subsets of CC data can be beneficial or detrimental depending on the moment in time. The newly proposed approach is able to benefit from that, successfully using CC data to improve performance over WC models.

Conclusions: This work not only shows that CC data can help to increase performance for SEE tasks, but also demonstrates that the online nature of software prediction tasks should be exploited, being an important issue to be considered in the future.

References

M. Baena-García, J. Del Campo-Ávila, R. Fidalgo, and A. Bifet. Early drift detection method. In IWKDDS, pages 77--86, Berlin, Germany, 2006.Google Scholar
B. Boehm. Software Engineering Economics. Prentice-Hall, Englewood Cliffs, NJ, 1981. Google ScholarDigital Library
M. Cartwright, M. Shepperd, and Q. Song. Dealing with missing software project data. In METRICS, pages 154--165, Sydney, 2003. Google ScholarDigital Library
S. Conte, H. Dunsmore, and V. Shen. Software Engineering Metrics and Models. Benjamin Cummings Publishing, Menlo Park, CA, 1986. Google ScholarDigital Library
J. Demšar. Statistical comparisons of classifiers over multiple data sets. JMLR, 7: 1--30, 2006. Google ScholarDigital Library
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: An update. SIGKDD Explorations, 11(1): 10--18, 2009. Google ScholarDigital Library
B. Kitchenham, E. Mendes, and G. Travassos. Cross versus within-company cost estimation studies: A systematic review. IEEE TSE, 33(5): 316--329, 2007. Google ScholarDigital Library
E. Kocaguneli, G. Gay, T. Menzies, Y. Yang, and J. W. Keung. When to use data from other projects for effort estimation. In ASE, pages 321--324, Antwerp, Belgium, 2010. Google ScholarDigital Library
J. Z. Kolter and M. A. Maloof. Using additive expert ensembles to cope with concept drift. In ACM ICML, pages 449--456, Bonn, Germany, 2005. Google ScholarDigital Library
J. Z. Kolter and M. A. Maloof. Dynamic weighted majority: An ensemble method for drifting concepts. JMLR, 8: 2755--2790, 2007. Google ScholarDigital Library
C. Lokan and E. Mendes. Investigating the use of chronological splitting to compare software cross-company and single-company effort predictions. In EASE, page 10p, Bari, Italy, 2008. Google ScholarDigital Library
C. Lokan and E. Mendes. Applying moving windows to software effort estimation. In ESEM, pages 111--122, Lake Buena Vista, Florida, USA, 2009. Google ScholarDigital Library
C. Lokan and E. Mendes. Investigating the use of chronological split for software effort estimation. IET-Software, 3(5): 422--434, 2009.Google ScholarCross Ref
C. Lokan and E. Mendes. Using chronological splitting to compare cross-and single-company effort models: Further investigation. In ACSC, pages 35--42, Wellington, New Zealand, 2009. Google ScholarDigital Library
E. Mendes and C. Lokan. Investigating the use of chronological splitting to compare software cross-company and single-company effort predictions: a replicated study. In EASE, page 10p, Durham, 2009. Google ScholarDigital Library
L. Minku and X. Yao. Using unreliable data for creating more reliable online learners. In IJCNN, pages 2492--2499, Brisbane, Australia, 2012.Google ScholarCross Ref
L. L. Minku, A. White, and X. Yao. The impact of diversity on on-line ensemble learning in the presence of concept drift. IEEE TKDE, 22(5): 730--742, 2010. Google ScholarDigital Library
L. L. Minku and X. Yao. A principled evaluation of ensembles of learning machines for software effort estimation. In PROMISE, pages 10p, doi: 10.1145/2020390.2020399, Banff, Canada, 2011. Google ScholarDigital Library
L. L. Minku and X. Yao. DDD: A new ensemble approach for dealing with concept drift. IEEE TKDE, 24(4): 619--633, 2012. Google ScholarDigital Library
M. L. Mitchell and J. M. Jolley. Research Design Explained. Cengage Learning, USA, 7th edition, 2010.Google Scholar
S. Muthukrishnan. Data Streams: algorithms and applications. Now Publishers Inc., Hanover, MA, 2005.Google Scholar
M. Shepperd and S. McDonell. Evaluating prediction systems in software project estimation. IST, 54(8): 820--827, 2012. Google ScholarDigital Library

Index Terms

Can cross-company data improve performance in software effort estimation?
1. Computing methodologies
  1. Machine learning
2. Social and professional topics
  1. Professional topics
    1. Management of computing and information systems
      1. Implementation management
        Pricing and resource allocation

Recommendations

How to make best use of cross-company data in software effort estimation?
ICSE 2014: Proceedings of the 36th International Conference on Software Engineering

Previous works using Cross-Company (CC) data for making Within-Company (WC) Software Effort Estimation (SEE) try to use CC data or models directly to provide predictions in the WC context. So, these data or models are only helpful when they match the ...
Read More
Clustering Dycom: An Online Cross-Company Software Effort Estimation Study
PROMISE: Proceedings of the 13th International Conference on Predictive Models and Data Analytics in Software Engineering

Background: Software Effort Estimation (SEE) can be formulated as an online learning problem, where new projects are completed over time and may become available for training. In this scenario, a Cross-Company (CC) SEE approach called Dycom can ...
Read More
On the Terms Within- and Cross-Company in Software Effort Estimation
PROMISE 2016: Proceedings of the The 12th International Conference on Predictive Models and Data Analytics in Software Engineering

Background: the terms Within-Company (WC) and Cross-Company (CC) in Software Effort Estimation (SEE) have the connotation that CC projects are considerably different from WC projects, and that WC projects are more similar to the projects being ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PROMISE '12: Proceedings of the 8th International Conference on Predictive Models in Software Engineering
September 2012
126 pages
ISBN:9781450312417
DOI:10.1145/2365324
Conference Chair:
Stefan Wagner
U Stuttgart
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 September 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
chronological split
concept drift
cross-company estimation models
ensembles of learning machines
online learning
software effort estimation
Qualifiers
- research-article
Conference

Acceptance Rates
PROMISE '12 Paper Acceptance Rate12of24submissions,50%Overall Acceptance Rate64of125submissions,51%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 32
  Total Citations
  View Citations
- 249
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Can cross-company data improve performance in software effort estimation?

PROMISE '12: Proceedings of the 8th International Conference on Predictive Models in Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

How to make best use of cross-company data in software effort estimation?

Clustering Dycom: An Online Cross-Company Software Effort Estimation Study

On the Terms Within- and Cross-Company in Software Effort Estimation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Can cross-company data improve performance in software effort estimation?

PROMISE '12: Proceedings of the 8th International Conference on Predictive Models in Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

How to make best use of cross-company data in software effort estimation?

Clustering Dycom: An Online Cross-Company Software Effort Estimation Study

On the Terms Within- and Cross-Company in Software Effort Estimation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media