skip to main content
10.1145/2365324.2365334acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Can cross-company data improve performance in software effort estimation?

Published:21 September 2012Publication History

ABSTRACT

Background: There has been a long debate in the software engineering literature concerning how useful cross-company (CC) data are for software effort estimation (SEE) in comparison to within-company (WC) data. Studies indicate that models trained on CC data obtain either similar or worse performance than models trained solely on WC data.

Aims: We aim at investigating if CC data could help to increase performance and under what conditions.

Method: The work concentrates on the fact that SEE is a class of online learning tasks which operate in changing environments, even though most work so far has neglected that. We conduct an analysis based on the performance of different approaches considering CC and WC data. These are: (1) an approach not designed for changing environments, (2) approaches designed for changing environments and (3) a new online learning approach able to identify when CC data are helpful or detrimental.

Results: Interesting features of data sets commonly used in the SEE literature are revealed, showing that different subsets of CC data can be beneficial or detrimental depending on the moment in time. The newly proposed approach is able to benefit from that, successfully using CC data to improve performance over WC models.

Conclusions: This work not only shows that CC data can help to increase performance for SEE tasks, but also demonstrates that the online nature of software prediction tasks should be exploited, being an important issue to be considered in the future.

References

  1. M. Baena-García, J. Del Campo-Ávila, R. Fidalgo, and A. Bifet. Early drift detection method. In IWKDDS, pages 77--86, Berlin, Germany, 2006.Google ScholarGoogle Scholar
  2. B. Boehm. Software Engineering Economics. Prentice-Hall, Englewood Cliffs, NJ, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Cartwright, M. Shepperd, and Q. Song. Dealing with missing software project data. In METRICS, pages 154--165, Sydney, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Conte, H. Dunsmore, and V. Shen. Software Engineering Metrics and Models. Benjamin Cummings Publishing, Menlo Park, CA, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Demšar. Statistical comparisons of classifiers over multiple data sets. JMLR, 7: 1--30, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: An update. SIGKDD Explorations, 11(1): 10--18, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Kitchenham, E. Mendes, and G. Travassos. Cross versus within-company cost estimation studies: A systematic review. IEEE TSE, 33(5): 316--329, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. Kocaguneli, G. Gay, T. Menzies, Y. Yang, and J. W. Keung. When to use data from other projects for effort estimation. In ASE, pages 321--324, Antwerp, Belgium, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Z. Kolter and M. A. Maloof. Using additive expert ensembles to cope with concept drift. In ACM ICML, pages 449--456, Bonn, Germany, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Z. Kolter and M. A. Maloof. Dynamic weighted majority: An ensemble method for drifting concepts. JMLR, 8: 2755--2790, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Lokan and E. Mendes. Investigating the use of chronological splitting to compare software cross-company and single-company effort predictions. In EASE, page 10p, Bari, Italy, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Lokan and E. Mendes. Applying moving windows to software effort estimation. In ESEM, pages 111--122, Lake Buena Vista, Florida, USA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Lokan and E. Mendes. Investigating the use of chronological split for software effort estimation. IET-Software, 3(5): 422--434, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  14. C. Lokan and E. Mendes. Using chronological splitting to compare cross-and single-company effort models: Further investigation. In ACSC, pages 35--42, Wellington, New Zealand, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. E. Mendes and C. Lokan. Investigating the use of chronological splitting to compare software cross-company and single-company effort predictions: a replicated study. In EASE, page 10p, Durham, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. Minku and X. Yao. Using unreliable data for creating more reliable online learners. In IJCNN, pages 2492--2499, Brisbane, Australia, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  17. L. L. Minku, A. White, and X. Yao. The impact of diversity on on-line ensemble learning in the presence of concept drift. IEEE TKDE, 22(5): 730--742, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. L. L. Minku and X. Yao. A principled evaluation of ensembles of learning machines for software effort estimation. In PROMISE, pages 10p, doi: 10.1145/2020390.2020399, Banff, Canada, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L. L. Minku and X. Yao. DDD: A new ensemble approach for dealing with concept drift. IEEE TKDE, 24(4): 619--633, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. L. Mitchell and J. M. Jolley. Research Design Explained. Cengage Learning, USA, 7th edition, 2010.Google ScholarGoogle Scholar
  21. S. Muthukrishnan. Data Streams: algorithms and applications. Now Publishers Inc., Hanover, MA, 2005.Google ScholarGoogle Scholar
  22. M. Shepperd and S. McDonell. Evaluating prediction systems in software project estimation. IST, 54(8): 820--827, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Can cross-company data improve performance in software effort estimation?

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        PROMISE '12: Proceedings of the 8th International Conference on Predictive Models in Software Engineering
        September 2012
        126 pages
        ISBN:9781450312417
        DOI:10.1145/2365324

        Copyright © 2012 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 September 2012

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        PROMISE '12 Paper Acceptance Rate12of24submissions,50%Overall Acceptance Rate64of125submissions,51%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader