skip to main content
10.1145/2491411.2491418acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Sample size vs. bias in defect prediction

Published:18 August 2013Publication History

ABSTRACT

Most empirical disciplines promote the reuse and sharing of datasets, as it leads to greater possibility of replication. While this is increasingly the case in Empirical Software Engineering, some of the most popular bug-fix datasets are now known to be biased. This raises two significant concerns: first, that sample bias may lead to underperforming prediction models, and second, that the external validity of the studies based on biased datasets may be suspect. This issue has raised considerable consternation in the ESE literature in recent years. However, there is a confounding factor of these datasets that has not been examined carefully: size. Biased datasets are sampling only some of the data that could be sampled, and doing so in a biased fashion; but biased samples could be smaller, or larger. Smaller data sets in general provide less reliable bases for estimating models, and thus could lead to inferior model performance. In this setting, we ask the question, what affects performance more, bias, or size? We conduct a detailed, large-scale meta-analysis, using simulated datasets sampled with bias from a high-quality dataset which is relatively free of bias. Our results suggest that size always matters just as much bias direction, and in fact much more than bias direction when considering information-retrieval measures such as AUCROC and F-score. This indicates that at least for prediction models, even when dealing with sampling bias, simply finding larger samples can sometimes be sufficient. Our analysis also exposes the complexity of the bias issue, and raises further issues to be explored in the future.

References

  1. E. Arisholm, L. C. Briand, and E. B. Johannessen. A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. JSS, 83(1):2–17, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Bachmann, C. Bird, F. Rahman, P. Devanbu, and A. Bernstein. The Missing Links : Bugs and Bug-fix Commits Categories and Subject Descriptors. In Proceedings of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE 2010), volume 2 of FSE ’10, pages 97–106. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Bird, A. Bachmann, E. Aune, and J. Duffy. Fair and balanced?: bias in bug-fix datasets. In Proceedings of the the 7th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT symposium on The Foundations of Software Engineering, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Bird, A. Bachmann, E. Aune, J. Duffy, A. Bernstein, V. Filkov, and P. Devanbu. Fair and balanced?: bias in bug-fix datasets. In Proceedings of the the 7th FSE, pages 121–130. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Bird, N. Nagappan, B. Murphy, H. Gall, and P. T. Devanbu. Don’t touch my code!: examining the effects of ownership on software quality. In T. Gyimóthy and A. Zeller, editors, SIGSOFT FSE, pages 4–14. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Cohen. Applied multiple regression/correlation analysis for the behavioral sciences. Lawrence Erlbaum, 2003.Google ScholarGoogle Scholar
  7. D. Cubrani´c and G. C. Murph. Hipikat: recommending pertinent software development artifacts. In Proc. Int’l Conf. Software Engineering (ICSE), pages 408–418, Portland, Oregon, 2003. IEEE Computer Society Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Fischer, M. Pinzger, and H. Gall. Populating a release history database from version control and bug tracking systems. In Proceedings of the International Conference on Software Maintenance, pages 23–32, Los Alamitos CA, September 2003. IEEE Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. U. Grömping. Relative importance for linear regression in r: the package relaimpo. Journal of Statistical Software, 17(1):1–27, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  10. S. Kim, H. Zhang, R. Wu, and L. Gong. Dealing with noise in defect prediction. In Proceedings of the 33rd International Conference on Software Engineering, pages 481–490. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Kim, T. Zimmermann, E. Whitehead Jr, and A. Zeller. Predicting faults from cached history. In Proceedings of the 29th ICSE, pages 489–498. IEEE Computer Society, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Le Cessie and J. Van Houwelingen. Ridge estimators in logistic regression. Applied statistics, pages 191–201, 1992.Google ScholarGoogle Scholar
  13. T. Menzies, J. Greenwald, and A. Frank. Data mining static code attributes to learn defect predictors. IEEE TSE, 33(1):2–13, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Mockus and L. G. Votta. Identifying reasons for software changes using historic databases. In ICSM ’00, page 120, Washington, DC, USA, 2000. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Mockus and D. M. Weiss. Predicting risk of software changes. Bell Labs Technical Journal, 5(2):169–180, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  16. R. Moser, W. Pedrycz, and G. Succi. A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In W. Schäfer, M. B. Dwyer, and V. Gruhn, editors, ICSE, pages 181– 190. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. H. Nguyen, B. Adams, and A. E. Hassan. A case study of bias in bug-fix datasets. In Proceedings of WCRE, pages 259–268, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Posnett, V. Filkov, and P. Devanbu. Ecological inference in empirical software engineering. In ASE’2011, pages 362–371. IEEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. F. Rahman and P. Devanbu. How, and why, process metrics are better. http://www.cs.ucdavis.edu/ research/tech-reports/2011/CSE-2012-33.pdf, 2012.Google ScholarGoogle Scholar
  20. F. Rahman, D. Posnett, and P. Devanbu. Recalling the “imprecision” of cross-project defect prediction. In the 20th ACM SIGSOFT FSE, pages –. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Wu, H. Zhang, S. Kim, and S. C. Cheung. Re-Link : Recovering Links between Bugs and Changes. In Proceedings of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE 2011), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Zimmermann, R. Premraj, and A. Zeller. Predicting defects for eclipse. In Proceedings of the Third International Workshop on Predictor Models in Software Engineering, PROMISE ’07, pages 9–, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Sample size vs. bias in defect prediction

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ESEC/FSE 2013: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
      August 2013
      738 pages
      ISBN:9781450322379
      DOI:10.1145/2491411

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 August 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate112of543submissions,21%

      Upcoming Conference

      FSE '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader