research-article

Sample size vs. bias in defect prediction

Authors:
Foyzur Rahman

UC Davis, USA

UC Davis, USA
View Profile

,
Daryl Posnett

UC Davis, USA

UC Davis, USA
View Profile

,
Israel Herraiz

Universidad Politécnica de Madrid, Spain

Universidad Politécnica de Madrid, Spain
View Profile

,
Premkumar Devanbu

UC Davis, USA

UC Davis, USA
View Profile

ESEC/FSE 2013: Proceedings of the 2013 9th Joint Meeting on Foundations of Software EngineeringAugust 2013Pages 147–157https://doi.org/10.1145/2491411.2491418

Published:18 August 2013Publication History

ESEC/FSE 2013: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering

Pages 147–157

ABSTRACT

Most empirical disciplines promote the reuse and sharing of datasets, as it leads to greater possibility of replication. While this is increasingly the case in Empirical Software Engineering, some of the most popular bug-fix datasets are now known to be biased. This raises two significant concerns: first, that sample bias may lead to underperforming prediction models, and second, that the external validity of the studies based on biased datasets may be suspect. This issue has raised considerable consternation in the ESE literature in recent years. However, there is a confounding factor of these datasets that has not been examined carefully: size. Biased datasets are sampling only some of the data that could be sampled, and doing so in a biased fashion; but biased samples could be smaller, or larger. Smaller data sets in general provide less reliable bases for estimating models, and thus could lead to inferior model performance. In this setting, we ask the question, what affects performance more, bias, or size? We conduct a detailed, large-scale meta-analysis, using simulated datasets sampled with bias from a high-quality dataset which is relatively free of bias. Our results suggest that size always matters just as much bias direction, and in fact much more than bias direction when considering information-retrieval measures such as AUCROC and F-score. This indicates that at least for prediction models, even when dealing with sampling bias, simply finding larger samples can sometimes be sufficient. Our analysis also exposes the complexity of the bias issue, and raises further issues to be explored in the future.

References

E. Arisholm, L. C. Briand, and E. B. Johannessen. A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. JSS, 83(1):2–17, 2010. Google ScholarDigital Library
A. Bachmann, C. Bird, F. Rahman, P. Devanbu, and A. Bernstein. The Missing Links : Bugs and Bug-fix Commits Categories and Subject Descriptors. In Proceedings of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE 2010), volume 2 of FSE ’10, pages 97–106. ACM, 2010. Google ScholarDigital Library
C. Bird, A. Bachmann, E. Aune, and J. Duffy. Fair and balanced?: bias in bug-fix datasets. In Proceedings of the the 7th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT symposium on The Foundations of Software Engineering, 2009. Google ScholarDigital Library
C. Bird, A. Bachmann, E. Aune, J. Duffy, A. Bernstein, V. Filkov, and P. Devanbu. Fair and balanced?: bias in bug-fix datasets. In Proceedings of the the 7th FSE, pages 121–130. ACM, 2009. Google ScholarDigital Library
C. Bird, N. Nagappan, B. Murphy, H. Gall, and P. T. Devanbu. Don’t touch my code!: examining the effects of ownership on software quality. In T. Gyimóthy and A. Zeller, editors, SIGSOFT FSE, pages 4–14. ACM, 2011. Google ScholarDigital Library
J. Cohen. Applied multiple regression/correlation analysis for the behavioral sciences. Lawrence Erlbaum, 2003.Google Scholar
D. Cubrani´c and G. C. Murph. Hipikat: recommending pertinent software development artifacts. In Proc. Int’l Conf. Software Engineering (ICSE), pages 408–418, Portland, Oregon, 2003. IEEE Computer Society Press. Google ScholarDigital Library
M. Fischer, M. Pinzger, and H. Gall. Populating a release history database from version control and bug tracking systems. In Proceedings of the International Conference on Software Maintenance, pages 23–32, Los Alamitos CA, September 2003. IEEE Press. Google ScholarDigital Library
U. Grömping. Relative importance for linear regression in r: the package relaimpo. Journal of Statistical Software, 17(1):1–27, 2006.Google ScholarCross Ref
S. Kim, H. Zhang, R. Wu, and L. Gong. Dealing with noise in defect prediction. In Proceedings of the 33rd International Conference on Software Engineering, pages 481–490. ACM, 2011. Google ScholarDigital Library
S. Kim, T. Zimmermann, E. Whitehead Jr, and A. Zeller. Predicting faults from cached history. In Proceedings of the 29th ICSE, pages 489–498. IEEE Computer Society, 2007. Google ScholarDigital Library
S. Le Cessie and J. Van Houwelingen. Ridge estimators in logistic regression. Applied statistics, pages 191–201, 1992.Google Scholar
T. Menzies, J. Greenwald, and A. Frank. Data mining static code attributes to learn defect predictors. IEEE TSE, 33(1):2–13, 2007. Google ScholarDigital Library
A. Mockus and L. G. Votta. Identifying reasons for software changes using historic databases. In ICSM ’00, page 120, Washington, DC, USA, 2000. IEEE Computer Society. Google ScholarDigital Library
A. Mockus and D. M. Weiss. Predicting risk of software changes. Bell Labs Technical Journal, 5(2):169–180, 2000.Google ScholarCross Ref
R. Moser, W. Pedrycz, and G. Succi. A comparative analysis of the eﬃciency of change metrics and static code attributes for defect prediction. In W. Schäfer, M. B. Dwyer, and V. Gruhn, editors, ICSE, pages 181– 190. ACM, 2008. Google ScholarDigital Library
T. H. Nguyen, B. Adams, and A. E. Hassan. A case study of bias in bug-fix datasets. In Proceedings of WCRE, pages 259–268, 2010. Google ScholarDigital Library
D. Posnett, V. Filkov, and P. Devanbu. Ecological inference in empirical software engineering. In ASE’2011, pages 362–371. IEEE, 2011. Google ScholarDigital Library
F. Rahman and P. Devanbu. How, and why, process metrics are better. http://www.cs.ucdavis.edu/ research/tech-reports/2011/CSE-2012-33.pdf, 2012.Google Scholar
F. Rahman, D. Posnett, and P. Devanbu. Recalling the “imprecision” of cross-project defect prediction. In the 20th ACM SIGSOFT FSE, pages –. ACM, 2012. Google ScholarDigital Library
R. Wu, H. Zhang, S. Kim, and S. C. Cheung. Re-Link : Recovering Links between Bugs and Changes. In Proceedings of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE 2011), 2011. Google ScholarDigital Library
T. Zimmermann, R. Premraj, and A. Zeller. Predicting defects for eclipse. In Proceedings of the Third International Workshop on Predictor Models in Software Engineering, PROMISE ’07, pages 9–, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarDigital Library

Index Terms

Sample size vs. bias in defect prediction
1. Social and professional topics
  1. Professional topics
    1. Management of computing and information systems
      1. Software management

Recommendations

Bias in knowledge graph embeddings
ASONAM '20: Proceedings of the 12th IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

In this paper, we study bias in knowledge graph embeddings. We focus on gender bias in occupations, but our approach is applicable to other types of bias. We start by proposing measures for identifying bias in the dataset (i.e., in the KG) and then ...
Read More
A survey on bias in visual datasets
Abstract
Computer Vision (CV) has achieved remarkable results, outperforming humans in several tasks. Nonetheless, it may result in significant discrimination if not handled properly. Indeed, CV systems highly depend on training datasets and ...
Highlights
- We describe many different types of bias that can be encountered in visual data.
Read More
Preliminary comparison of techniques for dealing with imbalance in software defect prediction
EASE '14: Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering

Imbalanced data is a common problem in data mining when dealing with classification problems, where samples of a class vastly outnumber other classes. In this situation, many data mining algorithms generate poor models as they try to optimize the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ESEC/FSE 2013: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
August 2013
738 pages
ISBN:9781450322379
DOI:10.1145/2491411
General Chair:
Bertrand Meyer
ETH Zurich, Switzerland
,
Program Chairs:
Luciano Baresi
Politecnico di Milano, Italy
,
Mira Mezini
TU Darmstadt, Germany
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 August 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
bias
defect prediction
size
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate112of543submissions,21%
Upcoming Conference
FSE '24

Sponsor:

sigsoft

32nd ACM International Conference on the Foundations of Software Engineering

July 15 - 19, 2024

Ipojuca (Pernambuco) , Brazil
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 83
  Total Citations
  View Citations
- 708
  Total Downloads
- Downloads (Last 12 months)50
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Sample size vs. bias in defect prediction

ESEC/FSE 2013: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Bias in knowledge graph embeddings

A survey on bias in visual datasets

Preliminary comparison of techniques for dealing with imbalance in software defect prediction