research-article

Heterogeneous defect prediction

Authors:
Jaechang Nam

Hong Kong University of Science and Technology, China

Hong Kong University of Science and Technology, China
View Profile

,
Sunghun Kim

Hong Kong University of Science and Technology, China

Hong Kong University of Science and Technology, China
View Profile

ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software EngineeringAugust 2015Pages 508–519https://doi.org/10.1145/2786805.2786814

Published:30 August 2015Publication History

ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering

Pages 508–519

ABSTRACT

Software defect prediction is one of the most active research areas in software engineering. We can build a prediction model with defect data collected from a software project and predict defects in the same project, i.e. within-project defect prediction (WPDP). Researchers also proposed cross-project defect prediction (CPDP) to predict defects for new projects lacking in defect data by using prediction models built by other projects. In recent studies, CPDP is proved to be feasible. However, CPDP requires projects that have the same metric set, meaning the metric sets should be identical between projects. As a result, current techniques for CPDP are difficult to apply across projects with heterogeneous metric sets. To address the limitation, we propose heterogeneous defect prediction (HDP) to predict defects across projects with heterogeneous metric sets. Our HDP approach conducts metric selection and metric matching to build a prediction model between projects with heterogeneous metric sets. Our empirical study on 28 subjects shows that about 68% of predictions using our approach outperform or are comparable to WPDP with statistical significance.

References

A. Arcuri and L. Briand. A practical guide for using statistical tests to assess randomized algorithms in software engineering. In Proceedings of the 33rd International Conference on Software Engineering, pages 1–10, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
V. R. Basili, L. C. Briand, and W. L. Melo. A validation of object-oriented design metrics as quality indicators. IEEE Trans. Softw. Eng., 22:751–761, October 1996. Google ScholarDigital Library
J. Bruin. newtest: command to compute new test, http://www.ats.ucla.edu/stat/stata/ado/analysis/, Feb. 2011.Google Scholar
G. Canfora, A. De Lucia, M. Di Penta, R. Oliveto, A. Panichella, and S. Panichella. Multi-objective cross-project defect prediction. In Software Testing, Verification and Validation, 2013 IEEE Sixth International Conference on, March 2013. Google ScholarDigital Library
C. Catal and B. Diri. Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. Information Sciences, 179(8):1040 – 1058, 2009. Google ScholarDigital Library
S. R. Chidamber and C. F. Kemerer. A metrics suite for object oriented design. IEEE Trans. Softw. Eng., 20:476–493, June 1994. Google ScholarDigital Library
G. W. Corder and D. I. Foreman. Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach. New Jersey: Wiley, 2009.Google ScholarCross Ref
M. D’Ambros, M. Lanza, and R. Robbes. Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empirical Software Engineering, 17(4-5):531–577, 2012. Google ScholarDigital Library
T. Fukushima, Y. Kamei, S. McIntosh, K. Yamashita, and N. Ubayashi. An empirical study of just-in-time defect prediction using cross-project models. In Proceedings of the 11th Working Conference on Mining Software Repositories, pages 172–181, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
K. Gao, T. M. Khoshgoftaar, H. Wang, and N. Seliya. Choosing software metrics for defect prediction: An investigation on feature selection techniques. Softw. Pract. Exper., 41(5):579–606, Apr. 2011. Google ScholarDigital Library
B. Ghotra, S. McIntosh, and A. E. Hassan. Revisiting the impact of classification techniques on the performance of defect prediction models. In Proc. of the 37th Int’l Conf. on Software Engineering (ICSE), ICSE ’15, pages 789–800, 2015.Google ScholarCross Ref
E. Giger, M. D’Ambros, M. Pinzger, and H. C. Gall. Method-level bug prediction. In Proceedings of the ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, pages 171–180, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
I. Guyon and A. Elisseeff. An introduction to variable and feature selection. J. Mach. Learn. Res., 3:1157–1182, Mar. 2003. Google ScholarDigital Library
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: an update. SIGKDD Explor. Newsl., 11:10–18, November 2009. Google ScholarDigital Library
M. Hall and G. Holmes. Benchmarking attribute selection techniques for discrete class data mining. Knowledge and Data Engineering, IEEE Transactions on, 15(6):1437–1447, Nov 2003. Google ScholarDigital Library
M. H. Halstead. Elements of Software Science (Operating and Programming Systems Series). Elsevier Science Inc., New York, NY, USA, 1977. Google ScholarDigital Library
P. He, B. Li, X. Liu, J. Chen, and Y. Ma. An empirical study on software defect prediction with a simplified metric set. Information and Software Technology, 59(0):170 – 190, 2015. Google ScholarDigital Library
P. He, B. Li, and Y. Ma. Towards cross-project defect prediction with imbalanced feature sets. CoRR, abs/1411.4228, 2014.Google Scholar
Z. He, F. Shu, Y. Yang, M. Li, and Q. Wang. An investigation on the feasibility of cross-project defect prediction. Automated Software Engineering, 19(2):167–199, 2012. Google ScholarDigital Library
Y. Kamei, E. Shihab, B. Adams, A. Hassan, A. Mockus, A. Sinha, and N. Ubayashi. A large-scale empirical study of just-in-time quality assurance. Software Engineering, IEEE Transactions on, 39(6):757–773, June 2013. Google ScholarDigital Library
D. Kim, Y. Tao, S. Kim, and A. Zeller. Where should we fix this bug? a two-phase recommendation model. Software Engineering, IEEE Transactions on, 39(11):1597–1610, Nov 2013. Google ScholarDigital Library
M. Kläs, F. Elberzhager, J. Münch, K. Hartjes, and O. von Graevemeyer. Transparent combination of expert and measurement data for defect prediction: an industrial case study. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2, pages 119–128, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
E. Kocaguneli, T. Menzies, J. Keung, D. Cok, and R. Madachy. Active learning and effort estimation: Finding the essential content of software effort estimation data. Software Engineering, IEEE Transactions on, 39(8):1040–1053, 2013. Google ScholarDigital Library
T. Lee, J. Nam, D. Han, S. Kim, and I. P. Hoh. Micro interaction metrics for defect prediction. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, 2011. Google ScholarDigital Library
S. Lessmann, B. Baesens, C. Mues, and S. Pietsch. Benchmarking classification models for software defect prediction: A proposed framework and novel findings. Software Engineering, IEEE Transactions on, 34(4):485–496, 2008. Google ScholarDigital Library
M. Li, H. Zhang, R. Wu, and Z.-H. Zhou. Sample-based software defect prediction with active and semi-supervised learning. Automated Software Engineering, 19(2):201–230, 2012. Google ScholarDigital Library
H. W. Lilliefors. On the kolmogorov-smirnov test for normality with mean and variance unknown. Journal of the American Statistical Association, 62(318):pp. 399–402, 1967.Google ScholarCross Ref
H. Liu, J. Li, and L. Wong. A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Informatics, 13:51–60, 2002.Google Scholar
Y. Ma, G. Luo, X. Zeng, and A. Chen. Transfer learning for cross-company software defect prediction. Inf. Softw. Technol., 54(3):248–256, Mar. 2012. Google ScholarDigital Library
F. J. Massey. The kolmogorov-smirnov test for goodness of fit. Journal of the American Statistical Association, 46(253):68–78, 1951.Google ScholarCross Ref
J. Matouek and B. Gärtner. Understanding and Using Linear Programming (Universitext). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006. Google ScholarCross Ref
T. McCabe. A complexity measure. Software Engineering, IEEE Transactions on, SE-2(4):308–320, Dec 1976. Google ScholarDigital Library
T. Mende. Replication of defect prediction studies: Problems, pitfalls and recommendations. In Proceedings of the 6th International Conference on Predictive Models in Software Engineering, pages 5:1–5:10, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
A. Meneely, L. Williams, W. Snipes, and J. Osborne. Predicting failures with developer networks and social network analysis. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, pages 13–23, 2008. Google ScholarDigital Library
T. Menzies, B. Caglayan, Z. He, E. Kocaguneli, J. Krall, F. Peters, and B. Turhan. The promise repository of empirical software engineering data, June 2012.Google Scholar
T. Menzies, J. Greenwald, and A. Frank. Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng., 33:2–13, January 2007. Google ScholarDigital Library
J. Nam, S. J. Pan, and S. Kim. Transfer defect learning. In Proceedings of the 2013 International Conference on Software Engineering, pages 382–391, Piscataway, NJ, USA, 2013. IEEE Press. Google ScholarDigital Library
T. Ostrand, E. Weyuker, and R. Bell. Predicting the location and number of faults in large software systems. Software Engineering, IEEE Transactions on, 31(4):340–355, April 2005. Google ScholarDigital Library
A. Panichella, R. Oliveto, and A. De Lucia. Cross-project defect prediction models: L’union fait la force. In Software Maintenance, Reengineering and Reverse Engineering (CSMR-WCRE), 2014 Software Evolution Week - IEEE Conference on, pages 164–173, Feb 2014.Google ScholarCross Ref
F. Peters and T. Menzies. Privacy and utility for defect prediction: experiments with morph. In Proceedings of the 2012 International Conference on Software Engineering, pages 189–199, Piscataway, NJ, USA, 2012. IEEE Press. Google ScholarDigital Library
M. Pinzger, N. Nagappan, and B. Murphy. Can developer-module networks predict failures? In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, pages 2–12, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
F. Rahman and P. Devanbu. How, and why, process metrics are better. In Proceedings of the 2013 International Conference on Software Engineering, Piscataway, NJ, USA, 2013. IEEE Press. Google ScholarDigital Library
F. Rahman, D. Posnett, and P. Devanbu. Recalling the “imprecision” of cross-project defect prediction. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
D. Ryu, O. Choi, and J. Baik. Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empirical Software Engineering, pages 1–29, 2014.Google Scholar
M. Shepperd, Q. Song, Z. Sun, and C. Mair. Data quality: Some comments on the nasa software defect datasets. Software Engineering, IEEE Transactions on, 39(9):1208–1215, Sept 2013. Google ScholarDigital Library
E. Shihab, A. Mockus, Y. Kamei, B. Adams, and A. E. Hassan. High-impact defects: a study of breakage and surprise defects. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering, pages 300–310, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
S. Shivaji, E. J. Whitehead, R. Akella, and S. Kim. Reducing features to improve code change-based bug prediction. IEEE Transactions on Software Engineering, 39(4):552–569, 2013. Google ScholarDigital Library
Q. Song, Z. Jia, M. Shepperd, S. Ying, and J. Liu. A general software defect-proneness prediction framework. Software Engineering, IEEE Transactions on, 37(3):356–370, 2011. Google ScholarDigital Library
C. Spearman. The proof and measurement of association between two things. International Journal of Epidemiology, 39(5):1137–1150, 2010.Google ScholarCross Ref
B. Turhan. On the dataset shift problem in software engineering prediction models. Empirical Software Engineering, 17(1-2):62–74, 2012. Google ScholarDigital Library
B. Turhan, T. Menzies, A. B. Bener, and J. Di Stefano. On the relative value of cross-company and within-company data for defect prediction. Empirical Softw. Eng., 14:540–578, October 2009. Google ScholarDigital Library
Understand 2.0. http://www.scitools.com/products/.Google Scholar
G. Valentini and T. G. Dietterich. Low bias bagged support vector machines. In Proceedings of the Twentieth International Conference on Machine Learning, pages 752–759. AAAI Press, 2003.Google Scholar
S. Watanabe, H. Kaiya, and K. Kaijiri. Adapting a fault prediction model to allow inter languagereuse. In Proceedings of the 4th International Workshop on Predictor Models in Software Engineering, pages 19–24, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
F. Wilcoxon. Individual Comparisons by Ranking Methods. Biometrics Bulletin, 1(6):80–83, Dec. 1945.Google ScholarCross Ref
R. Wu, H. Zhang, S. Kim, and S. Cheung. Relink: Recovering links between bugs and changes. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, 2011. Google ScholarDigital Library
F. Zhang, A. Mockus, I. Keivanloo, and Y. Zou. Towards building a universal defect prediction model. In Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pages 182–191, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
T. Zimmermann and N. Nagappan. Predicting defects using network analysis on dependency graphs. In Proceedings of the 30th international conference on Software engineering, pages 531–540, 2008. Google ScholarDigital Library
T. Zimmermann, N. Nagappan, H. Gall, E. Giger, and B. Murphy. Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, pages 91–100, New York, NY, USA, 2009. ACM. Google ScholarDigital Library

Index Terms

Heterogeneous defect prediction
1. Social and professional topics
  1. Professional topics
    1. Management of computing and information systems
      1. Project and people management
2. Software and its engineering
  1. Software creation and management
    1. Software development process management
  2. Software notations and tools
    1. Software configuration management and version control systems

Recommendations

Cross-project defect prediction: a large scale experiment on data vs. domain vs. process
ESEC/FSE '09: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering

Prediction of software defects works well within projects as long as there is a sufficient amount of data available to train any models. However, this is rarely the case for new software projects and for many companies. So far, only a few have studies ...
Read More
Moving from cross-project defect prediction to heterogeneous defect prediction: a partial replication study
CASCON '20: Proceedings of the 30th Annual International Conference on Computer Science and Software Engineering

Software defect prediction heavily relies on the metrics collected from software projects. Earlier studies often used machine learning techniques to build, validate, and improve bug prediction models using either a set of metrics collected within a ...
Read More
A Novel Feature Selection Approach based on Binary Particle Swarm Optimization and Ensemble Learning for Heterogeneous Defect Prediction
APIT '21: Proceedings of the 2021 3rd Asia Pacific Information Technology Conference

Software defect prediction is an integral part of the software development process. Defect prediction helps focus on the grey areas beforehand, thus saving the considerable amount of money that is otherwise wasted in finding and fixing the faults once ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering
August 2015
1068 pages
ISBN:9781450336758
DOI:10.1145/2786805
General Chair:
Elisabetta Di Nitto
Politecnico di Milano, Italy
,
Program Chairs:
Mark Harman
University College London, UK
,
Patrick Heymans
University of Namur, Belgium
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 August 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Defect prediction
heterogeneous metrics
quality assurance
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate112of543submissions,21%
Upcoming Conference
FSE '24

Sponsor:

sigsoft

32nd ACM International Conference on the Foundations of Software Engineering

July 15 - 19, 2024

Ipojuca (Pernambuco) , Brazil
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 102
  Total Citations
  View Citations
- 1,026
  Total Downloads
- Downloads (Last 12 months)105
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Heterogeneous defect prediction

ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cross-project defect prediction: a large scale experiment on data vs. domain vs. process

Moving from cross-project defect prediction to heterogeneous defect prediction: a partial replication study

A Novel Feature Selection Approach based on Binary Particle Swarm Optimization and Ensemble Learning for Heterogeneous Defect Prediction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Heterogeneous defect prediction

ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cross-project defect prediction: a large scale experiment on data vs. domain vs. process

Moving from cross-project defect prediction to heterogeneous defect prediction: a partial replication study

A Novel Feature Selection Approach based on Binary Particle Swarm Optimization and Ensemble Learning for Heterogeneous Defect Prediction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media