skip to main content
10.1145/2786805.2786814acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Heterogeneous defect prediction

Authors Info & Claims
Published:30 August 2015Publication History

ABSTRACT

Software defect prediction is one of the most active research areas in software engineering. We can build a prediction model with defect data collected from a software project and predict defects in the same project, i.e. within-project defect prediction (WPDP). Researchers also proposed cross-project defect prediction (CPDP) to predict defects for new projects lacking in defect data by using prediction models built by other projects. In recent studies, CPDP is proved to be feasible. However, CPDP requires projects that have the same metric set, meaning the metric sets should be identical between projects. As a result, current techniques for CPDP are difficult to apply across projects with heterogeneous metric sets. To address the limitation, we propose heterogeneous defect prediction (HDP) to predict defects across projects with heterogeneous metric sets. Our HDP approach conducts metric selection and metric matching to build a prediction model between projects with heterogeneous metric sets. Our empirical study on 28 subjects shows that about 68% of predictions using our approach outperform or are comparable to WPDP with statistical significance.

References

  1. A. Arcuri and L. Briand. A practical guide for using statistical tests to assess randomized algorithms in software engineering. In Proceedings of the 33rd International Conference on Software Engineering, pages 1–10, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. V. R. Basili, L. C. Briand, and W. L. Melo. A validation of object-oriented design metrics as quality indicators. IEEE Trans. Softw. Eng., 22:751–761, October 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Bruin. newtest: command to compute new test, http://www.ats.ucla.edu/stat/stata/ado/analysis/, Feb. 2011.Google ScholarGoogle Scholar
  4. G. Canfora, A. De Lucia, M. Di Penta, R. Oliveto, A. Panichella, and S. Panichella. Multi-objective cross-project defect prediction. In Software Testing, Verification and Validation, 2013 IEEE Sixth International Conference on, March 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Catal and B. Diri. Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. Information Sciences, 179(8):1040 – 1058, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. R. Chidamber and C. F. Kemerer. A metrics suite for object oriented design. IEEE Trans. Softw. Eng., 20:476–493, June 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. W. Corder and D. I. Foreman. Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach. New Jersey: Wiley, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  8. M. D’Ambros, M. Lanza, and R. Robbes. Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empirical Software Engineering, 17(4-5):531–577, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. Fukushima, Y. Kamei, S. McIntosh, K. Yamashita, and N. Ubayashi. An empirical study of just-in-time defect prediction using cross-project models. In Proceedings of the 11th Working Conference on Mining Software Repositories, pages 172–181, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. Gao, T. M. Khoshgoftaar, H. Wang, and N. Seliya. Choosing software metrics for defect prediction: An investigation on feature selection techniques. Softw. Pract. Exper., 41(5):579–606, Apr. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. Ghotra, S. McIntosh, and A. E. Hassan. Revisiting the impact of classification techniques on the performance of defect prediction models. In Proc. of the 37th Int’l Conf. on Software Engineering (ICSE), ICSE ’15, pages 789–800, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  12. E. Giger, M. D’Ambros, M. Pinzger, and H. C. Gall. Method-level bug prediction. In Proceedings of the ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, pages 171–180, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. I. Guyon and A. Elisseeff. An introduction to variable and feature selection. J. Mach. Learn. Res., 3:1157–1182, Mar. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: an update. SIGKDD Explor. Newsl., 11:10–18, November 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Hall and G. Holmes. Benchmarking attribute selection techniques for discrete class data mining. Knowledge and Data Engineering, IEEE Transactions on, 15(6):1437–1447, Nov 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. H. Halstead. Elements of Software Science (Operating and Programming Systems Series). Elsevier Science Inc., New York, NY, USA, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. He, B. Li, X. Liu, J. Chen, and Y. Ma. An empirical study on software defect prediction with a simplified metric set. Information and Software Technology, 59(0):170 – 190, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. He, B. Li, and Y. Ma. Towards cross-project defect prediction with imbalanced feature sets. CoRR, abs/1411.4228, 2014.Google ScholarGoogle Scholar
  19. Z. He, F. Shu, Y. Yang, M. Li, and Q. Wang. An investigation on the feasibility of cross-project defect prediction. Automated Software Engineering, 19(2):167–199, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Kamei, E. Shihab, B. Adams, A. Hassan, A. Mockus, A. Sinha, and N. Ubayashi. A large-scale empirical study of just-in-time quality assurance. Software Engineering, IEEE Transactions on, 39(6):757–773, June 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Kim, Y. Tao, S. Kim, and A. Zeller. Where should we fix this bug? a two-phase recommendation model. Software Engineering, IEEE Transactions on, 39(11):1597–1610, Nov 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Kläs, F. Elberzhager, J. Münch, K. Hartjes, and O. von Graevemeyer. Transparent combination of expert and measurement data for defect prediction: an industrial case study. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2, pages 119–128, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. E. Kocaguneli, T. Menzies, J. Keung, D. Cok, and R. Madachy. Active learning and effort estimation: Finding the essential content of software effort estimation data. Software Engineering, IEEE Transactions on, 39(8):1040–1053, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. Lee, J. Nam, D. Han, S. Kim, and I. P. Hoh. Micro interaction metrics for defect prediction. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Lessmann, B. Baesens, C. Mues, and S. Pietsch. Benchmarking classification models for software defect prediction: A proposed framework and novel findings. Software Engineering, IEEE Transactions on, 34(4):485–496, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Li, H. Zhang, R. Wu, and Z.-H. Zhou. Sample-based software defect prediction with active and semi-supervised learning. Automated Software Engineering, 19(2):201–230, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. H. W. Lilliefors. On the kolmogorov-smirnov test for normality with mean and variance unknown. Journal of the American Statistical Association, 62(318):pp. 399–402, 1967.Google ScholarGoogle ScholarCross RefCross Ref
  28. H. Liu, J. Li, and L. Wong. A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Informatics, 13:51–60, 2002.Google ScholarGoogle Scholar
  29. Y. Ma, G. Luo, X. Zeng, and A. Chen. Transfer learning for cross-company software defect prediction. Inf. Softw. Technol., 54(3):248–256, Mar. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. F. J. Massey. The kolmogorov-smirnov test for goodness of fit. Journal of the American Statistical Association, 46(253):68–78, 1951.Google ScholarGoogle ScholarCross RefCross Ref
  31. J. Matouek and B. Gärtner. Understanding and Using Linear Programming (Universitext). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006. Google ScholarGoogle ScholarCross RefCross Ref
  32. T. McCabe. A complexity measure. Software Engineering, IEEE Transactions on, SE-2(4):308–320, Dec 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. T. Mende. Replication of defect prediction studies: Problems, pitfalls and recommendations. In Proceedings of the 6th International Conference on Predictive Models in Software Engineering, pages 5:1–5:10, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Meneely, L. Williams, W. Snipes, and J. Osborne. Predicting failures with developer networks and social network analysis. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, pages 13–23, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. T. Menzies, B. Caglayan, Z. He, E. Kocaguneli, J. Krall, F. Peters, and B. Turhan. The promise repository of empirical software engineering data, June 2012.Google ScholarGoogle Scholar
  36. T. Menzies, J. Greenwald, and A. Frank. Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng., 33:2–13, January 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. Nam, S. J. Pan, and S. Kim. Transfer defect learning. In Proceedings of the 2013 International Conference on Software Engineering, pages 382–391, Piscataway, NJ, USA, 2013. IEEE Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. T. Ostrand, E. Weyuker, and R. Bell. Predicting the location and number of faults in large software systems. Software Engineering, IEEE Transactions on, 31(4):340–355, April 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. A. Panichella, R. Oliveto, and A. De Lucia. Cross-project defect prediction models: L’union fait la force. In Software Maintenance, Reengineering and Reverse Engineering (CSMR-WCRE), 2014 Software Evolution Week - IEEE Conference on, pages 164–173, Feb 2014.Google ScholarGoogle ScholarCross RefCross Ref
  40. F. Peters and T. Menzies. Privacy and utility for defect prediction: experiments with morph. In Proceedings of the 2012 International Conference on Software Engineering, pages 189–199, Piscataway, NJ, USA, 2012. IEEE Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. M. Pinzger, N. Nagappan, and B. Murphy. Can developer-module networks predict failures? In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, pages 2–12, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. F. Rahman and P. Devanbu. How, and why, process metrics are better. In Proceedings of the 2013 International Conference on Software Engineering, Piscataway, NJ, USA, 2013. IEEE Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. F. Rahman, D. Posnett, and P. Devanbu. Recalling the “imprecision” of cross-project defect prediction. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. D. Ryu, O. Choi, and J. Baik. Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empirical Software Engineering, pages 1–29, 2014.Google ScholarGoogle Scholar
  45. M. Shepperd, Q. Song, Z. Sun, and C. Mair. Data quality: Some comments on the nasa software defect datasets. Software Engineering, IEEE Transactions on, 39(9):1208–1215, Sept 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. E. Shihab, A. Mockus, Y. Kamei, B. Adams, and A. E. Hassan. High-impact defects: a study of breakage and surprise defects. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering, pages 300–310, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. S. Shivaji, E. J. Whitehead, R. Akella, and S. Kim. Reducing features to improve code change-based bug prediction. IEEE Transactions on Software Engineering, 39(4):552–569, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Q. Song, Z. Jia, M. Shepperd, S. Ying, and J. Liu. A general software defect-proneness prediction framework. Software Engineering, IEEE Transactions on, 37(3):356–370, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. C. Spearman. The proof and measurement of association between two things. International Journal of Epidemiology, 39(5):1137–1150, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  50. B. Turhan. On the dataset shift problem in software engineering prediction models. Empirical Software Engineering, 17(1-2):62–74, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. B. Turhan, T. Menzies, A. B. Bener, and J. Di Stefano. On the relative value of cross-company and within-company data for defect prediction. Empirical Softw. Eng., 14:540–578, October 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Understand 2.0. http://www.scitools.com/products/.Google ScholarGoogle Scholar
  53. G. Valentini and T. G. Dietterich. Low bias bagged support vector machines. In Proceedings of the Twentieth International Conference on Machine Learning, pages 752–759. AAAI Press, 2003.Google ScholarGoogle Scholar
  54. S. Watanabe, H. Kaiya, and K. Kaijiri. Adapting a fault prediction model to allow inter languagereuse. In Proceedings of the 4th International Workshop on Predictor Models in Software Engineering, pages 19–24, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. F. Wilcoxon. Individual Comparisons by Ranking Methods. Biometrics Bulletin, 1(6):80–83, Dec. 1945.Google ScholarGoogle ScholarCross RefCross Ref
  56. R. Wu, H. Zhang, S. Kim, and S. Cheung. Relink: Recovering links between bugs and changes. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. F. Zhang, A. Mockus, I. Keivanloo, and Y. Zou. Towards building a universal defect prediction model. In Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pages 182–191, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. T. Zimmermann and N. Nagappan. Predicting defects using network analysis on dependency graphs. In Proceedings of the 30th international conference on Software engineering, pages 531–540, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. T. Zimmermann, N. Nagappan, H. Gall, E. Giger, and B. Murphy. Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, pages 91–100, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Heterogeneous defect prediction

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering
          August 2015
          1068 pages
          ISBN:9781450336758
          DOI:10.1145/2786805

          Copyright © 2015 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 30 August 2015

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate112of543submissions,21%

          Upcoming Conference

          FSE '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader