skip to main content
10.1145/1868328.1868342acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Towards identifying software project clusters with regard to defect prediction

Published:12 September 2010Publication History

ABSTRACT

Background: This paper describes an analysis that was conducted on newly collected repository with 92 versions of 38 proprietary, open-source and academic projects. A preliminary study perfomed before showed the need for a further in-depth analysis in order to identify project clusters.

Aims: The goal of this research is to perform clustering on software projects in order to identify groups of software projects with similar characteristic from the defect prediction point of view. One defect prediction model should work well for all projects that belong to such group. The existence of those groups was investigated with statistical tests and by comparing the mean value of prediction efficiency.

Method: Hierarchical and k-means clustering, as well as Kohonen's neural network was used to find groups of similar projects. The obtained clusters were investigated with the discriminant analysis. For each of the identified group a statistical analysis has been conducted in order to distinguish whether this group really exists. Two defect prediction models were created for each of the identified groups. The first one was based on the projects that belong to a given group, and the second one - on all the projects. Then, both models were applied to all versions of projects from the investigated group. If the predictions from the model based on projects that belong to the identified group are significantly better than the all-projects model (the mean values were compared and statistical tests were used), we conclude that the group really exists.

Results: Six different clusters were identified and the existence of two of them was statistically proven: 1) cluster proprietary B -- T=19, p=0.035, r=0.40; 2) cluster proprietary/open - t(17)=3.18, p=0.05, r=0.59. The obtained effect sizes (r) represent large effects according to Cohen's benchmark, which is a substantial finding.

Conclusions: The two identified clusters were described and compared with results obtained by other researchers. The results of this work makes next step towards defining formal methods of reuse defect prediction models by identifying groups of projects within which the same defect prediction model may be used. Furthermore, a method of clustering was suggested and applied.

References

  1. }}Bansiya, J. and Davis, C. G. 2002. A Hierarchical Model for Object-Oriented Design Quality Assessment. IEEE Trans. on Software Engineering, 28, 1, (January 2002). 4--17. DOI= http://doi.ieeecomputersociety.org/10.1109/32.979986 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. }}Bell, R., Ostrand T., Weyuker E. 2006. Looking for Bugs in all the right places. In Proceeding of the 2006 International Symposium on Software Testing and Analysis (Portland, USA, July 17--20, 2006). ISSTA 2006. ACM Press New Your, NY, 61--72. DOI= http://doi.acm.org/10.1145/1146238.1146246 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. }}Catal, C., Diri, B., Ozumut, B. 2007. An Artificial Immune System Approach for Fault Prediction in Object-Oriented Software. In Proceedings of the 2nd International Conference on Dependability of Computer Systems (Szklarska Poręba, Poland, 14--16 June, 2007). DepCoS-RELCOMEX 2007. IEEE. 238--245. DOI= http://doi.ieeecomputersociety.org/10.1109/DEPCOS-RELCOMEX.2007.8 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. }}Chidamber, S. R., and Kemerer, C. F. A metrics suite for object oriented design. IEEE Transaction on Software Engineering, 20, 6, (June 1994). 476--493. DOI= http://doi.ieeecomputersociety.org/10.1109/32.295895 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. }}Fenton, E., Neil, M. 1999. A Critique of Software Defect Prediction Models. IEEE Transaction on Software Engineering, 25, 5, (September 1999). 675--689. DOI= http://dx.doi.org/10.1109/32.815326 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. }}Henderson-Sellers, B., Object-Oriented Metrics, measures of Complexity. Prentice Hall, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. }}Hand, D. J. 1998. Discriminant Analysis, Linear. In Encyclopedia of Biostatistics Volume 2 (P. Armitage and T. Colton, Eds.). Chichester: Wiley.Google ScholarGoogle Scholar
  8. }}Jureczko, M., Spinellis D. 2010. Using Object-Oriented Design Metrics to Predict Software Defects. In Models and Methods of System Dependability. Oficyna Wydawnicza Politechniki Wrocławskiej. 69--81.Google ScholarGoogle Scholar
  9. }}Jureczko, M. 2007. Use of software metrics for finding weak points of object oriented projects. Proceeding of Metody i narzęedzia wytwarzania oprogramowania (Szklarska Poręba, Poland, 14--16 May, 2007). 133--144.Google ScholarGoogle Scholar
  10. }}Jureczko, M., 2008. Ocena jakości obiektowo zorientowanego projektu programistycznego na podstawie metryk oprogramowania. In Inżynieria oprogramowania -- metody wytwarzania i wybrane zastosowania. PWN. 364--377.Google ScholarGoogle Scholar
  11. }}Jureczko, M., Madeyski, L. 2010. Predykcja defektów na podstawie metryk oprogramowania -- identyfikacja klas projektów. Submitted on KKIO 2010.Google ScholarGoogle Scholar
  12. }}Koru, A. G., Liu, H. 2005. Building Defect Prediction Models in Practice. IEEE Software 22, 6, (December 2005). 23--29. DOI= http://dx.doi.org/10.1109/MS.2005.149 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. }}Madeyski, L., Test-Driven Development: An Empirical Evaluation of Agile Practice. Springer, 2010. DOI= http://dx.doi.org/10.1007/978-3-642-04288-1 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. }}McCabe, T. J. 1976. A complexity measure. IEEE Trans. on Software Engineering, 2, 4, (1976). 308--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. }}Martin, R. 1994. OO Design Quality Metrics - An Analysis of Dependencies. Proceeding of Workshop Pragmatic and Theoretical Directions in Object-Oriented Software Metrics, OOPSLA'94.Google ScholarGoogle Scholar
  16. }}Nagappan N., Ball T., Zeller A. 2006. Mining Metrics to Predict Component Failures. In Proceedings of the 28th International Conference on Software Engineering. (Shanghai, China, May 20--28, 2006). ICSE'06. ACM Press New Your, NY, 452--461. DOI= http://doi.acm.org/10.1145/1134285.1134349 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. }}Olague, H., Etzkorn, L., Gholston S., Quattlebaum S. 2007. Empirical Validation of Three Software Metrics Suites to Predict Fault-Proneness of Object-Oriented Classes Developed Using Highly Iterative or Agile Software Development Processes. IEEE Transaction on Software Engineering 33, 6, (June 2007). 402--419. DOI= http://doi.ieeecomputersociety.org/10.1109/TSE.2007.1015 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. }}Ostrand, T., Weyuker, E., Bell, R. 2005. Predicting the Location and Number of Faults in Large Software Systems. IEEE Trans. on Software Engineering 31, 4, (April 2005). 340--355. DOI= http://dx.doi.org/10.1109/TSE.2005.49 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. }}Purao, S. and Vaishnavi V. K. 2003. Product metrics for object-oriented systems. ACM Computing Surveys 35, 2 (June 2003). 191--221. DOI=http://doi.acm.org/10.1145/857076.857090 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. }}Tang, M-H., Kao, M-H., and Chen, M-H. 1999. An Empirical Study on Object-Oriented Metrics. In Proceedings of the Sixth International Software Metrics Symposium (Boca Raton, USA, 4--6 November, 1999). 242--249. DOI= http://dx.doi.org/10.1109/METRIC.1999.809718 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. }}Turhan, B., Menzies, T., Bener, A., Distefano, J. 2009. On the Relative Value of Cross-Company and Within-Company Data for Defect Prediction. Empirical Software Engineering 14, 5, (October 2009). 540--578. DOI=http://dx.doi.org/10.1007/s10664-008-9103-7 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. }}Watanabe, S., Kaiya, H., Kaijiri K. 2008. Adapting a Fault Prediction Model to Allow Inter Language Reuse. In Proceedings of the 4th International Workshop on Predictive Models in Software Engineering (Leipzig, Germany, May 12--13, 2008). PROMISE'08. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. }}Wahyudin, D., Ramler, R. and Biffl S. 2008. A framework for Defect Prediction in Specific Software Project Contexts. In Proceedings of the 3rd IFIP Central and East European Conference on Software Engineering Techniques (Brno, Czech Republic, October 13--15, 2008). CEE-SET 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. }}Weyuker, E., Ostrand T., Bell R. 2008. Adapting a Fault Prediction Model to Allow Widespread Usage. In Proceedings of the the International Workshop on Predictive Models in Software Engineering (Leipzig, Germany, May 12--13, 2008). PROMISE'08.Google ScholarGoogle Scholar
  25. }}Weyuker, E., Ostrand T., Bell R. 2008. Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models. Empirical Software Engineering, 13, 5 (October 2008). 539--559. DOI= http://dx.doi.org/10.1007/s10664-008-9082-8 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. }}Zimmermann, T., Nagappan, N., Gall, H., Giger, E., Murphy, B. 2009. Cross-project Defect Prediction. In Proceedings of the 7th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (Amsterdam, The Netherlands, August 24--28 2009). ESEC/FSE 2009. 91--100. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Towards identifying software project clusters with regard to defect prediction

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          PROMISE '10: Proceedings of the 6th International Conference on Predictive Models in Software Engineering
          September 2010
          195 pages
          ISBN:9781450304047
          DOI:10.1145/1868328
          • General Chair:
          • Tim Menzies,
          • Program Chair:
          • Gunes Koru

          Copyright © 2010 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 September 2010

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate64of125submissions,51%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader