ABSTRACT
Background: This paper describes an analysis that was conducted on newly collected repository with 92 versions of 38 proprietary, open-source and academic projects. A preliminary study perfomed before showed the need for a further in-depth analysis in order to identify project clusters.
Aims: The goal of this research is to perform clustering on software projects in order to identify groups of software projects with similar characteristic from the defect prediction point of view. One defect prediction model should work well for all projects that belong to such group. The existence of those groups was investigated with statistical tests and by comparing the mean value of prediction efficiency.
Method: Hierarchical and k-means clustering, as well as Kohonen's neural network was used to find groups of similar projects. The obtained clusters were investigated with the discriminant analysis. For each of the identified group a statistical analysis has been conducted in order to distinguish whether this group really exists. Two defect prediction models were created for each of the identified groups. The first one was based on the projects that belong to a given group, and the second one - on all the projects. Then, both models were applied to all versions of projects from the investigated group. If the predictions from the model based on projects that belong to the identified group are significantly better than the all-projects model (the mean values were compared and statistical tests were used), we conclude that the group really exists.
Results: Six different clusters were identified and the existence of two of them was statistically proven: 1) cluster proprietary B -- T=19, p=0.035, r=0.40; 2) cluster proprietary/open - t(17)=3.18, p=0.05, r=0.59. The obtained effect sizes (r) represent large effects according to Cohen's benchmark, which is a substantial finding.
Conclusions: The two identified clusters were described and compared with results obtained by other researchers. The results of this work makes next step towards defining formal methods of reuse defect prediction models by identifying groups of projects within which the same defect prediction model may be used. Furthermore, a method of clustering was suggested and applied.
- }}Bansiya, J. and Davis, C. G. 2002. A Hierarchical Model for Object-Oriented Design Quality Assessment. IEEE Trans. on Software Engineering, 28, 1, (January 2002). 4--17. DOI= http://doi.ieeecomputersociety.org/10.1109/32.979986 Google ScholarDigital Library
- }}Bell, R., Ostrand T., Weyuker E. 2006. Looking for Bugs in all the right places. In Proceeding of the 2006 International Symposium on Software Testing and Analysis (Portland, USA, July 17--20, 2006). ISSTA 2006. ACM Press New Your, NY, 61--72. DOI= http://doi.acm.org/10.1145/1146238.1146246 Google ScholarDigital Library
- }}Catal, C., Diri, B., Ozumut, B. 2007. An Artificial Immune System Approach for Fault Prediction in Object-Oriented Software. In Proceedings of the 2nd International Conference on Dependability of Computer Systems (Szklarska Poręba, Poland, 14--16 June, 2007). DepCoS-RELCOMEX 2007. IEEE. 238--245. DOI= http://doi.ieeecomputersociety.org/10.1109/DEPCOS-RELCOMEX.2007.8 Google ScholarDigital Library
- }}Chidamber, S. R., and Kemerer, C. F. A metrics suite for object oriented design. IEEE Transaction on Software Engineering, 20, 6, (June 1994). 476--493. DOI= http://doi.ieeecomputersociety.org/10.1109/32.295895 Google ScholarDigital Library
- }}Fenton, E., Neil, M. 1999. A Critique of Software Defect Prediction Models. IEEE Transaction on Software Engineering, 25, 5, (September 1999). 675--689. DOI= http://dx.doi.org/10.1109/32.815326 Google ScholarDigital Library
- }}Henderson-Sellers, B., Object-Oriented Metrics, measures of Complexity. Prentice Hall, 1996. Google ScholarDigital Library
- }}Hand, D. J. 1998. Discriminant Analysis, Linear. In Encyclopedia of Biostatistics Volume 2 (P. Armitage and T. Colton, Eds.). Chichester: Wiley.Google Scholar
- }}Jureczko, M., Spinellis D. 2010. Using Object-Oriented Design Metrics to Predict Software Defects. In Models and Methods of System Dependability. Oficyna Wydawnicza Politechniki Wrocławskiej. 69--81.Google Scholar
- }}Jureczko, M. 2007. Use of software metrics for finding weak points of object oriented projects. Proceeding of Metody i narzęedzia wytwarzania oprogramowania (Szklarska Poręba, Poland, 14--16 May, 2007). 133--144.Google Scholar
- }}Jureczko, M., 2008. Ocena jakości obiektowo zorientowanego projektu programistycznego na podstawie metryk oprogramowania. In Inżynieria oprogramowania -- metody wytwarzania i wybrane zastosowania. PWN. 364--377.Google Scholar
- }}Jureczko, M., Madeyski, L. 2010. Predykcja defektów na podstawie metryk oprogramowania -- identyfikacja klas projektów. Submitted on KKIO 2010.Google Scholar
- }}Koru, A. G., Liu, H. 2005. Building Defect Prediction Models in Practice. IEEE Software 22, 6, (December 2005). 23--29. DOI= http://dx.doi.org/10.1109/MS.2005.149 Google ScholarDigital Library
- }}Madeyski, L., Test-Driven Development: An Empirical Evaluation of Agile Practice. Springer, 2010. DOI= http://dx.doi.org/10.1007/978-3-642-04288-1 Google ScholarDigital Library
- }}McCabe, T. J. 1976. A complexity measure. IEEE Trans. on Software Engineering, 2, 4, (1976). 308--320. Google ScholarDigital Library
- }}Martin, R. 1994. OO Design Quality Metrics - An Analysis of Dependencies. Proceeding of Workshop Pragmatic and Theoretical Directions in Object-Oriented Software Metrics, OOPSLA'94.Google Scholar
- }}Nagappan N., Ball T., Zeller A. 2006. Mining Metrics to Predict Component Failures. In Proceedings of the 28th International Conference on Software Engineering. (Shanghai, China, May 20--28, 2006). ICSE'06. ACM Press New Your, NY, 452--461. DOI= http://doi.acm.org/10.1145/1134285.1134349 Google ScholarDigital Library
- }}Olague, H., Etzkorn, L., Gholston S., Quattlebaum S. 2007. Empirical Validation of Three Software Metrics Suites to Predict Fault-Proneness of Object-Oriented Classes Developed Using Highly Iterative or Agile Software Development Processes. IEEE Transaction on Software Engineering 33, 6, (June 2007). 402--419. DOI= http://doi.ieeecomputersociety.org/10.1109/TSE.2007.1015 Google ScholarDigital Library
- }}Ostrand, T., Weyuker, E., Bell, R. 2005. Predicting the Location and Number of Faults in Large Software Systems. IEEE Trans. on Software Engineering 31, 4, (April 2005). 340--355. DOI= http://dx.doi.org/10.1109/TSE.2005.49 Google ScholarDigital Library
- }}Purao, S. and Vaishnavi V. K. 2003. Product metrics for object-oriented systems. ACM Computing Surveys 35, 2 (June 2003). 191--221. DOI=http://doi.acm.org/10.1145/857076.857090 Google ScholarDigital Library
- }}Tang, M-H., Kao, M-H., and Chen, M-H. 1999. An Empirical Study on Object-Oriented Metrics. In Proceedings of the Sixth International Software Metrics Symposium (Boca Raton, USA, 4--6 November, 1999). 242--249. DOI= http://dx.doi.org/10.1109/METRIC.1999.809718 Google ScholarDigital Library
- }}Turhan, B., Menzies, T., Bener, A., Distefano, J. 2009. On the Relative Value of Cross-Company and Within-Company Data for Defect Prediction. Empirical Software Engineering 14, 5, (October 2009). 540--578. DOI=http://dx.doi.org/10.1007/s10664-008-9103-7 Google ScholarDigital Library
- }}Watanabe, S., Kaiya, H., Kaijiri K. 2008. Adapting a Fault Prediction Model to Allow Inter Language Reuse. In Proceedings of the 4th International Workshop on Predictive Models in Software Engineering (Leipzig, Germany, May 12--13, 2008). PROMISE'08. Google ScholarDigital Library
- }}Wahyudin, D., Ramler, R. and Biffl S. 2008. A framework for Defect Prediction in Specific Software Project Contexts. In Proceedings of the 3rd IFIP Central and East European Conference on Software Engineering Techniques (Brno, Czech Republic, October 13--15, 2008). CEE-SET 2008. Google ScholarDigital Library
- }}Weyuker, E., Ostrand T., Bell R. 2008. Adapting a Fault Prediction Model to Allow Widespread Usage. In Proceedings of the the International Workshop on Predictive Models in Software Engineering (Leipzig, Germany, May 12--13, 2008). PROMISE'08.Google Scholar
- }}Weyuker, E., Ostrand T., Bell R. 2008. Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models. Empirical Software Engineering, 13, 5 (October 2008). 539--559. DOI= http://dx.doi.org/10.1007/s10664-008-9082-8 Google ScholarDigital Library
- }}Zimmermann, T., Nagappan, N., Gall, H., Giger, E., Murphy, B. 2009. Cross-project Defect Prediction. In Proceedings of the 7th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (Amsterdam, The Netherlands, August 24--28 2009). ESEC/FSE 2009. 91--100. Google ScholarDigital Library
Index Terms
- Towards identifying software project clusters with regard to defect prediction
Recommendations
Cross-project defect prediction: a large scale experiment on data vs. domain vs. process
ESEC/FSE '09: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineeringPrediction of software defects works well within projects as long as there is a sufficient amount of data available to train any models. However, this is rarely the case for new software projects and for many companies. So far, only a few have studies ...
How Far We Have Progressed in the Journey? An Examination of Cross-Project Defect Prediction
Background. Recent years have seen an increasing interest in cross-project defect prediction (CPDP), which aims to apply defect prediction models built on source projects to a target project. Currently, a variety of (complex) CPDP models have been ...
A novel defect prediction method for web pages using k-means++
Presents a novel defect clustering method.Shed new light to defect prediction methods.Depicts the prominence of k-means++ for software testing.Unveils the density of error rates of web elements. With the increase of the web software complexity, defect ...
Comments