research-article

Towards identifying software project clusters with regard to defect prediction

Authors:
Marian Jureczko

Wrocław University of Technology, Wrocław - Poland

Wrocław University of Technology, Wrocław - Poland
View Profile

,
Lech Madeyski

Wrocław University of Technology, Wrocław - Poland

Wrocław University of Technology, Wrocław - Poland
View Profile

PROMISE '10: Proceedings of the 6th International Conference on Predictive Models in Software EngineeringSeptember 2010Article No.: 9Pages 1–10https://doi.org/10.1145/1868328.1868342

Published:12 September 2010Publication History

PROMISE '10: Proceedings of the 6th International Conference on Predictive Models in Software Engineering

Pages 1–10

ABSTRACT

Background: This paper describes an analysis that was conducted on newly collected repository with 92 versions of 38 proprietary, open-source and academic projects. A preliminary study perfomed before showed the need for a further in-depth analysis in order to identify project clusters.

Aims: The goal of this research is to perform clustering on software projects in order to identify groups of software projects with similar characteristic from the defect prediction point of view. One defect prediction model should work well for all projects that belong to such group. The existence of those groups was investigated with statistical tests and by comparing the mean value of prediction efficiency.

Method: Hierarchical and k-means clustering, as well as Kohonen's neural network was used to find groups of similar projects. The obtained clusters were investigated with the discriminant analysis. For each of the identified group a statistical analysis has been conducted in order to distinguish whether this group really exists. Two defect prediction models were created for each of the identified groups. The first one was based on the projects that belong to a given group, and the second one - on all the projects. Then, both models were applied to all versions of projects from the investigated group. If the predictions from the model based on projects that belong to the identified group are significantly better than the all-projects model (the mean values were compared and statistical tests were used), we conclude that the group really exists.

Results: Six different clusters were identified and the existence of two of them was statistically proven: 1) cluster proprietary B -- T=19, p=0.035, r=0.40; 2) cluster proprietary/open - t(17)=3.18, p=0.05, r=0.59. The obtained effect sizes (r) represent large effects according to Cohen's benchmark, which is a substantial finding.

Conclusions: The two identified clusters were described and compared with results obtained by other researchers. The results of this work makes next step towards defining formal methods of reuse defect prediction models by identifying groups of projects within which the same defect prediction model may be used. Furthermore, a method of clustering was suggested and applied.

References

}}Bansiya, J. and Davis, C. G. 2002. A Hierarchical Model for Object-Oriented Design Quality Assessment. IEEE Trans. on Software Engineering, 28, 1, (January 2002). 4--17. DOI= http://doi.ieeecomputersociety.org/10.1109/32.979986 Google ScholarDigital Library
}}Bell, R., Ostrand T., Weyuker E. 2006. Looking for Bugs in all the right places. In Proceeding of the 2006 International Symposium on Software Testing and Analysis (Portland, USA, July 17--20, 2006). ISSTA 2006. ACM Press New Your, NY, 61--72. DOI= http://doi.acm.org/10.1145/1146238.1146246 Google ScholarDigital Library
}}Catal, C., Diri, B., Ozumut, B. 2007. An Artificial Immune System Approach for Fault Prediction in Object-Oriented Software. In Proceedings of the 2nd International Conference on Dependability of Computer Systems (Szklarska Poręba, Poland, 14--16 June, 2007). DepCoS-RELCOMEX 2007. IEEE. 238--245. DOI= http://doi.ieeecomputersociety.org/10.1109/DEPCOS-RELCOMEX.2007.8 Google ScholarDigital Library
}}Chidamber, S. R., and Kemerer, C. F. A metrics suite for object oriented design. IEEE Transaction on Software Engineering, 20, 6, (June 1994). 476--493. DOI= http://doi.ieeecomputersociety.org/10.1109/32.295895 Google ScholarDigital Library
}}Fenton, E., Neil, M. 1999. A Critique of Software Defect Prediction Models. IEEE Transaction on Software Engineering, 25, 5, (September 1999). 675--689. DOI= http://dx.doi.org/10.1109/32.815326 Google ScholarDigital Library
}}Henderson-Sellers, B., Object-Oriented Metrics, measures of Complexity. Prentice Hall, 1996. Google ScholarDigital Library
}}Hand, D. J. 1998. Discriminant Analysis, Linear. In Encyclopedia of Biostatistics Volume 2 (P. Armitage and T. Colton, Eds.). Chichester: Wiley.Google Scholar
}}Jureczko, M., Spinellis D. 2010. Using Object-Oriented Design Metrics to Predict Software Defects. In Models and Methods of System Dependability. Oficyna Wydawnicza Politechniki Wrocławskiej. 69--81.Google Scholar
}}Jureczko, M. 2007. Use of software metrics for finding weak points of object oriented projects. Proceeding of Metody i narzęedzia wytwarzania oprogramowania (Szklarska Poręba, Poland, 14--16 May, 2007). 133--144.Google Scholar
}}Jureczko, M., 2008. Ocena jakości obiektowo zorientowanego projektu programistycznego na podstawie metryk oprogramowania. In Inżynieria oprogramowania -- metody wytwarzania i wybrane zastosowania. PWN. 364--377.Google Scholar
}}Jureczko, M., Madeyski, L. 2010. Predykcja defektów na podstawie metryk oprogramowania -- identyfikacja klas projektów. Submitted on KKIO 2010.Google Scholar
}}Koru, A. G., Liu, H. 2005. Building Defect Prediction Models in Practice. IEEE Software 22, 6, (December 2005). 23--29. DOI= http://dx.doi.org/10.1109/MS.2005.149 Google ScholarDigital Library
}}Madeyski, L., Test-Driven Development: An Empirical Evaluation of Agile Practice. Springer, 2010. DOI= http://dx.doi.org/10.1007/978-3-642-04288-1 Google ScholarDigital Library
}}McCabe, T. J. 1976. A complexity measure. IEEE Trans. on Software Engineering, 2, 4, (1976). 308--320. Google ScholarDigital Library
}}Martin, R. 1994. OO Design Quality Metrics - An Analysis of Dependencies. Proceeding of Workshop Pragmatic and Theoretical Directions in Object-Oriented Software Metrics, OOPSLA'94.Google Scholar
}}Nagappan N., Ball T., Zeller A. 2006. Mining Metrics to Predict Component Failures. In Proceedings of the 28th International Conference on Software Engineering. (Shanghai, China, May 20--28, 2006). ICSE'06. ACM Press New Your, NY, 452--461. DOI= http://doi.acm.org/10.1145/1134285.1134349 Google ScholarDigital Library
}}Olague, H., Etzkorn, L., Gholston S., Quattlebaum S. 2007. Empirical Validation of Three Software Metrics Suites to Predict Fault-Proneness of Object-Oriented Classes Developed Using Highly Iterative or Agile Software Development Processes. IEEE Transaction on Software Engineering 33, 6, (June 2007). 402--419. DOI= http://doi.ieeecomputersociety.org/10.1109/TSE.2007.1015 Google ScholarDigital Library
}}Ostrand, T., Weyuker, E., Bell, R. 2005. Predicting the Location and Number of Faults in Large Software Systems. IEEE Trans. on Software Engineering 31, 4, (April 2005). 340--355. DOI= http://dx.doi.org/10.1109/TSE.2005.49 Google ScholarDigital Library
}}Purao, S. and Vaishnavi V. K. 2003. Product metrics for object-oriented systems. ACM Computing Surveys 35, 2 (June 2003). 191--221. DOI=http://doi.acm.org/10.1145/857076.857090 Google ScholarDigital Library
}}Tang, M-H., Kao, M-H., and Chen, M-H. 1999. An Empirical Study on Object-Oriented Metrics. In Proceedings of the Sixth International Software Metrics Symposium (Boca Raton, USA, 4--6 November, 1999). 242--249. DOI= http://dx.doi.org/10.1109/METRIC.1999.809718 Google ScholarDigital Library
}}Turhan, B., Menzies, T., Bener, A., Distefano, J. 2009. On the Relative Value of Cross-Company and Within-Company Data for Defect Prediction. Empirical Software Engineering 14, 5, (October 2009). 540--578. DOI=http://dx.doi.org/10.1007/s10664-008-9103-7 Google ScholarDigital Library
}}Watanabe, S., Kaiya, H., Kaijiri K. 2008. Adapting a Fault Prediction Model to Allow Inter Language Reuse. In Proceedings of the 4th International Workshop on Predictive Models in Software Engineering (Leipzig, Germany, May 12--13, 2008). PROMISE'08. Google ScholarDigital Library
}}Wahyudin, D., Ramler, R. and Biffl S. 2008. A framework for Defect Prediction in Specific Software Project Contexts. In Proceedings of the 3rd IFIP Central and East European Conference on Software Engineering Techniques (Brno, Czech Republic, October 13--15, 2008). CEE-SET 2008. Google ScholarDigital Library
}}Weyuker, E., Ostrand T., Bell R. 2008. Adapting a Fault Prediction Model to Allow Widespread Usage. In Proceedings of the the International Workshop on Predictive Models in Software Engineering (Leipzig, Germany, May 12--13, 2008). PROMISE'08.Google Scholar
}}Weyuker, E., Ostrand T., Bell R. 2008. Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models. Empirical Software Engineering, 13, 5 (October 2008). 539--559. DOI= http://dx.doi.org/10.1007/s10664-008-9082-8 Google ScholarDigital Library
}}Zimmermann, T., Nagappan, N., Gall, H., Giger, E., Murphy, B. 2009. Cross-project Defect Prediction. In Proceedings of the 7th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (Amsterdam, The Netherlands, August 24--28 2009). ESEC/FSE 2009. 91--100. Google ScholarDigital Library

Index Terms

Towards identifying software project clusters with regard to defect prediction
1. Software and its engineering
  1. Software creation and management
    1. Software development techniques
      1. Reusability
        Software product lines
    2. Software verification and validation
      1. Empirical software validation
  2. Software organization and properties
    1. Software functional properties
      1. Formal methods

Recommendations

Cross-project defect prediction: a large scale experiment on data vs. domain vs. process
ESEC/FSE '09: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering

Prediction of software defects works well within projects as long as there is a sufficient amount of data available to train any models. However, this is rarely the case for new software projects and for many companies. So far, only a few have studies ...
Read More
How Far We Have Progressed in the Journey? An Examination of Cross-Project Defect Prediction

Background. Recent years have seen an increasing interest in cross-project defect prediction (CPDP), which aims to apply defect prediction models built on source projects to a target project. Currently, a variety of (complex) CPDP models have been ...
Read More
A novel defect prediction method for web pages using k-means++

Presents a novel defect clustering method.Shed new light to defect prediction methods.Depicts the prominence of k-means++ for software testing.Unveils the density of error rates of web elements. With the increase of the web software complexity, defect ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PROMISE '10: Proceedings of the 6th International Conference on Predictive Models in Software Engineering
September 2010
195 pages
ISBN:9781450304047
DOI:10.1145/1868328
General Chair:
Tim Menzies
West Virginia University
,
Program Chair:
Gunes Koru
University of Maryland Baltimore County
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 September 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clustering
defect prediction
design metrics
size metrics
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate64of125submissions,51%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 355
  Total Citations
  View Citations
- 2,082
  Total Downloads
- Downloads (Last 12 months)94
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Towards identifying software project clusters with regard to defect prediction

PROMISE '10: Proceedings of the 6th International Conference on Predictive Models in Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cross-project defect prediction: a large scale experiment on data vs. domain vs. process

How Far We Have Progressed in the Journey? An Examination of Cross-Project Defect Prediction

A novel defect prediction method for web pages using k-means++