Abstract
Software defect prediction is the process of identification of defects early in the life cycle so as to optimize the testing resources and reduce maintenance efforts. Defect prediction works well if sufficient amount of data is available to train the prediction model. However, not always this is the case. For example, when the software is the first release or the company has not maintained significant data. In such cases, cross project defect prediction may identify the defective classes. In this work, we have studied the feasibility of cross project defect prediction and empirically validated the same. We conducted our experiments on 12 open source datasets. The prediction model is built using 12 software metrics. After studying the various train test combinations, we found that cross project defect prediction was feasible in 35 out of 132 cases. The success of prediction is determined via precision, recall and AUC of the prediction model. We have also analyzed 14 descriptive characteristics to construct the decision tree. The decision tree learnt from this data has 15 rules which describe the feasibility of successful cross project defect prediction.
Similar content being viewed by others
References
Zimmermann T, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction
Malhotra R, Agrawal A (2014) CMS tool. ACM SIGSOFT Softw. Eng. Notes 39(1):1–5
Radjenović D, Heričko M, Torkar R, Živkovič A (2013) Software fault prediction metrics: a systematic literature review. Inf Softw Technol 55(8):1397–1418
Gray R, Macdonell SG (1997) A comparison of techniques for developing predictive models of software metrics. Inf Softw Technol 5849(96):6–7
Mishra B, Shukla KK (2011) Impact of attribute selection on defect proneness prediction in OO software. In: 2011 2nd Int. Conf. Comput. Commun. Technol., pp 367–372
Chidamber Shyam R, Kemerer Chris F (1994) A Metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493
He Z, Shu F, Yang Y, Li M, Wang Q (2011) An investigation on the feasibility of cross-project defect prediction. Autom Softw Eng 19(2):167–199
Ma Y, Luo G, Zeng X, Chen A (2012) Transfer learning for cross-company software defect prediction. Inf Softw Technol 54(3):248–256
Turhan B, Menzies T, Bener AB, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng 14(5):540–578
Canfora G, De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S (2013) Multi-objective cross-project defect prediction. In: 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation, Luembourg, pp 252–261
Steffen H (2013) Training data selection for cross-project defect prediction.In: 9th International Conference on Predictive Models in Software Engineering, ACM, New York, USA, p 10
Ryu D, Choi O, Baik J (2014) Improving prediction robustness of VAB-SVM for cross-project defect prediction. In: IEEE 17th International Conference on Computational Science and Engineering, Chengdu, pp 994–999
Panichella R, Oliveto R, De Lucia A (2010) Cross-project defect prediction models: L’Union fait la force. Software Evolution Week—IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE), Antwerp, pp 164–173
Amasaki S, Kawata K, Yokogawa T (2015) Improving cross-project defect prediction methods with data simplification. In: 41st Euromicro Conference on Software Engineering and Advanced Applications, Funchal, pp 96–103
Herbold S (2015) CrossPare: a tool for benchmarking cross-project defect predictions. In: 30th IEEE/ACM International Conference on Automated Software Engineering Workshop (ASEW), Lincoln, NE, pp 90–96
Satin RFP, Wiese IS, Ré R (2015) An exploratory study about the cross-project defect prediction: impact of using different classification algorithms and a measure of performance in building predictive models. In: Latin American Computing Conference (CLEI), Arequipa, pp 1–12
Zhang Y, Lo D, Xia X, Sun J (2015) An empirical study of classifier combination for cross-project defect prediction. In: IEEE 39th Annual Computer Software and Applications Conference, Taichung, pp 264–269
Peters F, Menzies T, Layman L (2015) LACE2: better privacy-preserving data sharing for cross project defect prediction. IEEE/ACM 37th IEEE International Conference on Software Engineering, Florence, pp 801–811
Xia X, Lo D, Pan SJ, Nagappan N, Wang X (2016) HYDRA: massively compositional model for cross-project defect prediction. IEEE Trans Softw Eng 42(10):977–998
Ryu D, Baik J (2016) Effective multi-objective naïve Bayes learning for cross-project defect prediction. Appl Soft Comput 49:1062–1077
Zhang F, Zheng Q, Zou Y, Hassan AE (2016) Cross-project defect prediction using a connectivity-based unsupervised classifier. In: IEEE/ACM 38th International Conference on Software Engineering (ICSE), Austin, TX, pp 309–320
Hosseini S, Turhan B, Mantyla M (2016) Search based training data selection for cross project defect prediction. In: The 12th International Conference on Predictive Models and Data Analytics in Software Engineering, ACM, New York, USA, p 10
Zhang F, Keivanloo I, Zou Y (2017) Data transformation in cross-project defect prediction. Empir Softw Eng 22(6):3186–3218
Fei W et al. (2017) Cross-project and within-project semi-supervised software defect prediction problems study using a unified solution. In: IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), Buenos Aires, pp 195–197
Poon WN, Bennin KE, Huang J, Phannachitta P, Keung JW (2017) Cross-project defect prediction using a credibility theory based naive Bayes classifier. In: IEEE International Conference on Software Quality, Reliability and Security (QRS), Prague, pp 434–441
Huang S, Wu Y, Ji H, Bai C (2017) A three-stage defect prediction model for cross-project defect prediction. In: International conference on dependable systems and their applications (DSA), Beijing, pp 169–169
Jing XY, Wu F, Dong X, Xu B (2017) An improved SDA based defect prediction framework for both within-project and cross-project class-imbalance problems. IEEE Trans Softw Eng 43(4):321–339
Goel L, Damodaran D, Khatri SK, Sharma M (2017) A literature review on cross project defect prediction. In: 4th IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics (UPCON), Mathura, pp 680–685
http://amakihi.sourceforge.net/. Accessed 10 Aug 2017
http://sourceforge.net/projects/amberarcher/. Accessed 10 Aug 2017
http://abbot.sourceforge.net/doc/overview.shtml. Accessed 10 Aug 2017
http://sourceforge.net/projects/startec-apollo. Accessed 10 Aug 2017
http://sourceforge.net/projects/avisync/. Accessed 10 Aug 2017
http://sourceforge.net/projects/jfreechart/. Accessed 10 Aug 2017
http://sourceforge.net/projects/jgap/. Accessed 10 Aug 2017
http://sourceforge.net/projects/jtreeview/. Accessed 10 Aug 2017
http://sourceforge.net/projects/barcode4j/. Accessed 10 Aug 2017
http://sourceforge.net/projects/jt400/. Accessed 10 Aug 2017
http://sourceforge.net/projects/jung/. Accessed 10 Aug 2017
http://sourceforge.net/projects/geotag/. Accessed 10 Aug 2017
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Agrawal, A., Malhotra, R. Cross project defect prediction for open source software. Int. j. inf. tecnol. 14, 587–601 (2022). https://doi.org/10.1007/s41870-019-00299-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41870-019-00299-6