Skip to main content
Erschienen in: Soft Computing 10/2018

08.03.2018 | Methodologies and Application

Cross-company defect prediction via semi-supervised clustering-based data filtering and MSTrA-based transfer learning

verfasst von: Xiao Yu, Man Wu, Yiheng Jian, Kwabena Ebo Bennin, Mandi Fu, Chuanxiang Ma

Erschienen in: Soft Computing | Ausgabe 10/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Cross-company defect prediction (CCDP) is a practical way that trains a prediction model by exploiting one or multiple projects of a source company and then applies the model to a target company. Unfortunately, larger irrelevant cross-company (CC) data usually make it difficult to build a prediction model with high performance. On the other hand, brute force leveraging of CC data poorly related to within-company data may decrease the prediction model performance. To address such issues, we aim to provide an effective solution for CCDP. First, we propose a novel semi-supervised clustering-based data filtering method (i.e., SSDBSCAN filter) to filter out irrelevant CC data. Second, based on the filtered CC data, we for the first time introduce multi-source TrAdaBoost algorithm, an effective transfer learning method, into CCDP to import knowledge not from one but from multiple sources to avoid negative transfer. Experiments on 15 public datasets indicate that: (1) our proposed SSDBSCAN filter achieves better overall performance than compared data filtering methods; (2) our proposed CCDP approach achieves the best overall performance among all tested CCDP approaches; and (3) our proposed CCDP approach performs significantly better than with-company defect prediction models.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Arar Ömer Faruk, Ayan Kürşat (2015) Software defect prediction using cost-sensitive neural network. Appl Soft Comput 33:263–277CrossRef Arar Ömer Faruk, Ayan Kürşat (2015) Software defect prediction using cost-sensitive neural network. Appl Soft Comput 33:263–277CrossRef
Zurück zum Zitat Bennin K, Keung J, Monden A, et al (2017) The significant effects of data sampling approaches on software defect prioritization and classification. In:11th International symposium on empirical software engineering and measurement, ESEM 2017 Bennin K, Keung J, Monden A, et al (2017) The significant effects of data sampling approaches on software defect prioritization and classification. In:11th International symposium on empirical software engineering and measurement, ESEM 2017
Zurück zum Zitat Briand LC, Melo WL, Wust J (2002) Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans Softw Eng 28(7):706–720CrossRef Briand LC, Melo WL, Wust J (2002) Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans Softw Eng 28(7):706–720CrossRef
Zurück zum Zitat Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357MATH Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357MATH
Zurück zum Zitat Chen L, Fang B, Shang Z et al (2015) Negative samples reduction in cross-company software defects prediction. Inf Softw Technol 62:67–77CrossRef Chen L, Fang B, Shang Z et al (2015) Negative samples reduction in cross-company software defects prediction. Inf Softw Technol 62:67–77CrossRef
Zurück zum Zitat Dai W et al (2007) Boosting for transfer learning. In: 24th International conference on Machine learning, pp 193–200 Dai W et al (2007) Boosting for transfer learning. In: 24th International conference on Machine learning, pp 193–200
Zurück zum Zitat Dhanajayan RCG, Pillai SA (2016) SLMBC: spiral life cycle model-based Bayesian classification technique for efficient software fault prediction and classification, Soft Computing, 1-13 Dhanajayan RCG, Pillai SA (2016) SLMBC: spiral life cycle model-based Bayesian classification technique for efficient software fault prediction and classification, Soft Computing, 1-13
Zurück zum Zitat Elish KO, Elish MO (2008) Predicting defect-prone software modules using support vector machines. Softw J Syst Softw 81(5):649–660CrossRef Elish KO, Elish MO (2008) Predicting defect-prone software modules using support vector machines. Softw J Syst Softw 81(5):649–660CrossRef
Zurück zum Zitat Erturk Ezgi, Sezer Ebru Akcapinar (2016) Iterative software fault prediction with a hybrid approach. Appl Soft Comput 49:1020–1033CrossRef Erturk Ezgi, Sezer Ebru Akcapinar (2016) Iterative software fault prediction with a hybrid approach. Appl Soft Comput 49:1020–1033CrossRef
Zurück zum Zitat Field AP (2001) Discovering statistics using SPSS for windows: advanced techniques for beginners, pp 551–552 Field AP (2001) Discovering statistics using SPSS for windows: advanced techniques for beginners, pp 551–552
Zurück zum Zitat Fukunaga K, Narendra PM (1975) A branch and bound algorithm for computing k-nearest neighbors. IEEE Trans Comput 100(7):750–753CrossRefMATH Fukunaga K, Narendra PM (1975) A branch and bound algorithm for computing k-nearest neighbors. IEEE Trans Comput 100(7):750–753CrossRefMATH
Zurück zum Zitat Gray D, Bowes D, Davey N, et al (2009) Using the support vector machine as a classification method for software defect prediction with static code metrics. In: International conference on engineering applications of neural networks. Springer, Berlin, pp 223–234 Gray D, Bowes D, Davey N, et al (2009) Using the support vector machine as a classification method for software defect prediction with static code metrics. In: International conference on engineering applications of neural networks. Springer, Berlin, pp 223–234
Zurück zum Zitat Hall T, Beecham S, Bowes D, Gray D, Counsell S (2012) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276–1304CrossRef Hall T, Beecham S, Bowes D, Gray D, Counsell S (2012) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276–1304CrossRef
Zurück zum Zitat Hosmer DW, Lemeshow S (2000) Introduction to the logistic regression model. Appl Logist Regres 1–30 Hosmer DW, Lemeshow S (2000) Introduction to the logistic regression model. Appl Logist Regres 1–30
Zurück zum Zitat Jain K (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31(8):651–666CrossRef Jain K (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31(8):651–666CrossRef
Zurück zum Zitat Jing X et al (2015) Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning. In Proceedings of the 10th joint meeting on foundations of software engineering, pp 496–507 Jing X et al (2015) Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning. In Proceedings of the 10th joint meeting on foundations of software engineering, pp 496–507
Zurück zum Zitat Jing XY, Ying S, Zhang ZW, Wu SS, Liu J (2014) Dictionary learning based software defect prediction. In: Proceedings of the 36th International Conference on Software Engineering, pp 414–423 Jing XY, Ying S, Zhang ZW, Wu SS, Liu J (2014) Dictionary learning based software defect prediction. In: Proceedings of the 36th International Conference on Software Engineering, pp 414–423
Zurück zum Zitat Kampenes V By et al (2007) A systematic review of effect size in software engineering experiments. Inf Softw Technol 49(11):1073–1086CrossRef Kampenes V By et al (2007) A systematic review of effect size in software engineering experiments. Inf Softw Technol 49(11):1073–1086CrossRef
Zurück zum Zitat Kawata K, Amasaki S, Yokogawa T (2016) Improving relevancy filter methods for cross-project defect prediction, applied computing & information technology, pp 1–12 Kawata K, Amasaki S, Yokogawa T (2016) Improving relevancy filter methods for cross-project defect prediction, applied computing & information technology, pp 1–12
Zurück zum Zitat Laradji IH, Alshayeb M, Ghouti L (2015) Software defect prediction using ensemble learning on selected features. Inf. Softw. Technol. 58:388–402CrossRef Laradji IH, Alshayeb M, Ghouti L (2015) Software defect prediction using ensemble learning on selected features. Inf. Softw. Technol. 58:388–402CrossRef
Zurück zum Zitat Lelis L, Sander J (2009) Semi-supervised density-based clustering. In: 9th IEEE international conference on data mining, pp 842–847 Lelis L, Sander J (2009) Semi-supervised density-based clustering. In: 9th IEEE international conference on data mining, pp 842–847
Zurück zum Zitat Lewis DD (1998) Naive (Bayes) at forty the independence assumption in information retrieval. In: European conference on machine learning, pp 4–15 Lewis DD (1998) Naive (Bayes) at forty the independence assumption in information retrieval. In: European conference on machine learning, pp 4–15
Zurück zum Zitat Ma Y, Luo G, Zeng X, Chen A (2012) Transfer learning for cross-company software defect prediction. Inf Softw Technol 54(3):248–256CrossRef Ma Y, Luo G, Zeng X, Chen A (2012) Transfer learning for cross-company software defect prediction. Inf Softw Technol 54(3):248–256CrossRef
Zurück zum Zitat Malhotra Ruchika (2015) A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput 27:504–518CrossRef Malhotra Ruchika (2015) A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput 27:504–518CrossRef
Zurück zum Zitat Mesquita PP Diego et al (2016) Classification with reject option for software defect prediction. Appl Soft Comput 49:1085–1093CrossRef Mesquita PP Diego et al (2016) Classification with reject option for software defect prediction. Appl Soft Comput 49:1085–1093CrossRef
Zurück zum Zitat Nam J, Pan SJ, Kim S (2013) Transfer defect learning. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 382–391 Nam J, Pan SJ, Kim S (2013) Transfer defect learning. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 382–391
Zurück zum Zitat Peng L, Yang B, Chen Y, Abraham A (2009) Data gravitation based classification. Inf Sci 179(6):809–819CrossRefMATH Peng L, Yang B, Chen Y, Abraham A (2009) Data gravitation based classification. Inf Sci 179(6):809–819CrossRefMATH
Zurück zum Zitat Peters F, Menzies T, Marcus A (2013) Better cross company defect prediction. In: Proceedings of the 10th international workshop on mining software repositories, pp 409–418 Peters F, Menzies T, Marcus A (2013) Better cross company defect prediction. In: Proceedings of the 10th international workshop on mining software repositories, pp 409–418
Zurück zum Zitat Ryu D, Jang JI, Baik J (2015) A hybrid instance selection using nearest-neighbor for cross-project defect prediction. J Comput Sci Technol 30(5):969–980CrossRef Ryu D, Jang JI, Baik J (2015) A hybrid instance selection using nearest-neighbor for cross-project defect prediction. J Comput Sci Technol 30(5):969–980CrossRef
Zurück zum Zitat Seliya N, Khoshgoftaar TM (2011) The use of decision trees for cost- sensitive classification: an empirical study in software quality prediction. Wiley Interdiscip Rev Data Min Knowl Discov 1(5):448–459CrossRef Seliya N, Khoshgoftaar TM (2011) The use of decision trees for cost- sensitive classification: an empirical study in software quality prediction. Wiley Interdiscip Rev Data Min Knowl Discov 1(5):448–459CrossRef
Zurück zum Zitat Shepperd M, Bowes D, Hall T (2014) Researcher bias The use of machine learning in software defect prediction. IEEE Trans Softw Eng 40(6):603–616CrossRef Shepperd M, Bowes D, Hall T (2014) Researcher bias The use of machine learning in software defect prediction. IEEE Trans Softw Eng 40(6):603–616CrossRef
Zurück zum Zitat Shukla S, Radhakrishnan T, Muthukumaran K, et al (2016) Multi-objective cross-version defect prediction, Soft Computing 1-22 Shukla S, Radhakrishnan T, Muthukumaran K, et al (2016) Multi-objective cross-version defect prediction, Soft Computing 1-22
Zurück zum Zitat Siers MJ, Islam MZ (2015) Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem. Inf. Syst. 51:62–71CrossRef Siers MJ, Islam MZ (2015) Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem. Inf. Syst. 51:62–71CrossRef
Zurück zum Zitat Song Q, Jia Z, Shepperd M et al (2011) A general software defect-proneness prediction framework. IEEE Trans Softw Eng 37(3):356–370CrossRef Song Q, Jia Z, Shepperd M et al (2011) A general software defect-proneness prediction framework. IEEE Trans Softw Eng 37(3):356–370CrossRef
Zurück zum Zitat Sun Z, Song Q, Zhu X (2012) Using coding-based ensemble learning to improve software defect prediction. IEEE Trans Syst Man Cybern Part C (Appl Rev) 42(6):1806–1817CrossRef Sun Z, Song Q, Zhu X (2012) Using coding-based ensemble learning to improve software defect prediction. IEEE Trans Syst Man Cybern Part C (Appl Rev) 42(6):1806–1817CrossRef
Zurück zum Zitat Turhan B, Menzies T, Bener AB, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng 14(5):540–578CrossRef Turhan B, Menzies T, Bener AB, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng 14(5):540–578CrossRef
Zurück zum Zitat Turhan B, Tosun Mısırlı A, Bener A (2013) Empirical evaluation of the effects of mixed project data on learning defect predictors. Inf Softw Technol 55(6):1101–1118CrossRef Turhan B, Tosun Mısırlı A, Bener A (2013) Empirical evaluation of the effects of mixed project data on learning defect predictors. Inf Softw Technol 55(6):1101–1118CrossRef
Zurück zum Zitat Vashisht V, Lal M, Sureshchandar GS et al (2015) A framework for software defect prediction using neural networks. J Softw Eng Appl 8(8):384CrossRef Vashisht V, Lal M, Sureshchandar GS et al (2015) A framework for software defect prediction using neural networks. J Softw Eng Appl 8(8):384CrossRef
Zurück zum Zitat Wang J, Shen B, Chen Y (2012) Compressed C4. 5 models for software defect prediction. In: 12th international conference on quality software, pp 13–16 Wang J, Shen B, Chen Y (2012) Compressed C4. 5 models for software defect prediction. In: 12th international conference on quality software, pp 13–16
Zurück zum Zitat Wilcoxon Frank (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83CrossRef Wilcoxon Frank (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83CrossRef
Zurück zum Zitat Yan Z, Chen X, Guo P (2010) Software defect prediction using fuzzy support vector regression. In: International Symposium on Neural Networks. Springer, Berlin Heidelberg, pp 17–24 Yan Z, Chen X, Guo P (2010) Software defect prediction using fuzzy support vector regression. In: International Symposium on Neural Networks. Springer, Berlin Heidelberg, pp 17–24
Zurück zum Zitat Yao Y, Doretto G (2010) Boosting for transfer learning with multiple sources. In: IEEE conference on computer vision and pattern recognition, pp 1855–1862 Yao Y, Doretto G (2010) Boosting for transfer learning with multiple sources. In: IEEE conference on computer vision and pattern recognition, pp 1855–1862
Zurück zum Zitat Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, pp 91–100 Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, pp 91–100
Metadaten
Titel
Cross-company defect prediction via semi-supervised clustering-based data filtering and MSTrA-based transfer learning
verfasst von
Xiao Yu
Man Wu
Yiheng Jian
Kwabena Ebo Bennin
Mandi Fu
Chuanxiang Ma
Publikationsdatum
08.03.2018
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 10/2018
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-018-3093-1

Weitere Artikel der Ausgabe 10/2018

Soft Computing 10/2018 Zur Ausgabe

Premium Partner