nach oben

Automated Software Engineering

Erschienen in:

22.03.2016

Label propagation based semi-supervised learning for software defect prediction

verfasst von: Zhi-Wu Zhang, Xiao-Yuan Jing, Tie-Jian Wang

Erschienen in: Automated Software Engineering | Ausgabe 1/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Software defect prediction can automatically predict defect-prone software modules for efficient software test in software engineering. When the previous defect labels of modules are limited, predicting the defect-prone modules becomes a challenging problem. In static software defect prediction, there exist the similarity among software modules, a software module can be approximated by a sparse representation of the other part of the software modules, and class-imbalance problem, the number of defect-free modules is much larger than that of defective ones. In this paper, we propose to use graph based semi-supervised learning technique to predict software defect. By using Laplacian score sampling strategy for the labeled defect-free modules, we construct a class-balance labeled training dataset firstly. And then, we use a nonnegative sparse algorithm to compute the nonnegative sparse weights of a relationship graph which serve as clustering indicators. Lastly, on the nonnegative sparse graph, we use a label propagation algorithm to iteratively predict the labels of unlabeled software modules. We thus propose a nonnegative sparse graph based label propagation approach for software defect classification and prediction, which uses not only few labeled data but also abundant unlabeled ones to improve the generalization capability. We vary the size of labeled software modules from 10 to 30 % of all the datasets in the widely used NASA projects. Experimental results show that the NSGLP outperforms several representative state-of-the-art semi-supervised software defect prediction methods, and it can fully exploit the characteristics of static code metrics and improve the generalization capability of the software defect prediction model.

Vorheriger Artikel MaramaAIC: tool support for consistency management and validation of requirements

Nächster Artikel Guest Editorial: Automation in Software Performance Engineering

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explorations Newsletter. 6(1), 20–29 (2004)CrossRef

Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7(11), 2399–2434 (2006)MathSciNetMATH

Catal, C., Diri, B.: A systematic review of software fault prediction studies. Expert Syst. Appl. 36(4), 7346–7354 (2009a)CrossRef

Catal, C., Diri, B.: Unlabelled extra data do not always mean extra performance for semi-supervised fault prediction. Expert Syst. 26(5), 458–471 (2009b)CrossRef

Catal, C.: A comparison of semi-supervised classification approaches for software defect prediction. J. Intell. Syst. 23(1), 75–82 (2014)

Chan, Y., Walmsley, R.P.: Learning and understanding the Kruskal-Wallis one-way analysis-of-variance-by-ranks test for differences among three or more independent groups. Phys. Ther. 77(12), 1755–1761 (1997)

Chapelle, O., Zien, A.: Semi-supervised classification by low density separation. In: Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, pp. 57–64 (2005)

Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artifici. Intell. Res. 16, 321–357 (2002)MATH

Culp, M., Michailidis, G.: Graph-based semisupervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 30(1), 174–179 (2008)CrossRef

Fenton, N., Ohlsson, N.: Quantitative analysis of faults and failures in a complex software system. IEEE Trans. Softw. Eng. 26(8), 797–814 (2000)CrossRef

Gao, K., Khoshgoftaar, T. M.: Software defect prediction for high-dimensional and class-imbalanced data. In: Proceedings of the 23rd International Conference on Software Engineering and Knowledge Engineering, pp. 89–94 (2011)

Gao, K., Khoshgoftaar, T.M., Wald, R.: The use of under- and oversampling within ensemble feature selection and classification for software quality prediction. Int. J. Reliab. Qual. Saf. Eng. 21(1), 145004 (2014)CrossRef

Goldman, S., Zhou, Y.: Enhancing supervised learning with unlabeled data. In: Proceedings of the 17th International Conference on Machine Learning, pp. 327–334 (2000)

Grandvalet, Y., Bengio, Y.: Semi-supervised learning by entropy minimization. In: Advances in neural information processing systems, pp. 529–536 (2004)

Gray, D., Bowes, D., Davey, N., Sun, Y., Christianson, B.: The misuse of the NASA metrics data program data sets for automated software defect prediction. In: Proceedings of 15th Annual Conference on Evaluation and Assessment in Software Engineering, pp. 96–103 (2011)

Hall, T., Beecham, S., Bowes, D., Gray, D., Counsell, S.: A systematic literature review on fault prediction performance in software engineering. IEEE Trans. Softw. Eng. 38(6), 1276–1304 (2012)CrossRef

He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Advances in Neural Information Processing Systems, pp. 507–514 (2005)

Jiang, Y., Li, M., Zhou, Z.H.: Software defect detection with ROCUS. J. Comput. Sci. Technol. 26(2), 328–342 (2011)CrossRef

Jing, X. Y., Ying, S., Zhang, Z. W., Wu, S. S., Liu, J.: Dictionary learning based software defect prediction. In: Proceedings of the 36th International Conference on Software Engineering, pp. 414-423 (2014a)

Jing, X. Y., Zhang, Z. W., Ying, S., Wang, F., Zhu, Y. P.: Software defect prediction based on collaborative representation classification. In: Companion Proceedings of the 36th International Conference on Software Engineering, pp. 632–633 (2014b)

Joachims, T.: Transductive inference for text classification using support vector machines. In: Proceedings of the 16th International Conference on Machine Learning, pp 200–209 (1999)

Khoshgoftaar, T. M., Gao, K., Seliya, N.: Attribute selection and imbalanced data: problems in software defect prediction. In: Proceedings of the 22nd IEEE International Conference on Tools with Artificial Intelligence, pp. 137–144 (2010)

Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the 14th International Conference on Machine Learning, pp 179–186 (1997)

Laradji, I.H., Alshayeb, M., Ghouti, L.: Software defect prediction using ensemble learning on selected features. Inf. Softw. Technol. 58, 388–402 (2015)CrossRef

Li, M., Zhang, H., Wu, R., Zhou, Z.H.: Sample-based software defect prediction with active and semi-supervised learning. Autom. Softw. Eng. 19(2), 201–230 (2012)CrossRef

Li, S., Fu, Y.: Low-rank coding with b-matching constraint for semi-supervised classification. In: Proceedings of the 23th International Joint Conference on Artificial Intelligence, pp. 1472–1478 (2013)

Lu, H., Cukic, B., Culp, M.: An iterative semi-supervised approach to software fault prediction. In: Proceedings of the 7th International Conference on Predictive Models in Software Engineering (Article 15) (2011)

Lu, H., Cukic, B., Culp, M.: Software defect prediction using semi-supervised learning with dimension reduction. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, pp. 314–317 (2012)

Lyu, M. R.: Software reliability engineering: a roadmap. In: 2007 Future of Software Engineering, pp. 153–170 (2007)

McCabe, T.J.: A complexity measure. IEEE Trans. Softw. Eng. 4, 308–320 (1976)CrossRefMATH

Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng. 33(1), 2–13 (2007)CrossRef

Miller, D. J., Uyar, H. S.: A mixture of experts classifier with learning based on both labelled and unlabelled data. In: Advances in neural information processing systems, pp. 571–577 (1997)

Nam, J., Pan, S. J., Kim, S.: Transfer defect learning. In: Proceedings of the 35th International Conference on Software Engineering, pp. 382–391 (2013)

Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2–3), 103–134 (2000)CrossRefMATH

Pelayo, L, Dick, S.: Applying novel resampling strategies to software defect prediction. In: Proceedings of the 2007 Annual Meeting of the North American Fuzzy Information Processing Society, pp. 69–72 (2007)

Seliya, N., Khoshgoftaar, T.M.: Software quality estimation with limited fault data: a semi-supervised learning perspective. Softw. Qual. J. 15(3), 327–344 (2007a)CrossRef

Seliya, N., Khoshgoftaar, T.M.: Software quality analysis of unlabeled program modules with semisupervised clustering. IEEE Trans. Syst. Man. Cyber. 37(2), 201–211 (2007b)CrossRef

Shahshahani, B.M., Landgrebe, D.: The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon. IEEE Trans. Geosci. Remote Sens. 32(5), 1087–1095 (1994)CrossRef

Shepperd, M., Song, Q., Sun, Z., Mair, C.: Data quality: some comments on the NASA software defect datasets. IEEE Trans. Softw. Eng. 39(9), 1208–1215 (2013)CrossRef

Sun, Z.B., Song, Q.B., Zhu, X.Y.: Using coding based ensemble learning to improve software defect prediction. IEEE Trans. Syst. Man Cyber. C 42(6), 1806–1817 (2012)CrossRef

Turhan, B., Menzies, T., Bener, A.: On the relative value of cross-company and within-company data for defect prediction. Empirical Softw. Eng. 14(5), 540–578 (2009)CrossRef

Wang, F., Zhang, C.: Label propagation through linear neighborhoods. IEEE Trans. Knowl. Data Eng. 20(1), 55–67 (2008)CrossRef

Wang, S., Yao, X.: Using class imbalance learning for software defect prediction. IEEE Trans. Reliab. 62(2), 434–443 (2013)CrossRef

Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust Face Recognition via Sparse Representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009)CrossRef

Xu, J., Man, H.: Dictionary learning based on laplacian score in sparse coding. In: Machine Learning and Data Mining in Pattern Recognition, pp.253–264 (2011)

Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. Adv. Neural Inf. Process. Syst. 16(16), 321–328 (2004)

Zhou, Z.-H., Li, M.: Tri-training: Exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17(11), 1529–1541 (2005)CrossRef

Zhou, Z.-H., Li, M.: Semi-supervised regression with co-training style algorithms. IEEE Trans. Knowl. Data Eng. 19(11), 1479–1493 (2007)CrossRef

Zhu, X.: Semi-supervised learning with graphs. PhD thesis, Carnegie Mellon University (2005)

Zhu, X., Ghahramani, Z.: Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon University (2002)

Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using gaussian fields and harmonic functions. In: Proceedings of the 20th International Conference on Machine Learning, pp. 912–919 (2003)

Titel: Label propagation based semi-supervised learning for software defect prediction
verfasst von: Zhi-Wu Zhang
Xiao-Yuan Jing
Tie-Jian Wang
Publikationsdatum: 22.03.2016
Verlag: Springer US
Erschienen in: Automated Software Engineering / Ausgabe 1/2017
Print ISSN: 0928-8910
Elektronische ISSN: 1573-7535
DOI: https://doi.org/10.1007/s10515-016-0194-x

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1/2017

MaramaAIC: tool support for consistency management and validation of requirements

Continuous validation of performance test workloads

Unit testing performance with Stochastic Performance Logic

A Petri net tool for software performance estimation based on upper throughput bounds

Guest Editorial: Automation in Software Performance Engineering

Automated QoS-oriented cloud resource optimization using containers

Premium Partner