nach oben

Software Quality Journal

Erschienen in:

01.09.2007

Software quality estimation with limited fault data: a semi-supervised learning perspective

verfasst von: Naeem Seliya, Taghi M. Khoshgoftaar

Erschienen in: Software Quality Journal | Ausgabe 3/2007

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We addresses the important problem of software quality analysis when there is limited software fault or fault-proneness data. A software quality model is typically trained using software measurement and fault data obtained from a previous release or similar project. Such an approach assumes that fault data is available for all the training modules. Various issues in software development may limit the availability of fault-proneness data for all the training modules. Consequently, the available labeled training dataset is such that the trained software quality model may not provide predictions. More specifically, the small set of modules with known fault-proneness labels is not sufficient for capturing the software quality trends of the project. We investigate semi-supervised learning with the Expectation Maximization (EM) algorithm for software quality estimation with limited fault-proneness data. The hypothesis is that knowledge stored in software attributes of the unlabeled program modules will aid in improving software quality estimation. Software data collected from a large NASA software project is used during the semi-supervised learning process. The software quality model is evaluated with multiple test datasets collected from other NASA software projects. Compared to software quality models trained only with the available set of labeled program modules, the EM-based semi-supervised learning scheme improves generalization performance of the software quality models.

Vorheriger Artikel Validating neural network-based online adaptive systems: a case study

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Blum, A., & Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. In P. Bartlett & Y. Mansour (Eds), Proceedings of 11th annual ACM conference on computational learning theory, Madison, WI, July 1998, pp. 92–100, ACM Press.

Brodley, C. E., & Friedl, M. A. (1999). Identifying mislabeled training data. Journal of Artificial Intelligence Research, 11, 131–167.CrossRef

Demirez, A., & Bennett, K. (2000). Optimization approaches to semisupervised learning. In M. Ferris, O. Mangasarian, & J. Pang (Eds), Applications and algorithms of complementarity. Boston, MA: Kluwer Academic Publishers.

Fenton, N. E., & Pfleeger, S. L. (1997). Software metrics: A rigorous and practical approach (2nd ed.). ITP, Boston, MA: PWS Publishing Company.

Fung, G., & Mangasarian, O. (2001). Semi-supervised support vector machines for unlabeled data classification. Optimization Methods and Software, 15, 29–44.CrossRef

Ghahramani, Z., & Jordan, M. I. (1994). Supervised learning from incomplete data via an EM approach. In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing systems (Vol. 6, pp. 120–127). Morgan Kaufmann: San Francisco, CA.

Gokhale, S. S., & Lyu, M. R. (1997). Regression tree modeling for the prediction of software quality. In H. Pham (Ed.), Proceedings of 3rd international conference on reliability and quality in design, Anaheim, CA, March 1997, pp. 31–36, International Society of Science and Applied Technologies.

Goldman, S., & Zhou, Y. (2000). Enhancing supervised learning with unlabeled data. In Proceedings of 17th international conference on machine learning, Stanford University, CA, June–July 2000, pp. 327–334, Morgan Kaufmann.

Gray, A. R., & MacDonell, S. G. (1999). Software metrics data analysis: Exploring the relative performance of some commonly used modeling techniques. Empirical Software Engineering Journal, 4, 297–316.CrossRef

Guo, L., Cukic, B., & Singh, H. (2003). Predicting fault prone modules by the dempster-shafer belief networks. In Proceedings of the 18th international conference on automated software engineering, Montreal, Quebec, Canada, October 2003, pp. 249–252, IEEE Computer Society.

Imam, K. E., Benlarbi, S., Goel, N., & Rai, S. N. (2001). Comparing case-based reasoning classifiers for predicting high-risk software componenets. Journal of Systems and Software, 55(3), 301–320.CrossRef

Khoshgoftaar, T. M., & Joshi, V. (2004). Noise elimination with ensemble-classifier filtering: A case-study in software quality engineering. In Proceedings of the 16th international conference on software engineering and knowledge engineering, Banff, Canada, June 2004, pp. 226–231.

Khoshgoftaar, T. M., Liu, Y., & Seliya, N. (2003). Genetic programming-based decision trees for software quality classification. In Proceedings of 15th international conference on tools with artificial intelligence, Sacramento, CA, USA, November 2003, pp. 374–383, IEEE Computer Society.

Khoshgoftaar, T. M., & Seliya, N. (2002). Tree-based software quality models for fault prediction. In Proceedings of 8th international software metrics symposium, Ottawa, Ontario, Canada, June 2002, pp. 203–214, IEEE Computer Society.

Khoshgoftaar, T. M., & Seliya, N. (2003). Analogy-based practical classification rules for software quality estimation. Empirical Software Engineering Journal, 8(4), 325–350.CrossRef

Khoshgoftaar, T. M., Yuan, X., & Allen, E. B. (2000). Balancing misclassification rates in classification tree models of software quality. Empirical Software Engineering Journal, 5, 313–330, Kluwer Academic Publishers.CrossRef

Khoshgoftaar, T. M., Zhong, S., & Joshi, V. (2005). Noise elimination with ensemble-classifier filtering for software quality estimation. Intelligent Data Analysis: An International Journal, 9(1), 3–27.CrossRef

Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). Hoboken, NJ: John Wiley and Sons.CrossRef

Lyu, M. (1996). Handbook of software reliability engineering. New York, NY: IEEE Computer Press, McGraw Hill.

McCallum, A. K., & Nigam K. (1998). Employing EM and pool-based active learning for text classification. In Proceedings of the 15th international conference on machine learning, Madison, WI, July 1998, pp. 350–358, Morgan Kaufmann.

Mitchell, T. (1999). The role of unlabeled data in supervised learning. In Proceedings of the 6th international colloquium on cognitive science, Donostia, San Sebastian, Spain, May 1999, Institute for Logic, Cognition, Language and Information.

Nigam K., & Ghani R. (2000). Analyzing the effectiveness and applicability of co-training. In Proceedings of 9th international conference on information and knowledge management, McLean, VA, November 2000, pp. 86–93, ACM Press.

Nigam, K., McCallum, A. K., Thrun, S., & Mitchell, T. (1998). Learning to classify text from labeled and unlabeled documents. In Proceedings of 15th conference of the American association for artificial intelligence, Madison, WI, July 1998, pp. 792–799, AAAI Press.

Ohlsson, M. C., & Runeson, P. (2002). Experience from replicating empirical studies on prediction models. In Proceedings of 8th international software metrics symposium, Ottawa, Ontario, Canada, June 2002, pp. 217–226, IEEE Computer Society.

Pizzi, N. J., Summers, R., & Pedrycz ,W. (2002). Software quality prediction using median-adjusted class labels. In Proceedings of international joint conference on neural networks, Honolulu, HI, May 2002, Vol. 3, pp. 2405–2409, IEEE Computer Society.

Schneidewind, N. F. (2001). Investigation of logistic regression as a discriminant of software quality. In Proceedings of 7th international software metrics symposium, London, UK, April 2001, pp. 328–337, IEEE Computer Society.

Schneidewind, N. F. (2002). Body of knowledge for software quality measurement. IEEE Computer, 35(2), 77–83.CrossRef

Seeger, M. (2001). Learning with labeled and unlabeled data. Technical report, Institute for Adaptive and Neural Computation, University of Edinburgh, Scotland, UK, February 2001.

Suarez, A., & Lutsko, J. F. (1999). Globally optimal fuzzy decision trees for classification and regression. Pattern Analysis and Machine Intelligence, 21(12), 1297–1311.CrossRef

Whitten, I. H., & Frank, E. (2000). Data mining: Practical machine learning tools and techniques with JAVA implementations. San Francisco, CA: Morgan Kaufmann.

Titel: Software quality estimation with limited fault data: a semi-supervised learning perspective
verfasst von: Naeem Seliya
Taghi M. Khoshgoftaar
Publikationsdatum: 01.09.2007
Verlag: Birkhäuser-Verlag
Erschienen in: Software Quality Journal / Ausgabe 3/2007
Print ISSN: 0963-9314
Elektronische ISSN: 1573-1367
DOI: https://doi.org/10.1007/s11219-007-9013-8

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 3/2007

Validating neural network-based online adaptive systems: a case study

In this issue

Supporting high interoperability of components by adopting an agent-based approach

Deployment and dynamic reconfiguration planning for distributed software systems

Rapid goal-oriented automated software testing using MEA-graph planning

Introduction to the special issue on: “Software Quality Improvements and Estimations with Intelligence-based Methods”

Premium Partner