nach oben

Software Quality Journal

Erschienen in:

01.03.2009

Software quality analysis by combining multiple projects and learners

verfasst von: Taghi M. Khoshgoftaar, Pierre Rebours, Naeem Seliya

Erschienen in: Software Quality Journal | Ausgabe 1/2009

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

When building software quality models, the approach often consists of training data mining learners on a single fit dataset. Typically, this fit dataset contains software metrics collected during a past release of the software project that we want to predict the quality of. In order to improve the predictive accuracy of such quality models, it is common practice to combine the predictive results of multiple learners to take advantage of their respective biases. Although multi-learner classifiers have been proven to be successful in some cases, the improvement is not always significant because the information in the fit dataset sometimes can be insufficient. We present an innovative method to build software quality models using majority voting to combine the predictions of multiple learners induced on multiple training datasets. To our knowledge, no previous study in software quality has attempted to take advantage of multiple software project data repositories which are generally spread across the organization. In a large scale empirical study involving seven real-world datasets and seventeen learners, we show that, on average, combining the predictions of one learner trained on multiple datasets significantly improves the predictive performance compared to one learner induced on a single fit dataset. We also demonstrate empirically that combining multiple learners trained on a single training dataset does not significantly improve the average predictive accuracy compared to the use of a single learner induced on a single fit dataset.

Vorheriger Artikel Multiple-parameter coupling metrics for layered component-based software

Nächster Artikel Evaluating legacy assets in the context of migration to SOA

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

μ_.j and \(\mu_{.j}^{*}\) are exchangeably used throughout the paper.

Alpaydin, E. (1997). Voting over multiple condensed neareast neighbors. Artificial Intelligence Review, 11(1–5), 115–132.CrossRef

Alpaydin, E. (1998). Techniques for combining multiple learners. In E. Alpaydin (Ed.), Proceedings of engineering of intelligent systems conference (Vol. 2 of 6–12). ICSC Press.

Atkeson, C. G., Moore, A. W., & Schaal, S. (1997). Locally weighted learning. Artificial Intelligence Review, 11(1–5), 11–73.CrossRef

Berenson, M. L., Levine, D. M., & Goldstein, M. (1983). Intermediate statistical methods and applications. Prentice-Hall.

Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.MATHMathSciNet

Chen, K., Wang, L., & Chi, H. (1997). Methods of combining multiple classifiers with different features and their applications to text-independent speaker identification. International Journal of Pattern Recognition and Artificial Intelligence, 11(3), 417–445.CrossRef

Cohen, W. W. (1995). Fast effective rule induction. In A. Prieditis & S. Russell (Eds.), Proceedings of the 12th international conference on machine learning (pp. 115–123). Tahoe City, CA: Morgan Kaufmann.

Compton, P., & Jansen, R. (1990). Knowledge in context: A strategy for expert system maintenance. In C. J. Barter & M. J. Brooks (Eds.), 2nd Australian joint artificial intelligence conference (pp. 292–306). Adelaide, Australia: Springer-Verlag.

Cuadrado-Gallego, J. J., Fernndez-Sanz, L., & Sicilia, M. A. (2006). Enhancing input value selection in parametric software cost estimation models through second level cost drivers. Software Quality Journal, 14(4), 330–357.CrossRef

Emam, K. E., Benlarbi, S., Goel, N., & Rai, S. N. (2001). Comparing case-based reasoning classifiers for predicting high-risk software componenets. Journal of Systems and Software, 55(3), 301–320. Elsevier Science Publishing.CrossRef

Fenton, N. E., & Pfleeger, S. L. (1997). Software metrics: A rigorous and practical approach (2nd ed.). Boston, MA: PWS Publishing.

Frank, E., Trigg, L., Holmes, G., & Witten, I. H. (2000). Naive bayes for regression. Machine Learning, 41(1), 5–25.CrossRef

Frank, E., & Witten, I. H. (1998). Generating accurate rule sets without global optimization. In Proceedings of the 15th international conference on machine learning (pp. 144–151). Morgan Kaufmann.

Freund, Y., & Mason, L. (1999). The alternating decision tree learning algorithm. In Proceedings of 16th international conference on machine learning (pp. 124–133). Bled, Slovenia: Morgan Kaufmann.

Gaines, B. R., & Compton, P. (1995). Induction of ripple-down rules applied to modeling large databases. Journal of Intelligent Information Systems, 5(3), 211–228.CrossRef

Gamberger, D., Lavrač, N., & Dzeroski, S. (1996). Noise elimination in inductive concept learning: A case study in medical diagnosis. In Algorithmic learning theory: Proceedings of the 7th international workshop (Vol. 1160, pp. 199–212). Sydney, Australia: Springer-Verlag.

Hansen, L. K., & Salamon, P. (1990). Neural network ensemble. In IEEE transactions on pattern analysis and machine intelligence (Vol. 12, pp. 993–1001).

Ho, T. K., Hull, J. J., & Srihari, S. N. (1994). Decision combination in multiple classifier systems. IEEE transactions on pattern analysis and machine intelligence, 16(1).

Holte, R. C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11, 63–91.MATHCrossRef

Jain, R. (1991) The art of computer systems performance analysis: Techniques for experimental design, measurement, simulation, and modeling. John Wiley & Sons.

Khoshgoftaar, T. M., & Allen, E. B. (1999). Logistic regression modeling of software quality. International Journal of Reliability, Quality, and Safety Engineering, 6(4), 303–317.CrossRef

Khoshgoftaar, T. M., Joshi, V., & Seliya, N. (2006). Noise elimination with ensemble-classifier noise filtering: Case studies in software engineering. International Journal of Software Engineering and Knowledge Engineering, 16(1), 1–24.CrossRef

Khoshgoftaar, T. M., & Rebours, P. (2004). Generating multiple noise elimination filters with the ensemble-partitioning filter. In Proceedings of the 2004 IEEE International Conference on Information Reuse and Integration (pp. 369–375). Las Vegas, NV, November 2004.

Khoshgoftaar, T. M., & Seliya, N. (2004). The necessity of assuring quality in software measurement data. In Proceedings of the 10th international symposium on software metrics (pp. 119–130). Chicago, IL: IEEE Computer Society Press.

Khoshgoftaar, T. M., Yuan, X., & Allen, E. B. (2000). Balancing misclassification rates in classification tree models of software quality. Empirical Software Engineering, 5, 313–330.MATHCrossRef

Kohavi, R. (1995). The power of decision tables. In N. Lavrač & S. Wrobel (Eds.), Proceedings of the european conference on machine learning, Lecture Notes in Artificial Intelligence (pp. 174–189). Springer-Verlag.

Krogh, A., & Vedelsby, J. (1995). Neural network ensembles, cross validation and active learning. In Advances in neural information processing systems (pp. 231–238). Cambridge MA: MIT Press.

Mani, G. (1991). Lowering variance of decisions by using artificial neural network ensembles. Neural Computation, 3, 484–486.CrossRef

Meir, R. (1995). Bias, variance and the combination of estimators; the case of linear least squares. In D. T. G. Tesauro & T. Leen (Eds.), Advances in neural information processing systems. Cambridge, MA: MIT Press.

Menzies, T., Greenwald, J., & Frank, A. (2007). Data mining static code attributes to learn defect predictors. IEEE transactions on software engineering, 33(1), 2–13.CrossRef

Meulen, M. J., & Revilla, M. A. (2007). Correlations between internal software metrics and software dependability in a large population of small C/C++ programs. In Proceedings of the 18th IEEE international symposium on software reliability engineering, ISSRE 2007 (pp. 203–208). Trollhattan, Sweden, November 2007.

Munson, J. C., & Khoshgoftaar, T. M. (1992). The detection of fault-prone programs. IEEE Transactions on Software Engineering, 18(5).

Nikora, A. P., & Munson, J. C. (2003). Understanding the nature of software evolution. In Proceedings of the 2003 international conference on software maintenance (pp. 83–93), September 2003.

Nikora, A. P., & Munson, J. C. (2004). The effects of fault counting methods on fault model quality. In Proceedings of the 28th international computer software and applications conference, COMPSAC 2004 (vol. 1, pp. 192–201), September 2004.

Platt, J. C. (1998). Sequential minimal optimization: A fast algorithm for training support vector machines. Technical Report 98-14, Microsoft Research, Redmond, WA, April 1998.

Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann.

Shepperd, M., & Kadoda, G. (2001). Comparing software prediction techniques using simulation. IEEE Transactions on Software Engineering, 27(11), 1014–1022.CrossRef

Witten, I. H., & Frank, E. (2000). Data mining, practical machine learning tools and techniques with Java implementations. San Francisco, CA: Morgan Kaufmann.

Wohlin, C., Runeson, P., Host, M., Ohlsson, M. C., Regnell, B., & Wesslen, A. (2000). Experimentation in software engineering: An introduction Kluwer international series in software engineering. Boston, MA: Kluwer Academic Publishers.

Wolpert, D. (1992). Stacked generalization. Neural Network, 5(2), 241–259.CrossRef

Xing, F., Guo, P., & Lyu, M. R. (2005). A novel method for early software quality prediction based on support vector machine. In Proceedings of the 16th international symposium on software reliability engineering (p. 10), November 2005.

Yacoub, S., Lin, X., Simske, S., & Burns, J. (2003). Automating the analysis of voting systems. In 14th international symposium on software reliability engineering (pp. 203). Denver, CO, November 2003.

Titel: Software quality analysis by combining multiple projects and learners
verfasst von: Taghi M. Khoshgoftaar
Pierre Rebours
Naeem Seliya
Publikationsdatum: 01.03.2009
Verlag: Springer US
Erschienen in: Software Quality Journal / Ausgabe 1/2009
Print ISSN: 0963-9314
Elektronische ISSN: 1573-1367
DOI: https://doi.org/10.1007/s11219-008-9058-3

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 1/2009

Evaluating legacy assets in the context of migration to SOA

Multiple-parameter coupling metrics for layered component-based software

Welcome to Rachel Harrison, the new Editor-in-Chief of the Software Quality Journal

In this issue

Bayesian updating of optimal release time for software systems

A comprehensive quality model for service-oriented systems

Premium Partner