Skip to main content
Erschienen in: Software Quality Journal 1/2009

01.03.2009

Software quality analysis by combining multiple projects and learners

verfasst von: Taghi M. Khoshgoftaar, Pierre Rebours, Naeem Seliya

Erschienen in: Software Quality Journal | Ausgabe 1/2009

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

When building software quality models, the approach often consists of training data mining learners on a single fit dataset. Typically, this fit dataset contains software metrics collected during a past release of the software project that we want to predict the quality of. In order to improve the predictive accuracy of such quality models, it is common practice to combine the predictive results of multiple learners to take advantage of their respective biases. Although multi-learner classifiers have been proven to be successful in some cases, the improvement is not always significant because the information in the fit dataset sometimes can be insufficient. We present an innovative method to build software quality models using majority voting to combine the predictions of multiple learners induced on multiple training datasets. To our knowledge, no previous study in software quality has attempted to take advantage of multiple software project data repositories which are generally spread across the organization. In a large scale empirical study involving seven real-world datasets and seventeen learners, we show that, on average, combining the predictions of one learner trained on multiple datasets significantly improves the predictive performance compared to one learner induced on a single fit dataset. We also demonstrate empirically that combining multiple learners trained on a single training dataset does not significantly improve the average predictive accuracy compared to the use of a single learner induced on a single fit dataset.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
μ.j and \(\mu_{.j}^{*}\) are exchangeably used throughout the paper.
 
Literatur
Zurück zum Zitat Alpaydin, E. (1997). Voting over multiple condensed neareast neighbors. Artificial Intelligence Review, 11(1–5), 115–132.CrossRef Alpaydin, E. (1997). Voting over multiple condensed neareast neighbors. Artificial Intelligence Review, 11(1–5), 115–132.CrossRef
Zurück zum Zitat Alpaydin, E. (1998). Techniques for combining multiple learners. In E. Alpaydin (Ed.), Proceedings of engineering of intelligent systems conference (Vol. 2 of 6–12). ICSC Press. Alpaydin, E. (1998). Techniques for combining multiple learners. In E. Alpaydin (Ed.), Proceedings of engineering of intelligent systems conference (Vol. 2 of 6–12). ICSC Press.
Zurück zum Zitat Atkeson, C. G., Moore, A. W., & Schaal, S. (1997). Locally weighted learning. Artificial Intelligence Review, 11(1–5), 11–73.CrossRef Atkeson, C. G., Moore, A. W., & Schaal, S. (1997). Locally weighted learning. Artificial Intelligence Review, 11(1–5), 11–73.CrossRef
Zurück zum Zitat Berenson, M. L., Levine, D. M., & Goldstein, M. (1983). Intermediate statistical methods and applications. Prentice-Hall. Berenson, M. L., Levine, D. M., & Goldstein, M. (1983). Intermediate statistical methods and applications. Prentice-Hall.
Zurück zum Zitat Chen, K., Wang, L., & Chi, H. (1997). Methods of combining multiple classifiers with different features and their applications to text-independent speaker identification. International Journal of Pattern Recognition and Artificial Intelligence, 11(3), 417–445.CrossRef Chen, K., Wang, L., & Chi, H. (1997). Methods of combining multiple classifiers with different features and their applications to text-independent speaker identification. International Journal of Pattern Recognition and Artificial Intelligence, 11(3), 417–445.CrossRef
Zurück zum Zitat Cohen, W. W. (1995). Fast effective rule induction. In A. Prieditis & S. Russell (Eds.), Proceedings of the 12th international conference on machine learning (pp. 115–123). Tahoe City, CA: Morgan Kaufmann. Cohen, W. W. (1995). Fast effective rule induction. In A. Prieditis & S. Russell (Eds.), Proceedings of the 12th international conference on machine learning (pp. 115–123). Tahoe City, CA: Morgan Kaufmann.
Zurück zum Zitat Compton, P., & Jansen, R. (1990). Knowledge in context: A strategy for expert system maintenance. In C. J. Barter & M. J. Brooks (Eds.), 2nd Australian joint artificial intelligence conference (pp. 292–306). Adelaide, Australia: Springer-Verlag. Compton, P., & Jansen, R. (1990). Knowledge in context: A strategy for expert system maintenance. In C. J. Barter & M. J. Brooks (Eds.), 2nd Australian joint artificial intelligence conference (pp. 292–306). Adelaide, Australia: Springer-Verlag.
Zurück zum Zitat Cuadrado-Gallego, J. J., Fernndez-Sanz, L., & Sicilia, M. A. (2006). Enhancing input value selection in parametric software cost estimation models through second level cost drivers. Software Quality Journal, 14(4), 330–357.CrossRef Cuadrado-Gallego, J. J., Fernndez-Sanz, L., & Sicilia, M. A. (2006). Enhancing input value selection in parametric software cost estimation models through second level cost drivers. Software Quality Journal, 14(4), 330–357.CrossRef
Zurück zum Zitat Emam, K. E., Benlarbi, S., Goel, N., & Rai, S. N. (2001). Comparing case-based reasoning classifiers for predicting high-risk software componenets. Journal of Systems and Software, 55(3), 301–320. Elsevier Science Publishing.CrossRef Emam, K. E., Benlarbi, S., Goel, N., & Rai, S. N. (2001). Comparing case-based reasoning classifiers for predicting high-risk software componenets. Journal of Systems and Software, 55(3), 301–320. Elsevier Science Publishing.CrossRef
Zurück zum Zitat Fenton, N. E., & Pfleeger, S. L. (1997). Software metrics: A rigorous and practical approach (2nd ed.). Boston, MA: PWS Publishing. Fenton, N. E., & Pfleeger, S. L. (1997). Software metrics: A rigorous and practical approach (2nd ed.). Boston, MA: PWS Publishing.
Zurück zum Zitat Frank, E., Trigg, L., Holmes, G., & Witten, I. H. (2000). Naive bayes for regression. Machine Learning, 41(1), 5–25.CrossRef Frank, E., Trigg, L., Holmes, G., & Witten, I. H. (2000). Naive bayes for regression. Machine Learning, 41(1), 5–25.CrossRef
Zurück zum Zitat Frank, E., & Witten, I. H. (1998). Generating accurate rule sets without global optimization. In Proceedings of the 15th international conference on machine learning (pp. 144–151). Morgan Kaufmann. Frank, E., & Witten, I. H. (1998). Generating accurate rule sets without global optimization. In Proceedings of the 15th international conference on machine learning (pp. 144–151). Morgan Kaufmann.
Zurück zum Zitat Freund, Y., & Mason, L. (1999). The alternating decision tree learning algorithm. In Proceedings of 16th international conference on machine learning (pp. 124–133). Bled, Slovenia: Morgan Kaufmann. Freund, Y., & Mason, L. (1999). The alternating decision tree learning algorithm. In Proceedings of 16th international conference on machine learning (pp. 124–133). Bled, Slovenia: Morgan Kaufmann.
Zurück zum Zitat Gaines, B. R., & Compton, P. (1995). Induction of ripple-down rules applied to modeling large databases. Journal of Intelligent Information Systems, 5(3), 211–228.CrossRef Gaines, B. R., & Compton, P. (1995). Induction of ripple-down rules applied to modeling large databases. Journal of Intelligent Information Systems, 5(3), 211–228.CrossRef
Zurück zum Zitat Gamberger, D., Lavrač, N., & Dzeroski, S. (1996). Noise elimination in inductive concept learning: A case study in medical diagnosis. In Algorithmic learning theory: Proceedings of the 7th international workshop (Vol. 1160, pp. 199–212). Sydney, Australia: Springer-Verlag. Gamberger, D., Lavrač, N., & Dzeroski, S. (1996). Noise elimination in inductive concept learning: A case study in medical diagnosis. In Algorithmic learning theory: Proceedings of the 7th international workshop (Vol. 1160, pp. 199–212). Sydney, Australia: Springer-Verlag.
Zurück zum Zitat Hansen, L. K., & Salamon, P. (1990). Neural network ensemble. In IEEE transactions on pattern analysis and machine intelligence (Vol. 12, pp. 993–1001). Hansen, L. K., & Salamon, P. (1990). Neural network ensemble. In IEEE transactions on pattern analysis and machine intelligence (Vol. 12, pp. 993–1001).
Zurück zum Zitat Ho, T. K., Hull, J. J., & Srihari, S. N. (1994). Decision combination in multiple classifier systems. IEEE transactions on pattern analysis and machine intelligence, 16(1). Ho, T. K., Hull, J. J., & Srihari, S. N. (1994). Decision combination in multiple classifier systems. IEEE transactions on pattern analysis and machine intelligence, 16(1).
Zurück zum Zitat Holte, R. C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11, 63–91.MATHCrossRef Holte, R. C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11, 63–91.MATHCrossRef
Zurück zum Zitat Jain, R. (1991) The art of computer systems performance analysis: Techniques for experimental design, measurement, simulation, and modeling. John Wiley & Sons. Jain, R. (1991) The art of computer systems performance analysis: Techniques for experimental design, measurement, simulation, and modeling. John Wiley & Sons.
Zurück zum Zitat Khoshgoftaar, T. M., & Allen, E. B. (1999). Logistic regression modeling of software quality. International Journal of Reliability, Quality, and Safety Engineering, 6(4), 303–317.CrossRef Khoshgoftaar, T. M., & Allen, E. B. (1999). Logistic regression modeling of software quality. International Journal of Reliability, Quality, and Safety Engineering, 6(4), 303–317.CrossRef
Zurück zum Zitat Khoshgoftaar, T. M., Joshi, V., & Seliya, N. (2006). Noise elimination with ensemble-classifier noise filtering: Case studies in software engineering. International Journal of Software Engineering and Knowledge Engineering, 16(1), 1–24.CrossRef Khoshgoftaar, T. M., Joshi, V., & Seliya, N. (2006). Noise elimination with ensemble-classifier noise filtering: Case studies in software engineering. International Journal of Software Engineering and Knowledge Engineering, 16(1), 1–24.CrossRef
Zurück zum Zitat Khoshgoftaar, T. M., & Rebours, P. (2004). Generating multiple noise elimination filters with the ensemble-partitioning filter. In Proceedings of the 2004 IEEE International Conference on Information Reuse and Integration (pp. 369–375). Las Vegas, NV, November 2004. Khoshgoftaar, T. M., & Rebours, P. (2004). Generating multiple noise elimination filters with the ensemble-partitioning filter. In Proceedings of the 2004 IEEE International Conference on Information Reuse and Integration (pp. 369–375). Las Vegas, NV, November 2004.
Zurück zum Zitat Khoshgoftaar, T. M., & Seliya, N. (2004). The necessity of assuring quality in software measurement data. In Proceedings of the 10th international symposium on software metrics (pp. 119–130). Chicago, IL: IEEE Computer Society Press. Khoshgoftaar, T. M., & Seliya, N. (2004). The necessity of assuring quality in software measurement data. In Proceedings of the 10th international symposium on software metrics (pp. 119–130). Chicago, IL: IEEE Computer Society Press.
Zurück zum Zitat Khoshgoftaar, T. M., Yuan, X., & Allen, E. B. (2000). Balancing misclassification rates in classification tree models of software quality. Empirical Software Engineering, 5, 313–330.MATHCrossRef Khoshgoftaar, T. M., Yuan, X., & Allen, E. B. (2000). Balancing misclassification rates in classification tree models of software quality. Empirical Software Engineering, 5, 313–330.MATHCrossRef
Zurück zum Zitat Kohavi, R. (1995). The power of decision tables. In N. Lavrač & S. Wrobel (Eds.), Proceedings of the european conference on machine learning, Lecture Notes in Artificial Intelligence (pp. 174–189). Springer-Verlag. Kohavi, R. (1995). The power of decision tables. In N. Lavrač & S. Wrobel (Eds.), Proceedings of the european conference on machine learning, Lecture Notes in Artificial Intelligence (pp. 174–189). Springer-Verlag.
Zurück zum Zitat Krogh, A., & Vedelsby, J. (1995). Neural network ensembles, cross validation and active learning. In Advances in neural information processing systems (pp. 231–238). Cambridge MA: MIT Press. Krogh, A., & Vedelsby, J. (1995). Neural network ensembles, cross validation and active learning. In Advances in neural information processing systems (pp. 231–238). Cambridge MA: MIT Press.
Zurück zum Zitat Mani, G. (1991). Lowering variance of decisions by using artificial neural network ensembles. Neural Computation, 3, 484–486.CrossRef Mani, G. (1991). Lowering variance of decisions by using artificial neural network ensembles. Neural Computation, 3, 484–486.CrossRef
Zurück zum Zitat Meir, R. (1995). Bias, variance and the combination of estimators; the case of linear least squares. In D. T. G. Tesauro & T. Leen (Eds.), Advances in neural information processing systems. Cambridge, MA: MIT Press. Meir, R. (1995). Bias, variance and the combination of estimators; the case of linear least squares. In D. T. G. Tesauro & T. Leen (Eds.), Advances in neural information processing systems. Cambridge, MA: MIT Press.
Zurück zum Zitat Menzies, T., Greenwald, J., & Frank, A. (2007). Data mining static code attributes to learn defect predictors. IEEE transactions on software engineering, 33(1), 2–13.CrossRef Menzies, T., Greenwald, J., & Frank, A. (2007). Data mining static code attributes to learn defect predictors. IEEE transactions on software engineering, 33(1), 2–13.CrossRef
Zurück zum Zitat Meulen, M. J., & Revilla, M. A. (2007). Correlations between internal software metrics and software dependability in a large population of small C/C++ programs. In Proceedings of the 18th IEEE international symposium on software reliability engineering, ISSRE 2007 (pp. 203–208). Trollhattan, Sweden, November 2007. Meulen, M. J., & Revilla, M. A. (2007). Correlations between internal software metrics and software dependability in a large population of small C/C++ programs. In Proceedings of the 18th IEEE international symposium on software reliability engineering, ISSRE 2007 (pp. 203–208). Trollhattan, Sweden, November 2007.
Zurück zum Zitat Munson, J. C., & Khoshgoftaar, T. M. (1992). The detection of fault-prone programs. IEEE Transactions on Software Engineering, 18(5). Munson, J. C., & Khoshgoftaar, T. M. (1992). The detection of fault-prone programs. IEEE Transactions on Software Engineering, 18(5).
Zurück zum Zitat Nikora, A. P., & Munson, J. C. (2003). Understanding the nature of software evolution. In Proceedings of the 2003 international conference on software maintenance (pp. 83–93), September 2003. Nikora, A. P., & Munson, J. C. (2003). Understanding the nature of software evolution. In Proceedings of the 2003 international conference on software maintenance (pp. 83–93), September 2003.
Zurück zum Zitat Nikora, A. P., & Munson, J. C. (2004). The effects of fault counting methods on fault model quality. In Proceedings of the 28th international computer software and applications conference, COMPSAC 2004 (vol. 1, pp. 192–201), September 2004. Nikora, A. P., & Munson, J. C. (2004). The effects of fault counting methods on fault model quality. In Proceedings of the 28th international computer software and applications conference, COMPSAC 2004 (vol. 1, pp. 192–201), September 2004.
Zurück zum Zitat Platt, J. C. (1998). Sequential minimal optimization: A fast algorithm for training support vector machines. Technical Report 98-14, Microsoft Research, Redmond, WA, April 1998. Platt, J. C. (1998). Sequential minimal optimization: A fast algorithm for training support vector machines. Technical Report 98-14, Microsoft Research, Redmond, WA, April 1998.
Zurück zum Zitat Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann. Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann.
Zurück zum Zitat Shepperd, M., & Kadoda, G. (2001). Comparing software prediction techniques using simulation. IEEE Transactions on Software Engineering, 27(11), 1014–1022.CrossRef Shepperd, M., & Kadoda, G. (2001). Comparing software prediction techniques using simulation. IEEE Transactions on Software Engineering, 27(11), 1014–1022.CrossRef
Zurück zum Zitat Witten, I. H., & Frank, E. (2000). Data mining, practical machine learning tools and techniques with Java implementations. San Francisco, CA: Morgan Kaufmann. Witten, I. H., & Frank, E. (2000). Data mining, practical machine learning tools and techniques with Java implementations. San Francisco, CA: Morgan Kaufmann.
Zurück zum Zitat Wohlin, C., Runeson, P., Host, M., Ohlsson, M. C., Regnell, B., & Wesslen, A. (2000). Experimentation in software engineering: An introduction Kluwer international series in software engineering. Boston, MA: Kluwer Academic Publishers. Wohlin, C., Runeson, P., Host, M., Ohlsson, M. C., Regnell, B., & Wesslen, A. (2000). Experimentation in software engineering: An introduction Kluwer international series in software engineering. Boston, MA: Kluwer Academic Publishers.
Zurück zum Zitat Wolpert, D. (1992). Stacked generalization. Neural Network, 5(2), 241–259.CrossRef Wolpert, D. (1992). Stacked generalization. Neural Network, 5(2), 241–259.CrossRef
Zurück zum Zitat Xing, F., Guo, P., & Lyu, M. R. (2005). A novel method for early software quality prediction based on support vector machine. In Proceedings of the 16th international symposium on software reliability engineering (p. 10), November 2005. Xing, F., Guo, P., & Lyu, M. R. (2005). A novel method for early software quality prediction based on support vector machine. In Proceedings of the 16th international symposium on software reliability engineering (p. 10), November 2005.
Zurück zum Zitat Yacoub, S., Lin, X., Simske, S., & Burns, J. (2003). Automating the analysis of voting systems. In 14th international symposium on software reliability engineering (pp. 203). Denver, CO, November 2003. Yacoub, S., Lin, X., Simske, S., & Burns, J. (2003). Automating the analysis of voting systems. In 14th international symposium on software reliability engineering (pp. 203). Denver, CO, November 2003.
Metadaten
Titel
Software quality analysis by combining multiple projects and learners
verfasst von
Taghi M. Khoshgoftaar
Pierre Rebours
Naeem Seliya
Publikationsdatum
01.03.2009
Verlag
Springer US
Erschienen in
Software Quality Journal / Ausgabe 1/2009
Print ISSN: 0963-9314
Elektronische ISSN: 1573-1367
DOI
https://doi.org/10.1007/s11219-008-9058-3

Weitere Artikel der Ausgabe 1/2009

Software Quality Journal 1/2009 Zur Ausgabe

EditorialNotes

In this issue

Premium Partner