Skip to main content

2012 | OriginalPaper | Buchkapitel

6. Making Early Predictions of the Accuracy of Machine Learning Classifiers

verfasst von : James Edward Smith, Muhammad Atif Tahir, Davy Sannen, Hendrik Van Brussel

Erschienen in: Learning in Non-Stationary Environments

Verlag: Springer New York

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The accuracy of machine learning systems is a widely studied research topic. Established techniques such as cross validation predict the accuracy on unseen data of the classifier produced by applying a given learning method to a given training data set. However, they do not predict whether incurring the cost of obtaining more data and undergoing further training will lead to higher accuracy. In this chapter, we investigate techniques for making such early predictions. We note that when a machine learning algorithm is presented with a training set the classifier produced, and hence its error, will depend on the characteristics of the algorithm, on training set’s size, and also on its specific composition. In particular we hypothesize that if a number of classifiers are produced, and their observed error is decomposed into bias and variance terms, then although these components may behave differently, their behavior may be predictable. Experimental results confirm this hypothesis, and show that our predictions are very highly correlated with the values observed after undertaking the extra training. This has particular relevance to learning in nonstationary environments, since we can use our characterization of bias and variance to detect whether perceived changes in the data stream arise from sampling variability or because the underlying data distributions have changed, which can be perceived as changes in bias.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
Literatur
1.
Zurück zum Zitat P. L. Bartlett, S. Boucheron, and G. Lugosi. Model selection and error estimation. Machine Learning, 48(1), 85–113 (2002)MATHCrossRef P. L. Bartlett, S. Boucheron, and G. Lugosi. Model selection and error estimation. Machine Learning, 48(1), 85–113 (2002)MATHCrossRef
2.
Zurück zum Zitat S. Boucheron, O. Bousquet, and G. Lugosi. Theory of classification: A survey of some recent advances. ESAIM: P&S, 9, 323–375 (2005)MathSciNetMATHCrossRef S. Boucheron, O. Bousquet, and G. Lugosi. Theory of classification: A survey of some recent advances. ESAIM: P&S, 9, 323–375 (2005)MathSciNetMATHCrossRef
3.
Zurück zum Zitat O. Bousquet, S. Boucheron, and G. Lugosi. Introduction to statistical learning theory. Advanced Lectures on Machine Learning, pp. 169–207 (2004) O. Bousquet, S. Boucheron, and G. Lugosi. Introduction to statistical learning theory. Advanced Lectures on Machine Learning, pp. 169–207 (2004)
5.
Zurück zum Zitat L. Breiman. Bias, variance, and arcing classifiers. Technical Report 460, Statistics Department, University of California, Berkeley, CA, 1996 L. Breiman. Bias, variance, and arcing classifiers. Technical Report 460, Statistics Department, University of California, Berkeley, CA, 1996
7.
Zurück zum Zitat L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth International Group, Belmont, California. (1994) L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth International Group, Belmont, California. (1994)
8.
Zurück zum Zitat D. Brian and G.I. Webb. On the effect of data set size on bias and variance in classification learning. In Proceedings of the 4th Australian Knowledge Acquisition Workshop, pp. 117–128 (1999) D. Brian and G.I. Webb. On the effect of data set size on bias and variance in classification learning. In Proceedings of the 4th Australian Knowledge Acquisition Workshop, pp. 117–128 (1999)
9.
Zurück zum Zitat G. Brown, J. Wyatt, R. Harris, and X. Yao. Diversity creation methods: A survey and categorisation. Journal of Information Fusion, 6(1), 5–20 (2005)CrossRef G. Brown, J. Wyatt, R. Harris, and X. Yao. Diversity creation methods: A survey and categorisation. Journal of Information Fusion, 6(1), 5–20 (2005)CrossRef
10.
Zurück zum Zitat C. Cortes, L.D. Jackel, S.A. Solla, V. Vapnik, and J.S. Denker. Learning curves: Asymptotic values and rate of convergence. In Advances in Neural Information Processing Systems: 6, pp. 327–334 (1994) C. Cortes, L.D. Jackel, S.A. Solla, V. Vapnik, and J.S. Denker. Learning curves: Asymptotic values and rate of convergence. In Advances in Neural Information Processing Systems: 6, pp. 327–334 (1994)
11.
Zurück zum Zitat T.M. Cover and P.E. Hart. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27 (1967)MATHCrossRef T.M. Cover and P.E. Hart. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27 (1967)MATHCrossRef
12.
Zurück zum Zitat Pedro Domingos. A unified bias–variance decomposition and its applications. In Proceedings of the 17th International Conference on Machine Learning, pp. 231–238. Morgan Kaufmann, San Francisco (2000) Pedro Domingos. A unified bias–variance decomposition and its applications. In Proceedings of the 17th International Conference on Machine Learning, pp. 231–238. Morgan Kaufmann, San Francisco (2000)
13.
Zurück zum Zitat R O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley Interscience, 2nd edition, New York (2000) R O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley Interscience, 2nd edition, New York (2000)
14.
Zurück zum Zitat C. Eitzinger, W. Heidl, E. Lughofer, S. Raiser, J.E. Smith, M.A. Tahir, D. Sannen and H. Van Brussel. Assessment of the Influence of Adaptive Components in Trainable Surface Inspection Systems, Machine Vision and Applications, 21(5), 613–626 (2010)CrossRef C. Eitzinger, W. Heidl, E. Lughofer, S. Raiser, J.E. Smith, M.A. Tahir, D. Sannen and H. Van Brussel. Assessment of the Influence of Adaptive Components in Trainable Surface Inspection Systems, Machine Vision and Applications, 21(5), 613–626 (2010)CrossRef
15.
Zurück zum Zitat Yoav Freund and Robert E. Shapire. Experiments with a new boosting algorithm. In Proceedings of 13th International Conference on Machine Learning, pp. 148–156. (1996) Yoav Freund and Robert E. Shapire. Experiments with a new boosting algorithm. In Proceedings of 13th International Conference on Machine Learning, pp. 148–156. (1996)
16.
Zurück zum Zitat J. H. Friedman. On bias, variance, 0/1-loss, and the curse of dimensionality. Data Mining and Knowledge Discovery, 1(1), 55–77 (2000)CrossRef J. H. Friedman. On bias, variance, 0/1-loss, and the curse of dimensionality. Data Mining and Knowledge Discovery, 1(1), 55–77 (2000)CrossRef
17.
Zurück zum Zitat S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias/variance dilemma. Neural Computation, textbf4, 1–48 (1995) S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias/variance dilemma. Neural Computation, textbf4, 1–48 (1995)
18.
Zurück zum Zitat T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer Verlag, New York, Heidelberg, London (2001)MATH T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer Verlag, New York, Heidelberg, London (2001)MATH
19.
Zurück zum Zitat G. James. Variance and bias for general loss functions. Machine Learning, 51(2), 115–135 (2003)MATHCrossRef G. James. Variance and bias for general loss functions. Machine Learning, 51(2), 115–135 (2003)MATHCrossRef
20.
Zurück zum Zitat R. Kohavi. The power of decision tables. In Proceedings of the 8th European Conference on Machine Learning, pp. 174–189. Springer Verlag, London, UK (1995) R. Kohavi. The power of decision tables. In Proceedings of the 8th European Conference on Machine Learning, pp. 174–189. Springer Verlag, London, UK (1995)
21.
Zurück zum Zitat R. Kohavi and D. H. Wolpert. Bias plus variance decomposition for zero-one loss functions. In Proceedings of the 13th International Conference on Machine Learning., pp. 275–283 (1996) R. Kohavi and D. H. Wolpert. Bias plus variance decomposition for zero-one loss functions. In Proceedings of the 13th International Conference on Machine Learning., pp. 275–283 (1996)
22.
Zurück zum Zitat B. E. Kong and T. G. Dietterich. Error-correcting output coding corrects bias and variance. In Proceedings of the 12th International Conference on Machine Learning, pp. 313–321, San Francisco, Morgan Kaufmann (1995) B. E. Kong and T. G. Dietterich. Error-correcting output coding corrects bias and variance. In Proceedings of the 12th International Conference on Machine Learning, pp. 313–321, San Francisco, Morgan Kaufmann (1995)
23.
Zurück zum Zitat R. Leite and P. Brazdil. Improving Progressive Sampling via Meta-learning on Learning Curves. In Proceedings of the European Conference on machine Learning (ECML) pp. 250–261 (2004) R. Leite and P. Brazdil. Improving Progressive Sampling via Meta-learning on Learning Curves. In Proceedings of the European Conference on machine Learning (ECML) pp. 250–261 (2004)
24.
Zurück zum Zitat E. Lughofer. Extensions of vector quantization for incremental clustering. Pattern Recognition, 41(3), 995–1011 (2008)MATHCrossRef E. Lughofer. Extensions of vector quantization for incremental clustering. Pattern Recognition, 41(3), 995–1011 (2008)MATHCrossRef
25.
Zurück zum Zitat E. Lughofer, J.E. Smith, M.A. Tahir, P. Caleb-Solly, C. Eitzinger, D. Sannen and M. Nuttin. Human–Machine Interaction Issues in Quality Control Based on On-Line Image Classification, IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, 39(5), 960–971 (2009)CrossRef E. Lughofer, J.E. Smith, M.A. Tahir, P. Caleb-Solly, C. Eitzinger, D. Sannen and M. Nuttin. Human–Machine Interaction Issues in Quality Control Based on On-Line Image Classification, IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, 39(5), 960–971 (2009)CrossRef
26.
Zurück zum Zitat S. Mukherjee, P. Tamayo, S. Rogers, R.M. Rifkin, A. Engle, C. Campbell, T.R. Golub, and J.P. Mesirov. Estimating dataset size requirements for classifying DNA microarray data. Journal of Computational Biology, 10(2), 119–142 (2003)CrossRef S. Mukherjee, P. Tamayo, S. Rogers, R.M. Rifkin, A. Engle, C. Campbell, T.R. Golub, and J.P. Mesirov. Estimating dataset size requirements for classifying DNA microarray data. Journal of Computational Biology, 10(2), 119–142 (2003)CrossRef
27.
Zurück zum Zitat Bartlett P.L. and S Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3, 463–482 (2002) Bartlett P.L. and S Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3, 463–482 (2002)
28.
Zurück zum Zitat J. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Schoelkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods—Support Vector Learning. MIT Press, Mass (1998) J. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Schoelkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods—Support Vector Learning. MIT Press, Mass (1998)
29.
Zurück zum Zitat F.J. Provost, D. Jensen and T. Oates. Efficient Progressive Sampling. In Proceedings of Knowledge Discovery in Databases (KDD), pp. 23–32 (1999) F.J. Provost, D. Jensen and T. Oates. Efficient Progressive Sampling. In Proceedings of Knowledge Discovery in Databases (KDD), pp. 23–32 (1999)
30.
Zurück zum Zitat J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993) J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
31.
Zurück zum Zitat J. J. Rodriguez, C. J. Alonso, and O. J. Prieto. Bias and variance of rotation-based ensembles. In Computational Intelligence and Bioinspired Systems, number 3512 in Lecture Notes in Computer Science, pp. 779–786. Springer, Berlin, Heidelberg (2005) J. J. Rodriguez, C. J. Alonso, and O. J. Prieto. Bias and variance of rotation-based ensembles. In Computational Intelligence and Bioinspired Systems, number 3512 in Lecture Notes in Computer Science, pp. 779–786. Springer, Berlin, Heidelberg (2005)
32.
Zurück zum Zitat D. Sannen, H. Van Brussel, and M. Nuttin. Classifier fusion using discounted Dempster–Shafer combination. In Proceedings of the 5th International Conference on Machine Learning and Data Mining, Poster Proceedings, pp. 216–230 (2007) D. Sannen, H. Van Brussel, and M. Nuttin. Classifier fusion using discounted Dempster–Shafer combination. In Proceedings of the 5th International Conference on Machine Learning and Data Mining, Poster Proceedings, pp. 216–230 (2007)
33.
Zurück zum Zitat J. E. Smith and M. A. Tahir. Stop wasting time: On predicting the success or failure of learning for industrial applications. In Proceedings of the 8th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL’08), number 4881 in Lecture Notes in Computer Science, pp. 673–683. Springer Verlag, Berlin, Heidelberg (2007) J. E. Smith and M. A. Tahir. Stop wasting time: On predicting the success or failure of learning for industrial applications. In Proceedings of the 8th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL’08), number 4881 in Lecture Notes in Computer Science, pp. 673–683. Springer Verlag, Berlin, Heidelberg (2007)
34.
Zurück zum Zitat M. Stone. Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society, 36, 111–147 (1974)MATH M. Stone. Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society, 36, 111–147 (1974)MATH
35.
Zurück zum Zitat V. Vapnik. The Nature of Statistical Learning Theory. Springer, New York (2000)MATH V. Vapnik. The Nature of Statistical Learning Theory. Springer, New York (2000)MATH
36.
Zurück zum Zitat V. Vapnik and O. Chapelle. Bounds on error expectation for support vector machines. Neural Computation, 12(9), 2013–2036 (2000)CrossRef V. Vapnik and O. Chapelle. Bounds on error expectation for support vector machines. Neural Computation, 12(9), 2013–2036 (2000)CrossRef
37.
Zurück zum Zitat G. I. Webb. Multiboosting: A technique for combining boosting and wagging. Machine Learning, 40(2), 159–196 (2000)CrossRef G. I. Webb. Multiboosting: A technique for combining boosting and wagging. Machine Learning, 40(2), 159–196 (2000)CrossRef
39.
Zurück zum Zitat I.H. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco, 2nd edition (2005)MATH I.H. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco, 2nd edition (2005)MATH
Metadaten
Titel
Making Early Predictions of the Accuracy of Machine Learning Classifiers
verfasst von
James Edward Smith
Muhammad Atif Tahir
Davy Sannen
Hendrik Van Brussel
Copyright-Jahr
2012
Verlag
Springer New York
DOI
https://doi.org/10.1007/978-1-4419-8020-5_6

Premium Partner