Skip to main content

2021 | OriginalPaper | Buchkapitel

What Is Important About the No Free Lunch Theorems?

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The No Free Lunch theorems prove that under a uniform distribution over induction problems (search problems or learning problems), all induction algorithms perform equally. As I discuss in this chapter, the importance of the theorems arises by using them to analyze scenarios involving nonuniform distributions, and to compare different algorithms, without any assumption about the distribution over problems at all. In particular, the theorems prove that anti-cross-validation (choosing among a set of candidate algorithms based on which has worst out-of-sample behavior) performs as well as cross-validation, unless one makes an assumption—which has never been formalized—about how the distribution over induction problems, on the one hand, is related to the set of algorithms one is choosing among using (anti-)cross validation, on the other. In addition, they establish strong caveats concerning the significance of the many results in the literature that establish the strength of a particular algorithm without assuming a particular distribution. They also motivate a “dictionary” between supervised learning and improving blackbox optimization, which allows one to “translate” techniques from supervised learning into the domain of blackbox optimization, thereby strengthening blackbox optimization algorithms. In addition to these topics, I also briefly discuss their implications for philosophy of science.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Note that as a special case, we could have each of the two professors always produce the exact same search algorithm for any objective function they are presented. In this case comparing the performance of the two professors just amounts to comparing the performance of the two associated search algorithms.
 
2
The choice to use an off-training set cost function for the analysis of supervised learning is the analog of the choice in the analysis of search to use a search algorithm that only searches over points not yet sampled. In both the cases, the goal is to “mod out” aspects of the problem that are typically not of interest and might result in misleading results: ability of the learning algorithm to reproduce a training set in the case of supervised learning, and ability to revisit points already sampled with a good objective value in the case of search.
 
Literatur
1.
Zurück zum Zitat Wolpert, D.H.: The lack of a prior distinctions between learning algorithms and the existence of a priori distinctions between learning algorithms. Neural Comput. 8, 1341–1390, 1391–1421 (1996)CrossRef Wolpert, D.H.: The lack of a prior distinctions between learning algorithms and the existence of a priori distinctions between learning algorithms. Neural Comput. 8, 1341–1390, 1391–1421 (1996)CrossRef
2.
Zurück zum Zitat Schaffer, C.: A conservation law for generalization performance. In: International Conference on Machine Learning, pp. 295–265. Morgan Kaufmann, San Mateo (1994) Schaffer, C.: A conservation law for generalization performance. In: International Conference on Machine Learning, pp. 295–265. Morgan Kaufmann, San Mateo (1994)
3.
Zurück zum Zitat Wolpert, D.H., Macready, W.G.: No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1(1), 67–82 (1997)CrossRef Wolpert, D.H., Macready, W.G.: No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1(1), 67–82 (1997)CrossRef
4.
Zurück zum Zitat Whitley, D., Rowe, J.: A “no free lunch” tutorial: sharpened and focused no free lunch. In: Theory of Randomized Search Heuristics: Foundations and Recent Developments, pp. 255–287. World Scientific, Singapore (2011) Whitley, D., Rowe, J.: A “no free lunch” tutorial: sharpened and focused no free lunch. In: Theory of Randomized Search Heuristics: Foundations and Recent Developments, pp. 255–287. World Scientific, Singapore (2011)
5.
Zurück zum Zitat Igel, C., Toussaint, M.: A no-free-lunch theorem for non-uniform distributions of target functions. J. Math. Model. Algorithms 3(4), 313–322 (2005)MathSciNetCrossRef Igel, C., Toussaint, M.: A no-free-lunch theorem for non-uniform distributions of target functions. J. Math. Model. Algorithms 3(4), 313–322 (2005)MathSciNetCrossRef
6.
Zurück zum Zitat Poland, K., Beer, K., Osborne, T.J.: No free lunch for quantum machine learning (2020). Preprint, arXiv:2003.14103 Poland, K., Beer, K., Osborne, T.J.: No free lunch for quantum machine learning (2020). Preprint, arXiv:2003.14103
7.
Zurück zum Zitat Peel, L., Larremore, D.B., Clauset, A.: The ground truth about metadata and community detection in networks. Sci. Adv. 3(5), e1602548 (2017)CrossRef Peel, L., Larremore, D.B., Clauset, A.: The ground truth about metadata and community detection in networks. Sci. Adv. 3(5), e1602548 (2017)CrossRef
8.
Zurück zum Zitat Godfrey-Smith, P.: Theory and Reality: An Introduction to the Philosophy of Science. University of Chicago Press, Chicago (2009) Godfrey-Smith, P.: Theory and Reality: An Introduction to the Philosophy of Science. University of Chicago Press, Chicago (2009)
9.
Zurück zum Zitat Wolpert, D.H.: The relationship between PAC, the statistical physics framework, the Bayesian framework, and the VC framework. In: The Mathematics of Generalization, pp. 117–215. Addison-Wesley, Reading (1995) Wolpert, D.H.: The relationship between PAC, the statistical physics framework, the Bayesian framework, and the VC framework. In: The Mathematics of Generalization, pp. 117–215. Addison-Wesley, Reading (1995)
10.
Zurück zum Zitat Wolpert, D.H.: On bias plus variance. Neural Comput. 9(6), 1211–1243 (1997)CrossRef Wolpert, D.H.: On bias plus variance. Neural Comput. 9(6), 1211–1243 (1997)CrossRef
11.
Zurück zum Zitat Mackay, D.J.C.: Information Theory, Inference, and Learning Algorithms. Cambridge University Press, Cambridge (2003)MATH Mackay, D.J.C.: Information Theory, Inference, and Learning Algorithms. Cambridge University Press, Cambridge (2003)MATH
12.
Zurück zum Zitat Jefferys, W.H., Berger, J.O.: Ockham’s razor and Bayesian analysis. Am. Sci. 80(1), 64–72 (1992) Jefferys, W.H., Berger, J.O.: Ockham’s razor and Bayesian analysis. Am. Sci. 80(1), 64–72 (1992)
13.
Zurück zum Zitat Loredo, T.J.: From Laplace to SN 1987a: Bayesian inference in astrophysics. In: Maximum Entropy and Bayesian Methods, pp. 81–142. Kluwer Academic, Dordrecht (1990) Loredo, T.J.: From Laplace to SN 1987a: Bayesian inference in astrophysics. In: Maximum Entropy and Bayesian Methods, pp. 81–142. Kluwer Academic, Dordrecht (1990)
14.
Zurück zum Zitat Gull, S.F.: Bayesian inductive inference and maximum entropy. In: Maximum Entropy and Bayesian Methods, pp. 53–74. Kluwer Academic, Dordrecht (1988) Gull, S.F.: Bayesian inductive inference and maximum entropy. In: Maximum Entropy and Bayesian Methods, pp. 53–74. Kluwer Academic, Dordrecht (1988)
15.
Zurück zum Zitat Wolpert, D.H.: On the Bayesian “Occam factors” argument for Occam’s razor. In: Petsche T., et al. (eds.) Computational Learning Theory and Natural Learning Systems III. MIT Press, Cambridge (1995) Wolpert, D.H.: On the Bayesian “Occam factors” argument for Occam’s razor. In: Petsche T., et al. (eds.) Computational Learning Theory and Natural Learning Systems III. MIT Press, Cambridge (1995)
16.
Zurück zum Zitat Li, M., Vitanyi, P.: An Introduction to Kolmogorov Complexity and Its Applications. Springer, Berlin (2008)CrossRef Li, M., Vitanyi, P.: An Introduction to Kolmogorov Complexity and Its Applications. Springer, Berlin (2008)CrossRef
17.
Zurück zum Zitat Lattimore, T., Hutter, M.: No free lunch versus Occam’s razor in supervised learning. In: Algorithmic Probability and Friends. Bayesian Prediction and Artificial Intelligence, pp. 223–235. Springer, Berlin (2013) Lattimore, T., Hutter, M.: No free lunch versus Occam’s razor in supervised learning. In: Algorithmic Probability and Friends. Bayesian Prediction and Artificial Intelligence, pp. 223–235. Springer, Berlin (2013)
18.
Zurück zum Zitat Wolpert, D.H.: The relationship between Occam’s razor and convergent guessing. Complex Syst. 4, 319–368 (1990)MathSciNetMATH Wolpert, D.H.: The relationship between Occam’s razor and convergent guessing. Complex Syst. 4, 319–368 (1990)MathSciNetMATH
19.
Zurück zum Zitat Ermoliev, Y.M., Norkin, V.I.: Monte Carlo optimization and path dependent nonstationary laws of large numbers. Technical Report IR-98-009, International Institute for Applied Systems Analysis, March 1998 Ermoliev, Y.M., Norkin, V.I.: Monte Carlo optimization and path dependent nonstationary laws of large numbers. Technical Report IR-98-009, International Institute for Applied Systems Analysis, March 1998
20.
Zurück zum Zitat Rubinstein, R., Kroese, D.: The Cross-Entropy Method. Springer, Berlin (2004)CrossRef Rubinstein, R., Kroese, D.: The Cross-Entropy Method. Springer, Berlin (2004)CrossRef
21.
Zurück zum Zitat De Bonet, J.S., Isbell, C.L. Jr., Viola, P.: Mimic: Finding optima by estimating probability densities. In: Advances in Neural Information Processing Systems - 9. MIT Press, Cambridge (1997) De Bonet, J.S., Isbell, C.L. Jr., Viola, P.: Mimic: Finding optima by estimating probability densities. In: Advances in Neural Information Processing Systems - 9. MIT Press, Cambridge (1997)
22.
Zurück zum Zitat Rajnarayan, D., Wolpert, D.H.: Exploiting parametric learning to improve black-box optimization. In: Jost, J. (ed.) Proceedings of ECCS 2007 (2007) Rajnarayan, D., Wolpert, D.H.: Exploiting parametric learning to improve black-box optimization. In: Jost, J. (ed.) Proceedings of ECCS 2007 (2007)
23.
Zurück zum Zitat Rajnarayan, D., Wolpert, D.H.: Bias-variance techniques for Monte Carlo optimization: cross-validation for the CE method (2008). arXiv:0810.0877v1 Rajnarayan, D., Wolpert, D.H.: Bias-variance techniques for Monte Carlo optimization: cross-validation for the CE method (2008). arXiv:0810.0877v1
24.
Zurück zum Zitat Wolpert, D.H., Macready, W.: Coevolutionary free lunches. Trans. Evol. Comput. 9, 721–735 (2005)CrossRef Wolpert, D.H., Macready, W.: Coevolutionary free lunches. Trans. Evol. Comput. 9, 721–735 (2005)CrossRef
25.
Metadaten
Titel
What Is Important About the No Free Lunch Theorems?
verfasst von
David H. Wolpert
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-66515-9_13

Premium Partner