nach oben

KI - Künstliche Intelligenz

Erschienen in:

01.11.2015 | Technical Contribution

Beyond Manual Tuning of Hyperparameters

verfasst von: Frank Hutter, Jörg Lücke, Lars Schmidt-Thieme

Erschienen in: KI - Künstliche Intelligenz | Ausgabe 4/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The success of hand-crafted machine learning systems in many applications raises the question of making machine learning algorithms more autonomous, i.e., to reduce the requirement of expert input to a minimum. We discuss two strategies towards this goal: (1) automated optimization of hyperparameters (including mechanisms for feature selection, preprocessing, model selection, etc) and (2) the development of algorithms with reduced sets of hyperparameters. Since many research directions (e.g., deep learning), show a tendency towards increasingly complex algorithms with more and more hyperparamters, the demand for both of these strategies continuously increases. We review recent hyperparameter optimization methods and discuss data-driven approaches to avoid the introduction of hyperparameters using unsupervised learning. We end in discussing how these complementary strategies can work hand-in-hand, representing a very promising approach towards autonomous machine learning.

Vorheriger Artikel Special Issue on Autonomous Learning

Nächster Artikel Autonomous Learning of Representations

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

KI - Künstliche Intelligenz

The Scientific journal "KI – Künstliche Intelligenz" is the official journal of the division for artificial intelligence within the "Gesellschaft für Informatik e.V." (GI) – the German Informatics Society - with constributions from troughout the field of artificial intelligence.

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Adams RP, Wallach HM, Ghahramani Z (2009) Learning the structure of deep sparse graphical models. ArXiv preprint, arXiv:1001.0160

Aha DW (1992) Generalizing from case studies: a case study. In: ML, pp 1–10

Bardenet R, Brendel M, Kégl B, Sebag M (2013) Collaborative hyperparameter tuning. In: Proceeidngs of ICML’13

Bengio Y (2000) Gradient-based optimization of hyperparameters. Neural Comput 12(8):1889–1900CrossRef

Bergstra J, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Proceedings of NIPS’11

Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. JMLR 13:281–305MathSciNetMATH

Bergstra J, Cox D (2013) Hyperparameter optimization and boosting for classifying facial expressions: How good can a “null” model be? ArXiv preprint, arXiv:1306.3476

Bergstra J, Yamins D, Cox DD (2013) Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In: Proceedings of ICML’13

Berkes P, Turner R, Sahani M (2008) On sparsity and overcompleteness in image models. In: Proceedings of NIPS’08, vol 21

10.

Blockeel H (2006) Experiment databases: a novel methodology for experimental research. In: Knowledge discovery in inductive databases, pp 72–85. Springer

11.

Brazdil P, Gama J, Henery B (1994) Characterizing the applicability of classification algorithms using meta-level learning. In: Proceedings of ECML’94, pp 83–102

12.

Brochu E, Cora, V., de Freitas, N (2010) A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. ArXiv preprint, arXiv:1012.2599

13.

Castiello C, Castellano G, Fanelli AM (2005) Meta-data: characterization of input features for meta-learning. In: Modeling decisions for artificial intelligence, pp 457–468. Springer

14.

Ciresan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: Proceedings of CVPR’12, pp 3642–3649. IEEE

15.

Dayan P (1997) Recognition in hierarchical models. In: Foundations of computational mathematics, pp 43–62. Springer

16.

Domhan T, Springenberg T, Hutter F (2014) Extrapolating learning curves of deep neural networks. In: ICML 2014 AutoML Workshop

17.

Eggensperger K, Feurer M, Hutter F, Bergstra J, Snoek J, Hoos H, Leyton-Brown K (2013) Towards an empirical foundation for assessing Bayesian optimization of hyperparameters. In: NIPS workshop on Bayesian Optimization in Theory and Practice

18.

Engels R, Theusinger C (1998) Using a data metric for preprocessing advice for data mining applications. In: Proceedings of ECAI’98, pp 430–434

19.

Fawcett C, Hoos H (2013) Analysing differences between algorithm configurations through ablation. In: Proceedings of MIC’13, pp 123–132

20.

Feurer M, Springenberg T, Hutter F (2015) Initializing Bayesian hyperparameter optimization via meta-learning. In: Proceedings of AAAI’15

21.

Gomes TAF, Prudêncio RBC, Soares C, Rossi ALD (2012) Carvalho, A.C.P.L.F.: combining meta-learning and search techniques to select parameters for support vector machines. Neurocomputing 75(1):3–13CrossRef

22.

Goodfellow I, Courville AC, Bengio Y (2012) Large-scale feature learning with spike-and-slab sparse coding. In: Proceedings of ICML’12

23.

Griffiths TL, Kemp C, Tenenbaum JB (2008) Bayesian models of cognition. In: Sun R (ed) Cambridge Handbook of Computational Psychology. Cambridge University Press, New York, NY, USA

24.

Gross S, Mokbel B, Hammer B, Pinkwart N (2012) Feedback provision strategies in intelligent tutoring systems based on clustered solution spaces. In: Desel J, Haake JM, Spannagel C (eds) DeLFI 2012: Die 10. e-Learning Fachtagung Informatik, pp 27–38. Köllen, Hagen, Germany

25.

Guerra SB, Prudłncio RB, Ludermir TB (2008) Predicting the performance of learning algorithms using support vector machines as meta-regressors. In: Proceedings of ICANN’08, pp 523–532

26.

Guo X, Yang J, Wu C, Wang C, Liang Y (2008) A novel LS-SVMs hyper-parameter selection based on particle swarm optimization. Neurocomputing 71(16):3211–3215CrossRef

27.

Henery RJ (1994) Methods for comparison. In: Michie D, Spiegelhalter DJ, Taylor CC (eds) Machine learning, neural and statistical classification. Ellis Horwood, New York

28.

Hinton GE, Osindero S, Teh Y (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7)

29.

Ho TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24(3):289–300CrossRef

30.

Hutter F, Hoos H, Leyton-Brown K (2014) An efficient approach for assessing hyperparameter importance. In: Proceeding of ICML’14, pp 754–762

31.

Hutter F, Hoos H, Leyton-Brown K, Stützle T (2009) ParamILS: an automatic algorithm configuration framework. JAIR 36(1):267–306MATH

32.

Hutter F, Hoos HH, Leyton-Brown K (2011) Sequential model-based optimization for general algorithm configuration. In: Proceeidngs of LION-5

33.

Hutter F, Hoos HH, Leyton-Brown K (2013) Identifying key algorithm parameters and instance features using forward selection. In: Proceedings of LION-7

34.

Jones DR, Schonlau M, Welch WJ (1998) Efficient global optimization of expensive black box functions. Journal of Global Optim 13:455–492MathSciNetCrossRefMATH

35.

King RD, Feng C, Sutherland A (1995) Statlog: comparison of classification algorithms on large real-world problems. Appl Artif Intell 9(3):289–333CrossRef

36.

Kingma DP, Mohamed S, Rezende DJ, Welling M (2014) Semi-supervised learning with deep generative models. In: Proceedings of NIPS’14, pp 3581–3589

37.

Kulick J, Toussaint M, Lang T, Lopes M (2013) Active learning for teaching a robot grounded relational symbols. In: Proceedings of IJCAI’13

38.

LeCun Y, Bengio Y (1995) Convolutional networks for images, speech, and time series. Handb Brain Theory Neural Netw 3361:310

39.

LeCun Y, Bottou L, Bengio Y, Haffner P (2001) Gradient-based learning applied to document recognition. In: Intelligent Signal Processing, pp 306–351. IEEE Press

40.

LeCun Y, Kavukcuoglu K, Farabet C (2010) Convolutional networks and applications in vision. Proceeidngs of ISCAS’10 pp 253–6 (2010)

41.

Lee TS, Mumford D (2003) Hierarchical Bayesian inference in the visual cortex. J Opt Soc Am A Opt Image Sci Vis 20(7):1434–1448CrossRef

42.

Lemke C, Budka M, Gabrys B (2013) Metalearning: a survey of trends and technologies. Artif. Intell. Rev. pp 1–14

43.

Lücke J, Sahani M (2008) Maximal causes for non-linear component extraction. JMLR 9:1227–67MATH

44.

Maron O, Moore A (1994) Hoeffding races: accelerating model selection search for classification and function approximation. In: Proceeding of NIPS’94, pp 59–66

45.

Martius G, Der R, Ay N (2013) Information driven self-organization of complex robotic behaviors. PLoS One 8(5), e63,400. DOI10.1371/journal.pone.0063400

46.

Mohamed S, Heller K, Ghahramani Z (2012) Evaluating Bayesian and L1 approaches for sparse unsupervised learning. In: Proceedings of ICML’12

47.

Murray I, Adams RP (2010) Slice sampling covariance hyperparameters of latent Gaussian models. In: Proceedings of NIPS’10, pp 1723–1731

48.

Pasemann F (2013) Self-regulating neurons in the sensorimotor loop. In: Rojas I, Joya G, Gabestany J (eds) Advances in Computational Intelligence, vol 7902., Lecture Notes in Computer ScienceSpringer, Berlin Heidelberg, pp 481–491CrossRef

49.

Peng Y, Flach PA, Brazdil P, Soares C (2002) Decision tree-based data characterization for meta-learning. In: ECML/PKDD’02 Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning, pp 111–122

50.

Pfahringer B, Bensusan H, Giraud-Carrier C (2000) Meta-learning by landmarking various learning algorithms. In: Proceedings of ICML’00, pp 743–750

51.

Pinto F, Soares C, Mendes-Moreira J (2014) A framework to decompose and develop metafeatures. In: ECAI 2014 Workshop on Meta-Learning and Algorithm Selection, p 32

52.

Reif M (2012) A comprehensive dataset for evaluating approaches of various meta-learning tasks. In: Proceedings of ICPRAM’12, vol 1, pp 273–276

53.

Reif M, Shafait F, Dengel A (2012) Meta-learning for evolutionary parameter optimization of classifiers. Mach Learn 87(3):357–380MathSciNetCrossRef

54.

Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117CrossRef

55.

Schonlau M, Welch WJ, Jones DR (1998) Global versus local search in constrained optimization of computer models. In: New developments and applications in experimental design, vol 34, pp 11–25. Institute of Mathematical Statistics, Hayward, California

56.

Sheikh AS, Shelton JA, Lücke J (2014) A truncated em approach for spike-and-slab sparse coding. JMLR 15:2653–2687MATH

57.

Sidenbladh H, Black MJ, Fleet DJ (2000) Stochastic tracking of 3d human figures using 2d image motion. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 702–718. Springer

58.

Smith MR, Mitchell L, Giraud-Carrier C, Martinez T (2014) Recommending learning algorithms and their associated hyperparameters. ArXiv preprint, arXiv:1407.1890

59.

Smith MR, White A, Giraud-Carrier C, Martinez T (2014) An easy to use repository for comparing and improving machine learning algorithm usage. ArXiv preprint, arXiv:1405.7292

60.

Smith-Miles K (2009) Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Computing Surveys 41(1), 6:1–6:25

61.

Snoek J, Larochelle H, Adams R (2012) Practical Bayesian optimization of machine learning algorithms. In: Proceedings of NIPS’12

62.

Srinivas N, Krause A, Kakade S, Seeger M (2010) Gaussian process optimization in the bandit setting: No regret and experimental design. In: Proceedings of ICML’10

63.

Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. JMLR 15(1):1929–1958MathSciNetMATH

64.

Swersky K, Duvenaud D, Snoek J, Hutter F, Osborne M (2013) Raiders of the lost architecture: Kernels for Bayesian optimization in conditional parameter spaces. In: NIPS workshop on Bayesian Optimization in theory and practice (BayesOpt’13)

65.

Swersky K, Snoek J, Adams R (2013) Multi-task bayesian optimization. In: Proc. of ICML’13

66.

Swersky K, Snoek J, Prescott Adams R (2014) Freeze-Thaw Bayesian Optimization. ArXiv, arXiv:1406.3896

67.

Thornton C, Hutter F, Hoos HH, Leyton-Brown K (2013) Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of KDD’13

68.

Vanschoren J, Blockeel H, Pfahringer B, Holmes G (2012) Experiment databases: a new way to share, organize and learn from experiments. Machine Learning 87(2):127–158MathSciNetCrossRefMATH

69.

Vilalta R, Drissi Y (2002) A perspective view and survey of meta-learning. Artif Intell Rev 18(2):77–95CrossRef

70.

Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. JMLR 11:3371–3408MathSciNetMATH

71.

Wager S, Wang S, Liang PS (2013) Dropout training as adaptive regularization. In: Proceedings of NIPS’13, pp 351–359

72.

Weng P, Busa-Fekete R, Hüllermeier E (2013) Interactive Q-learning with ordinal rewards and unreliable tutor. In: Proceedings ECML/PKDD Workshop on Reinforcement learning from Generalized Feedback: Beyond Numerical Rewards

73.

Yogatama D, Mann G (2014) Efficient transfer learning method for automatic hyperparameter tuning. In: Proceedings of AISTATS’14, pp 1077–1085

Titel: Beyond Manual Tuning of Hyperparameters
verfasst von: Frank Hutter
Jörg Lücke
Lars Schmidt-Thieme
Publikationsdatum: 01.11.2015
Verlag: Springer Berlin Heidelberg
Erschienen in: KI - Künstliche Intelligenz / Ausgabe 4/2015
Print ISSN: 0933-1875
Elektronische ISSN: 1610-1987
DOI: https://doi.org/10.1007/s13218-015-0381-0

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

KI - Künstliche Intelligenz

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 4/2015

Autonomous Learning of State Representations for Control: An Emerging Field Aims to Autonomously Learn State Representations for Reinforcement Learning Agents from Their Real-World Sensor Observations

Special Issue on Autonomous Learning

Interview with Werner von Seelen

Online Learning of Bipedal Walking Stabilization

Pleased to Meet You!

Consumers’ Perception of Augmented Reality as an Emerging end User Technology: Social Media Monitoring Applied