Skip to main content
Top
Published in: The Journal of Supercomputing 5/2018

09-01-2018

An LP-based hyperparameter optimization model for language modeling

Authors: Amir Hossein Akhavan Rahnama, Mehdi Toloo, Nezer Jacob Zaidenberg

Published in: The Journal of Supercomputing | Issue 5/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In order to find hyperparameters for a machine learning model, algorithms such as grid search or random search are used over the space of possible values of the models’ hyperparameters. These search algorithms opt the solution that minimizes a specific cost function. In language models, perplexity is one of the most popular cost functions. In this study, we propose a fractional nonlinear programming model that finds the optimal perplexity value. The special structure of the model allows us to approximate it by a linear programming model that can be solved using the well-known simplex algorithm. To the best of our knowledge, this is the first attempt to use optimization techniques to find perplexity values in the language modeling literature. We apply our model to find hyperparameters of a language model and compare it to the grid search algorithm. Furthermore, we illustrate that it results in lower perplexity values. We perform this experiment on a real-world dataset from SwiftKey to validate our proposed approach.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Footnotes
1
SwiftKey is an input method that employs various artificial intelligence technologies to predict the next word the user intends to type. For an available free download, visit: https://​swiftkey.​com/​en.
 
Literature
1.
go back to reference Bazaraa MS, Jarvis JJ, Sherali HD (2010) Linear programming and network flows, 4th edn. Wiley, HobokenMATH Bazaraa MS, Jarvis JJ, Sherali HD (2010) Linear programming and network flows, 4th edn. Wiley, HobokenMATH
2.
go back to reference Bazaraa MS, Sherali HD, Shetty CM (2013) Nonlinear programming: theory and algorithms. Wiley, HobokenMATH Bazaraa MS, Sherali HD, Shetty CM (2013) Nonlinear programming: theory and algorithms. Wiley, HobokenMATH
3.
go back to reference Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155MATH Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155MATH
4.
go back to reference Bergstra J, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Advances in neural information processing systems Bergstra J, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Advances in neural information processing systems
5.
go back to reference Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(1):281–305MathSciNetMATH Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(1):281–305MathSciNetMATH
6.
go back to reference Bousquet O, Boucheron S, Lugosi G (2004) Introduction to statistical learning theory. In: Advanced lectures on machine learning. Springer, Berlin, pp 169–207 Bousquet O, Boucheron S, Lugosi G (2004) Introduction to statistical learning theory. In: Advanced lectures on machine learning. Springer, Berlin, pp 169–207
7.
8.
go back to reference Brants T, Brants T, Popat AC, Xu P, Och FJ, Dean J (2007) Large language models in machine translation. In: The Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp 858–867 Brants T, Brants T, Popat AC, Xu P, Och FJ, Dean J (2007) Large language models in machine translation. In: The Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp 858–867
9.
go back to reference Brown PF, Desouza PV, Mercer RL, Pietra VJD, Lai JC (1984) Class based n-gram models of natural language. In: Computational linguistics, vol 18. Association for Computational Linguistics Brown PF, Desouza PV, Mercer RL, Pietra VJD, Lai JC (1984) Class based n-gram models of natural language. In: Computational linguistics, vol 18. Association for Computational Linguistics
11.
go back to reference Chen SF, Rosenfeld R (1999) A Gaussian prior for smoothing maximum entropy models. School of Computer Science, Carnegie Mellon University Chen SF, Rosenfeld R (1999) A Gaussian prior for smoothing maximum entropy models. School of Computer Science, Carnegie Mellon University
12.
go back to reference Goodman J (2001) A bit of progress in language modeling. Technical Report Goodman J (2001) A bit of progress in language modeling. Technical Report
13.
go back to reference Halevy A, Norvig P, Pereira F (2009) The unreasonable effectiveness of data. IEEE Intell Syst 24(2):8–12CrossRef Halevy A, Norvig P, Pereira F (2009) The unreasonable effectiveness of data. IEEE Intell Syst 24(2):8–12CrossRef
14.
go back to reference Hillier FS, Lieberman GJ (2001) Introduction to operations research. McGraw-Hill, New YorkMATH Hillier FS, Lieberman GJ (2001) Introduction to operations research. McGraw-Hill, New YorkMATH
15.
go back to reference Mikolov T, Karafiát M, Burget L, Khudanpur S (2010) Recurrent neural network based language model. Interspeech 2:3 Mikolov T, Karafiát M, Burget L, Khudanpur S (2010) Recurrent neural network based language model. Interspeech 2:3
16.
17.
go back to reference Vose D (2008) Risk analysis: a quantitative guide, 3rd edn. Wiley, HobokenMATH Vose D (2008) Risk analysis: a quantitative guide, 3rd edn. Wiley, HobokenMATH
Metadata
Title
An LP-based hyperparameter optimization model for language modeling
Authors
Amir Hossein Akhavan Rahnama
Mehdi Toloo
Nezer Jacob Zaidenberg
Publication date
09-01-2018
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 5/2018
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-018-2236-6

Other articles of this Issue 5/2018

The Journal of Supercomputing 5/2018 Go to the issue

Premium Partner