Top

The Journal of Supercomputing

Published in:

09-01-2018

An LP-based hyperparameter optimization model for language modeling

Authors: Amir Hossein Akhavan Rahnama, Mehdi Toloo, Nezer Jacob Zaidenberg

Published in: The Journal of Supercomputing | Issue 5/2018

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In order to find hyperparameters for a machine learning model, algorithms such as grid search or random search are used over the space of possible values of the models’ hyperparameters. These search algorithms opt the solution that minimizes a specific cost function. In language models, perplexity is one of the most popular cost functions. In this study, we propose a fractional nonlinear programming model that finds the optimal perplexity value. The special structure of the model allows us to approximate it by a linear programming model that can be solved using the well-known simplex algorithm. To the best of our knowledge, this is the first attempt to use optimization techniques to find perplexity values in the language modeling literature. We apply our model to find hyperparameters of a language model and compare it to the grid search algorithm. Furthermore, we illustrate that it results in lower perplexity values. We perform this experiment on a real-world dataset from SwiftKey to validate our proposed approach.

previous article Preemptive cloud resource allocation modeling of processing jobs

next article Energy performance of heuristics and meta-heuristics for real-time joint resource scaling and consolidation in virtualized networked data centers

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

SwiftKey is an input method that employs various artificial intelligence technologies to predict the next word the user intends to type. For an available free download, visit: https://swiftkey.com/en.

Bazaraa MS, Jarvis JJ, Sherali HD (2010) Linear programming and network flows, 4th edn. Wiley, HobokenMATH

Bazaraa MS, Sherali HD, Shetty CM (2013) Nonlinear programming: theory and algorithms. Wiley, HobokenMATH

Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155MATH

Bergstra J, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Advances in neural information processing systems

Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(1):281–305MathSciNetMATH

Bousquet O, Boucheron S, Lugosi G (2004) Introduction to statistical learning theory. In: Advanced lectures on machine learning. Springer, Berlin, pp 169–207

Boyd SP, Vandenberghe L (2004) Convex optimization. Cambridge University Press, CambridgeCrossRefMATH

Brants T, Brants T, Popat AC, Xu P, Och FJ, Dean J (2007) Large language models in machine translation. In: The Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp 858–867

Brown PF, Desouza PV, Mercer RL, Pietra VJD, Lai JC (1984) Class based n-gram models of natural language. In: Computational linguistics, vol 18. Association for Computational Linguistics

10.

Charnes A, Cooper WW (1962) Programming with linear fractional functionals. Naval Res Logist Q 9:181–186MathSciNetCrossRefMATH

11.

Chen SF, Rosenfeld R (1999) A Gaussian prior for smoothing maximum entropy models. School of Computer Science, Carnegie Mellon University

12.

Goodman J (2001) A bit of progress in language modeling. Technical Report

13.

Halevy A, Norvig P, Pereira F (2009) The unreasonable effectiveness of data. IEEE Intell Syst 24(2):8–12CrossRef

14.

Hillier FS, Lieberman GJ (2001) Introduction to operations research. McGraw-Hill, New YorkMATH

15.

Mikolov T, Karafiát M, Burget L, Khudanpur S (2010) Recurrent neural network based language model. Interspeech 2:3

16.

Samuel AL (1959) Some studies in machine learning using the game of checkers. IBM J Res Dev 3(3):206–226MathSciNetCrossRef

17.

Vose D (2008) Risk analysis: a quantitative guide, 3rd edn. Wiley, HobokenMATH

Title: An LP-based hyperparameter optimization model for language modeling
Authors: Amir Hossein Akhavan Rahnama
Mehdi Toloo
Nezer Jacob Zaidenberg
Publication date: 09-01-2018
Publisher: Springer US
Published in: The Journal of Supercomputing / Issue 5/2018
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-018-2236-6

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 5/2018

A survey of graph processing on graphics processing units

HYSTERY: a hybrid scheduling and mapping approach to optimize temperature, energy consumption and lifetime reliability of heterogeneous multiprocessor systems

An improved low-cost yoking proof protocol based on Kazahaya’s flaws

Preemptive cloud resource allocation modeling of processing jobs

Analysis of large deviations behavior of multi-GPU memory access in deep learning

Design of an accurate and high-speed binocular pupil tracking system based on GPGPUs

Premium Partner