Skip to main content
Erschienen in: Neural Computing and Applications 8/2019

10.01.2018 | Original Article

Character-level recurrent neural networks in practice: comparing training and sampling schemes

verfasst von: Cedric De Boom, Thomas Demeester, Bart Dhoedt

Erschienen in: Neural Computing and Applications | Ausgabe 8/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Recurrent neural networks are nowadays successfully used in an abundance of applications, going from text, speech and image processing to recommender systems. Backpropagation through time is the algorithm that is commonly used to train these networks on specific tasks. Many deep learning frameworks have their own implementation of training and sampling procedures for recurrent neural networks, while there are in fact multiple other possibilities to choose from and other parameters to tune. In the existing literature, this is very often overlooked or ignored. In this paper, we therefore give an overview of possible training and sampling schemes for character-level recurrent neural networks to solve the task of predicting the next token in a given sequence. We test these different schemes on a variety of datasets, neural network architectures and parameter settings, and formulate a number of take-home recommendations. The choice of training and sampling scheme turns out to be subject to a number of trade-offs, such as training stability, sampling time, model performance and implementation effort, but is largely independent of the data. Perhaps the most surprising result is that transferring hidden states for correctly initializing the model on subsequences often leads to unstable training behavior depending on the dataset.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
3
github.com/torvalds/linux/tree/master/kernel.
 
Literatur
2.
Zurück zum Zitat Cho K, van Merrienboer B, Gülçehre Ç, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. arXiv:1406.1078 Cho K, van Merrienboer B, Gülçehre Ç, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. arXiv:​1406.​1078
3.
Zurück zum Zitat Chung J, Gülçehre Ç, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555 Chung J, Gülçehre Ç, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:​1412.​3555
6.
Zurück zum Zitat De Boom C, Agrawal R, Hansen S, Kumar E, Yon R, Chen CW, Demeester T, Dhoedt B (2017) Large-scale user modeling with recurrent neural networks for music discovery on multiple time scales. arXiv:1708.06520 De Boom C, Agrawal R, Hansen S, Kumar E, Yon R, Chen CW, Demeester T, Dhoedt B (2017) Large-scale user modeling with recurrent neural networks for music discovery on multiple time scales. arXiv:​1708.​06520
7.
8.
Zurück zum Zitat Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, LondonMATH Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, LondonMATH
10.
12.
14.
Zurück zum Zitat Hidasi B, Karatzoglou A, Baltrunas L, Tikk D (2016) Session-based recommendations with recurrent neural networks. arXiv:1511.06939 Hidasi B, Karatzoglou A, Baltrunas L, Tikk D (2016) Session-based recommendations with recurrent neural networks. arXiv:​1511.​06939
15.
Zurück zum Zitat Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780CrossRef Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780CrossRef
16.
Zurück zum Zitat Hochreiter S, Bengio Y, Frasconi P, Schmidhuber J (2001) Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: A Field Guide to Dynamical Recurrent Neural Networks, IEEE Press Hochreiter S, Bengio Y, Frasconi P, Schmidhuber J (2001) Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: A Field Guide to Dynamical Recurrent Neural Networks, IEEE Press
17.
Zurück zum Zitat Hutter M (2012) The human knowledge compression contest Hutter M (2012) The human knowledge compression contest
18.
Zurück zum Zitat Inan H, Khosravi K, Socher R (2016) Tying word vectors and word classifiers—a loss framework for language modeling. arXiv:1611.01462 Inan H, Khosravi K, Socher R (2016) Tying word vectors and word classifiers—a loss framework for language modeling. arXiv:​1611.​01462
23.
Zurück zum Zitat Marcus M, Santorini B, Marcinkiewicz MA (1993) Building a large annotated corpus of English: the Penn Treebank. Comput Linguist 19:313–330 Marcus M, Santorini B, Marcinkiewicz MA (1993) Building a large annotated corpus of English: the Penn Treebank. Comput Linguist 19:313–330
27.
Zurück zum Zitat Mikolov T, Zweig G (2012) Context dependent recurrent neural network language model. In: 2012 IEEE spoken language technology workshop (SLTW) Mikolov T, Zweig G (2012) Context dependent recurrent neural network language model. In: 2012 IEEE spoken language technology workshop (SLTW)
28.
Zurück zum Zitat Mikolov T, Karafiát M, Burget L, Cernocky J, Khudanpur S (2010) Recurrent neural network based language model. In: Interspeech Mikolov T, Karafiát M, Burget L, Cernocky J, Khudanpur S (2010) Recurrent neural network based language model. In: Interspeech
31.
Zurück zum Zitat Rumelhart DE, Hinton GE, Williams RJ (1988) Learning representations by back-propagating errors. Cogn Model 5(3):1MATH Rumelhart DE, Hinton GE, Williams RJ (1988) Learning representations by back-propagating errors. Cogn Model 5(3):1MATH
32.
Zurück zum Zitat Saon G, Sercu T, Rennie S, Kuo HKJ (2016) The IBM 2016 English conversational telephone speech recognition system. arXiv:1505.05899 Saon G, Sercu T, Rennie S, Kuo HKJ (2016) The IBM 2016 English conversational telephone speech recognition system. arXiv:​1505.​05899
33.
34.
Zurück zum Zitat Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout— a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958MathSciNetMATH Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout— a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958MathSciNetMATH
35.
Zurück zum Zitat Sturm BL, Santos JF, Ben-Tal O, Korshunova I (2016) Music transcription modelling and composition using deep learning. arXiv:1604.08723 Sturm BL, Santos JF, Ben-Tal O, Korshunova I (2016) Music transcription modelling and composition using deep learning. arXiv:​1604.​08723
36.
Zurück zum Zitat Sutskever I (2013) Training recurrent neural networks. Ph.D. thesis Sutskever I (2013) Training recurrent neural networks. Ph.D. thesis
38.
Zurück zum Zitat Van Den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) WaveNet: a generative model for raw audio. arXiv:1609.03499 Van Den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) WaveNet: a generative model for raw audio. arXiv:​1609.​03499
39.
Zurück zum Zitat Wu Y, Zhang S, Zhang Y, Bengio Y, Salakhutdinov R (2016) On multiplicative integration with recurrent neural networks. arXiv:1606.06630 Wu Y, Zhang S, Zhang Y, Bengio Y, Salakhutdinov R (2016) On multiplicative integration with recurrent neural networks. arXiv:​1606.​06630
40.
Zurück zum Zitat Yang Z, Dai Z, Salakhutdinov R, Cohen WW (2017) Breaking the softmax bottleneck: a high-rank RNN language model. arXiv:1711.03953 Yang Z, Dai Z, Salakhutdinov R, Cohen WW (2017) Breaking the softmax bottleneck: a high-rank RNN language model. arXiv:​1711.​03953
Metadaten
Titel
Character-level recurrent neural networks in practice: comparing training and sampling schemes
verfasst von
Cedric De Boom
Thomas Demeester
Bart Dhoedt
Publikationsdatum
10.01.2018
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 8/2019
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-017-3322-z

Weitere Artikel der Ausgabe 8/2019

Neural Computing and Applications 8/2019 Zur Ausgabe