Top

Neural Computing and Applications

Published in:

10-01-2018 | Original Article

Character-level recurrent neural networks in practice: comparing training and sampling schemes

Authors: Cedric De Boom, Thomas Demeester, Bart Dhoedt

Published in: Neural Computing and Applications | Issue 8/2019

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Recurrent neural networks are nowadays successfully used in an abundance of applications, going from text, speech and image processing to recommender systems. Backpropagation through time is the algorithm that is commonly used to train these networks on specific tasks. Many deep learning frameworks have their own implementation of training and sampling procedures for recurrent neural networks, while there are in fact multiple other possibilities to choose from and other parameters to tune. In the existing literature, this is very often overlooked or ignored. In this paper, we therefore give an overview of possible training and sampling schemes for character-level recurrent neural networks to solve the task of predicting the next token in a given sequence. We test these different schemes on a variety of datasets, neural network architectures and parameter settings, and formulate a number of take-home recommendations. The choice of training and sampling scheme turns out to be subject to a number of trade-offs, such as training stability, sampling time, model performance and implementation effort, but is largely independent of the data. Perhaps the most surprising result is that transferring hidden states for correctly initializing the model on subsequences often leads to unstable training behavior depending on the dataset.

previous article A novel VIKOR method with an application to multiple criteria decision analysis for hospital-based post-acute care within a highly complex uncertain environment

next article Fault diagnosis on wireless sensor network using the neighborhood kernel density estimation

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

The datasets are available for download at https://github.com/cedricdeboom/character-level-rnn-datasets.

www.gutenberg.org.

github.com/torvalds/linux/tree/master/kernel.

www.classicalarchives.com.

Bradbury J, Merity S, Xiong C, Socher R (2016) Quasi-recurrent neural networks. arXiv:1611.01576

Cho K, van Merrienboer B, Gülçehre Ç, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. arXiv:1406.1078

Chung J, Gülçehre Ç, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555

Chung J, Ahn S, Bengio Y (2016) Hierarchical multiscale recurrent neural networks. arXiv:1609.01704

Cooijmans T, Ballas N, Laurent C, Courville A (2016) Recurrent batch normalization. arXiv:1603.09025

De Boom C, Agrawal R, Hansen S, Kumar E, Yon R, Chen CW, Demeester T, Dhoedt B (2017) Large-scale user modeling with recurrent neural networks for music discovery on multiple time scales. arXiv:1708.06520

Gal Y, Ghahramani Z (2016) A theoretically grounded application of dropout in recurrent neural networks. NIPS. arXiv:1512.05287

Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, LondonMATH

Graves A (2013) Generating sequences with recurrent neural networks. arXiv:1308.0850

10.

Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2015) LSTM: a search space Odyssey. arXiv:1503.04069

11.

Gregor K, Danihelka I, Mnih A, Blundell C, Wierstra D (2013) Deep autoregressive networks. arXiv:1310.8499

12.

Gregor K, Danihelka I, Graves A, Wierstra D (2015) DRAW: a recurrent neural network for image generation. arXiv:1502.04623

13.

Ha D, Dai A, Le QV (2016) Hypernetworks. arXiv:1609.09106

14.

Hidasi B, Karatzoglou A, Baltrunas L, Tikk D (2016) Session-based recommendations with recurrent neural networks. arXiv:1511.06939

15.

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780CrossRef

16.

Hochreiter S, Bengio Y, Frasconi P, Schmidhuber J (2001) Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: A Field Guide to Dynamical Recurrent Neural Networks, IEEE Press

17.

Hutter M (2012) The human knowledge compression contest

18.

Inan H, Khosravi K, Socher R (2016) Tying word vectors and word classifiers—a loss framework for language modeling. arXiv:1611.01462

19.

Karpathy A, Johnson J, Fei-Fei L (2015) Visualizing and understanding recurrent networks. arXiv:1506.02078

20.

Kim Y, Jernite Y, Sontag D, Rush AM (2015) Character-aware neural language models. arXiv:1508.06615

21.

Kingma D, Ba J (2015) Adam: a method for stochastic optimization. In: ICLR. arXiv:1412.6980

22.

Krause B, Lu L, Murray I, Renals S (2016) Multiplicative LSTM for sequence modelling. arXiv:1609.07959

23.

Marcus M, Santorini B, Marcinkiewicz MA (1993) Building a large annotated corpus of English: the Penn Treebank. Comput Linguist 19:313–330

24.

Melis G, Dyer C, Blunsom P (2017) On the state of the art of evaluation in neural language models. arXiv:1707.05589

25.

Merity S, Xiong C, Bradbury J, Socher R (2016) Pointer sentinel mixture models. arXiv:1609.07843

26.

Merity S, Keskar NS, Socher R (2017) Regularizing and optimizing LSTM language models. arXiv:1708.02182

27.

Mikolov T, Zweig G (2012) Context dependent recurrent neural network language model. In: 2012 IEEE spoken language technology workshop (SLTW)

28.

Mikolov T, Karafiát M, Burget L, Cernocky J, Khudanpur S (2010) Recurrent neural network based language model. In: Interspeech

29.

Mujika A, Meier F, Steger A (2017) Fast–slow recurrent neural networks. arXiv:1705.08639

30.

Oord A, Kalchbrenner N, Kavukcuoglu K (2016) Pixel recurrent neural networks. arXiv:1601.06759

31.

Rumelhart DE, Hinton GE, Williams RJ (1988) Learning representations by back-propagating errors. Cogn Model 5(3):1MATH

32.

Saon G, Sercu T, Rennie S, Kuo HKJ (2016) The IBM 2016 English conversational telephone speech recognition system. arXiv:1505.05899

33.

Sercu T, Goel V (2016) Advances in very deep convolutional neural networks for LVCSR. In: Interspeech. arXiv:1604.01792

34.

Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout— a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958MathSciNetMATH

35.

Sturm BL, Santos JF, Ben-Tal O, Korshunova I (2016) Music transcription modelling and composition using deep learning. arXiv:1604.08723

36.

Sutskever I (2013) Training recurrent neural networks. Ph.D. thesis

37.

Tan YK, Xu X, Liu Y (2016) Improved recurrent neural networks for session-based recommendations. arXiv:1606.08117

38.

Van Den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) WaveNet: a generative model for raw audio. arXiv:1609.03499

39.

Wu Y, Zhang S, Zhang Y, Bengio Y, Salakhutdinov R (2016) On multiplicative integration with recurrent neural networks. arXiv:1606.06630

40.

Yang Z, Dai Z, Salakhutdinov R, Cohen WW (2017) Breaking the softmax bottleneck: a high-rank RNN language model. arXiv:1711.03953

41.

Zaremba W, Sutskever I, Vinyals O (2014) Recurrent neural network regularization. arXiv:1409.2329

42.

Zilly JG, Srivastava RK, Koutník J, Schmidhuber J (2017) Recurrent highway networks. arXiv:1607.03474

43.

Zoph B, Le QV (2016) Neural architecture search with reinforcement learning. arXiv:1611.01578

Title: Character-level recurrent neural networks in practice: comparing training and sampling schemes
Authors: Cedric De Boom
Thomas Demeester
Bart Dhoedt
Publication date: 10-01-2018
Publisher: Springer London
Published in: Neural Computing and Applications / Issue 8/2019
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-017-3322-z

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 8/2019

Face recognition using AMVP and WSRC under variable illumination and pose

A gene expression programming approach for thermodynamic properties of working fluids used on Organic Rankine Cycle

Tourism scene classification based on multi-stage transfer learning model

Fault isolation in manufacturing systems based on learning algorithm and fuzzy rule selection

Efficient feature selection and classification algorithm based on PSO and rough sets

A convolutional neural network model for semantic segmentation of mitotic events in microscopy images

Premium Partner