nach oben

Neural Computing and Applications

Erschienen in:

10.01.2018 | Original Article

Character-level recurrent neural networks in practice: comparing training and sampling schemes

verfasst von: Cedric De Boom, Thomas Demeester, Bart Dhoedt

Erschienen in: Neural Computing and Applications | Ausgabe 8/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Recurrent neural networks are nowadays successfully used in an abundance of applications, going from text, speech and image processing to recommender systems. Backpropagation through time is the algorithm that is commonly used to train these networks on specific tasks. Many deep learning frameworks have their own implementation of training and sampling procedures for recurrent neural networks, while there are in fact multiple other possibilities to choose from and other parameters to tune. In the existing literature, this is very often overlooked or ignored. In this paper, we therefore give an overview of possible training and sampling schemes for character-level recurrent neural networks to solve the task of predicting the next token in a given sequence. We test these different schemes on a variety of datasets, neural network architectures and parameter settings, and formulate a number of take-home recommendations. The choice of training and sampling scheme turns out to be subject to a number of trade-offs, such as training stability, sampling time, model performance and implementation effort, but is largely independent of the data. Perhaps the most surprising result is that transferring hidden states for correctly initializing the model on subsequences often leads to unstable training behavior depending on the dataset.

Vorheriger Artikel A novel VIKOR method with an application to multiple criteria decision analysis for hospital-based post-acute care within a highly complex uncertain environment

Nächster Artikel Fault diagnosis on wireless sensor network using the neighborhood kernel density estimation

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

The datasets are available for download at https://github.com/cedricdeboom/character-level-rnn-datasets.

www.gutenberg.org.

github.com/torvalds/linux/tree/master/kernel.

www.classicalarchives.com.

Bradbury J, Merity S, Xiong C, Socher R (2016) Quasi-recurrent neural networks. arXiv:1611.01576

Cho K, van Merrienboer B, Gülçehre Ç, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. arXiv:1406.1078

Chung J, Gülçehre Ç, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555

Chung J, Ahn S, Bengio Y (2016) Hierarchical multiscale recurrent neural networks. arXiv:1609.01704

Cooijmans T, Ballas N, Laurent C, Courville A (2016) Recurrent batch normalization. arXiv:1603.09025

De Boom C, Agrawal R, Hansen S, Kumar E, Yon R, Chen CW, Demeester T, Dhoedt B (2017) Large-scale user modeling with recurrent neural networks for music discovery on multiple time scales. arXiv:1708.06520

Gal Y, Ghahramani Z (2016) A theoretically grounded application of dropout in recurrent neural networks. NIPS. arXiv:1512.05287

Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, LondonMATH

Graves A (2013) Generating sequences with recurrent neural networks. arXiv:1308.0850

10.

Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2015) LSTM: a search space Odyssey. arXiv:1503.04069

11.

Gregor K, Danihelka I, Mnih A, Blundell C, Wierstra D (2013) Deep autoregressive networks. arXiv:1310.8499

12.

Gregor K, Danihelka I, Graves A, Wierstra D (2015) DRAW: a recurrent neural network for image generation. arXiv:1502.04623

13.

Ha D, Dai A, Le QV (2016) Hypernetworks. arXiv:1609.09106

14.

Hidasi B, Karatzoglou A, Baltrunas L, Tikk D (2016) Session-based recommendations with recurrent neural networks. arXiv:1511.06939

15.

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780CrossRef

16.

Hochreiter S, Bengio Y, Frasconi P, Schmidhuber J (2001) Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: A Field Guide to Dynamical Recurrent Neural Networks, IEEE Press

17.

Hutter M (2012) The human knowledge compression contest

18.

Inan H, Khosravi K, Socher R (2016) Tying word vectors and word classifiers—a loss framework for language modeling. arXiv:1611.01462

19.

Karpathy A, Johnson J, Fei-Fei L (2015) Visualizing and understanding recurrent networks. arXiv:1506.02078

20.

Kim Y, Jernite Y, Sontag D, Rush AM (2015) Character-aware neural language models. arXiv:1508.06615

21.

Kingma D, Ba J (2015) Adam: a method for stochastic optimization. In: ICLR. arXiv:1412.6980

22.

Krause B, Lu L, Murray I, Renals S (2016) Multiplicative LSTM for sequence modelling. arXiv:1609.07959

23.

Marcus M, Santorini B, Marcinkiewicz MA (1993) Building a large annotated corpus of English: the Penn Treebank. Comput Linguist 19:313–330

24.

Melis G, Dyer C, Blunsom P (2017) On the state of the art of evaluation in neural language models. arXiv:1707.05589

25.

Merity S, Xiong C, Bradbury J, Socher R (2016) Pointer sentinel mixture models. arXiv:1609.07843

26.

Merity S, Keskar NS, Socher R (2017) Regularizing and optimizing LSTM language models. arXiv:1708.02182

27.

Mikolov T, Zweig G (2012) Context dependent recurrent neural network language model. In: 2012 IEEE spoken language technology workshop (SLTW)

28.

Mikolov T, Karafiát M, Burget L, Cernocky J, Khudanpur S (2010) Recurrent neural network based language model. In: Interspeech

29.

Mujika A, Meier F, Steger A (2017) Fast–slow recurrent neural networks. arXiv:1705.08639

30.

Oord A, Kalchbrenner N, Kavukcuoglu K (2016) Pixel recurrent neural networks. arXiv:1601.06759

31.

Rumelhart DE, Hinton GE, Williams RJ (1988) Learning representations by back-propagating errors. Cogn Model 5(3):1MATH

32.

Saon G, Sercu T, Rennie S, Kuo HKJ (2016) The IBM 2016 English conversational telephone speech recognition system. arXiv:1505.05899

33.

Sercu T, Goel V (2016) Advances in very deep convolutional neural networks for LVCSR. In: Interspeech. arXiv:1604.01792

34.

Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout— a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958MathSciNetMATH

35.

Sturm BL, Santos JF, Ben-Tal O, Korshunova I (2016) Music transcription modelling and composition using deep learning. arXiv:1604.08723

36.

Sutskever I (2013) Training recurrent neural networks. Ph.D. thesis

37.

Tan YK, Xu X, Liu Y (2016) Improved recurrent neural networks for session-based recommendations. arXiv:1606.08117

38.

Van Den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) WaveNet: a generative model for raw audio. arXiv:1609.03499

39.

Wu Y, Zhang S, Zhang Y, Bengio Y, Salakhutdinov R (2016) On multiplicative integration with recurrent neural networks. arXiv:1606.06630

40.

Yang Z, Dai Z, Salakhutdinov R, Cohen WW (2017) Breaking the softmax bottleneck: a high-rank RNN language model. arXiv:1711.03953

41.

Zaremba W, Sutskever I, Vinyals O (2014) Recurrent neural network regularization. arXiv:1409.2329

42.

Zilly JG, Srivastava RK, Koutník J, Schmidhuber J (2017) Recurrent highway networks. arXiv:1607.03474

43.

Zoph B, Le QV (2016) Neural architecture search with reinforcement learning. arXiv:1611.01578

Titel: Character-level recurrent neural networks in practice: comparing training and sampling schemes
verfasst von: Cedric De Boom
Thomas Demeester
Bart Dhoedt
Publikationsdatum: 10.01.2018
Verlag: Springer London
Erschienen in: Neural Computing and Applications / Ausgabe 8/2019
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-017-3322-z

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 8/2019

Pullback attractor for neutral Hopfield neural networks with time delay in the leakage term and mixed time delays

RBF-ARX model-based two-stage scheduling RPC for dynamic systems with bounded disturbance

Asymptotic and finite-time synchronization of memristor-based switching networks with multi-links and impulsive perturbation

A gene expression programming approach for thermodynamic properties of working fluids used on Organic Rankine Cycle

Detection of phishing websites using an efficient feature-based machine learning framework

Automatic clustering and feature selection using gravitational search algorithm and its application to microarray data analysis