nach oben

Erschienen in:

2015 | OriginalPaper | Buchkapitel

6. Deep Dynamic Models for Learning Hidden Representations of Speech Features

verfasst von : Li Deng, Roberto Togneri

Erschienen in: Speech and Audio Processing for Coding, Enhancement and Recognition

Verlag: Springer New York

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Deep hierarchical structure with multiple layers of hidden space in human speech is intrinsically connected to its dynamic characteristics manifested in all levels of speech production and perception. The desire and an attempt to capitalize on a (superficial) understanding of this deep speech structure helped ignite the recent surge of interest in the deep learning approach to speech recognition and related applications, and a more thorough understanding of the deep structure of speech dynamics and the related computational representations is expected to further advance the research progress in speech technology. In this chapter, we first survey a series of studies on representing speech in a hidden space using dynamic systems and recurrent neural networks, emphasizing different ways of learning the model parameters and subsequently the hidden feature representations of time-varying speech data. We analyze and summarize this rich set of deep, dynamic speech models into two major categories: (1) top-down, generative models adopting localist representations of speech classes and features in the hidden space; and (2) bottom-up, discriminative models adopting distributed representations. With detailed examinations of and comparisons between these two types of models, we focus on the localist versus distributed representations as their respective hallmarks and defining characteristics. Future directions are discussed and analyzed about potential strategies to leverage the strengths of both the localist and distributed representations while overcoming their respective weaknesses, beyond blind integration of the two by using the generative model to pre-train the discriminative one as a popular method of training deep neural networks.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Ensemble Learning Approaches in Speech Recognition

Nächstes Kapitel Speech Based Emotion Recognition

A. Acero, L. Deng, T. Kristjansson, J. Zhang, HMM adaptation using vector taylor series for noisy speech recognition, in Proceedings of International Conference on Spoken Language Processing (2000), pp. 869–872

J. Baker, Stochastic modeling for automatic speech recognition, in Speech Recognition, ed. by D. Reddy (Academic, New York, 1976)

J. Baker, L. Deng, J. Glass, S. Khudanpur, C.-H. Lee, N. Morgan, D. O’Shgughnessy, Research developments and directions in speech recognition and understanding, part i. IEEE Signal Process. Mag. 26(3), 75–80 (2009)CrossRef

J. Baker, L. Deng, J. Glass, S. Khudanpur, C.-H. Lee, N. Morgan, D. O’Shgughnessy, Updated MINDS report on speech recognition and understanding. IEEE Signal Process. Mag. 26(4), 78–85 (2009)CrossRef

L. Baum, T. Petrie, Statistical inference for probabilistic functions of finite state Markov chains. Ann. Math. Stat. 37(6), 1554–1563 (1966)MathSciNetCrossRefMATH

Y. Bengio, N. Boulanger, R. Pascanu, Advances in optimizing recurrent networks, in Proceedings of ICASSP, Vancouver, 2013

Y. Bengio, N. Boulanger-Lewandowski, R. Pascanu, Advances in optimizing recurrent networks, in Proceedings of ICASSP, Vancouver, 2013

J. Bilmes, Buried markov models: a graphical modeling approach to automatic speech recognition. Comput. Speech Lang. 17, 213–231 (2003)CrossRef

J. Bilmes, What HMMs can do. IEICE Trans. Inf. Syst. E89-D(3), 869–891 (2006)CrossRef

10.

M. Boden, A guide to recurrent neural networks and backpropagation. Tech. rep., T2002:03, SICS (2002)

11.

H. Bourlard, N. Morgan, Connectionist Speech Recognition: A Hybrid Approach. The Kluwer International Series in Engineering and Computer Science, vol. 247 (Kluwer Academic, Boston, 1994)

12.

J. Bridle, L. Deng, J. Picone, H. Richards, J. Ma, T. Kamm, M. Schuster, S. Pike, R. Reagan, An investigation of segmental hidden dynamic models of speech coarticulation for automatic speech recognition. Final Report for 1998 Workshop on Langauge Engineering, CLSP (Johns Hopkins, 1998)

13.

J. Chen, L. Deng, A primal-dual method for training recurrent neural networks constrained by the echo-state property, in Proceedings of ICLR (2014)

14.

J.-T. Chien, C.-H. Chueh, Dirichlet class language models for speech recognition. IEEE Trans. Audio Speech Lang. Process. 27, 43–54 (2011)

15.

G. Dahl, D. Yu, L. Deng, A. Acero, Large vocabulary continuous speech recognition with context-dependent DBN-HMMs, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (2011)

16.

G. Dahl, D. Yu, L. Deng, A. Acero, Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)CrossRef

17.

A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum-likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B. 39, 1–38 (1977)MathSciNetMATH

18.

L. Deng, A generalized hidden markov model with state-conditioned trend functions of time for the speech signal. Signal Process. 27(1), 65–78 (1992)CrossRefMATH

19.

L. Deng, A dynamic, feature-based approach to the interface between phonology and phonetics for speech modeling and recognition. Speech Commun. 24(4), 299–323 (1998)CrossRef

20.

L. Deng, Articulatory features and associated production models in statistical speech recognition, in Computational Models of Speech Pattern Processing (Springer, New York, 1999), pp. 214–224

21.

L. Deng, Computational models for speech production, in Computational Models of Speech Pattern Processing (Springer, New York, 1999), pp. 199–213

22.

L. Deng, Switching dynamic system models for speech articulation and acoustics, in Mathematical Foundations of Speech and Language Processing (Springer, New York, 2003), pp. 115–134

23.

L. Deng, Dynamic Speech Models—Theory, Algorithm, and Applications (Morgan and Claypool, San Rafael, 2006)

24.

L. Deng, M. Aksmanovic, D. Sun, J. Wu, Speech recognition using hidden Markov models with polynomial regression functions as non-stationary states. IEEE Trans. Acoust. Speech Signal Process. 2(4), 101–119 (1994)

25.

L. Deng, J. Chen, Sequence classification using high-level features extracted from deep neural networks, in Proceedings of ICASSP (2014)

26.

L. Deng, J. Droppo, A. Acero, A Bayesian approach to speech feature enhancement using the dynamic cepstral prior, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1 (2002), pp. I-829–I-832

27.

L. Deng, K. Hassanein, M. Elmasry, Analysis of the correlation structure for a neural predictive model with application to speech recognition. Neural Netw. 7(2), 331–339 (1994)CrossRef

28.

L. Deng, G. Hinton, B. Kingsbury, New types of deep neural network learning for speech recognition and related applications: an overview, in Proceedings of IEEE ICASSP, Vancouver, 2013

29.

L. Deng, G. Hinton, D. Yu, Deep learning for speech recognition and related applications, in NIPS Workshop, Whistler, 2009

30.

L. Deng, P. Kenny, M. Lennig, V. Gupta, F. Seitz, P. Mermelsten, Phonemic hidden markov models with continuous mixture output densities for large vocabulary word recognition. IEEE Trans. Acoust. Speech Signal Process. 39(7), 1677–1681 (1991)CrossRef

31.

L. Deng, L. Lee, H. Attias, A. Acero, Adaptive kalman filtering and smoothing for tracking vocal tract resonances using a continuous-valued hidden dynamic model. IEEE Trans. Audio Speech Lang. Process. 15(1), 13–23 (2007)CrossRef

32.

L. Deng, M. Lennig, F. Seitz, P. Mermelstein, Large vocabulary word recognition using context-dependent allophonic hidden markov models. Comput. Speech Lang. 4, 345–357 (1991)CrossRef

33.

L. Deng, X. Li, Machine learning paradigms in speech recognition: an overview. IEEE Trans. Audio Speech Lang. Process. 21(5), 1060–1089 (2013)CrossRef

34.

L. Deng, J. Ma, A statistical coarticulatory model for the hidden vocal-tract-resonance dynamics, in EUROSPEECH (1999), pp. 1499–1502

35.

L. Deng, J. Ma, Spontaneous speech recognition using a statistical coarticulatory model for the hidden vocal-tract-resonance dynamics. J. Acoust. Soc. Am. 108, 3036–3048 (2000)CrossRef

36.

L. Deng, D. O’Shaughnessy, Speech Processing—A Dynamic and Optimization-Oriented Approach (Marcel Dekker, New York, 2003)

37.

L. Deng, G. Ramsay, D. Sun, Production models as a structural basis for automatic speech recognition. Speech Commun. 33(2–3), 93–111 (1997)CrossRef

38.

L. Deng, D. Yu, Use of differential cepstra as acoustic features in hidden trajectory modelling for phonetic recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (2007), pp. 445–448

39.

L. Deng, D. Yu, A. Acero, A bidirectional target filtering model of speech coarticulation: two-stage implementation for phonetic recognition. IEEE Trans. Speech Audio Process. 14, 256–265 (2006)CrossRef

40.

L. Deng, D. Yu, A. Acero, Structured speech modeling. IEEE Trans. Speech Audio Process. 14, 1492–1504 (2006)CrossRef

41.

P. Divenyi, S. Greenberg, G. Meyer, Dynamics of Speech Production and Perception (IOS Press, Amsterdam, 2006)

42.

J. Droppo, A. Acero, Noise robust speech recognition with a switching linear dynamic model, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1 (2004), pp. I-953–I-956

43.

E. Fox, E. Sudderth, M. Jordan, A. Willsky, Bayesian nonparametric methods for learning markov switching processes. IEEE Signal Process. Mag. 27(6), 43–54 (2010)

44.

B. Frey, L. Deng, A. Acero, T. Kristjansson, Algonquin: iterating laplaces method to remove multiple types of acoustic distortion for robust speech recognition, in Proceedings of Eurospeech (2000)

45.

M. Gales, S. Young, Robust continuous speech recognition using parallel model combination. IEEE Trans. Speech Audio Process. 4(5), 352–359 (1996)CrossRef

46.

Z. Ghahramani, G.E. Hinton, Variational learning for switching state-space models. Neural Comput. 12, 831–864 (2000)CrossRef

47.

Y. Gong, I. Illina, J.-P. Haton, Modeling long term variability information in mixture stochastic trajectory framework, in Proceedings of International Conference on Spoken Language Processing (1996)

48.

A. Graves, Sequence transduction with recurrent neural networks, in Representation Learning Workshop, ICML (2012)

49.

A. Graves, A. Mahamed, G. Hinton, Speech recognition with deep recurrent neural networks, in Proceedings of ICASSP, Vancouver, 2013

50.

G. E. Hinton, “A practical guide to training restricted Boltzmann machines,” in Technical report 2010-003, Machine Learning Group, University of Toronto, 2010.

51.

G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, B. Kingsbury, Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29(6), 82–97 (2012)CrossRef

52.

G. Hinton, S. Osindero, Y. Teh, A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)MathSciNetCrossRefMATH

53.

G. Hinton, R. Salakhutdinov, Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)MathSciNetCrossRefMATH

54.

W. Holmes, M. Russell, Probabilistic-trajectory segmental HMMs. Comput. Speech Lang. 13, 3–37 (1999)CrossRef

55.

X. Huang, A. Acero, H.-W. Hon, Spoken Language Processing: A Guide to Theory, Algorithm, and System Development (Upper Saddle River, New Jersey 07458)

56.

H. Jaeger, Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the “echo state network” approach. GMD Report 159, GMD - German National Research Institute for Computer Science (2002)

57.

F. Jelinek, Continuous speech recognition by statistical methods. Proc. IEEE 64(4), 532–557 (1976)CrossRef

58.

B.-H. Juang, S.E. Levinson, M.M. Sondhi, Maximum likelihood estimation for mixture multivariate stochastic observations of markov chains. IEEE Trans. Inf. Theory 32(2), 307–309 (1986)CrossRef

59.

B. Kingsbury, T. Sainath, H. Soltau, Scalable minimum Bayes risk training of deep neural network acoustic models using distributed hessian-free optimization, in Proceedings of Interspeech (2012)

60.

H. Larochelle, Y. Bengio, Classification using discriminative restricted Boltzmann machines, in Proceedings of the 25th International Conference on Machine learning (ACM, New York, 2008), pp. 536–543

61.

L. Lee, H. Attias, L. Deng, Variational inference and learning for segmental switching state space models of hidden speech dynamics, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1 (2003), pp. I-872–I-875

62.

L.J. Lee, P. Fieguth, L. Deng, A functional articulatory dynamic model for speech production, in Proceedings of ICASSP, Salt Lake City, vol. 2, 2001, pp. 797–800

63.

S. Liu, K. Sim, Temporally varying weight regression: a semi-parametric trajectory model for automatic speech recognition. IEEE Trans. Audio Speech Lang. Process. 22(1) 151–160 (2014)CrossRef

64.

S.M. Siniscalchia, D. Yu, L. Deng, C.-H. Lee, Exploiting deep neural networks for detection-based speech recognition. Neurocomputing 106, 148–157 (2013)CrossRef

65.

J. Ma, L. Deng, A path-stack algorithm for optimizing dynamic regimes in a statistical hidden dynamic model of speech. Comput. Speech Lang. 14, 101–104 (2000)CrossRef

66.

J. Ma, L. Deng, Efficient decoding strategies for conversational speech recognition using a constrained nonlinear state-space model. IEEE Trans. Audio Speech Process. 11(6), 590–602 (2003)CrossRef

67.

J. Ma, L. Deng, Efficient decoding strategies for conversational speech recognition using a constrained nonlinear state-space model. IEEE Trans. Audio Speech Lang. Process 11(6), 590–602 (2004)CrossRef

68.

J. Ma, L. Deng, Target-directed mixture dynamic models for spontaneous speech recognition. IEEE Trans. Audio Speech Process. 12(1), 47–58 (2004)CrossRef

69.

A.L. Maas, Q. Le, T.M. O’Neil, O. Vinyals, P. Nguyen, A.Y. Ng, Recurrent neural networks for noise reduction in robust asr, in Proceedings of INTERSPEECH, Portland, 2012

70.

J. Martens, I. Sutskever, Learning recurrent neural networks with hessian-free optimization, in Proceedings of ICML, Bellevue, 2011, pp. 1033–1040

71.

G. Mesnil, X. He, L. Deng, Y. Bengio, Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding, in Proceedings of INTERSPEECH, Lyon, 2013

72.

B. Mesot, D. Barber, Switching linear dynamical systems for noise robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 15(6), 1850–1858 (2007)CrossRef

73.

T. Mikolov, Statistical language models based on neural networks, Ph.D. thesis, Brno University of Technology, 2012

74.

T. Mikolov, A. Deoras, D. Povey, L. Burget, J. Cernocky, Strategies for training large scale neural network language models, in Proceedings of IEEE ASRU (IEEE, Honolulu, 2011), pp. 196–201

75.

T. Mikolov, M. Karafiát, L. Burget, J. Cernockỳ, S. Khudanpur, Recurrent neural network based language model, in Proceedings of INTERSPEECH, Makuhari, 2010, pp. 1045–1048

76.

T. Mikolov, S. Kombrink, L. Burget, J. Cernocky, S. Khudanpur, Extensions of recurrent neural network language model, in Proceedings of IEEE ICASSP, Prague, 2011, pp. 5528–5531

77.

A. Mohamed, G. Dahl, G. Hinton, Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Process. 20(1), 14–22 (2012)CrossRef

78.

A. Mohamed, G.E. Dahl, G.E. Hinton, Deep belief networks for phone recognition, in NIPS Workshop on Deep Learning for Speech Recognition and Related Applications (2009)

79.

A. Mohamed, T. Sainath, G. Dahl, B. Ramabhadran, G. Hinton, M. Picheny, Deep belief networks using discriminative features for phone recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (2011), pp. 5060–5063

80.

N. Morgan, Deep and wide: multiple layers in automatic speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 7–13 (2012)CrossRef

81.

M. Ostendorf, V. Digalakis, O. Kimball, From HMM’s to segment models: a unified view of stochastic modeling for speech recognition. IEEE Trans. Speech Audio Process. 4(5), 360–378 (1996)CrossRef

82.

M. Ostendorf, A. Kannan, O. Kimball, J. Rohlicek, Continuous word recognition based on the stochastic segment model, in Proceedings of DARPA Workshop CSR (1992)

83.

E. Ozkan, I. Ozbek, M. Demirekler, Dynamic speech spectrum representation and tracking variable number of vocal tract resonance frequencies with time-varying dirichlet process mixture models. IEEE Trans. Audio Speech Lang. Process. 17(8), 1518–1532 (2009)CrossRef

84.

R. Pascanu, T. Mikolov, Y. Bengio, On the difficulty of training recurrent neural networks, in Proceedings of ICML, Atlanta, 2013

85.

V. Pavlovic, B. Frey, T. Huang, Variational learning in mixed-state dynamic graphical models, in Proceedings of UAI, Stockholm, 1999, pp. 522–530

86.

J. Picone, S. Pike, R. Regan, T. Kamm, J. Bridle, L. Deng, Z. Ma, H. Richards, M. Schuster, Initial evaluation of hidden dynamic models on conversational speech, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (1999)

87.

G. Puskorius, L. Feldkamp, Neurocontrol of nonlinear dynamical systems with kalman filter trained recurrent networks. IEEE Trans. Neural Netw. 5(2), 279–297 (1998)CrossRef

88.

L. Rabiner, B.-H. Juang, Fundamentals of Speech Recognition (Prentice-Hall, Upper Saddle River, 1993)

89.

S. Rennie, J. Hershey, P. Olsen, Single-channel multitalker speech recognition—graphical modeling approaches. IEEE Signal Process.Mag. 33, 66–80 (2010)

90.

A.J. Robinson, An application of recurrent nets to phone probability estimation. IEEE Trans. Neural Netw. 5(2), 298–305 (1994)CrossRef

91.

A. Rosti, M. Gales, Rao-blackwellised gibbs sampling for switching linear dynamical systems, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1 (2004), pp. I-809–I-812

92.

M. Russell, P. Jackson, A multiple-level linear/linear segmental HMM with a formant-based intermediate layer. Comput. Speech Lang. 19, 205–225 (2005)CrossRef

93.

T. Sainath, B. Kingsbury, H. Soltau, B. Ramabhadran, Optimization techniques to improve training speed of deep neural networks for large speech tasks. IEEE Trans. Audio Speech Lang. Process. 21(11), 2267–2276 (2013)CrossRef

94.

F. Seide, G. Li, X. Chen, D. Yu, Feature engineering in context-dependent deep neural networks for conversational speech transcription, in IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2011 (Waikoloa, HI, USA), pp. 24–29

95.

X. Shen, L. Deng, Maximum likelihood in statistical estimation of dynamical systems: decomposition algorithm and simulation results. Signal Process. 57, 65–79 (1997)CrossRefMATH

96.

K. N. Stevens, Acoustic phonetics, Vol. 30, MIT Press, 2000.

97.

V. Stoyanov, A. Ropson, J. Eisner, Empirical risk minimization of graphical model parameters given approximate inference, decoding, and model structure, in Proceedings of AISTAT (2011)

98.

I. Suskever, J. Martens, G.E. Hinton, Generating text with recurrent neural networks, in Proceedings of 28th International Conference on Machine Learning (2011)

99.

I. Sutskever, Training recurrent neural networks, Ph.D. thesis, University of Toronto, 2013

100.

I. Sutskever, J. Martens, G.E. Hinton, Generating text with recurrent neural networks, in Proceedings of ICML, Bellevue, 2011, pp. 1017–1024

101.

R. Togneri, L. Deng, Joint state and parameter estimation for a target-directed nonlinear dynamic system model. IEEE Trans. Signal Process. 51(12), 3061–3070 (2003)MathSciNetCrossRef

102.

R. Togneri, L. Deng, A state-space model with neural-network prediction for recovering vocal tract resonances in fluent speech from mel-cepstral coefficients. Speech Commun. 48(8), 971–988 (2006)CrossRef

103.

F. Triefenbach, A. Jalalvand, K. Demuynck, J.-P. Martens, Acoustic modeling with hierarchical reservoirs. EEE Trans. Audio Speech Lang. Process. 21(11 ), 2439–2450 (2013)CrossRef

104.

S. Wright, D. Kanevsky, L. Deng, X. He, G. Heigold, H. Li, Optimization algorithms and applications for speech and language processing. IEEE Trans. Audio Speech Lang. Process. 21(11), 2231–2243 (2013)CrossRef

105.

X. Xing, M. Jordan, S. Russell, A generalized mean field algorithm for variational inference in exponential families, in Proceedings of UAI (2003)

106.

D. Yu, L. Deng, Speaker-adaptive learning of resonance targets in a hidden trajectory model of speech coarticulation. Comput. Speech Lang. 27, 72–87 (2007)CrossRef

107.

D. Yu, L. Deng, Discriminative pretraining of deep neural networks, US Patent 20130138436 A1, 2013

108.

D. Yu, L. Deng, G. Dahl, Roles of pre-training and fine-tuning in context-dependent DBN-HMMs for real-world speech recognition, in NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2010)

109.

D. Yu, F. Seide, G. Li, L. Deng, Exploiting sparseness in deep neural networks for large vocabulary speech recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (2012), pp. 4409–4412

110.

D. Yu, S. Siniscalchi, L. Deng, C. Lee, Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (2012)

111.

H. Zen, K. Tokuda, T. Kitamura, An introduction of trajectory model into HMM-based speech synthesis, in Proceedings of ISCA SSW5 (2004), pp. 191–196

112.

L. Zhang, S. Renals, Acoustic-articulatory modelling with the trajectory HMM. IEEE Signal Process. Lett. 15, 245–248 (2008)CrossRef

Titel: Deep Dynamic Models for Learning Hidden Representations of Speech Features
verfasst von: Li Deng
Roberto Togneri
Verlag: Springer New York
Buch: Speech and Audio Processing for Coding, Enhancement and Recognition
Print ISBN: 978-1-4939-1455-5

Electronic ISBN: 978-1-4939-1456-2

Copyright-Jahr: 2015
DOI: https://doi.org/10.1007/978-1-4939-1456-2_6

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Die Gewinner und Laudatoren des Sustainability Award in Automotive 2024/© Uli Regenscheit | ATZlive, Search Icon, Banner Hanser, Kundenpotenzial/© Andrii Yalanskyi / Getty Images / iStock, Toyota-Logo/© ollo / Getty Images / iStock, Sebastian Glenschek/© Hermes International, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade, chassis.tech plus 2023/© [M] ATZlive / TÜV SÜD PRODUCT SERVICE GMBH, adäsion-Webinar-Matinee/© krystiannawrocki_ Getty Images

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.