Computational Models for Speech Production

Deng, Li

doi:10.1007/978-3-642-60087-6_20

Li Deng²

Part of the book series: NATO ASI Series ((NATO ASI F,volume 169))

243 Accesses
24 Citations
1 Altmetric

Summary

Major speech production models from speech science literature and a number of popular statistical “generative” models of speech used in speech technology are surveyed. Strengths and weaknesses of these two styles of speech models are analyzed, pointing to the need to integrate the respective strengths while eliminating the respective weaknesses. As an example, a statistical task-dynamic model of speech production is described, motivated by the original deterministic version of the model and targeted for integrated-multilingual speech recognition applications. Methods for model parameter learning (training) and for likelihood computation (recognition) are described based on statistical optimization principles integrated in neural network and dynamic system theories.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bakis R. (1993), “An articulatory-like speech production model with controlled use of prior knowledge,” notes from Frontiers in Speech Processing, CD-ROM.
Google Scholar
Blackburn C., and Young.S. (1995), “Towards improved speech recognition using a speech production model,” Proc. Eurospeech, vol. 2, pp. 1623-1626.
Google Scholar
Deng L. (1992)“A generalized hidden Markov model with state-conditioned trend functions of time for the speech signal,” Signal Processing, vol. 27, pp. 65–78.
Article MATH Google Scholar
Deng L. (1993)“Design of a feature-based speech recognizer aiming at integration of auditory processing, signal modeling, and phonological structure of speech.” J ASAvol. 93(4) Pt. 2, pp. 2318
Google Scholar
Deng L. (1992-1993)“A Computational Model of the Phonology-Phonetics Interface for Automatic Speech Recognition,” Summary Report of Research in Spoken Language Systems, Laboratory for Computer Science, MIT.
Google Scholar
Deng L. and Aksmanovic M. (1997)“Speaker-independent phonetic classification using hidden Markov models with mixtures of trend functions,” IEEE Trans. Speech Audio Processing, vol. 5, pp. 319–324.
Article Google Scholar
Deng L. and Sameti H. (1996)“Transitional speech units and their representation by the regressive Markov states: Applications to speech recognition,” IEEE Trans. Speech Audio Proc., vol. 4(4), pp. 301–306.
Article Google Scholar
Deng L. and Sun D. (1994), “A statistical approach to automatic speech recognition using the atomic speech units constructed from overlapping articulatory features,” JASA, vol. 95, pp. 2702–2719.
Google Scholar
Deng L., Ramsay L., and Sun D. (1997) “Production models as a structural basis for automatic speech recognition,” Speech Communication, August issue.
Google Scholar
Digalakis V., Rohlicek J., and Ostendorf M., (1993)“ML estimation of a stochastic linear system with the EMalgorithm and its application to speech recognition,” IEEE Trans. Speech Audio Processing, pp. 431–442.
Google Scholar
Ghitza O., and Sondhi M. (1993) “Hidden Markov models with templates as nonsta-tionary states: an application to speech recognition,” Computer Speech and Language, vol. 7, pp. 101–119
Article Google Scholar
Gales M. and Young S. (1993) “Segmental HMMs for speech recognition,” Proc. Eurospeech, pp. 1579–1582.
Google Scholar
Gersch W. (1992) “Smoothness priors,” in New Directions in Time Series AnalysisD. Brillinger et al. (eds.), Springer, New York, pp. 111–146.
Google Scholar
Gish H. and Ng K. (1993) “A segmental speech model with applications to word spotting,” Proc. ICASSP, pp. 447–450.
Google Scholar
Haykin S. (1994) Neural Networks—A Comprehensive Foundation, Maxwell Macmil-lan, Toronto.
MATH Google Scholar
Holmes W. and Russell M. (1995)“Speech recognition using a linear dynamic segmental HMM,” Proc. Eurospeech, pp. 1611–1641.
Google Scholar
Kent R., Adams S. and Turner G. (1995 “Models of speech production,” in Principles of Experimental Phonetics, Ed. N. Lass, Mosby: London, pp. 3–45.
Google Scholar
Kitagawa G. and W. Gersch W. (1996) Smoothness Priors Analysis of Time Series, Springer, New York.
Book MATH Google Scholar
Kohn R. and Ansley C. (1988)“Equivalence between Bayesian smoothness priors and optimal smoothing for function estimation,” in Bayesian Analysis of Time Series and Dynamic Models, J. Spall (ed.), Marcel Dekker, New York, pp. 393–430.
Google Scholar
McGowan R. (1994)“Recovering articulatory movement from formant frequency trajectories using task dynamics and a genetic algorithm: Preliminary model tests,” Speech Communication, 14, pp. 19–48.
Article Google Scholar
Mendel J. (1995) Lessons in Estimation Theory for Signal Processing, Communications, and Control, Prentice Hall, New Jersey.
MATH Google Scholar
Moore R. (1994) “Twenty things we still don-t know about speech,” Proc. CRIM/FORWISS Workshop on Speech Research and Technology, pp. 1–9.
Google Scholar
Ostendorf M. (1996)“From HMMs to segment models,” in Automatic Speech and Speaker Recognition -Advanced Topics, C. Lee, F. Soong, and K. Paliwal (eds.), Kluwer Academic Publishers, pp. 185–210.
Chapter Google Scholar
Perrier P. et al. (eds.) Proceedings of the First ESCA Tutorial & Research Workshop on Speech Production Modeling, Autrans, France, May 24–27, 1996
Google Scholar
Ramsay G. and Deng L. (1996)“Optimal filtering and smoothing for speech recognition using a stochastic target model,” Proc. ICSLP, pp. 1113–1116
Google Scholar
Rathinavalu C. and Deng L. (1997) “HMM-based speech recognition using state-dependent, discriminatively derived transforms on Mel-warped DFT features,” IEEE Trans. Speech Audio Processing, pp. 243–256.
Google Scholar
Rubin P. et al (1996) “CASY and extensions to the task-dynamic model,” Proc. 4th European Speech Production Workshop, Autrans, France, pp. 125–128.
Google Scholar
Saltzman E. and Munhall K. (1989)“A dynamical approach to gestural patterning in speech production,” Ecological Psychology, 1, 333–382.
Article Google Scholar
Stevens K. (1989)“On the quantal nature of speech,” J. Phonetics, vol. 17, 1989, pp. 3–45.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Ontario, Canada, N2L 3G1
Li Deng

Authors

Li Deng
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Speech Research Unit, DERA Malvern, St. Andrew’s Road, WR14 4DT, Great Malvern, Worcs, UK
Keith Ponting

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Deng, L. (1999). Computational Models for Speech Production. In: Ponting, K. (eds) Computational Models of Speech Pattern Processing. NATO ASI Series, vol 169. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-60087-6_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-60087-6_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-64250-0
Online ISBN: 978-3-642-60087-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics