Summary
Major speech production models from speech science literature and a number of popular statistical “generative” models of speech used in speech technology are surveyed. Strengths and weaknesses of these two styles of speech models are analyzed, pointing to the need to integrate the respective strengths while eliminating the respective weaknesses. As an example, a statistical task-dynamic model of speech production is described, motivated by the original deterministic version of the model and targeted for integrated-multilingual speech recognition applications. Methods for model parameter learning (training) and for likelihood computation (recognition) are described based on statistical optimization principles integrated in neural network and dynamic system theories.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bakis R. (1993), “An articulatory-like speech production model with controlled use of prior knowledge,” notes from Frontiers in Speech Processing, CD-ROM.
Blackburn C., and Young.S. (1995), “Towards improved speech recognition using a speech production model,” Proc. Eurospeech, vol. 2, pp. 1623-1626.
Deng L. (1992)“A generalized hidden Markov model with state-conditioned trend functions of time for the speech signal,” Signal Processing, vol. 27, pp. 65–78.
Deng L. (1993)“Design of a feature-based speech recognizer aiming at integration of auditory processing, signal modeling, and phonological structure of speech.” J ASAvol. 93(4) Pt. 2, pp. 2318
Deng L. (1992-1993)“A Computational Model of the Phonology-Phonetics Interface for Automatic Speech Recognition,” Summary Report of Research in Spoken Language Systems, Laboratory for Computer Science, MIT.
Deng L. and Aksmanovic M. (1997)“Speaker-independent phonetic classification using hidden Markov models with mixtures of trend functions,” IEEE Trans. Speech Audio Processing, vol. 5, pp. 319–324.
Deng L. and Sameti H. (1996)“Transitional speech units and their representation by the regressive Markov states: Applications to speech recognition,” IEEE Trans. Speech Audio Proc., vol. 4(4), pp. 301–306.
Deng L. and Sun D. (1994), “A statistical approach to automatic speech recognition using the atomic speech units constructed from overlapping articulatory features,” JASA, vol. 95, pp. 2702–2719.
Deng L., Ramsay L., and Sun D. (1997) “Production models as a structural basis for automatic speech recognition,” Speech Communication, August issue.
Digalakis V., Rohlicek J., and Ostendorf M., (1993)“ML estimation of a stochastic linear system with the EMalgorithm and its application to speech recognition,” IEEE Trans. Speech Audio Processing, pp. 431–442.
Ghitza O., and Sondhi M. (1993) “Hidden Markov models with templates as nonsta-tionary states: an application to speech recognition,” Computer Speech and Language, vol. 7, pp. 101–119
Gales M. and Young S. (1993) “Segmental HMMs for speech recognition,” Proc. Eurospeech, pp. 1579–1582.
Gersch W. (1992) “Smoothness priors,” in New Directions in Time Series AnalysisD. Brillinger et al. (eds.), Springer, New York, pp. 111–146.
Gish H. and Ng K. (1993) “A segmental speech model with applications to word spotting,” Proc. ICASSP, pp. 447–450.
Haykin S. (1994) Neural Networks—A Comprehensive Foundation, Maxwell Macmil-lan, Toronto.
Holmes W. and Russell M. (1995)“Speech recognition using a linear dynamic segmental HMM,” Proc. Eurospeech, pp. 1611–1641.
Kent R., Adams S. and Turner G. (1995 “Models of speech production,” in Principles of Experimental Phonetics, Ed. N. Lass, Mosby: London, pp. 3–45.
Kitagawa G. and W. Gersch W. (1996) Smoothness Priors Analysis of Time Series, Springer, New York.
Kohn R. and Ansley C. (1988)“Equivalence between Bayesian smoothness priors and optimal smoothing for function estimation,” in Bayesian Analysis of Time Series and Dynamic Models, J. Spall (ed.), Marcel Dekker, New York, pp. 393–430.
McGowan R. (1994)“Recovering articulatory movement from formant frequency trajectories using task dynamics and a genetic algorithm: Preliminary model tests,” Speech Communication, 14, pp. 19–48.
Mendel J. (1995) Lessons in Estimation Theory for Signal Processing, Communications, and Control, Prentice Hall, New Jersey.
Moore R. (1994) “Twenty things we still don-t know about speech,” Proc. CRIM/FORWISS Workshop on Speech Research and Technology, pp. 1–9.
Ostendorf M. (1996)“From HMMs to segment models,” in Automatic Speech and Speaker Recognition -Advanced Topics, C. Lee, F. Soong, and K. Paliwal (eds.), Kluwer Academic Publishers, pp. 185–210.
Perrier P. et al. (eds.) Proceedings of the First ESCA Tutorial & Research Workshop on Speech Production Modeling, Autrans, France, May 24–27, 1996
Ramsay G. and Deng L. (1996)“Optimal filtering and smoothing for speech recognition using a stochastic target model,” Proc. ICSLP, pp. 1113–1116
Rathinavalu C. and Deng L. (1997) “HMM-based speech recognition using state-dependent, discriminatively derived transforms on Mel-warped DFT features,” IEEE Trans. Speech Audio Processing, pp. 243–256.
Rubin P. et al (1996) “CASY and extensions to the task-dynamic model,” Proc. 4th European Speech Production Workshop, Autrans, France, pp. 125–128.
Saltzman E. and Munhall K. (1989)“A dynamical approach to gestural patterning in speech production,” Ecological Psychology, 1, 333–382.
Stevens K. (1989)“On the quantal nature of speech,” J. Phonetics, vol. 17, 1989, pp. 3–45.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Deng, L. (1999). Computational Models for Speech Production. In: Ponting, K. (eds) Computational Models of Speech Pattern Processing. NATO ASI Series, vol 169. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-60087-6_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-60087-6_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-64250-0
Online ISBN: 978-3-642-60087-6
eBook Packages: Springer Book Archive