Skip to main content

2017 | OriginalPaper | Buchkapitel

2. Speech Production and Modelling

verfasst von : Tom Bäckström

Erschienen in: Speech Coding

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Humans produce speech sounds by pushing air out of the lungs and letting the vocal folds oscillate by the airflow as well as by turbulent constrictions in the vocal tract. The flow-waveform thus created is further modulated by the resonances of the vocal tract. These features form the characteristic properties of phones. For efficient coding, we must model these features with a minimum number of parameters without altering the perceptual impression.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
This is a representative list of vowels, but in no way complete. For example, diphthongs have been omitted, since for our purposes they can be modelled as a transition between two vowels.
 
Literatur
1.
Zurück zum Zitat Austin, S.F., Titze, I.R.: The effect of subglottal resonance upon vocal fold vibration. J. Voice 11(4), 391–402 (1997)CrossRef Austin, S.F., Titze, I.R.: The effect of subglottal resonance upon vocal fold vibration. J. Voice 11(4), 391–402 (1997)CrossRef
2.
Zurück zum Zitat Benesty, J., Sondhi, M., Huang, Y.: Springer Handbook of Speech Processing. Springer, Heidelberg (2008)CrossRef Benesty, J., Sondhi, M., Huang, Y.: Springer Handbook of Speech Processing. Springer, Heidelberg (2008)CrossRef
3.
Zurück zum Zitat Bozkurt, B., Doval, B., d’Alessandro, C., Dutoit, T.: Zeros of z-transform (zzt) decomposition of speech for source-tract separation. In: Proceedings International Conference Speech, Language Processing (2004) Bozkurt, B., Doval, B., d’Alessandro, C., Dutoit, T.: Zeros of z-transform (zzt) decomposition of speech for source-tract separation. In: Proceedings International Conference Speech, Language Processing (2004)
4.
Zurück zum Zitat Bozkurt, B., Dutoit, T.: Mixed-phase speech modeling and formant estimation, using differential phase spectrums. In: ISCA Tutorial and Research Workshop on Voice Quality: Functions, Analysis and Synthesis (2003) Bozkurt, B., Dutoit, T.: Mixed-phase speech modeling and formant estimation, using differential phase spectrums. In: ISCA Tutorial and Research Workshop on Voice Quality: Functions, Analysis and Synthesis (2003)
5.
Zurück zum Zitat Degottex, G., Roebel, A., Rodet, X.: Phase minimization for glottal model estimation. IEEE Trans. Audio Speech Lang. Process. 19(5), 1080–1090 (2011)CrossRef Degottex, G., Roebel, A., Rodet, X.: Phase minimization for glottal model estimation. IEEE Trans. Audio Speech Lang. Process. 19(5), 1080–1090 (2011)CrossRef
6.
Zurück zum Zitat Erath, B.D., Zañartu, M., Stewart, K.C., Plesniak, M.W., Sommer, D.E., Peterson, S.D.: A review of lumped-element models of voiced speech. Speech Commun. 55(5), 667–690 (2013) Erath, B.D., Zañartu, M., Stewart, K.C., Plesniak, M.W., Sommer, D.E., Peterson, S.D.: A review of lumped-element models of voiced speech. Speech Commun. 55(5), 667–690 (2013)
7.
Zurück zum Zitat Fant, G.: Acoustic Theory of Speech Production. Walter de Gruyter, Germany (1970) Fant, G.: Acoustic Theory of Speech Production. Walter de Gruyter, Germany (1970)
8.
Zurück zum Zitat Flanagan, J.L.: Speech Analysis: Synthesis and Perception. Springer-Verlag, New York (1972)CrossRef Flanagan, J.L.: Speech Analysis: Synthesis and Perception. Springer-Verlag, New York (1972)CrossRef
9.
Zurück zum Zitat Goldstein, U.G.: An articulatory model for the vocal tracts of growing children. Ph.D. thesis, Massachusetts Institute of Technology (1980) Goldstein, U.G.: An articulatory model for the vocal tracts of growing children. Ph.D. thesis, Massachusetts Institute of Technology (1980)
10.
Zurück zum Zitat Kelly, J.L., Lochbaum, C.C.: Speech synthesis. In: Proceedings Fourth International Congress on Acoustics, vol. G42, pp. 1–4. Copenhagen, Denmark (1962) Kelly, J.L., Lochbaum, C.C.: Speech synthesis. In: Proceedings Fourth International Congress on Acoustics, vol. G42, pp. 1–4. Copenhagen, Denmark (1962)
11.
Zurück zum Zitat Laine, U.K.: Modelling of lip radiation impedance in z-domain. In: Proceedings of the ICASSP, vol. 7, pp. 1992–1995. IEEE (1982) Laine, U.K.: Modelling of lip radiation impedance in z-domain. In: Proceedings of the ICASSP, vol. 7, pp. 1992–1995. IEEE (1982)
12.
Zurück zum Zitat Lulich, S.M.: Subglottal resonances and distinctive features. J. Phon. 38(1), 20–32 (2010)CrossRef Lulich, S.M.: Subglottal resonances and distinctive features. J. Phon. 38(1), 20–32 (2010)CrossRef
13.
Zurück zum Zitat Markel, J.E., Gray, A.H.: Linear Prediction of Speech. Springer-Verlag, Inc., New York (1982) Markel, J.E., Gray, A.H.: Linear Prediction of Speech. Springer-Verlag, Inc., New York (1982)
14.
Zurück zum Zitat Palo, J., Aalto, D., Aaltonen, O., Happonen, R.P., Malinen, J., Saunavaara, J., Vainio, M.: Articulating finnish vowels: results from MRI and sound data. Ling. Ural. 48(3), 194–199 (2012) Palo, J., Aalto, D., Aaltonen, O., Happonen, R.P., Malinen, J., Saunavaara, J., Vainio, M.: Articulating finnish vowels: results from MRI and sound data. Ling. Ural. 48(3), 194–199 (2012)
15.
Zurück zum Zitat Pulkki, V., Karjalainen, M.: Communication Acoustics: An Introduction to Speech, Audio and Psychoacoustics. Wiley, New Jersey (2015) Pulkki, V., Karjalainen, M.: Communication Acoustics: An Introduction to Speech, Audio and Psychoacoustics. Wiley, New Jersey (2015)
16.
Zurück zum Zitat Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals, vol. 100. Prentice-Hall, Englewood Cliffs (1978) Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals, vol. 100. Prentice-Hall, Englewood Cliffs (1978)
17.
Zurück zum Zitat Ramasubramanian, V.: Ultra low bit-rate speech coding: an overview and recent results. In: Signal Processing and Communications (SPCOM), 2012 International Conference on, pp. 1–5. IEEE (2012) Ramasubramanian, V.: Ultra low bit-rate speech coding: an overview and recent results. In: Signal Processing and Communications (SPCOM), 2012 International Conference on, pp. 1–5. IEEE (2012)
18.
Zurück zum Zitat Ramasubramanian, V., Harish, D.: Ultra low bit-rate speech coding based on unit-selection with joint spectral-residual quantization: no transmission of any residual information. In: Proceedings of the Interspeech (2009) Ramasubramanian, V., Harish, D.: Ultra low bit-rate speech coding based on unit-selection with joint spectral-residual quantization: no transmission of any residual information. In: Proceedings of the Interspeech (2009)
19.
Zurück zum Zitat Rossing, T.D.: The Science of Sound. Addison-Wesley, New York (1990) Rossing, T.D.: The Science of Sound. Addison-Wesley, New York (1990)
20.
Zurück zum Zitat Smith III, J.O.: Physical audio signal processing for virtual musical instruments and audio effects. In: Center for Computer Research in Music and Acoustics (CCRMA) (2013) Smith III, J.O.: Physical audio signal processing for virtual musical instruments and audio effects. In: Center for Computer Research in Music and Acoustics (CCRMA) (2013)
21.
Zurück zum Zitat Tokuda, K., Masuko, T., Hiroi, J., Kobayashi, T., Kitamura, T.: A very low bit rate speech coder using hmm-based speech recognition/synthesis techniques. In: Proceedings of the ICASSP, vol. 2, pp. 609–612. IEEE (1998) Tokuda, K., Masuko, T., Hiroi, J., Kobayashi, T., Kitamura, T.: A very low bit rate speech coder using hmm-based speech recognition/synthesis techniques. In: Proceedings of the ICASSP, vol. 2, pp. 609–612. IEEE (1998)
22.
Zurück zum Zitat Vary, P., Martin, R.: Digital Speech Transmission: Enhancement, Coding and Error Concealment. Wiley, New Jersey (2006)CrossRef Vary, P., Martin, R.: Digital Speech Transmission: Enhancement, Coding and Error Concealment. Wiley, New Jersey (2006)CrossRef
23.
Zurück zum Zitat Wikipedia. Formant — Wikipedia, the free encyclopedia (2015). Accessed 1 Dec 2015 Wikipedia. Formant — Wikipedia, the free encyclopedia (2015). Accessed 1 Dec 2015
24.
Zurück zum Zitat Wikipedia. International phonetic alphabet chart for English dialects — Wikipedia, the free encyclopedia (2015). Accessed 1 Dec 2015 Wikipedia. International phonetic alphabet chart for English dialects — Wikipedia, the free encyclopedia (2015). Accessed 1 Dec 2015
25.
Zurück zum Zitat Wikipedia. Table of vowels — Wikipedia, the free encyclopedia (2015). Accessed 1 Dec 2015 Wikipedia. Table of vowels — Wikipedia, the free encyclopedia (2015). Accessed 1 Dec 2015
Metadaten
Titel
Speech Production and Modelling
verfasst von
Tom Bäckström
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-50204-5_2

Neuer Inhalt