Skip to main content

2019 | OriginalPaper | Buchkapitel

5. The Voice Signal and Its Information Content—2

verfasst von : Rita Singh

Erschienen in: Profiling Humans from their Voice

Verlag: Springer Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Information in the voice signal is embedded in both its time progression and in its spectral content, i.e. in its time domain and spectrographic domain respectively. Within these domains, information relevant to profiling may be present in the patterns exhibited by specific characteristics of the voice signal. The signal may however also have characteristics that are not evident in these domains, and must be searched for in other (derivative) mathematical domains where the relevant patterns become more tangible for measurement and analysis. This, however, is the subject of feature discovery—a subject that is discussed in Part II of this book. A third domain that reflects the information in the voice signal is that of physical or abstract models that simulate or explain the voice signal and the processes that generate it. We will refer to this as the model domain.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
More sophisticated methods for excitation extraction such as IAIF can be expected to produce improved results.
 
Literatur
1.
Zurück zum Zitat Kodera, K., De Villedary, C., & Gendrin, R. (1976). A new method for the numerical analysis of non-stationary signals. Physics of the Earth and Planetary Interiors, 12(2–3), 142–150. Kodera, K., De Villedary, C., & Gendrin, R. (1976). A new method for the numerical analysis of non-stationary signals. Physics of the Earth and Planetary Interiors, 12(2–3), 142–150.
2.
Zurück zum Zitat Kodera, K., Gendrin, R., & Villedary, C. D. (1978). Analysis of time-varying signals with small BT values. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(1), 64–76. Kodera, K., Gendrin, R., & Villedary, C. D. (1978). Analysis of time-varying signals with small BT values. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(1), 64–76.
3.
Zurück zum Zitat Flandrin, P., Auger, F., & Chassande-Mottin, E. (2003). Time-frequency reassignment: From principles to algorithms. Applications in Time-Frequency Signal Processing, 5(179–203), 102.MATH Flandrin, P., Auger, F., & Chassande-Mottin, E. (2003). Time-frequency reassignment: From principles to algorithms. Applications in Time-Frequency Signal Processing, 5(179–203), 102.MATH
4.
Zurück zum Zitat Auger, F., & Flandrin, P. (1995). Improving the readability of time-frequency and time-scale representations by the reassignment method. IEEE Transactions on Signal Processing, 43(5), 1068–1089. Auger, F., & Flandrin, P. (1995). Improving the readability of time-frequency and time-scale representations by the reassignment method. IEEE Transactions on Signal Processing, 43(5), 1068–1089.
5.
Zurück zum Zitat Nelson, D. J. (2001). Cross-spectral methods for processing speech. The Journal of the Acoustical Society of America, 110(5), 2575–2592. Nelson, D. J. (2001). Cross-spectral methods for processing speech. The Journal of the Acoustical Society of America, 110(5), 2575–2592.
6.
Zurück zum Zitat Nelson, D. (1993). Special purpose correlation functions for improved signal detection and parameter estimation. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Minneapolis, Minnesota, USA (Vol. 4, pp. 73–76). Nelson, D. (1993). Special purpose correlation functions for improved signal detection and parameter estimation. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Minneapolis, Minnesota, USA (Vol. 4, pp. 73–76).
7.
Zurück zum Zitat Hermansky, H., Hanson, B., & Wakita, H. (1985). Perceptually based linear predictive analysis of speech. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Tampa, Florida, USA (Vol. 10, pp. 509–512). Hermansky, H., Hanson, B., & Wakita, H. (1985). Perceptually based linear predictive analysis of speech. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Tampa, Florida, USA (Vol. 10, pp. 509–512).
8.
Zurück zum Zitat Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America,87(4), 1738–1752. Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America,87(4), 1738–1752.
9.
Zurück zum Zitat Kim, C., & Stern, R. M. (2016). Power-normalized cepstral coefficients (PNCC) for robust speech recognition. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP),24(7), 1315–1329. Kim, C., & Stern, R. M. (2016). Power-normalized cepstral coefficients (PNCC) for robust speech recognition. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP),24(7), 1315–1329.
10.
Zurück zum Zitat Darling, A. M. (1991). Properties and implementation of the gammatone filter: A tutorial. A report. Department of Phonetics and Linguistics, University College London (pp. 43–61). Darling, A. M. (1991). Properties and implementation of the gammatone filter: A tutorial. A report. Department of Phonetics and Linguistics, University College London (pp. 43–61).
11.
Zurück zum Zitat Teager, H. (1980). Some observations on oral air flow during phonation. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(5), 599–601. Teager, H. (1980). Some observations on oral air flow during phonation. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(5), 599–601.
12.
Zurück zum Zitat Kaiser, J. F. (1993). Some useful properties of Teager’s energy operators. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Minneapolis, Minnesota, USA (Vol. 3, pp. 149–152). Kaiser, J. F. (1993). Some useful properties of Teager’s energy operators. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Minneapolis, Minnesota, USA (Vol. 3, pp. 149–152).
13.
Zurück zum Zitat Kvedalen, E. (2003). Signal processing using the Teager Energy Operator and other nonlinear operators. Norway, Department of Informatics: Thesis for the Candidatus Scientiarum degree at the University of Oslo. Kvedalen, E. (2003). Signal processing using the Teager Energy Operator and other nonlinear operators. Norway, Department of Informatics: Thesis for the Candidatus Scientiarum degree at the University of Oslo.
14.
Zurück zum Zitat Maragos, P., Kaiser, J. F., & Quatieri, T. F. (1993). Energy separation in signal modulations with application to speech analysis. IEEE Transactions on Signal Processing, 41(10), 3024–3051.MATH Maragos, P., Kaiser, J. F., & Quatieri, T. F. (1993). Energy separation in signal modulations with application to speech analysis. IEEE Transactions on Signal Processing, 41(10), 3024–3051.MATH
15.
Zurück zum Zitat Jabloun, F., Cetin, A. E., & Erzin, E. (1999). Teager energy based feature parameters for speech recognition in car noise. IEEE Signal Processing Letters, 6(10), 259–261. Jabloun, F., Cetin, A. E., & Erzin, E. (1999). Teager energy based feature parameters for speech recognition in car noise. IEEE Signal Processing Letters, 6(10), 259–261.
16.
Zurück zum Zitat Oppenheim, A. V., & Schafer, R. W. (1975). Digital signal processing. Englewood Cliffs, New Jersey: Prentice-Hall Inc.MATH Oppenheim, A. V., & Schafer, R. W. (1975). Digital signal processing. Englewood Cliffs, New Jersey: Prentice-Hall Inc.MATH
17.
Zurück zum Zitat Kumaresan, R., & Rao, A. (1999). Model-based approach to envelope and positive instantaneous frequency estimation of signals with speech applications. The Journal of the Acoustical Society of America, 105(3), 1912–1924. Kumaresan, R., & Rao, A. (1999). Model-based approach to envelope and positive instantaneous frequency estimation of signals with speech applications. The Journal of the Acoustical Society of America, 105(3), 1912–1924.
18.
Zurück zum Zitat Kingsbury, B. E., Morgan, N., & Greenberg, S. (1998). Robust speech recognition using the modulation spectrogram. Speech Communication, 25(1–3), 117–132. Kingsbury, B. E., Morgan, N., & Greenberg, S. (1998). Robust speech recognition using the modulation spectrogram. Speech Communication, 25(1–3), 117–132.
19.
Zurück zum Zitat Gallun, F., & Souza, P. (2008). Exploring the role of the modulation spectrum in phoneme recognition. Ear and Hearing, 29(5), 800. Gallun, F., & Souza, P. (2008). Exploring the role of the modulation spectrum in phoneme recognition. Ear and Hearing, 29(5), 800.
20.
Zurück zum Zitat Meyer, B. T., Ravuri, S. V., Schädler, M. R., & Morgan, N. (2011). Comparing different flavors of spectro-temporal features for ASR. Twelfth Annual Conference of the International Speech Communication Association (INTERSPEECH) (pp. 1269–1272). Italy: Florence. Meyer, B. T., Ravuri, S. V., Schädler, M. R., & Morgan, N. (2011). Comparing different flavors of spectro-temporal features for ASR. Twelfth Annual Conference of the International Speech Communication Association (INTERSPEECH) (pp. 1269–1272). Italy: Florence.
21.
Zurück zum Zitat Tchorz, J., & Kollmeier, B. (1999). A model of auditory perception as front end for automatic speech recognition. The Journal of the Acoustical Society of America, 106(4), 2040–2050. Tchorz, J., & Kollmeier, B. (1999). A model of auditory perception as front end for automatic speech recognition. The Journal of the Acoustical Society of America, 106(4), 2040–2050.
22.
Zurück zum Zitat Viemeister, N. F. (1979). Temporal modulation transfer functions based upon modulation thresholds. The Journal of the Acoustical Society of America, 66(5), 1364–1380. Viemeister, N. F. (1979). Temporal modulation transfer functions based upon modulation thresholds. The Journal of the Acoustical Society of America, 66(5), 1364–1380.
23.
Zurück zum Zitat Yost, W. A., & Moore, M. J. (1987). Temporal changes in a complex spectral profile. The Journal of the Acoustical Society of America, 81(6), 1896–1905. Yost, W. A., & Moore, M. J. (1987). Temporal changes in a complex spectral profile. The Journal of the Acoustical Society of America, 81(6), 1896–1905.
24.
Zurück zum Zitat Joris, P. X., Schreiner, C. E., & Rees, A. (2004). Neural processing of amplitude-modulated sounds. Physiological Reviews, 84(2), 541–577. Joris, P. X., Schreiner, C. E., & Rees, A. (2004). Neural processing of amplitude-modulated sounds. Physiological Reviews, 84(2), 541–577.
25.
Zurück zum Zitat Kollmeier, B., & Koch, R. (1994). Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction. The Journal of the Acoustical Society of America, 95(3), 1593–1602. Kollmeier, B., & Koch, R. (1994). Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction. The Journal of the Acoustical Society of America, 95(3), 1593–1602.
26.
Zurück zum Zitat Singh, N. C., & Theunissen, F. E. (2003). Modulation spectra of natural sounds and ethological theories of auditory processing. The Journal of the Acoustical Society of America, 114(6), 3394–3411. Singh, N. C., & Theunissen, F. E. (2003). Modulation spectra of natural sounds and ethological theories of auditory processing. The Journal of the Acoustical Society of America, 114(6), 3394–3411.
27.
Zurück zum Zitat Tyagi, V. (2011). Fepstrum features: Design and application to conversational speech recognition. IBM Research Report (p. 11009). Tyagi, V. (2011). Fepstrum features: Design and application to conversational speech recognition. IBM Research Report (p. 11009).
28.
Zurück zum Zitat Atal, B. S., & Hanauer, S. L. (1971). Speech analysis and synthesis by linear prediction of the speech wave. The Journal of the Acoustical Society of America, 50(2B), 637–655. Atal, B. S., & Hanauer, S. L. (1971). Speech analysis and synthesis by linear prediction of the speech wave. The Journal of the Acoustical Society of America, 50(2B), 637–655.
29.
Zurück zum Zitat Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Prentice Hall, New Jersey: Englewood Cliffs. Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Prentice Hall, New Jersey: Englewood Cliffs.
30.
Zurück zum Zitat Levinson, N. (1946). The Wiener (root mean square) error criterion in filter design and prediction. Journal of Mathematics and Physics, 25(1–4), 261–278.MathSciNet Levinson, N. (1946). The Wiener (root mean square) error criterion in filter design and prediction. Journal of Mathematics and Physics, 25(1–4), 261–278.MathSciNet
31.
Zurück zum Zitat Durbin, J. (1960). The fitting of time-series models. Revue de l’Institut International de Statistique 233–244.MATH Durbin, J. (1960). The fitting of time-series models. Revue de l’Institut International de Statistique 233–244.MATH
32.
Zurück zum Zitat El-Jaroudi, A., & Makhoul, J. (1991). Discrete all-pole modeling. IEEE Transactions on Signal Processing, 39(2), 411–423. El-Jaroudi, A., & Makhoul, J. (1991). Discrete all-pole modeling. IEEE Transactions on Signal Processing, 39(2), 411–423.
33.
Zurück zum Zitat Gray, R., Buzo, A., Gray, A., & Matsuyama, Y. (1980). Distortion measures for speech processing. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 367–376.MATH Gray, R., Buzo, A., Gray, A., & Matsuyama, Y. (1980). Distortion measures for speech processing. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 367–376.MATH
34.
Zurück zum Zitat Liu, M., & Lacroix, A. (1996). Improved vocal tract model for the analysis of nasal speech sounds. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings (ICASSP), Atlanta, Georgia, USA (Vol. 2, pp. 801–804). Liu, M., & Lacroix, A. (1996). Improved vocal tract model for the analysis of nasal speech sounds. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings (ICASSP), Atlanta, Georgia, USA (Vol. 2, pp. 801–804).
35.
Zurück zum Zitat Alku, P. (1992). An automatic method to estimate the time-based parameters of the glottal pulseform. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), San Francisco, California, USA (Vol. 2, pp. 29–32). Alku, P. (1992). An automatic method to estimate the time-based parameters of the glottal pulseform. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), San Francisco, California, USA (Vol. 2, pp. 29–32).
36.
Zurück zum Zitat Vincent, D., Rosec, O., & Chonavel, T. (2005). Estimation of LF glottal source parameters based on an ARX model. In Proceedings of the Ninth European Conference on Speech Communication and Technology (INTERSPEECH/EUROSPEECH), Lisboa, Portugal. Vincent, D., Rosec, O., & Chonavel, T. (2005). Estimation of LF glottal source parameters based on an ARX model. In Proceedings of the Ninth European Conference on Speech Communication and Technology (INTERSPEECH/EUROSPEECH), Lisboa, Portugal.
37.
Zurück zum Zitat Milenkovic, P. (1986). Glottal inverse filtering by joint estimation of an AR system with a linear input model. IEEE Transactions on Acoustics, Speech, and Signal Processing, 34(1), 28–42. Milenkovic, P. (1986). Glottal inverse filtering by joint estimation of an AR system with a linear input model. IEEE Transactions on Acoustics, Speech, and Signal Processing, 34(1), 28–42.
38.
Zurück zum Zitat Veeneman, D., & BeMent, S. (1985). Automatic glottal inverse filtering from speech and electroglottographic signals. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(2), 369–377. Veeneman, D., & BeMent, S. (1985). Automatic glottal inverse filtering from speech and electroglottographic signals. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(2), 369–377.
39.
Zurück zum Zitat Childers, D. G., Hicks, D. M., Moore, G. P., & Alsaka, Y. A. (1986). A model for vocal fold vibratory motion, contact area, and the electroglottogram. The Journal of the Acoustical Society of America, 80(5), 1309–1320. Childers, D. G., Hicks, D. M., Moore, G. P., & Alsaka, Y. A. (1986). A model for vocal fold vibratory motion, contact area, and the electroglottogram. The Journal of the Acoustical Society of America, 80(5), 1309–1320.
40.
Zurück zum Zitat Alku, P. (2011). Glottal inverse filtering analysis of human voice production—a review of estimation and parameterization methods of the glottal excitation and their applications. Sadhana, 36(5), 623–650. Alku, P. (2011). Glottal inverse filtering analysis of human voice production—a review of estimation and parameterization methods of the glottal excitation and their applications. Sadhana, 36(5), 623–650.
41.
Zurück zum Zitat Rothenberg, M. (1977). Measurement of airflow in speech. Journal of Speech and Hearing Research, 20(1), 155–176. Rothenberg, M. (1977). Measurement of airflow in speech. Journal of Speech and Hearing Research, 20(1), 155–176.
42.
Zurück zum Zitat Alku, P. (1992). Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Communication, 11(2–3), 109–118. Alku, P. (1992). Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Communication, 11(2–3), 109–118.
43.
Zurück zum Zitat Drugman, T., Bozkurt, B., & Dutoit, T. (2011). Causal-anticausal decomposition of speech using complex cepstrum for glottal source estimation. Speech Communication, 53(6), 855–866. Drugman, T., Bozkurt, B., & Dutoit, T. (2011). Causal-anticausal decomposition of speech using complex cepstrum for glottal source estimation. Speech Communication, 53(6), 855–866.
44.
Zurück zum Zitat Childers, D. G., Skinner, D. P., & Kemerait, R. C. (1977). The cepstrum: A guide to processing. Proceedings of the IEEE, 65(10), 1428–1443. Childers, D. G., Skinner, D. P., & Kemerait, R. C. (1977). The cepstrum: A guide to processing. Proceedings of the IEEE, 65(10), 1428–1443.
45.
Zurück zum Zitat Tribolet, J. (1977). A new phase unwrapping algorithm. IEEE Transactions on Acoustics, Speech, and Signal Processing, 25(2), 170–177.MATH Tribolet, J. (1977). A new phase unwrapping algorithm. IEEE Transactions on Acoustics, Speech, and Signal Processing, 25(2), 170–177.MATH
46.
Zurück zum Zitat Drugman, T., Thomas, M., Gudnason, J., Naylor, P., & Dutoit, T. (2012). Detection of glottal closure instants from speech signals: A quantitative review. IEEE Transactions on Audio, Speech, and Language Processing, 20(3), 994–1006. Drugman, T., Thomas, M., Gudnason, J., Naylor, P., & Dutoit, T. (2012). Detection of glottal closure instants from speech signals: A quantitative review. IEEE Transactions on Audio, Speech, and Language Processing, 20(3), 994–1006.
47.
Zurück zum Zitat Drugman, T., & Dutoit, T. (2009). Glottal closure and opening instant detection from speech signals. In Proceedings of the Tenth Annual Conference of the International Speech Communication Association (INTERSPEECH), Brighton, UK (pp. 2891–2894). Drugman, T., & Dutoit, T. (2009). Glottal closure and opening instant detection from speech signals. In Proceedings of the Tenth Annual Conference of the International Speech Communication Association (INTERSPEECH), Brighton, UK (pp. 2891–2894).
48.
Zurück zum Zitat Cheng, Y. M., & O’Shaughnessy, D. (1989). Automatic and reliable estimation of glottal closure instant and period. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(12), 1805–1815. Cheng, Y. M., & O’Shaughnessy, D. (1989). Automatic and reliable estimation of glottal closure instant and period. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(12), 1805–1815.
49.
Zurück zum Zitat Wong, D., Markel, J., & Gray, A. (1979). Least squares glottal inverse filtering from the acoustic speech waveform. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(4), 350–355. Wong, D., Markel, J., & Gray, A. (1979). Least squares glottal inverse filtering from the acoustic speech waveform. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(4), 350–355.
50.
Zurück zum Zitat Brookes, M., Naylor, P. A., & Gudnason, J. (2006). A quantitative assessment of group delay methods for identifying glottal closures in voiced speech. IEEE Transactions on Audio, Speech, and Language Processing, 14(2), 456–466. Brookes, M., Naylor, P. A., & Gudnason, J. (2006). A quantitative assessment of group delay methods for identifying glottal closures in voiced speech. IEEE Transactions on Audio, Speech, and Language Processing, 14(2), 456–466.
51.
Zurück zum Zitat Naylor, P. A., Kounoudes, A., Gudnason, J., & Brookes, M. (2007). Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 34–43. Naylor, P. A., Kounoudes, A., Gudnason, J., & Brookes, M. (2007). Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 34–43.
52.
Zurück zum Zitat Gerhard, D. (2003). Pitch extraction and fundamental frequency: History and current techniques. Technical Report TR-CS 2003-06. Department of Computer Science, University of Regina, Canada (pp. 0–22). Gerhard, D. (2003). Pitch extraction and fundamental frequency: History and current techniques. Technical Report TR-CS 2003-06. Department of Computer Science, University of Regina, Canada (pp. 0–22).
53.
Zurück zum Zitat Seltzer, M. L., & Michael, D. (2000). Automatic detection of corrupt spectrographic features for robust speech recognition. Master of Science Thesis, Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, USA. Seltzer, M. L., & Michael, D. (2000). Automatic detection of corrupt spectrographic features for robust speech recognition. Master of Science Thesis, Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, USA.
54.
Zurück zum Zitat Scordilis, M. S., & Gowdy, J. N. (1989). Neural network based generation of fundamental frequency contours. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Glasgow, Scotland (pp. 219–222). Scordilis, M. S., & Gowdy, J. N. (1989). Neural network based generation of fundamental frequency contours. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Glasgow, Scotland (pp. 219–222).
55.
Zurück zum Zitat Han, K., & Wang, D. (2014). Neural network based pitch tracking in very noisy speech. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP),22(12), 2158–2168. Han, K., & Wang, D. (2014). Neural network based pitch tracking in very noisy speech. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP),22(12), 2158–2168.
56.
Zurück zum Zitat Su, H., Zhang, H., Zhang, X., & Gao, G. (2016). Convolutional neural network for robust pitch determination. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China (pp. 579–583). Su, H., Zhang, H., Zhang, X., & Gao, G. (2016). Convolutional neural network for robust pitch determination. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China (pp. 579–583).
57.
Zurück zum Zitat Ananthapadmanabha, T. V., & Fant, G. (1982). Calculation of true glottal flow and its components. Speech Communication, 1(3–4), 167–184. Ananthapadmanabha, T. V., & Fant, G. (1982). Calculation of true glottal flow and its components. Speech Communication, 1(3–4), 167–184.
58.
Zurück zum Zitat Lucero, J. C., & Koenig, L. L. (2005). Phonation thresholds as a function of laryngeal size in a two-mass model of the vocal folds. The Journal of the Acoustical Society of America, 118(5), 2798–2801. Lucero, J. C., & Koenig, L. L. (2005). Phonation thresholds as a function of laryngeal size in a two-mass model of the vocal folds. The Journal of the Acoustical Society of America, 118(5), 2798–2801.
59.
Zurück zum Zitat Titze, I. R. (1992). Phonation threshold pressure: A missing link in glottal aerodynamics. The Journal of the Acoustical Society of America, 91(5), 2926–2935. Titze, I. R. (1992). Phonation threshold pressure: A missing link in glottal aerodynamics. The Journal of the Acoustical Society of America, 91(5), 2926–2935.
60.
Zurück zum Zitat Plant, R. L., Freed, G. L., & Plant, R. E. (2004). Direct measurement of onset and offset phonation threshold pressure in normal subjects. The Journal of the Acoustical Society of America, 116(6), 3640–3646. Plant, R. L., Freed, G. L., & Plant, R. E. (2004). Direct measurement of onset and offset phonation threshold pressure in normal subjects. The Journal of the Acoustical Society of America, 116(6), 3640–3646.
61.
Zurück zum Zitat Isshiki, N. (1981). Vocal efficiency index. In K. N. Steven & M. Hirano (Eds.), Vocal fold physiology (pp. 193–203). Press: University of Tokyo. Isshiki, N. (1981). Vocal efficiency index. In K. N. Steven & M. Hirano (Eds.), Vocal fold physiology (pp. 193–203). Press: University of Tokyo.
62.
Zurück zum Zitat Klatt, D. H. (1987). Review of text-to-speech conversion for English. The Journal of the Acoustical Society of America, 82(3), 737–793. Klatt, D. H. (1987). Review of text-to-speech conversion for English. The Journal of the Acoustical Society of America, 82(3), 737–793.
63.
Zurück zum Zitat Rosenberg, A. E. (1971). Effect of glottal pulse shape on the quality of natural vowels. The Journal of the Acoustical Society of America, 49(2B), 583–590. Rosenberg, A. E. (1971). Effect of glottal pulse shape on the quality of natural vowels. The Journal of the Acoustical Society of America, 49(2B), 583–590.
64.
Zurück zum Zitat Hedelin, P. (1984). A glottal LPC-vocoder. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), San Diego, California, USA (Vol. 9, pp. 21–24). Hedelin, P. (1984). A glottal LPC-vocoder. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), San Diego, California, USA (Vol. 9, pp. 21–24).
65.
Zurück zum Zitat Hedelin, P. (1986). High quality glottal LPC-vocoding. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Tokyo, Japan (Vol. 11, pp. 465–468). Hedelin, P. (1986). High quality glottal LPC-vocoding. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Tokyo, Japan (Vol. 11, pp. 465–468).
66.
Zurück zum Zitat Fujisaki, H., & Ljungqvist, M. (1986). Proposal and evaluation of models for the glottal source waveform. In IEEE Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Tokyo, Japan (Vol. 11, pp. 1605–1608). Fujisaki, H., & Ljungqvist, M. (1986). Proposal and evaluation of models for the glottal source waveform. In IEEE Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Tokyo, Japan (Vol. 11, pp. 1605–1608).
67.
Zurück zum Zitat Fant, G., Liljencrants, J., & Lin, Q. G. (1985). A four-parameter model of glottal flow. Speech Transmission Laboratory Quarterly Progress and Status Report (STL-QPSR),4(1985), 1–13. Fant, G., Liljencrants, J., & Lin, Q. G. (1985). A four-parameter model of glottal flow. Speech Transmission Laboratory Quarterly Progress and Status Report (STL-QPSR),4(1985), 1–13.
68.
Zurück zum Zitat Gobl, C. (2003). The voice source in speech communication-production and perception experiments involving inverse filtering and synthesis. Doctoral dissertation, Institutionen för talöverföring och musikakustik, Royal Institute of Technology, Stockholm, Sweden. Gobl, C. (2003). The voice source in speech communication-production and perception experiments involving inverse filtering and synthesis. Doctoral dissertation, Institutionen för talöverföring och musikakustik, Royal Institute of Technology, Stockholm, Sweden.
69.
Zurück zum Zitat Drioli, C. (2005). A flow waveform-matched low-dimensional glottal model based on physical knowledge. The Journal of the Acoustical Society of America, 117(5), 3184–3195. Drioli, C. (2005). A flow waveform-matched low-dimensional glottal model based on physical knowledge. The Journal of the Acoustical Society of America, 117(5), 3184–3195.
70.
Zurück zum Zitat Avanzini, F. (2008). Simulation of vocal fold oscillation with a pseudo-one-mass physical model. Speech Communication, 50(2), 95–108.MathSciNet Avanzini, F. (2008). Simulation of vocal fold oscillation with a pseudo-one-mass physical model. Speech Communication, 50(2), 95–108.MathSciNet
71.
Zurück zum Zitat Frøkjaer-Jensen, B., & Prytz, S. (1976). Registration of voice quality. Brüel and Kjaer Technical Review, 3, 3–17. Frøkjaer-Jensen, B., & Prytz, S. (1976). Registration of voice quality. Brüel and Kjaer Technical Review, 3, 3–17.
72.
Zurück zum Zitat Childers, D. G., & Lee, C. K. (1991). Vocal quality factors: analysis, synthesis, and perception. The Journal of the Acoustical Society of America, 90(5), 2394–2410. Childers, D. G., & Lee, C. K. (1991). Vocal quality factors: analysis, synthesis, and perception. The Journal of the Acoustical Society of America, 90(5), 2394–2410.
73.
Zurück zum Zitat Titze, I. R., & Sundberg, J. (1992). Vocal intensity in speakers and singers. The Journal of the Acoustical Society of America,91(5), 2936–2946. Titze, I. R., & Sundberg, J. (1992). Vocal intensity in speakers and singers. The Journal of the Acoustical Society of America,91(5), 2936–2946.
74.
Zurück zum Zitat Alku, P., Strik, H., & Vilkman, E. (1997). Parabolic spectral parameter—a new method for quantification of the glottal flow. Speech Communication, 22(1), 67–79. Alku, P., Strik, H., & Vilkman, E. (1997). Parabolic spectral parameter—a new method for quantification of the glottal flow. Speech Communication, 22(1), 67–79.
75.
Zurück zum Zitat Murphy, P. J. (1999). Perturbation-free measurement of the harmonics-to-noise ratio in voice signals using pitch synchronous harmonic analysis. The Journal of the Acoustical Society of America, 105(5), 2866–2881. Murphy, P. J. (1999). Perturbation-free measurement of the harmonics-to-noise ratio in voice signals using pitch synchronous harmonic analysis. The Journal of the Acoustical Society of America, 105(5), 2866–2881.
76.
Zurück zum Zitat Cummings, K. E., & Clements, M. A. (1995). Analysis of the glottal excitation of emotionally styled and stressed speech. The Journal of the Acoustical Society of America, 98(1), 88–98. Cummings, K. E., & Clements, M. A. (1995). Analysis of the glottal excitation of emotionally styled and stressed speech. The Journal of the Acoustical Society of America, 98(1), 88–98.
77.
Zurück zum Zitat Laukkanen, A. M., Vilkman, E., Alku, P., & Oksanen, H. (1996). Physical variations related to stress and emotional state: A preliminary study. Journal of Phonetics, 24(3), 313–335. Laukkanen, A. M., Vilkman, E., Alku, P., & Oksanen, H. (1996). Physical variations related to stress and emotional state: A preliminary study. Journal of Phonetics, 24(3), 313–335.
78.
Zurück zum Zitat Laukkanen, A. M., Vilkman, E., Alku, P., & Oksanen, H. (1997). On the perception of emotions in speech: The role of voice quality. Logopedics Phoniatrics Vocology, 22(4), 157–168. Laukkanen, A. M., Vilkman, E., Alku, P., & Oksanen, H. (1997). On the perception of emotions in speech: The role of voice quality. Logopedics Phoniatrics Vocology, 22(4), 157–168.
79.
Zurück zum Zitat Gobl, C., & Chasaide, A. N. (2003). The role of voice quality in communicating emotion, mood and attitude. Speech Communication, 40(1–2), 189–212.MATH Gobl, C., & Chasaide, A. N. (2003). The role of voice quality in communicating emotion, mood and attitude. Speech Communication, 40(1–2), 189–212.MATH
80.
Zurück zum Zitat Airas, M., & Alku, P. (2006). Emotions in vowel segments of continuous speech: Analysis of the glottal flow using the normalised amplitude quotient. Phonetica, 63(1), 26–46. Airas, M., & Alku, P. (2006). Emotions in vowel segments of continuous speech: Analysis of the glottal flow using the normalised amplitude quotient. Phonetica, 63(1), 26–46.
81.
Zurück zum Zitat Waaramaa, T., Laukkanen, A. M., Airas, M., & Alku, P. (2010). Perception of emotional valences and activity levels from vowel segments of continuous speech. Journal of Voice, 24(1), 30–38. Waaramaa, T., Laukkanen, A. M., Airas, M., & Alku, P. (2010). Perception of emotional valences and activity levels from vowel segments of continuous speech. Journal of Voice, 24(1), 30–38.
82.
Zurück zum Zitat Higgins, M. B., & Saxman, J. H. (1991). A comparison of selected phonatory behaviors of healthy aged and young adults. Journal of Speech, Language, and Hearing Research, 34(5), 1000–1010. Higgins, M. B., & Saxman, J. H. (1991). A comparison of selected phonatory behaviors of healthy aged and young adults. Journal of Speech, Language, and Hearing Research, 34(5), 1000–1010.
83.
Zurück zum Zitat Sapienza, C. M., & Stathopoulos, E. T. (1994). Comparison of maximum flow declination rate: Children versus adults. Journal of Voice, 8(3), 240–247. Sapienza, C. M., & Stathopoulos, E. T. (1994). Comparison of maximum flow declination rate: Children versus adults. Journal of Voice, 8(3), 240–247.
84.
Zurück zum Zitat Sapienza, C. M., & Dutka, J. (1996). Glottal airflow characteristics of women’s voice production along an aging continuum. Journal of Speech, Language, and Hearing Research, 39(2), 322–328. Sapienza, C. M., & Dutka, J. (1996). Glottal airflow characteristics of women’s voice production along an aging continuum. Journal of Speech, Language, and Hearing Research, 39(2), 322–328.
85.
Zurück zum Zitat Hodge, F. S., Colton, R. H., & Kelley, R. T. (2001). Vocal intensity characteristics innormal and elderly speakers. Journal of Voice, 15(4), 503–511. Hodge, F. S., Colton, R. H., & Kelley, R. T. (2001). Vocal intensity characteristics innormal and elderly speakers. Journal of Voice, 15(4), 503–511.
86.
Zurück zum Zitat Welham, N. V., & Maclagan, M. A. (2003). Vocal fatigue: Current knowledge and future directions. Journal of Voice, 17(1), 21–30. Welham, N. V., & Maclagan, M. A. (2003). Vocal fatigue: Current knowledge and future directions. Journal of Voice, 17(1), 21–30.
87.
Zurück zum Zitat Ozdas, A., Shiavi, R. G., Silverman, S. E., Silverman, M. K., & Wilkes, D. M. (2004). Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk. IEEE Transactions on Biomedical Engineering, 51(9), 1530–1540. Ozdas, A., Shiavi, R. G., Silverman, S. E., Silverman, M. K., & Wilkes, D. M. (2004). Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk. IEEE Transactions on Biomedical Engineering, 51(9), 1530–1540.
88.
Zurück zum Zitat Stanek, M., & Sigmund, M. (2015). Psychological stress detection in speech using return-to-opening phase ratios in glottis. Elektronika ir Elektrotechnika, 21(5), 59–63. Stanek, M., & Sigmund, M. (2015). Psychological stress detection in speech using return-to-opening phase ratios in glottis. Elektronika ir Elektrotechnika, 21(5), 59–63.
89.
Zurück zum Zitat Sigmund, M., Prokes, A., & Zelinka, P. (2010). Detection of alcohol in speech signal using LF model. In Proceedings of the International Conference on Artificial Intelligence and Applications. Innsbruck, Austria (pp. 193–196). Sigmund, M., Prokes, A., & Zelinka, P. (2010). Detection of alcohol in speech signal using LF model. In Proceedings of the International Conference on Artificial Intelligence and Applications. Innsbruck, Austria (pp. 193–196).
90.
Zurück zum Zitat Koike, Y., & Markel, J. (1975). Application of inverse filtering for detecting laryngeal pathology. Annals of Otology, Rhinology & Laryngology, 84(1), 117–124. Koike, Y., & Markel, J. (1975). Application of inverse filtering for detecting laryngeal pathology. Annals of Otology, Rhinology & Laryngology, 84(1), 117–124.
91.
Zurück zum Zitat Deller, J. (1982). Evaluation of laryngeal dysfunction based on features of an accurate estimate of the glottal waveform. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Paris, France (Vol. 7, pp. 759–762). Deller, J. (1982). Evaluation of laryngeal dysfunction based on features of an accurate estimate of the glottal waveform. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Paris, France (Vol. 7, pp. 759–762).
92.
Zurück zum Zitat Hillman, R. E., Holmberg, E. B., Perkell, J. S., Walsh, M., & Vaughan, C. (1990). Phonatory function associated with hyperfunctionally related vocal fold lesions. Journal of Voice, 4(1), 52–63. Hillman, R. E., Holmberg, E. B., Perkell, J. S., Walsh, M., & Vaughan, C. (1990). Phonatory function associated with hyperfunctionally related vocal fold lesions. Journal of Voice, 4(1), 52–63.
93.
Zurück zum Zitat Howell, P., & Williams, M. (1988). The contribution of the excitatory source to the perception of neutral vowels in stuttered speech. The Journal of the Acoustical Society of America, 84(1), 80–89. Howell, P., & Williams, M. (1988). The contribution of the excitatory source to the perception of neutral vowels in stuttered speech. The Journal of the Acoustical Society of America, 84(1), 80–89.
94.
Zurück zum Zitat Howell, P., & Williams, M. (1992). Acoustic analysis and perception of vowels in children’s and teenagers’ stuttered speech. The Journal of the Acoustical Society of America, 91(3), 1697–1706. Howell, P., & Williams, M. (1992). Acoustic analysis and perception of vowels in children’s and teenagers’ stuttered speech. The Journal of the Acoustical Society of America, 91(3), 1697–1706.
95.
Zurück zum Zitat Björkner, E., Sundberg, J., Cleveland, T., & Stone, E. (2006). Voice source differences between registers in female musical theater singers. Journal of Voice, 20(2), 187–197. Björkner, E., Sundberg, J., Cleveland, T., & Stone, E. (2006). Voice source differences between registers in female musical theater singers. Journal of Voice, 20(2), 187–197.
96.
Zurück zum Zitat Sundberg, J., Fahlstedt, E., & Morell, A. (2005). Effects on the glottal voice source of vocal loudness variation in untrained female and male voices. The Journal of the Acoustical Society of America, 117(2), 879–885. Sundberg, J., Fahlstedt, E., & Morell, A. (2005). Effects on the glottal voice source of vocal loudness variation in untrained female and male voices. The Journal of the Acoustical Society of America, 117(2), 879–885.
97.
Zurück zum Zitat Schafer, R. W., & Rabiner, L. R. (1970). System for automatic formant analysis of voiced speech. The Journal of the Acoustical Society of America, 47(2B), 634–648. Schafer, R. W., & Rabiner, L. R. (1970). System for automatic formant analysis of voiced speech. The Journal of the Acoustical Society of America, 47(2B), 634–648.
98.
Zurück zum Zitat Kammoun, M. A., Gargouri, D., Frikha, M., & Hamida, A. B. (2004). Cepstral method evaluation in speech formant frequencies estimation. In Proceedings of the IEEE International Conference on Industrial Technology (ICIT), Hammamet, Tunisia (Vol. 3, pp. 1612–1616). Kammoun, M. A., Gargouri, D., Frikha, M., & Hamida, A. B. (2004). Cepstral method evaluation in speech formant frequencies estimation. In Proceedings of the IEEE International Conference on Industrial Technology (ICIT), Hammamet, Tunisia (Vol. 3, pp. 1612–1616).
99.
Zurück zum Zitat Kammoun, M. A., Gargouri, D., Frikha, M., & Hamida, A. B. (2006). Cepstrum vs. LPC: A comparative study for speech formant frequencies estimation. GESTS International Transactions on Communication and Signal Processing,9(1), 87–102. Kammoun, M. A., Gargouri, D., Frikha, M., & Hamida, A. B. (2006). Cepstrum vs. LPC: A comparative study for speech formant frequencies estimation. GESTS International Transactions on Communication and Signal Processing,9(1), 87–102.
100.
Zurück zum Zitat Hunt, M. J. (1987). Delayed decisions in speech recognition—the case of formants. Pattern Recognition Letters, 6(2), 121–137.MathSciNet Hunt, M. J. (1987). Delayed decisions in speech recognition—the case of formants. Pattern Recognition Letters, 6(2), 121–137.MathSciNet
101.
Zurück zum Zitat Lee, C. H. (1989). Applications of dynamic programming to speech and language processing. AT & T Technical Journal, 68(3), 114–130. Lee, C. H. (1989). Applications of dynamic programming to speech and language processing. AT & T Technical Journal, 68(3), 114–130.
102.
Zurück zum Zitat Snell, R. C., & Milinazzo, F. (1993). Formant location from LPC analysis data. IEEE transactions on Speech and Audio Processing, 1(2), 129–134. Snell, R. C., & Milinazzo, F. (1993). Formant location from LPC analysis data. IEEE transactions on Speech and Audio Processing, 1(2), 129–134.
103.
Zurück zum Zitat Sandler, M. (1991). Algorithm for high precision root finding from high order LPC models. IEE Proceedings I-Communications, Speech and Vision, 138(6), 596–602. Sandler, M. (1991). Algorithm for high precision root finding from high order LPC models. IEE Proceedings I-Communications, Speech and Vision, 138(6), 596–602.
104.
Zurück zum Zitat Fant, G. (1962). Descriptive analysis of the acoustic aspects of speech. Logos, 5, 3–17. Fant, G. (1962). Descriptive analysis of the acoustic aspects of speech. Logos, 5, 3–17.
105.
Zurück zum Zitat Fitch, W. T. (1997). Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques. The Journal of the Acoustical Society of America, 102(2), 1213–1222. Fitch, W. T. (1997). Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques. The Journal of the Acoustical Society of America, 102(2), 1213–1222.
106.
Zurück zum Zitat Laan, G. P. (1997). The contribution of intonation, segmental durations, and spectral features to the perception of a spontaneous and a read speaking style. Speech Communication, 22(1), 43–65. Laan, G. P. (1997). The contribution of intonation, segmental durations, and spectral features to the perception of a spontaneous and a read speaking style. Speech Communication, 22(1), 43–65.
107.
Zurück zum Zitat Zhou, G., Hansen, J. H., & Kaiser, J. F. (2001). Nonlinear feature based classification of speech under stress. IEEE Transactions on Speech and Audio Processing, 9(3), 201–216. Zhou, G., Hansen, J. H., & Kaiser, J. F. (2001). Nonlinear feature based classification of speech under stress. IEEE Transactions on Speech and Audio Processing, 9(3), 201–216.
108.
Zurück zum Zitat El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587.MATH El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587.MATH
Metadaten
Titel
The Voice Signal and Its Information Content—2
verfasst von
Rita Singh
Copyright-Jahr
2019
Verlag
Springer Singapore
DOI
https://doi.org/10.1007/978-981-13-8403-5_5

Neuer Inhalt