Skip to main content
Top

2019 | OriginalPaper | Chapter

5. The Voice Signal and Its Information Content—2

Author : Rita Singh

Published in: Profiling Humans from their Voice

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Information in the voice signal is embedded in both its time progression and in its spectral content, i.e. in its time domain and spectrographic domain respectively. Within these domains, information relevant to profiling may be present in the patterns exhibited by specific characteristics of the voice signal. The signal may however also have characteristics that are not evident in these domains, and must be searched for in other (derivative) mathematical domains where the relevant patterns become more tangible for measurement and analysis. This, however, is the subject of feature discovery—a subject that is discussed in Part II of this book. A third domain that reflects the information in the voice signal is that of physical or abstract models that simulate or explain the voice signal and the processes that generate it. We will refer to this as the model domain.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
More sophisticated methods for excitation extraction such as IAIF can be expected to produce improved results.
 
Literature
1.
go back to reference Kodera, K., De Villedary, C., & Gendrin, R. (1976). A new method for the numerical analysis of non-stationary signals. Physics of the Earth and Planetary Interiors, 12(2–3), 142–150. Kodera, K., De Villedary, C., & Gendrin, R. (1976). A new method for the numerical analysis of non-stationary signals. Physics of the Earth and Planetary Interiors, 12(2–3), 142–150.
2.
go back to reference Kodera, K., Gendrin, R., & Villedary, C. D. (1978). Analysis of time-varying signals with small BT values. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(1), 64–76. Kodera, K., Gendrin, R., & Villedary, C. D. (1978). Analysis of time-varying signals with small BT values. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(1), 64–76.
3.
go back to reference Flandrin, P., Auger, F., & Chassande-Mottin, E. (2003). Time-frequency reassignment: From principles to algorithms. Applications in Time-Frequency Signal Processing, 5(179–203), 102.MATH Flandrin, P., Auger, F., & Chassande-Mottin, E. (2003). Time-frequency reassignment: From principles to algorithms. Applications in Time-Frequency Signal Processing, 5(179–203), 102.MATH
4.
go back to reference Auger, F., & Flandrin, P. (1995). Improving the readability of time-frequency and time-scale representations by the reassignment method. IEEE Transactions on Signal Processing, 43(5), 1068–1089. Auger, F., & Flandrin, P. (1995). Improving the readability of time-frequency and time-scale representations by the reassignment method. IEEE Transactions on Signal Processing, 43(5), 1068–1089.
5.
go back to reference Nelson, D. J. (2001). Cross-spectral methods for processing speech. The Journal of the Acoustical Society of America, 110(5), 2575–2592. Nelson, D. J. (2001). Cross-spectral methods for processing speech. The Journal of the Acoustical Society of America, 110(5), 2575–2592.
6.
go back to reference Nelson, D. (1993). Special purpose correlation functions for improved signal detection and parameter estimation. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Minneapolis, Minnesota, USA (Vol. 4, pp. 73–76). Nelson, D. (1993). Special purpose correlation functions for improved signal detection and parameter estimation. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Minneapolis, Minnesota, USA (Vol. 4, pp. 73–76).
7.
go back to reference Hermansky, H., Hanson, B., & Wakita, H. (1985). Perceptually based linear predictive analysis of speech. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Tampa, Florida, USA (Vol. 10, pp. 509–512). Hermansky, H., Hanson, B., & Wakita, H. (1985). Perceptually based linear predictive analysis of speech. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Tampa, Florida, USA (Vol. 10, pp. 509–512).
8.
go back to reference Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America,87(4), 1738–1752. Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America,87(4), 1738–1752.
9.
go back to reference Kim, C., & Stern, R. M. (2016). Power-normalized cepstral coefficients (PNCC) for robust speech recognition. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP),24(7), 1315–1329. Kim, C., & Stern, R. M. (2016). Power-normalized cepstral coefficients (PNCC) for robust speech recognition. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP),24(7), 1315–1329.
10.
go back to reference Darling, A. M. (1991). Properties and implementation of the gammatone filter: A tutorial. A report. Department of Phonetics and Linguistics, University College London (pp. 43–61). Darling, A. M. (1991). Properties and implementation of the gammatone filter: A tutorial. A report. Department of Phonetics and Linguistics, University College London (pp. 43–61).
11.
go back to reference Teager, H. (1980). Some observations on oral air flow during phonation. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(5), 599–601. Teager, H. (1980). Some observations on oral air flow during phonation. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(5), 599–601.
12.
go back to reference Kaiser, J. F. (1993). Some useful properties of Teager’s energy operators. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Minneapolis, Minnesota, USA (Vol. 3, pp. 149–152). Kaiser, J. F. (1993). Some useful properties of Teager’s energy operators. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Minneapolis, Minnesota, USA (Vol. 3, pp. 149–152).
13.
go back to reference Kvedalen, E. (2003). Signal processing using the Teager Energy Operator and other nonlinear operators. Norway, Department of Informatics: Thesis for the Candidatus Scientiarum degree at the University of Oslo. Kvedalen, E. (2003). Signal processing using the Teager Energy Operator and other nonlinear operators. Norway, Department of Informatics: Thesis for the Candidatus Scientiarum degree at the University of Oslo.
14.
go back to reference Maragos, P., Kaiser, J. F., & Quatieri, T. F. (1993). Energy separation in signal modulations with application to speech analysis. IEEE Transactions on Signal Processing, 41(10), 3024–3051.MATH Maragos, P., Kaiser, J. F., & Quatieri, T. F. (1993). Energy separation in signal modulations with application to speech analysis. IEEE Transactions on Signal Processing, 41(10), 3024–3051.MATH
15.
go back to reference Jabloun, F., Cetin, A. E., & Erzin, E. (1999). Teager energy based feature parameters for speech recognition in car noise. IEEE Signal Processing Letters, 6(10), 259–261. Jabloun, F., Cetin, A. E., & Erzin, E. (1999). Teager energy based feature parameters for speech recognition in car noise. IEEE Signal Processing Letters, 6(10), 259–261.
16.
go back to reference Oppenheim, A. V., & Schafer, R. W. (1975). Digital signal processing. Englewood Cliffs, New Jersey: Prentice-Hall Inc.MATH Oppenheim, A. V., & Schafer, R. W. (1975). Digital signal processing. Englewood Cliffs, New Jersey: Prentice-Hall Inc.MATH
17.
go back to reference Kumaresan, R., & Rao, A. (1999). Model-based approach to envelope and positive instantaneous frequency estimation of signals with speech applications. The Journal of the Acoustical Society of America, 105(3), 1912–1924. Kumaresan, R., & Rao, A. (1999). Model-based approach to envelope and positive instantaneous frequency estimation of signals with speech applications. The Journal of the Acoustical Society of America, 105(3), 1912–1924.
18.
go back to reference Kingsbury, B. E., Morgan, N., & Greenberg, S. (1998). Robust speech recognition using the modulation spectrogram. Speech Communication, 25(1–3), 117–132. Kingsbury, B. E., Morgan, N., & Greenberg, S. (1998). Robust speech recognition using the modulation spectrogram. Speech Communication, 25(1–3), 117–132.
19.
go back to reference Gallun, F., & Souza, P. (2008). Exploring the role of the modulation spectrum in phoneme recognition. Ear and Hearing, 29(5), 800. Gallun, F., & Souza, P. (2008). Exploring the role of the modulation spectrum in phoneme recognition. Ear and Hearing, 29(5), 800.
20.
go back to reference Meyer, B. T., Ravuri, S. V., Schädler, M. R., & Morgan, N. (2011). Comparing different flavors of spectro-temporal features for ASR. Twelfth Annual Conference of the International Speech Communication Association (INTERSPEECH) (pp. 1269–1272). Italy: Florence. Meyer, B. T., Ravuri, S. V., Schädler, M. R., & Morgan, N. (2011). Comparing different flavors of spectro-temporal features for ASR. Twelfth Annual Conference of the International Speech Communication Association (INTERSPEECH) (pp. 1269–1272). Italy: Florence.
21.
go back to reference Tchorz, J., & Kollmeier, B. (1999). A model of auditory perception as front end for automatic speech recognition. The Journal of the Acoustical Society of America, 106(4), 2040–2050. Tchorz, J., & Kollmeier, B. (1999). A model of auditory perception as front end for automatic speech recognition. The Journal of the Acoustical Society of America, 106(4), 2040–2050.
22.
go back to reference Viemeister, N. F. (1979). Temporal modulation transfer functions based upon modulation thresholds. The Journal of the Acoustical Society of America, 66(5), 1364–1380. Viemeister, N. F. (1979). Temporal modulation transfer functions based upon modulation thresholds. The Journal of the Acoustical Society of America, 66(5), 1364–1380.
23.
go back to reference Yost, W. A., & Moore, M. J. (1987). Temporal changes in a complex spectral profile. The Journal of the Acoustical Society of America, 81(6), 1896–1905. Yost, W. A., & Moore, M. J. (1987). Temporal changes in a complex spectral profile. The Journal of the Acoustical Society of America, 81(6), 1896–1905.
24.
go back to reference Joris, P. X., Schreiner, C. E., & Rees, A. (2004). Neural processing of amplitude-modulated sounds. Physiological Reviews, 84(2), 541–577. Joris, P. X., Schreiner, C. E., & Rees, A. (2004). Neural processing of amplitude-modulated sounds. Physiological Reviews, 84(2), 541–577.
25.
go back to reference Kollmeier, B., & Koch, R. (1994). Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction. The Journal of the Acoustical Society of America, 95(3), 1593–1602. Kollmeier, B., & Koch, R. (1994). Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction. The Journal of the Acoustical Society of America, 95(3), 1593–1602.
26.
go back to reference Singh, N. C., & Theunissen, F. E. (2003). Modulation spectra of natural sounds and ethological theories of auditory processing. The Journal of the Acoustical Society of America, 114(6), 3394–3411. Singh, N. C., & Theunissen, F. E. (2003). Modulation spectra of natural sounds and ethological theories of auditory processing. The Journal of the Acoustical Society of America, 114(6), 3394–3411.
27.
go back to reference Tyagi, V. (2011). Fepstrum features: Design and application to conversational speech recognition. IBM Research Report (p. 11009). Tyagi, V. (2011). Fepstrum features: Design and application to conversational speech recognition. IBM Research Report (p. 11009).
28.
go back to reference Atal, B. S., & Hanauer, S. L. (1971). Speech analysis and synthesis by linear prediction of the speech wave. The Journal of the Acoustical Society of America, 50(2B), 637–655. Atal, B. S., & Hanauer, S. L. (1971). Speech analysis and synthesis by linear prediction of the speech wave. The Journal of the Acoustical Society of America, 50(2B), 637–655.
29.
go back to reference Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Prentice Hall, New Jersey: Englewood Cliffs. Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Prentice Hall, New Jersey: Englewood Cliffs.
30.
go back to reference Levinson, N. (1946). The Wiener (root mean square) error criterion in filter design and prediction. Journal of Mathematics and Physics, 25(1–4), 261–278.MathSciNet Levinson, N. (1946). The Wiener (root mean square) error criterion in filter design and prediction. Journal of Mathematics and Physics, 25(1–4), 261–278.MathSciNet
31.
go back to reference Durbin, J. (1960). The fitting of time-series models. Revue de l’Institut International de Statistique 233–244.MATH Durbin, J. (1960). The fitting of time-series models. Revue de l’Institut International de Statistique 233–244.MATH
32.
go back to reference El-Jaroudi, A., & Makhoul, J. (1991). Discrete all-pole modeling. IEEE Transactions on Signal Processing, 39(2), 411–423. El-Jaroudi, A., & Makhoul, J. (1991). Discrete all-pole modeling. IEEE Transactions on Signal Processing, 39(2), 411–423.
33.
go back to reference Gray, R., Buzo, A., Gray, A., & Matsuyama, Y. (1980). Distortion measures for speech processing. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 367–376.MATH Gray, R., Buzo, A., Gray, A., & Matsuyama, Y. (1980). Distortion measures for speech processing. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 367–376.MATH
34.
go back to reference Liu, M., & Lacroix, A. (1996). Improved vocal tract model for the analysis of nasal speech sounds. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings (ICASSP), Atlanta, Georgia, USA (Vol. 2, pp. 801–804). Liu, M., & Lacroix, A. (1996). Improved vocal tract model for the analysis of nasal speech sounds. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings (ICASSP), Atlanta, Georgia, USA (Vol. 2, pp. 801–804).
35.
go back to reference Alku, P. (1992). An automatic method to estimate the time-based parameters of the glottal pulseform. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), San Francisco, California, USA (Vol. 2, pp. 29–32). Alku, P. (1992). An automatic method to estimate the time-based parameters of the glottal pulseform. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), San Francisco, California, USA (Vol. 2, pp. 29–32).
36.
go back to reference Vincent, D., Rosec, O., & Chonavel, T. (2005). Estimation of LF glottal source parameters based on an ARX model. In Proceedings of the Ninth European Conference on Speech Communication and Technology (INTERSPEECH/EUROSPEECH), Lisboa, Portugal. Vincent, D., Rosec, O., & Chonavel, T. (2005). Estimation of LF glottal source parameters based on an ARX model. In Proceedings of the Ninth European Conference on Speech Communication and Technology (INTERSPEECH/EUROSPEECH), Lisboa, Portugal.
37.
go back to reference Milenkovic, P. (1986). Glottal inverse filtering by joint estimation of an AR system with a linear input model. IEEE Transactions on Acoustics, Speech, and Signal Processing, 34(1), 28–42. Milenkovic, P. (1986). Glottal inverse filtering by joint estimation of an AR system with a linear input model. IEEE Transactions on Acoustics, Speech, and Signal Processing, 34(1), 28–42.
38.
go back to reference Veeneman, D., & BeMent, S. (1985). Automatic glottal inverse filtering from speech and electroglottographic signals. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(2), 369–377. Veeneman, D., & BeMent, S. (1985). Automatic glottal inverse filtering from speech and electroglottographic signals. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(2), 369–377.
39.
go back to reference Childers, D. G., Hicks, D. M., Moore, G. P., & Alsaka, Y. A. (1986). A model for vocal fold vibratory motion, contact area, and the electroglottogram. The Journal of the Acoustical Society of America, 80(5), 1309–1320. Childers, D. G., Hicks, D. M., Moore, G. P., & Alsaka, Y. A. (1986). A model for vocal fold vibratory motion, contact area, and the electroglottogram. The Journal of the Acoustical Society of America, 80(5), 1309–1320.
40.
go back to reference Alku, P. (2011). Glottal inverse filtering analysis of human voice production—a review of estimation and parameterization methods of the glottal excitation and their applications. Sadhana, 36(5), 623–650. Alku, P. (2011). Glottal inverse filtering analysis of human voice production—a review of estimation and parameterization methods of the glottal excitation and their applications. Sadhana, 36(5), 623–650.
41.
go back to reference Rothenberg, M. (1977). Measurement of airflow in speech. Journal of Speech and Hearing Research, 20(1), 155–176. Rothenberg, M. (1977). Measurement of airflow in speech. Journal of Speech and Hearing Research, 20(1), 155–176.
42.
go back to reference Alku, P. (1992). Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Communication, 11(2–3), 109–118. Alku, P. (1992). Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Communication, 11(2–3), 109–118.
43.
go back to reference Drugman, T., Bozkurt, B., & Dutoit, T. (2011). Causal-anticausal decomposition of speech using complex cepstrum for glottal source estimation. Speech Communication, 53(6), 855–866. Drugman, T., Bozkurt, B., & Dutoit, T. (2011). Causal-anticausal decomposition of speech using complex cepstrum for glottal source estimation. Speech Communication, 53(6), 855–866.
44.
go back to reference Childers, D. G., Skinner, D. P., & Kemerait, R. C. (1977). The cepstrum: A guide to processing. Proceedings of the IEEE, 65(10), 1428–1443. Childers, D. G., Skinner, D. P., & Kemerait, R. C. (1977). The cepstrum: A guide to processing. Proceedings of the IEEE, 65(10), 1428–1443.
45.
go back to reference Tribolet, J. (1977). A new phase unwrapping algorithm. IEEE Transactions on Acoustics, Speech, and Signal Processing, 25(2), 170–177.MATH Tribolet, J. (1977). A new phase unwrapping algorithm. IEEE Transactions on Acoustics, Speech, and Signal Processing, 25(2), 170–177.MATH
46.
go back to reference Drugman, T., Thomas, M., Gudnason, J., Naylor, P., & Dutoit, T. (2012). Detection of glottal closure instants from speech signals: A quantitative review. IEEE Transactions on Audio, Speech, and Language Processing, 20(3), 994–1006. Drugman, T., Thomas, M., Gudnason, J., Naylor, P., & Dutoit, T. (2012). Detection of glottal closure instants from speech signals: A quantitative review. IEEE Transactions on Audio, Speech, and Language Processing, 20(3), 994–1006.
47.
go back to reference Drugman, T., & Dutoit, T. (2009). Glottal closure and opening instant detection from speech signals. In Proceedings of the Tenth Annual Conference of the International Speech Communication Association (INTERSPEECH), Brighton, UK (pp. 2891–2894). Drugman, T., & Dutoit, T. (2009). Glottal closure and opening instant detection from speech signals. In Proceedings of the Tenth Annual Conference of the International Speech Communication Association (INTERSPEECH), Brighton, UK (pp. 2891–2894).
48.
go back to reference Cheng, Y. M., & O’Shaughnessy, D. (1989). Automatic and reliable estimation of glottal closure instant and period. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(12), 1805–1815. Cheng, Y. M., & O’Shaughnessy, D. (1989). Automatic and reliable estimation of glottal closure instant and period. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(12), 1805–1815.
49.
go back to reference Wong, D., Markel, J., & Gray, A. (1979). Least squares glottal inverse filtering from the acoustic speech waveform. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(4), 350–355. Wong, D., Markel, J., & Gray, A. (1979). Least squares glottal inverse filtering from the acoustic speech waveform. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(4), 350–355.
50.
go back to reference Brookes, M., Naylor, P. A., & Gudnason, J. (2006). A quantitative assessment of group delay methods for identifying glottal closures in voiced speech. IEEE Transactions on Audio, Speech, and Language Processing, 14(2), 456–466. Brookes, M., Naylor, P. A., & Gudnason, J. (2006). A quantitative assessment of group delay methods for identifying glottal closures in voiced speech. IEEE Transactions on Audio, Speech, and Language Processing, 14(2), 456–466.
51.
go back to reference Naylor, P. A., Kounoudes, A., Gudnason, J., & Brookes, M. (2007). Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 34–43. Naylor, P. A., Kounoudes, A., Gudnason, J., & Brookes, M. (2007). Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 34–43.
52.
go back to reference Gerhard, D. (2003). Pitch extraction and fundamental frequency: History and current techniques. Technical Report TR-CS 2003-06. Department of Computer Science, University of Regina, Canada (pp. 0–22). Gerhard, D. (2003). Pitch extraction and fundamental frequency: History and current techniques. Technical Report TR-CS 2003-06. Department of Computer Science, University of Regina, Canada (pp. 0–22).
53.
go back to reference Seltzer, M. L., & Michael, D. (2000). Automatic detection of corrupt spectrographic features for robust speech recognition. Master of Science Thesis, Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, USA. Seltzer, M. L., & Michael, D. (2000). Automatic detection of corrupt spectrographic features for robust speech recognition. Master of Science Thesis, Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, USA.
54.
go back to reference Scordilis, M. S., & Gowdy, J. N. (1989). Neural network based generation of fundamental frequency contours. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Glasgow, Scotland (pp. 219–222). Scordilis, M. S., & Gowdy, J. N. (1989). Neural network based generation of fundamental frequency contours. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Glasgow, Scotland (pp. 219–222).
55.
go back to reference Han, K., & Wang, D. (2014). Neural network based pitch tracking in very noisy speech. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP),22(12), 2158–2168. Han, K., & Wang, D. (2014). Neural network based pitch tracking in very noisy speech. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP),22(12), 2158–2168.
56.
go back to reference Su, H., Zhang, H., Zhang, X., & Gao, G. (2016). Convolutional neural network for robust pitch determination. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China (pp. 579–583). Su, H., Zhang, H., Zhang, X., & Gao, G. (2016). Convolutional neural network for robust pitch determination. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China (pp. 579–583).
57.
go back to reference Ananthapadmanabha, T. V., & Fant, G. (1982). Calculation of true glottal flow and its components. Speech Communication, 1(3–4), 167–184. Ananthapadmanabha, T. V., & Fant, G. (1982). Calculation of true glottal flow and its components. Speech Communication, 1(3–4), 167–184.
58.
go back to reference Lucero, J. C., & Koenig, L. L. (2005). Phonation thresholds as a function of laryngeal size in a two-mass model of the vocal folds. The Journal of the Acoustical Society of America, 118(5), 2798–2801. Lucero, J. C., & Koenig, L. L. (2005). Phonation thresholds as a function of laryngeal size in a two-mass model of the vocal folds. The Journal of the Acoustical Society of America, 118(5), 2798–2801.
59.
go back to reference Titze, I. R. (1992). Phonation threshold pressure: A missing link in glottal aerodynamics. The Journal of the Acoustical Society of America, 91(5), 2926–2935. Titze, I. R. (1992). Phonation threshold pressure: A missing link in glottal aerodynamics. The Journal of the Acoustical Society of America, 91(5), 2926–2935.
60.
go back to reference Plant, R. L., Freed, G. L., & Plant, R. E. (2004). Direct measurement of onset and offset phonation threshold pressure in normal subjects. The Journal of the Acoustical Society of America, 116(6), 3640–3646. Plant, R. L., Freed, G. L., & Plant, R. E. (2004). Direct measurement of onset and offset phonation threshold pressure in normal subjects. The Journal of the Acoustical Society of America, 116(6), 3640–3646.
61.
go back to reference Isshiki, N. (1981). Vocal efficiency index. In K. N. Steven & M. Hirano (Eds.), Vocal fold physiology (pp. 193–203). Press: University of Tokyo. Isshiki, N. (1981). Vocal efficiency index. In K. N. Steven & M. Hirano (Eds.), Vocal fold physiology (pp. 193–203). Press: University of Tokyo.
62.
go back to reference Klatt, D. H. (1987). Review of text-to-speech conversion for English. The Journal of the Acoustical Society of America, 82(3), 737–793. Klatt, D. H. (1987). Review of text-to-speech conversion for English. The Journal of the Acoustical Society of America, 82(3), 737–793.
63.
go back to reference Rosenberg, A. E. (1971). Effect of glottal pulse shape on the quality of natural vowels. The Journal of the Acoustical Society of America, 49(2B), 583–590. Rosenberg, A. E. (1971). Effect of glottal pulse shape on the quality of natural vowels. The Journal of the Acoustical Society of America, 49(2B), 583–590.
64.
go back to reference Hedelin, P. (1984). A glottal LPC-vocoder. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), San Diego, California, USA (Vol. 9, pp. 21–24). Hedelin, P. (1984). A glottal LPC-vocoder. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), San Diego, California, USA (Vol. 9, pp. 21–24).
65.
go back to reference Hedelin, P. (1986). High quality glottal LPC-vocoding. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Tokyo, Japan (Vol. 11, pp. 465–468). Hedelin, P. (1986). High quality glottal LPC-vocoding. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Tokyo, Japan (Vol. 11, pp. 465–468).
66.
go back to reference Fujisaki, H., & Ljungqvist, M. (1986). Proposal and evaluation of models for the glottal source waveform. In IEEE Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Tokyo, Japan (Vol. 11, pp. 1605–1608). Fujisaki, H., & Ljungqvist, M. (1986). Proposal and evaluation of models for the glottal source waveform. In IEEE Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Tokyo, Japan (Vol. 11, pp. 1605–1608).
67.
go back to reference Fant, G., Liljencrants, J., & Lin, Q. G. (1985). A four-parameter model of glottal flow. Speech Transmission Laboratory Quarterly Progress and Status Report (STL-QPSR),4(1985), 1–13. Fant, G., Liljencrants, J., & Lin, Q. G. (1985). A four-parameter model of glottal flow. Speech Transmission Laboratory Quarterly Progress and Status Report (STL-QPSR),4(1985), 1–13.
68.
go back to reference Gobl, C. (2003). The voice source in speech communication-production and perception experiments involving inverse filtering and synthesis. Doctoral dissertation, Institutionen för talöverföring och musikakustik, Royal Institute of Technology, Stockholm, Sweden. Gobl, C. (2003). The voice source in speech communication-production and perception experiments involving inverse filtering and synthesis. Doctoral dissertation, Institutionen för talöverföring och musikakustik, Royal Institute of Technology, Stockholm, Sweden.
69.
go back to reference Drioli, C. (2005). A flow waveform-matched low-dimensional glottal model based on physical knowledge. The Journal of the Acoustical Society of America, 117(5), 3184–3195. Drioli, C. (2005). A flow waveform-matched low-dimensional glottal model based on physical knowledge. The Journal of the Acoustical Society of America, 117(5), 3184–3195.
70.
go back to reference Avanzini, F. (2008). Simulation of vocal fold oscillation with a pseudo-one-mass physical model. Speech Communication, 50(2), 95–108.MathSciNet Avanzini, F. (2008). Simulation of vocal fold oscillation with a pseudo-one-mass physical model. Speech Communication, 50(2), 95–108.MathSciNet
71.
go back to reference Frøkjaer-Jensen, B., & Prytz, S. (1976). Registration of voice quality. Brüel and Kjaer Technical Review, 3, 3–17. Frøkjaer-Jensen, B., & Prytz, S. (1976). Registration of voice quality. Brüel and Kjaer Technical Review, 3, 3–17.
72.
go back to reference Childers, D. G., & Lee, C. K. (1991). Vocal quality factors: analysis, synthesis, and perception. The Journal of the Acoustical Society of America, 90(5), 2394–2410. Childers, D. G., & Lee, C. K. (1991). Vocal quality factors: analysis, synthesis, and perception. The Journal of the Acoustical Society of America, 90(5), 2394–2410.
73.
go back to reference Titze, I. R., & Sundberg, J. (1992). Vocal intensity in speakers and singers. The Journal of the Acoustical Society of America,91(5), 2936–2946. Titze, I. R., & Sundberg, J. (1992). Vocal intensity in speakers and singers. The Journal of the Acoustical Society of America,91(5), 2936–2946.
74.
go back to reference Alku, P., Strik, H., & Vilkman, E. (1997). Parabolic spectral parameter—a new method for quantification of the glottal flow. Speech Communication, 22(1), 67–79. Alku, P., Strik, H., & Vilkman, E. (1997). Parabolic spectral parameter—a new method for quantification of the glottal flow. Speech Communication, 22(1), 67–79.
75.
go back to reference Murphy, P. J. (1999). Perturbation-free measurement of the harmonics-to-noise ratio in voice signals using pitch synchronous harmonic analysis. The Journal of the Acoustical Society of America, 105(5), 2866–2881. Murphy, P. J. (1999). Perturbation-free measurement of the harmonics-to-noise ratio in voice signals using pitch synchronous harmonic analysis. The Journal of the Acoustical Society of America, 105(5), 2866–2881.
76.
go back to reference Cummings, K. E., & Clements, M. A. (1995). Analysis of the glottal excitation of emotionally styled and stressed speech. The Journal of the Acoustical Society of America, 98(1), 88–98. Cummings, K. E., & Clements, M. A. (1995). Analysis of the glottal excitation of emotionally styled and stressed speech. The Journal of the Acoustical Society of America, 98(1), 88–98.
77.
go back to reference Laukkanen, A. M., Vilkman, E., Alku, P., & Oksanen, H. (1996). Physical variations related to stress and emotional state: A preliminary study. Journal of Phonetics, 24(3), 313–335. Laukkanen, A. M., Vilkman, E., Alku, P., & Oksanen, H. (1996). Physical variations related to stress and emotional state: A preliminary study. Journal of Phonetics, 24(3), 313–335.
78.
go back to reference Laukkanen, A. M., Vilkman, E., Alku, P., & Oksanen, H. (1997). On the perception of emotions in speech: The role of voice quality. Logopedics Phoniatrics Vocology, 22(4), 157–168. Laukkanen, A. M., Vilkman, E., Alku, P., & Oksanen, H. (1997). On the perception of emotions in speech: The role of voice quality. Logopedics Phoniatrics Vocology, 22(4), 157–168.
79.
go back to reference Gobl, C., & Chasaide, A. N. (2003). The role of voice quality in communicating emotion, mood and attitude. Speech Communication, 40(1–2), 189–212.MATH Gobl, C., & Chasaide, A. N. (2003). The role of voice quality in communicating emotion, mood and attitude. Speech Communication, 40(1–2), 189–212.MATH
80.
go back to reference Airas, M., & Alku, P. (2006). Emotions in vowel segments of continuous speech: Analysis of the glottal flow using the normalised amplitude quotient. Phonetica, 63(1), 26–46. Airas, M., & Alku, P. (2006). Emotions in vowel segments of continuous speech: Analysis of the glottal flow using the normalised amplitude quotient. Phonetica, 63(1), 26–46.
81.
go back to reference Waaramaa, T., Laukkanen, A. M., Airas, M., & Alku, P. (2010). Perception of emotional valences and activity levels from vowel segments of continuous speech. Journal of Voice, 24(1), 30–38. Waaramaa, T., Laukkanen, A. M., Airas, M., & Alku, P. (2010). Perception of emotional valences and activity levels from vowel segments of continuous speech. Journal of Voice, 24(1), 30–38.
82.
go back to reference Higgins, M. B., & Saxman, J. H. (1991). A comparison of selected phonatory behaviors of healthy aged and young adults. Journal of Speech, Language, and Hearing Research, 34(5), 1000–1010. Higgins, M. B., & Saxman, J. H. (1991). A comparison of selected phonatory behaviors of healthy aged and young adults. Journal of Speech, Language, and Hearing Research, 34(5), 1000–1010.
83.
go back to reference Sapienza, C. M., & Stathopoulos, E. T. (1994). Comparison of maximum flow declination rate: Children versus adults. Journal of Voice, 8(3), 240–247. Sapienza, C. M., & Stathopoulos, E. T. (1994). Comparison of maximum flow declination rate: Children versus adults. Journal of Voice, 8(3), 240–247.
84.
go back to reference Sapienza, C. M., & Dutka, J. (1996). Glottal airflow characteristics of women’s voice production along an aging continuum. Journal of Speech, Language, and Hearing Research, 39(2), 322–328. Sapienza, C. M., & Dutka, J. (1996). Glottal airflow characteristics of women’s voice production along an aging continuum. Journal of Speech, Language, and Hearing Research, 39(2), 322–328.
85.
go back to reference Hodge, F. S., Colton, R. H., & Kelley, R. T. (2001). Vocal intensity characteristics innormal and elderly speakers. Journal of Voice, 15(4), 503–511. Hodge, F. S., Colton, R. H., & Kelley, R. T. (2001). Vocal intensity characteristics innormal and elderly speakers. Journal of Voice, 15(4), 503–511.
86.
go back to reference Welham, N. V., & Maclagan, M. A. (2003). Vocal fatigue: Current knowledge and future directions. Journal of Voice, 17(1), 21–30. Welham, N. V., & Maclagan, M. A. (2003). Vocal fatigue: Current knowledge and future directions. Journal of Voice, 17(1), 21–30.
87.
go back to reference Ozdas, A., Shiavi, R. G., Silverman, S. E., Silverman, M. K., & Wilkes, D. M. (2004). Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk. IEEE Transactions on Biomedical Engineering, 51(9), 1530–1540. Ozdas, A., Shiavi, R. G., Silverman, S. E., Silverman, M. K., & Wilkes, D. M. (2004). Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk. IEEE Transactions on Biomedical Engineering, 51(9), 1530–1540.
88.
go back to reference Stanek, M., & Sigmund, M. (2015). Psychological stress detection in speech using return-to-opening phase ratios in glottis. Elektronika ir Elektrotechnika, 21(5), 59–63. Stanek, M., & Sigmund, M. (2015). Psychological stress detection in speech using return-to-opening phase ratios in glottis. Elektronika ir Elektrotechnika, 21(5), 59–63.
89.
go back to reference Sigmund, M., Prokes, A., & Zelinka, P. (2010). Detection of alcohol in speech signal using LF model. In Proceedings of the International Conference on Artificial Intelligence and Applications. Innsbruck, Austria (pp. 193–196). Sigmund, M., Prokes, A., & Zelinka, P. (2010). Detection of alcohol in speech signal using LF model. In Proceedings of the International Conference on Artificial Intelligence and Applications. Innsbruck, Austria (pp. 193–196).
90.
go back to reference Koike, Y., & Markel, J. (1975). Application of inverse filtering for detecting laryngeal pathology. Annals of Otology, Rhinology & Laryngology, 84(1), 117–124. Koike, Y., & Markel, J. (1975). Application of inverse filtering for detecting laryngeal pathology. Annals of Otology, Rhinology & Laryngology, 84(1), 117–124.
91.
go back to reference Deller, J. (1982). Evaluation of laryngeal dysfunction based on features of an accurate estimate of the glottal waveform. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Paris, France (Vol. 7, pp. 759–762). Deller, J. (1982). Evaluation of laryngeal dysfunction based on features of an accurate estimate of the glottal waveform. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Paris, France (Vol. 7, pp. 759–762).
92.
go back to reference Hillman, R. E., Holmberg, E. B., Perkell, J. S., Walsh, M., & Vaughan, C. (1990). Phonatory function associated with hyperfunctionally related vocal fold lesions. Journal of Voice, 4(1), 52–63. Hillman, R. E., Holmberg, E. B., Perkell, J. S., Walsh, M., & Vaughan, C. (1990). Phonatory function associated with hyperfunctionally related vocal fold lesions. Journal of Voice, 4(1), 52–63.
93.
go back to reference Howell, P., & Williams, M. (1988). The contribution of the excitatory source to the perception of neutral vowels in stuttered speech. The Journal of the Acoustical Society of America, 84(1), 80–89. Howell, P., & Williams, M. (1988). The contribution of the excitatory source to the perception of neutral vowels in stuttered speech. The Journal of the Acoustical Society of America, 84(1), 80–89.
94.
go back to reference Howell, P., & Williams, M. (1992). Acoustic analysis and perception of vowels in children’s and teenagers’ stuttered speech. The Journal of the Acoustical Society of America, 91(3), 1697–1706. Howell, P., & Williams, M. (1992). Acoustic analysis and perception of vowels in children’s and teenagers’ stuttered speech. The Journal of the Acoustical Society of America, 91(3), 1697–1706.
95.
go back to reference Björkner, E., Sundberg, J., Cleveland, T., & Stone, E. (2006). Voice source differences between registers in female musical theater singers. Journal of Voice, 20(2), 187–197. Björkner, E., Sundberg, J., Cleveland, T., & Stone, E. (2006). Voice source differences between registers in female musical theater singers. Journal of Voice, 20(2), 187–197.
96.
go back to reference Sundberg, J., Fahlstedt, E., & Morell, A. (2005). Effects on the glottal voice source of vocal loudness variation in untrained female and male voices. The Journal of the Acoustical Society of America, 117(2), 879–885. Sundberg, J., Fahlstedt, E., & Morell, A. (2005). Effects on the glottal voice source of vocal loudness variation in untrained female and male voices. The Journal of the Acoustical Society of America, 117(2), 879–885.
97.
go back to reference Schafer, R. W., & Rabiner, L. R. (1970). System for automatic formant analysis of voiced speech. The Journal of the Acoustical Society of America, 47(2B), 634–648. Schafer, R. W., & Rabiner, L. R. (1970). System for automatic formant analysis of voiced speech. The Journal of the Acoustical Society of America, 47(2B), 634–648.
98.
go back to reference Kammoun, M. A., Gargouri, D., Frikha, M., & Hamida, A. B. (2004). Cepstral method evaluation in speech formant frequencies estimation. In Proceedings of the IEEE International Conference on Industrial Technology (ICIT), Hammamet, Tunisia (Vol. 3, pp. 1612–1616). Kammoun, M. A., Gargouri, D., Frikha, M., & Hamida, A. B. (2004). Cepstral method evaluation in speech formant frequencies estimation. In Proceedings of the IEEE International Conference on Industrial Technology (ICIT), Hammamet, Tunisia (Vol. 3, pp. 1612–1616).
99.
go back to reference Kammoun, M. A., Gargouri, D., Frikha, M., & Hamida, A. B. (2006). Cepstrum vs. LPC: A comparative study for speech formant frequencies estimation. GESTS International Transactions on Communication and Signal Processing,9(1), 87–102. Kammoun, M. A., Gargouri, D., Frikha, M., & Hamida, A. B. (2006). Cepstrum vs. LPC: A comparative study for speech formant frequencies estimation. GESTS International Transactions on Communication and Signal Processing,9(1), 87–102.
100.
go back to reference Hunt, M. J. (1987). Delayed decisions in speech recognition—the case of formants. Pattern Recognition Letters, 6(2), 121–137.MathSciNet Hunt, M. J. (1987). Delayed decisions in speech recognition—the case of formants. Pattern Recognition Letters, 6(2), 121–137.MathSciNet
101.
go back to reference Lee, C. H. (1989). Applications of dynamic programming to speech and language processing. AT & T Technical Journal, 68(3), 114–130. Lee, C. H. (1989). Applications of dynamic programming to speech and language processing. AT & T Technical Journal, 68(3), 114–130.
102.
go back to reference Snell, R. C., & Milinazzo, F. (1993). Formant location from LPC analysis data. IEEE transactions on Speech and Audio Processing, 1(2), 129–134. Snell, R. C., & Milinazzo, F. (1993). Formant location from LPC analysis data. IEEE transactions on Speech and Audio Processing, 1(2), 129–134.
103.
go back to reference Sandler, M. (1991). Algorithm for high precision root finding from high order LPC models. IEE Proceedings I-Communications, Speech and Vision, 138(6), 596–602. Sandler, M. (1991). Algorithm for high precision root finding from high order LPC models. IEE Proceedings I-Communications, Speech and Vision, 138(6), 596–602.
104.
go back to reference Fant, G. (1962). Descriptive analysis of the acoustic aspects of speech. Logos, 5, 3–17. Fant, G. (1962). Descriptive analysis of the acoustic aspects of speech. Logos, 5, 3–17.
105.
go back to reference Fitch, W. T. (1997). Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques. The Journal of the Acoustical Society of America, 102(2), 1213–1222. Fitch, W. T. (1997). Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques. The Journal of the Acoustical Society of America, 102(2), 1213–1222.
106.
go back to reference Laan, G. P. (1997). The contribution of intonation, segmental durations, and spectral features to the perception of a spontaneous and a read speaking style. Speech Communication, 22(1), 43–65. Laan, G. P. (1997). The contribution of intonation, segmental durations, and spectral features to the perception of a spontaneous and a read speaking style. Speech Communication, 22(1), 43–65.
107.
go back to reference Zhou, G., Hansen, J. H., & Kaiser, J. F. (2001). Nonlinear feature based classification of speech under stress. IEEE Transactions on Speech and Audio Processing, 9(3), 201–216. Zhou, G., Hansen, J. H., & Kaiser, J. F. (2001). Nonlinear feature based classification of speech under stress. IEEE Transactions on Speech and Audio Processing, 9(3), 201–216.
108.
go back to reference El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587.MATH El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587.MATH
Metadata
Title
The Voice Signal and Its Information Content—2
Author
Rita Singh
Copyright Year
2019
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-13-8403-5_5