Skip to main content
Top
Published in: International Journal of Speech Technology 2/2013

01-06-2013

Characterization and recognition of emotions from speech using excitation source information

Authors: Sreenivasa Rao Krothapalli, Shashidhar G. Koolagudi

Published in: International Journal of Speech Technology | Issue 2/2013

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper explores the excitation source features of speech production mechanism for characterizing and recognizing the emotions from speech signal. The excitation source signal is obtained from speech signal using linear prediction (LP) analysis, and it is also known as LP residual. Glottal volume velocity (GVV) signal is also used to represent excitation source, and it is derived from LP residual signal. Speech signal has high signal to noise ratio around the instants of glottal closure (GC). These instants of glottal closure are also known as epochs. In this paper, the following excitation source features are proposed for characterizing and recognizing the emotions: sequence of LP residual samples and their phase information, parameters of epochs and their dynamics at syllable and utterance levels, samples of GVV signal and its parameters. Auto-associative neural networks (AANN) and support vector machines (SVM) are used for developing the emotion recognition models. Telugu and Berlin emotion speech corpora are used to evaluate the developed models. Anger, disgust, fear, happy, neutral and sadness are the six emotions considered in this study. About 42 % to 63 % of average emotion recognition performance is observed using different excitation source features. Further, the combination of excitation source and spectral features has shown to improve the emotion recognition performance up to 84 %.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
go back to reference Ananthakrishnan, S., & Narayanan, S. S. (2008). Automatic prosodic event detection using acoustic, lexical, and syntactic evidence. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 216–228. CrossRef Ananthakrishnan, S., & Narayanan, S. S. (2008). Automatic prosodic event detection using acoustic, lexical, and syntactic evidence. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 216–228. CrossRef
go back to reference Anjani, A. V. N. S. (2000). Autoassociate neural network models for processing degraded speech. MS thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India. Anjani, A. V. N. S. (2000). Autoassociate neural network models for processing degraded speech. MS thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India.
go back to reference Atal, B. S. (1972). Automatic speaker recognition based on pitch contours. The Journal of the Acoustical Society of America, 52(6), 1687–1697. CrossRef Atal, B. S. (1972). Automatic speaker recognition based on pitch contours. The Journal of the Acoustical Society of America, 52(6), 1687–1697. CrossRef
go back to reference Bajpai, A., & Yegnanarayana, B. (2004). Exploring features for audio clip classification using LP residual and AANN models. In The international conference on intelligent sensing and information processing (ICISIP 2004), Chennai, India, January 2004 (pp. 305–310). CrossRef Bajpai, A., & Yegnanarayana, B. (2004). Exploring features for audio clip classification using LP residual and AANN models. In The international conference on intelligent sensing and information processing (ICISIP 2004), Chennai, India, January 2004 (pp. 305–310). CrossRef
go back to reference Bapineedu, G., Avinash, B., Gangashetty, S. V., & Yegnanarayana, B. (2009). Analysis of lombard speech using excitation source information. In INTERSPEECH-09, Brighton, UK, 6–10 September 2009 (pp. 1091–1094). Bapineedu, G., Avinash, B., Gangashetty, S. V., & Yegnanarayana, B. (2009). Analysis of lombard speech using excitation source information. In INTERSPEECH-09, Brighton, UK, 6–10 September 2009 (pp. 1091–1094).
go back to reference Bitouk, D., Verma, R., & Nenkova, A. (2010). Class-level spectral features for emotion recognition. Speech Communication, 52(7), 613–625. CrossRef Bitouk, D., Verma, R., & Nenkova, A. (2010). Class-level spectral features for emotion recognition. Speech Communication, 52(7), 613–625. CrossRef
go back to reference Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005). A database of German emotional speech. In Interspeech. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005). A database of German emotional speech. In Interspeech.
go back to reference Chauhan, A., Koolagudi, S. G., Kafley, S., & Rao, K. S. (2010). Emotion recognition using LP residual. In IEEE TechSym 2010, West Bengal, India, IIT Kharagpur, April 2010. New York: IEEE Press. Chauhan, A., Koolagudi, S. G., Kafley, S., & Rao, K. S. (2010). Emotion recognition using LP residual. In IEEE TechSym 2010, West Bengal, India, IIT Kharagpur, April 2010. New York: IEEE Press.
go back to reference Cichosz, K. S. J. (2007). Emotion recognition in speech signal using emotion-extracting binary decision trees. In Affective computing and intelligent interfaces ACII, Lisbon, Doctoral Consortium, September 2007. Cichosz, K. S. J. (2007). Emotion recognition in speech signal using emotion-extracting binary decision trees. In Affective computing and intelligent interfaces ACII, Lisbon, Doctoral Consortium, September 2007.
go back to reference Cummings, K. E., & Clements, M. A. (1995). Analysis of the glottal excitation of emotionally styled and stressed speech. The Journal of the Acoustical Society of America, 98, 88–98. CrossRef Cummings, K. E., & Clements, M. A. (1995). Analysis of the glottal excitation of emotionally styled and stressed speech. The Journal of the Acoustical Society of America, 98, 88–98. CrossRef
go back to reference Dellert, F., Polzin, T., & Waibel, A. (1996). Recognizing emotion in speech. In 4th international conference on spoken language processing, Philadelphia, PA, USA, October 1996 (pp. 1970–1973). Dellert, F., Polzin, T., & Waibel, A. (1996). Recognizing emotion in speech. In 4th international conference on spoken language processing, Philadelphia, PA, USA, October 1996 (pp. 1970–1973).
go back to reference Desai, S., Black, A. W., Yegnanarayana, B., & Prahallad, K. (2010). Spectral mapping using artificial neural networks for voice conversion. IEEE Transactions on Audio, Speech, and Language Processing, 18, 954–964. CrossRef Desai, S., Black, A. W., Yegnanarayana, B., & Prahallad, K. (2010). Spectral mapping using artificial neural networks for voice conversion. IEEE Transactions on Audio, Speech, and Language Processing, 18, 954–964. CrossRef
go back to reference Diamantaras, K. I., & Kung, S. Y. (1996). Principal component neural networks: theory and applications. New York: Wiley. MATH Diamantaras, K. I., & Kung, S. Y. (1996). Principal component neural networks: theory and applications. New York: Wiley. MATH
go back to reference Gobl, C., & Chasaide, A. (2003). The role of voice quality in communicating emotion, mood and attitude. Speech Communication, 40, 189–212. MATHCrossRef Gobl, C., & Chasaide, A. (2003). The role of voice quality in communicating emotion, mood and attitude. Speech Communication, 40, 189–212. MATHCrossRef
go back to reference Gupta, C. S. (2003). Significance of source features for speaker recognition. MS thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India. Gupta, C. S. (2003). Significance of source features for speaker recognition. MS thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India.
go back to reference Gupta, C. S., Prasanna, S. R. M., & Yegnanarayana, B. (2002). Autoassociative neural network models for online speaker verification using source features from vowels. In International joint conference on neural networks, Honululu, HI, USA, May 2002. Gupta, C. S., Prasanna, S. R. M., & Yegnanarayana, B. (2002). Autoassociative neural network models for online speaker verification using source features from vowels. In International joint conference on neural networks, Honululu, HI, USA, May 2002.
go back to reference Haag, A., Goronzy, S., Schaich, P., & Williams, J. (2004). Emotion recognition using bio-sensors: first steps towards an automatic system. In E.A. et al. (Eds.), LNAI: Vol. 3068. ADS-2004 (p. 3648). Berlin: Springer. Haag, A., Goronzy, S., Schaich, P., & Williams, J. (2004). Emotion recognition using bio-sensors: first steps towards an automatic system. In E.A. et al. (Eds.), LNAI: Vol. 3068. ADS-2004 (p. 3648). Berlin: Springer.
go back to reference Haykin, S. (1999). Neural networks: a comprehensive foundation. New Delhi: Pearson Education Aisa, Inc. MATH Haykin, S. (1999). Neural networks: a comprehensive foundation. New Delhi: Pearson Education Aisa, Inc. MATH
go back to reference Hua, L. Z., Yu, H., & Hua, W. R. (2005). A novel source analysis method by matching spectral characters of LF model with STRAIGHT spectrum. Berlin: Springer. Hua, L. Z., Yu, H., & Hua, W. R. (2005). A novel source analysis method by matching spectral characters of LF model with STRAIGHT spectrum. Berlin: Springer.
go back to reference Iida, A., Campbell, N., Higuchi, F., & Yasumura, M. (2003). A corpus-based speech synthesis system with emotion. Speech Communication, 40, 161–187. MATHCrossRef Iida, A., Campbell, N., Higuchi, F., & Yasumura, M. (2003). A corpus-based speech synthesis system with emotion. Speech Communication, 40, 161–187. MATHCrossRef
go back to reference Ikbal, M. S., Misra, H., & Yegnanarayana, B. (1999). Analysis of autoassociative mapping neural networks. In International joint conference on neural networks, USA (pp. 854–858). Ikbal, M. S., Misra, H., & Yegnanarayana, B. (1999). Analysis of autoassociative mapping neural networks. In International joint conference on neural networks, USA (pp. 854–858).
go back to reference Iliev, A. I., Scordilis, M. S., Papa, J. P., & Falco, A. X. (2010). Spoken emotion recognition through optimum-path forest classification using glottal features. Computer Speech & Language, 24(3), 445–460. CrossRef Iliev, A. I., Scordilis, M. S., Papa, J. P., & Falco, A. X. (2010). Spoken emotion recognition through optimum-path forest classification using glottal features. Computer Speech & Language, 24(3), 445–460. CrossRef
go back to reference Kamaruddin, N., & Wahab, A. (2009). Features extraction for speech emotion. Journal of Computational Methods in Science and Engineering, 9(9), 1–12. MATH Kamaruddin, N., & Wahab, A. (2009). Features extraction for speech emotion. Journal of Computational Methods in Science and Engineering, 9(9), 1–12. MATH
go back to reference Kim, J., & Andr, E. (2008). Emotion recognition based on physiological changes in music listening. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 2067–2083. CrossRef Kim, J., & Andr, E. (2008). Emotion recognition based on physiological changes in music listening. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 2067–2083. CrossRef
go back to reference Kim, T., Shin, D., & Shin, D. (2009). Towards an emotion recognition system based on biometrics. In International joint conference on computational sciences and optimization. doi:10.1l09/CSO.2009.497. Kim, T., Shin, D., & Shin, D. (2009). Towards an emotion recognition system based on biometrics. In International joint conference on computational sciences and optimization. doi:10.​1l09/​CSO.​2009.​497.
go back to reference Kishore, S. P., & Yegnanarayana, B. (2001). Online text-independent speaker verification system using autoassociative neural network models. In International joint conference on neural networks, Washington, USA, August 2001 (pp. 1548–1553). Kishore, S. P., & Yegnanarayana, B. (2001). Online text-independent speaker verification system using autoassociative neural network models. In International joint conference on neural networks, Washington, USA, August 2001 (pp. 1548–1553).
go back to reference Koolagudi, S. G., & Rao, K. S. (2011). Two stage emotion recognition based on speaking rate. International Journal of Speech Technology, 14, 35–48. CrossRef Koolagudi, S. G., & Rao, K. S. (2011). Two stage emotion recognition based on speaking rate. International Journal of Speech Technology, 14, 35–48. CrossRef
go back to reference Koolagudi, S. G., & Rao, K. S. (2012a). Emotion recognition from speech: a review. International Journal of Speech Technology, 15(2), 99–117. CrossRef Koolagudi, S. G., & Rao, K. S. (2012a). Emotion recognition from speech: a review. International Journal of Speech Technology, 15(2), 99–117. CrossRef
go back to reference Koolagudi, S. G., & Rao, K. S. (2012b). Emotion recognition from speech using source, system and prosodic features. International Journal of Speech Technology, 15(2), 265–289. CrossRef Koolagudi, S. G., & Rao, K. S. (2012b). Emotion recognition from speech using source, system and prosodic features. International Journal of Speech Technology, 15(2), 265–289. CrossRef
go back to reference Koolagudi, S. G., & Rao, K. S. (2012c). Emotion recognition from speech using sub-syllabic and pitch synchronous spectral features. International Journal of Speech Technology, 15(3), 495–511. doi:10.1007/s10772-012-9150-8. CrossRef Koolagudi, S. G., & Rao, K. S. (2012c). Emotion recognition from speech using sub-syllabic and pitch synchronous spectral features. International Journal of Speech Technology, 15(3), 495–511. doi:10.​1007/​s10772-012-9150-8. CrossRef
go back to reference Koolagudi, S. G., Maity, S., Kumar, V. A., Chakrabarti, S., & Rao, K. S. (2009). IITKGP-SESC: speech database for emotion analysis. In LNCS. Communications in computer and information science. Berlin: Springer. Koolagudi, S. G., Maity, S., Kumar, V. A., Chakrabarti, S., & Rao, K. S. (2009). IITKGP-SESC: speech database for emotion analysis. In LNCS. Communications in computer and information science. Berlin: Springer.
go back to reference Koolagudi, S., Reddy, R., & Rao, K. S. (2010). Emotion recognition from speech signal using epoch parameters. In International conference on signal processing and communications (SPCOM 2010), Indian Institute of Science, Bangalore, India, 18–21 July 2010. Koolagudi, S., Reddy, R., & Rao, K. S. (2010). Emotion recognition from speech signal using epoch parameters. In International conference on signal processing and communications (SPCOM 2010), Indian Institute of Science, Bangalore, India, 18–21 July 2010.
go back to reference Kumar, K. S., Reddy, M. S. H., Murty, K. S. R., & Yegnanarayana, B. (2009). Analysis of laugh signals for detecting in continuous speech. In INTERSPEECH-09, Brighton, UK, 6–10 September 2009 (pp. 1591–1594). Kumar, K. S., Reddy, M. S. H., Murty, K. S. R., & Yegnanarayana, B. (2009). Analysis of laugh signals for detecting in continuous speech. In INTERSPEECH-09, Brighton, UK, 6–10 September 2009 (pp. 1591–1594).
go back to reference Kwon, O., Chan, K., Hao, J., & Lee, T. (2003). Emotion recognition by speech signals. In Eurospeech, Geneva (pp. 125–128). Kwon, O., Chan, K., Hao, J., & Lee, T. (2003). Emotion recognition by speech signals. In Eurospeech, Geneva (pp. 125–128).
go back to reference Lee, C. M., & Narayanan, S. S. (2005). Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13, 293–303. CrossRef Lee, C. M., & Narayanan, S. S. (2005). Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13, 293–303. CrossRef
go back to reference Makhoul, J. (1975). Linear prediction: a tutorial review. Proceedings of the IEEE, 63(4), 561–580. CrossRef Makhoul, J. (1975). Linear prediction: a tutorial review. Proceedings of the IEEE, 63(4), 561–580. CrossRef
go back to reference Makhoul, J. (1975). Linear prediction: a tutorial review. Proceedings of the IEEE, 63, 561–580. CrossRef Makhoul, J. (1975). Linear prediction: a tutorial review. Proceedings of the IEEE, 63, 561–580. CrossRef
go back to reference Mary, L. (2006). Multi level implicit features for language and speaker recognition. PhD thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology, Madras, Chennai, India, June 2006. Mary, L. (2006). Multi level implicit features for language and speaker recognition. PhD thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology, Madras, Chennai, India, June 2006.
go back to reference Mary, L., & Yegnanarayana, B. (2004). Autoassociative neural network models for language identification. In International conference on intelligent sensing and information processing, 24 August 2004 (pp. 317–320). New York: IEEE Press. Mary, L., & Yegnanarayana, B. (2004). Autoassociative neural network models for language identification. In International conference on intelligent sensing and information processing, 24 August 2004 (pp. 317–320). New York: IEEE Press.
go back to reference McGilloway, S., Cowie, R., Douglas-Cowie, E., Gielen, S., Westerdijk, M., & Stroeve, S. (2000). Approaching automatic recognition of emotion from voice: a rough benchmark. In ISCA workshop on speech and emotion, Belfast. McGilloway, S., Cowie, R., Douglas-Cowie, E., Gielen, S., Westerdijk, M., & Stroeve, S. (2000). Approaching automatic recognition of emotion from voice: a rough benchmark. In ISCA workshop on speech and emotion, Belfast.
go back to reference Mohan, C. K., & Yegnanarayana, B. (2008). Classification of sport videos using edge-based features and autoassociative neural network models. Signal, Image and Video Processing, 4, 61–73. CrossRef Mohan, C. K., & Yegnanarayana, B. (2008). Classification of sport videos using edge-based features and autoassociative neural network models. Signal, Image and Video Processing, 4, 61–73. CrossRef
go back to reference Moore, E., Clements, M., Peifer, J., & Weisser, L. (2003). Investigating the role of glottal features in classifying clinical depression. In 25th annual international conference of the IEEE EMBAS, September 2003 (pp. 2849–2852). Moore, E., Clements, M., Peifer, J., & Weisser, L. (2003). Investigating the role of glottal features in classifying clinical depression. In 25th annual international conference of the IEEE EMBAS, September 2003 (pp. 2849–2852).
go back to reference Murray, I. R., Arnott, J. L., & Rohwer, E. A. (1996). Emotional stress in synthetic speech: progress and future directions. Speech Communication, 20, 85–91. CrossRef Murray, I. R., Arnott, J. L., & Rohwer, E. A. (1996). Emotional stress in synthetic speech: progress and future directions. Speech Communication, 20, 85–91. CrossRef
go back to reference Murty, K. S. R., & Yegnanarayana, B. (2006). Combining evidence from residual phase and mfcc features for speaker recognition. IEEE Signal Processing Letters, 13, 52–55. CrossRef Murty, K. S. R., & Yegnanarayana, B. (2006). Combining evidence from residual phase and mfcc features for speaker recognition. IEEE Signal Processing Letters, 13, 52–55. CrossRef
go back to reference Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16, 1602–1613. CrossRef Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16, 1602–1613. CrossRef
go back to reference Nakatsu, R., Nicholson, J., & Tosa, N. (2000). Emotion recognition and its application to computer agents with spontaneous interactive capabilities. Knowledge-Based Systems, 13, 497–504. CrossRef Nakatsu, R., Nicholson, J., & Tosa, N. (2000). Emotion recognition and its application to computer agents with spontaneous interactive capabilities. Knowledge-Based Systems, 13, 497–504. CrossRef
go back to reference Nicholson, J., Takahashi, K., & Nakatsu, R. (1999). Emotion recognition in speech using neural networks. In 6th international conference on neural information processing (ICONIP-99), Perth, WA, Australia, August 1999 (pp. 495–501). Nicholson, J., Takahashi, K., & Nakatsu, R. (1999). Emotion recognition in speech using neural networks. In 6th international conference on neural information processing (ICONIP-99), Perth, WA, Australia, August 1999 (pp. 495–501).
go back to reference Prasanna, S. R. M., Reddy, B. V. S., & Krishnamoorthy, P. (2009). Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Transactions on Audio, Speech, and Language Processing, 17, 556–565. CrossRef Prasanna, S. R. M., Reddy, B. V. S., & Krishnamoorthy, P. (2009). Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Transactions on Audio, Speech, and Language Processing, 17, 556–565. CrossRef
go back to reference Rabiner, L. R., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall. Rabiner, L. R., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall.
go back to reference Rao, K. S. (2011a). Application of prosody models for developing speech systems in Indian languages. International Journal of Speech Technology, 14, 19–33. CrossRef Rao, K. S. (2011a). Application of prosody models for developing speech systems in Indian languages. International Journal of Speech Technology, 14, 19–33. CrossRef
go back to reference Rao, K. S. (2011b). Role of neural network models for developing speech systems. Sâdhana, 36, 783–836. CrossRef Rao, K. S. (2011b). Role of neural network models for developing speech systems. Sâdhana, 36, 783–836. CrossRef
go back to reference Rao, K. S., & Yegnanarayana, B. (2009). Intonation modeling for Indian languages. Computer Speech & Language, 23, 240–256. CrossRef Rao, K. S., & Yegnanarayana, B. (2009). Intonation modeling for Indian languages. Computer Speech & Language, 23, 240–256. CrossRef
go back to reference Rao, K. S., Prasanna, S. R. M., & Yegnanarayana, B. (2007). Determination of instants of significant excitation in speech using Hilbert envelope and group delay function. IEEE Signal Processing Letters, 14, 762–765. CrossRef Rao, K. S., Prasanna, S. R. M., & Yegnanarayana, B. (2007). Determination of instants of significant excitation in speech using Hilbert envelope and group delay function. IEEE Signal Processing Letters, 14, 762–765. CrossRef
go back to reference Rao, K. S., Saroj, V. K., Maity, S., & Koolagudi, S. G. (2011). Recognition of emotions from video using neural network models. Expert Systems with Applications, 38, 13181–13185. CrossRef Rao, K. S., Saroj, V. K., Maity, S., & Koolagudi, S. G. (2011). Recognition of emotions from video using neural network models. Expert Systems with Applications, 38, 13181–13185. CrossRef
go back to reference Rao, K. S., Koolagudi, S. G., & Vempada, R. R. (2012). Emotion recognition from speech using global and local prosodic features. International Journal of Speech Technology. doi:10.1007/s10772-012-9172-2. Rao, K. S., Koolagudi, S. G., & Vempada, R. R. (2012). Emotion recognition from speech using global and local prosodic features. International Journal of Speech Technology. doi:10.​1007/​s10772-012-9172-2.
go back to reference Reddy, K. S. (2004). Source and system features for speaker recognition. MS thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India. Reddy, K. S. (2004). Source and system features for speaker recognition. MS thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India.
go back to reference Ringeval, F., & Chetouani, M. (2008). A vowel based approach for acted emotion recognition. In Verbal and nonverbal features of human-human and human-machine interaction: COST action 2102 international conference (pp. 243–254). Berlin: Springer. CrossRef Ringeval, F., & Chetouani, M. (2008). A vowel based approach for acted emotion recognition. In Verbal and nonverbal features of human-human and human-machine interaction: COST action 2102 international conference (pp. 243–254). Berlin: Springer. CrossRef
go back to reference Sagar, T. V. (2007). Characterisation and synthesis of emotions in speech using prosodic features. Master’s thesis, Dept. of Electronics and communications Engineering, Indian Institute of Technology Guwahati, May 2007. Sagar, T. V. (2007). Characterisation and synthesis of emotions in speech using prosodic features. Master’s thesis, Dept. of Electronics and communications Engineering, Indian Institute of Technology Guwahati, May 2007.
go back to reference Scherer, S., Hofmann, H., Lampmann, M., Pfeil, M., Rhinow, S., Schwenker, F., & Palm, G. (2008). Emotion recognition from speech: stress experiment. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odjik, S. Piperidis, & D. Tapias (Eds.), Proceedings of the sixth international language resources and evaluation (LREC’08), Marrakech, Morocco, May 2008. Paris : European Language Resources Association (ELRA). Scherer, S., Hofmann, H., Lampmann, M., Pfeil, M., Rhinow, S., Schwenker, F., & Palm, G. (2008). Emotion recognition from speech: stress experiment. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odjik, S. Piperidis, & D. Tapias (Eds.), Proceedings of the sixth international language resources and evaluation (LREC’08), Marrakech, Morocco, May 2008. Paris : European Language Resources Association (ELRA).
go back to reference Schuller, B., Rigoll, G., & Lang, M. (2004). Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In Proceedings of the IEEE international conference on acoustics, speech and signal processing, May 2004 (pp. 577–580). New York: IEEE Press. Schuller, B., Rigoll, G., & Lang, M. (2004). Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In Proceedings of the IEEE international conference on acoustics, speech and signal processing, May 2004 (pp. 577–580). New York: IEEE Press.
go back to reference Schuller, B., Reiter, S., & Rigoll, G. (2006). Evolutionary feature generation in speech emotion recognition. In IEEE international conference on multimedia and expo, Toronto, ON, July 2006 (pp. 5–8). New York: IEEE Press. Schuller, B., Reiter, S., & Rigoll, G. (2006). Evolutionary feature generation in speech emotion recognition. In IEEE international conference on multimedia and expo, Toronto, ON, July 2006 (pp. 5–8). New York: IEEE Press.
go back to reference Seshadri, G. P., & Yegnanarayana, B. (2009). Perceived loudness of speech based on the characteristics of glottal excitation source. The Journal of the Acoustical Society of America, 126, 2061–2071. CrossRef Seshadri, G. P., & Yegnanarayana, B. (2009). Perceived loudness of speech based on the characteristics of glottal excitation source. The Journal of the Acoustical Society of America, 126, 2061–2071. CrossRef
go back to reference Sidorova, J. (2004). Speech emotion recognition. PhD thesis, Universitat Pompeu Fabra, July 2004. Sidorova, J. (2004). Speech emotion recognition. PhD thesis, Universitat Pompeu Fabra, July 2004.
go back to reference Ververidis, D., & Kotropoulos, C. (2006). A state of the art review on emotional speech databases. In Eleventh Australasian international conference on speech science and technology, Auckland, New Zealand, December 2006. Ververidis, D., & Kotropoulos, C. (2006). A state of the art review on emotional speech databases. In Eleventh Australasian international conference on speech science and technology, Auckland, New Zealand, December 2006.
go back to reference Ververidis, D., Kotropoulos, C., & Pitas, I. (2004). Automatic emotional speech classification. In ICASSP (pp. I593–I596). New York: IEEE Press. Ververidis, D., Kotropoulos, C., & Pitas, I. (2004). Automatic emotional speech classification. In ICASSP (pp. I593–I596). New York: IEEE Press.
go back to reference Vogt, T., & Andre, E. (2006). Improving automatic emotion recognition from speech via gender differentiation. In Proceedings of language resources and evaluation conference (LREC), May 2006. Vogt, T., & Andre, E. (2006). Improving automatic emotion recognition from speech via gender differentiation. In Proceedings of language resources and evaluation conference (LREC), May 2006.
go back to reference Vuppala, A. K., Yadav, J., Chakrabarti, S., & Rao, K. S. (2012). Vowel onset point detection for low bit rate coded speech. IEEE Transactions on Audio, Speech, and Language Processing, 20, 1894–1903. CrossRef Vuppala, A. K., Yadav, J., Chakrabarti, S., & Rao, K. S. (2012). Vowel onset point detection for low bit rate coded speech. IEEE Transactions on Audio, Speech, and Language Processing, 20, 1894–1903. CrossRef
go back to reference Wakita, H. (1976). Residual energy of linear prediction to vowel and speaker recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24, 270–271. CrossRef Wakita, H. (1976). Residual energy of linear prediction to vowel and speaker recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24, 270–271. CrossRef
go back to reference Wang, Y., & Guan, L. (2004). An investigation of speech-based human emotion recognition. In IEEE 6th workshop on multimedia signal processing, October 2004 (pp. 15–18). New York: IEEE Press. Wang, Y., & Guan, L. (2004). An investigation of speech-based human emotion recognition. In IEEE 6th workshop on multimedia signal processing, October 2004 (pp. 15–18). New York: IEEE Press.
go back to reference Williams, C. E., & Stevens, K. N. (1981). Vocal correlates of emotional states. In Speech evaluation in psychiatry (pp. 189–220). Williams, C. E., & Stevens, K. N. (1981). Vocal correlates of emotional states. In Speech evaluation in psychiatry (pp. 189–220).
go back to reference Yegnanarayana, B. (1999). Artificial neural networks. New Delhi: Prentice-Hall. Yegnanarayana, B. (1999). Artificial neural networks. New Delhi: Prentice-Hall.
go back to reference Yegnanarayana, B., Reddy, K. S., & Kishore, S. P. (2001a). Source and system features for speaker recognition using aann models. In IEEE international conference on acoustics, speech and signal processing, Salt Lake City, UT, May 2001. Yegnanarayana, B., Reddy, K. S., & Kishore, S. P. (2001a). Source and system features for speaker recognition using aann models. In IEEE international conference on acoustics, speech and signal processing, Salt Lake City, UT, May 2001.
go back to reference Yegnanarayana, B., Reddy, K. S., & Kishore, S. P. (2001b). Source and system features for speaker recognition using AANN models. In Proceedings of the IEEE international conference on acoustics, speech and signal processing, Salt Lake City, UT, USA, May 2001 (pp. 409–412). Yegnanarayana, B., Reddy, K. S., & Kishore, S. P. (2001b). Source and system features for speaker recognition using AANN models. In Proceedings of the IEEE international conference on acoustics, speech and signal processing, Salt Lake City, UT, USA, May 2001 (pp. 409–412).
go back to reference Yegnanarayana, B., Swamy, R. K., & Murty, K. S. R. (2009). Determining mixing parameters from multispeaker data using speech-specific information. IEEE Transactions on Audio, Speech, and Language Processing, 17(6), 1196–1207. CrossRef Yegnanarayana, B., Swamy, R. K., & Murty, K. S. R. (2009). Determining mixing parameters from multispeaker data using speech-specific information. IEEE Transactions on Audio, Speech, and Language Processing, 17(6), 1196–1207. CrossRef
go back to reference Zhou, Y., Sun, Y., Yang, L., & Yan, Y. (2009). Applying articulatory features to speech emotion recognition. In International conference on research challenges in computer science (ICRCCS), 28–29 December 2009 (pp. 73–76). CrossRef Zhou, Y., Sun, Y., Yang, L., & Yan, Y. (2009). Applying articulatory features to speech emotion recognition. In International conference on research challenges in computer science (ICRCCS), 28–29 December 2009 (pp. 73–76). CrossRef
go back to reference Zhou, Y., Sun, Y., Zhang, J., & Yan, Y. (2009). Speech emotion recognition using both spectral and prosodic features. In International conference on information engineering and computer science (ICIECS), Wuhan, 19–20 December 2009 (pp. 1–4). New York: IEEE Press. Zhou, Y., Sun, Y., Zhang, J., & Yan, Y. (2009). Speech emotion recognition using both spectral and prosodic features. In International conference on information engineering and computer science (ICIECS), Wuhan, 19–20 December 2009 (pp. 1–4). New York: IEEE Press.
Metadata
Title
Characterization and recognition of emotions from speech using excitation source information
Authors
Sreenivasa Rao Krothapalli
Shashidhar G. Koolagudi
Publication date
01-06-2013
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 2/2013
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-012-9175-z

Other articles of this Issue 2/2013

International Journal of Speech Technology 2/2013 Go to the issue