Skip to main content
Top
Published in: International Journal of Speech Technology 2/2012

01-06-2012

Emotion recognition from speech using source, system, and prosodic features

Authors: Shashidhar G. Koolagudi, K. Sreenivasa Rao

Published in: International Journal of Speech Technology | Issue 2/2012

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this work, source, system, and prosodic features of speech are explored for characterizing and classifying the underlying emotions. Different speech features contribute in different ways to express the emotions, due to their complementary nature. Linear prediction residual samples chosen around glottal closure regions, and glottal pulse parameters are used to represent excitation source information. Linear prediction cepstral coefficients extracted through simple block processing and pitch synchronous analysis represent the vocal tract information. Global and local prosodic features extracted from gross statistics and temporal dynamics of the sequence of duration, pitch, and energy values represent the prosodic information. Emotion recognition models are developed using above mentioned features separately, and in combination. Simulated Telugu emotion database (IITKGP-SESC) is used to evaluate the proposed features. The emotion recognition results obtained using IITKGP-SESC are compared with the results of internationally known Berlin emotion speech database (Emo-DB). Autoassociative neural networks, Gaussian mixture models, and support vector machines are used to develop emotion recognition systems with source, system, and prosodic features, respectively. Weighted combination of evidence has been used while combining the performance of systems developed using different features. From the results, it is observed that, each of the proposed speech features has contributed toward emotion recognition. The combination of features improved the emotion recognition performance, indicating the complementary nature of the features.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Anjani, A. V. N. S. (2000). Autoassociate neural network models for processing degraded speech. Master’s thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India. Anjani, A. V. N. S. (2000). Autoassociate neural network models for processing degraded speech. Master’s thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India.
go back to reference Atal, B. S. (1972). Automatic speaker recognition based on pitch contours. The Journal of the Acoustical Society of America, 52(6), 1687–1697. CrossRef Atal, B. S. (1972). Automatic speaker recognition based on pitch contours. The Journal of the Acoustical Society of America, 52(6), 1687–1697. CrossRef
go back to reference Bajpai, A., & Yegnanarayana, B. (2004). Exploring features for audio clip classification using LP residual and AANN models. In The international conference on intelligent sensing and information processing 2004 (ICISIP 2004), Chennai, India, Jan. 2004 (pp. 305–310). CrossRef Bajpai, A., & Yegnanarayana, B. (2004). Exploring features for audio clip classification using LP residual and AANN models. In The international conference on intelligent sensing and information processing 2004 (ICISIP 2004), Chennai, India, Jan. 2004 (pp. 305–310). CrossRef
go back to reference Banziger, T., & Scherer, K. R. (2005). The role of intonation in emotional expressions. Speech Communication, 46, 252–267. CrossRef Banziger, T., & Scherer, K. R. (2005). The role of intonation in emotional expressions. Speech Communication, 46, 252–267. CrossRef
go back to reference Bapineedu, G., Avinash, B., Gangashetty, S. V., & Yegnanarayana, B. (2009). Analysis of lombard speech using excitation source information. In INTERSPEECH-09, Brighton, UK, September 6–10 (pp. 1091–1094). Bapineedu, G., Avinash, B., Gangashetty, S. V., & Yegnanarayana, B. (2009). Analysis of lombard speech using excitation source information. In INTERSPEECH-09, Brighton, UK, September 6–10 (pp. 1091–1094).
go back to reference Bitouk, D., Verma, R., & Nenkova, A. (2010). Class-level spectral features for emotion recognition. Speech Communication, 52(7–8), 613–625. CrossRef Bitouk, D., Verma, R., & Nenkova, A. (2010). Class-level spectral features for emotion recognition. Speech Communication, 52(7–8), 613–625. CrossRef
go back to reference Burkhardt, F., & Sendlmeier, W. F. (2000). Verification of acoustical correlates of emotional speech using formant synthesis. In ITRW on speech and emotion, Newcastle, Northern Ireland, UK, Sept. 2000 (pp. 151–156). Burkhardt, F., & Sendlmeier, W. F. (2000). Verification of acoustical correlates of emotional speech using formant synthesis. In ITRW on speech and emotion, Newcastle, Northern Ireland, UK, Sept. 2000 (pp. 151–156).
go back to reference Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005). A database of German emotional speech. In Interspeech. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005). A database of German emotional speech. In Interspeech.
go back to reference Cahn, J. E. (1990). The generation of affect in synthesized speech. In JAVIOS, Jul. 1990 (pp. 1–19). Cahn, J. E. (1990). The generation of affect in synthesized speech. In JAVIOS, Jul. 1990 (pp. 1–19).
go back to reference Cowie, R., & Cornelius, R. R. (2003). Describing the emotional states that are expressed in speech. Speech Communication, 40, 5–32. MATHCrossRef Cowie, R., & Cornelius, R. R. (2003). Describing the emotional states that are expressed in speech. Speech Communication, 40, 5–32. MATHCrossRef
go back to reference Cummings, K. E., & Clements, M. A. (1995). Analysis of the glottal excitation of emotionally styled and stressed speech. The Journal of the Acoustical Society of America, 98, 88–98. CrossRef Cummings, K. E., & Clements, M. A. (1995). Analysis of the glottal excitation of emotionally styled and stressed speech. The Journal of the Acoustical Society of America, 98, 88–98. CrossRef
go back to reference Dellaert, F., Polzin, T., & Waibel, A. (1996). Recognising emotions in speech. In ICSLP 96, Oct. 1996. Dellaert, F., Polzin, T., & Waibel, A. (1996). Recognising emotions in speech. In ICSLP 96, Oct. 1996.
go back to reference Dellert, F., Polzin, T., & Waibel, A. (1996). Recognizing emotion in speech. In 4th international conference on spoken language processing, Philadelphia, PA, USA, Oct. 1996 (pp. 1970–1973). CrossRef Dellert, F., Polzin, T., & Waibel, A. (1996). Recognizing emotion in speech. In 4th international conference on spoken language processing, Philadelphia, PA, USA, Oct. 1996 (pp. 1970–1973). CrossRef
go back to reference Desai, S., Black, A. W., Yegnanarayana, B., & Prahallad, K. (2010). Spectral mapping using artificial neural networks for voice conversion. IEEE Transactions on Audio, Speech, and Language Processing, 18, 954–964. CrossRef Desai, S., Black, A. W., Yegnanarayana, B., & Prahallad, K. (2010). Spectral mapping using artificial neural networks for voice conversion. IEEE Transactions on Audio, Speech, and Language Processing, 18, 954–964. CrossRef
go back to reference Diamantaras, K. I., & Kung, S. Y. (1996). Principal component neural networks: Theory and applications. New York: Wiley. MATH Diamantaras, K. I., & Kung, S. Y. (1996). Principal component neural networks: Theory and applications. New York: Wiley. MATH
go back to reference Gupta, C. S. (2003). Significance of source features for speaker recognition. Master’s thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India. Gupta, C. S. (2003). Significance of source features for speaker recognition. Master’s thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India.
go back to reference Gupta, C. S., Prasanna, S. R. M., & Yegnanarayana, B. (2002). Autoassociative neural network models for online speaker verification using source features from vowels. In Int. joint conf. neural networks, Honolulu, Hawaii, USA, May 2002. Gupta, C. S., Prasanna, S. R. M., & Yegnanarayana, B. (2002). Autoassociative neural network models for online speaker verification using source features from vowels. In Int. joint conf. neural networks, Honolulu, Hawaii, USA, May 2002.
go back to reference hao Kao, Y., & shan Lee, L. (2006). Feature analysis for emotion recognition from Mandarin speech considering the special characteristics of Chinese language. In INTERSPEECH—ICSLP, Pittsburgh, Pennsylvania, Sept. 2006 (pp. 1814–1817). hao Kao, Y., & shan Lee, L. (2006). Feature analysis for emotion recognition from Mandarin speech considering the special characteristics of Chinese language. In INTERSPEECH—ICSLP, Pittsburgh, Pennsylvania, Sept. 2006 (pp. 1814–1817).
go back to reference Haykin, S. (1999) Neural networks: A comprehensive foundation. New Delhi: Pearson Education Asia, Inc. MATH Haykin, S. (1999) Neural networks: A comprehensive foundation. New Delhi: Pearson Education Asia, Inc. MATH
go back to reference Hua, L. Z., Yu, H., & Hua, W. R. (2005). A novel source analysis method by matching spectral characters of LF model with STRAIGHT spectrum. Berlin: Springer. Hua, L. Z., Yu, H., & Hua, W. R. (2005). A novel source analysis method by matching spectral characters of LF model with STRAIGHT spectrum. Berlin: Springer.
go back to reference Iida, A., Campbell, N., Higuchi, F., & Yasumura, M. (2003). A corpus-based speech synthesis system with emotion. Speech Communication, 40, 161–187. MATHCrossRef Iida, A., Campbell, N., Higuchi, F., & Yasumura, M. (2003). A corpus-based speech synthesis system with emotion. Speech Communication, 40, 161–187. MATHCrossRef
go back to reference Ikbal, M. S., Misra, H., & Yegnanarayana, B. (1999). Analysis of autoassociative mapping neural networks. In Int. joint conf. neural networks, USA (pp. 854–858). Ikbal, M. S., Misra, H., & Yegnanarayana, B. (1999). Analysis of autoassociative mapping neural networks. In Int. joint conf. neural networks, USA (pp. 854–858).
go back to reference Iliou, T., & Anagnostopoulos, C. N. (2009). Statistical evaluation of speech features for emotion recognition. In Fourth international conference on digital telecommunications, Colmar, France, July 2009 (pp. 121–126). CrossRef Iliou, T., & Anagnostopoulos, C. N. (2009). Statistical evaluation of speech features for emotion recognition. In Fourth international conference on digital telecommunications, Colmar, France, July 2009 (pp. 121–126). CrossRef
go back to reference Kamaruddin, N., & Wahab, A. (2009). Features extraction for speech emotion. Journal of Computational Methods in Science and Engineering, 9(9), 1–12. MATH Kamaruddin, N., & Wahab, A. (2009). Features extraction for speech emotion. Journal of Computational Methods in Science and Engineering, 9(9), 1–12. MATH
go back to reference Kishore, S. P., & Yegnanarayana, B. (2001). Online text-independent speaker verification system using autoassociative neural network models. In Int. joint conf. neural networks, Washington, USA, Aug. 2001 (Vol. 2, pp. 1548–1553). Kishore, S. P., & Yegnanarayana, B. (2001). Online text-independent speaker verification system using autoassociative neural network models. In Int. joint conf. neural networks, Washington, USA, Aug. 2001 (Vol. 2, pp. 1548–1553).
go back to reference Koolagudi, S. G., Maity, S., Kumar, V. A., Chakrabarti, S., & Rao, K. S. (2009). IITKGP-SESC: Speech database for emotion analysis. Communications in computer and information science, LNCS. Berlin: Springer. Koolagudi, S. G., Maity, S., Kumar, V. A., Chakrabarti, S., & Rao, K. S. (2009). IITKGP-SESC: Speech database for emotion analysis. Communications in computer and information science, LNCS. Berlin: Springer.
go back to reference Koolagudi, S. G., & Rao, K. S. (2009). Exploring speech features for classifying emotions along valence dimension. In S. Chandhury, et al. (Eds.), LNCS. The 3rd international conference on pattern recognition and machine intelligence (PReMI-09), IIT Delhi, December 2009 (pp. 537–542). Heidelberg: Springer. CrossRef Koolagudi, S. G., & Rao, K. S. (2009). Exploring speech features for classifying emotions along valence dimension. In S. Chandhury, et al. (Eds.), LNCS. The 3rd international conference on pattern recognition and machine intelligence (PReMI-09), IIT Delhi, December 2009 (pp. 537–542). Heidelberg: Springer. CrossRef
go back to reference Kumar, K. S., Reddy, M. S. H., Murty, K. S. R., & Yegnanarayana, B. (2009). Analysis of laugh signals for detecting in continuous speech. In INTERSPEECH-09, Brighton, UK, September 6–10 (pp. 1591–1594). Kumar, K. S., Reddy, M. S. H., Murty, K. S. R., & Yegnanarayana, B. (2009). Analysis of laugh signals for detecting in continuous speech. In INTERSPEECH-09, Brighton, UK, September 6–10 (pp. 1591–1594).
go back to reference Lee, C. M., & Narayanan, S. S. (2005). Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13, 293–303. CrossRef Lee, C. M., & Narayanan, S. S. (2005). Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13, 293–303. CrossRef
go back to reference Liu, J. H. L., & Palm, G. (1997). On the use of features from prediction residual signal in speaker recognition. In European conf. speech processing and technology (EUROSPEECH) (pp. 313–316). Liu, J. H. L., & Palm, G. (1997). On the use of features from prediction residual signal in speaker recognition. In European conf. speech processing and technology (EUROSPEECH) (pp. 313–316).
go back to reference Luengo, I., Navas, E., Hernez, I., & Snchez, J. (2005). Automatic emotion recognition using prosodic parameters. In INTERSPEECH, Lisbon, Portugal, Sept. 2005 (pp. 493–496). Luengo, I., Navas, E., Hernez, I., & Snchez, J. (2005). Automatic emotion recognition using prosodic parameters. In INTERSPEECH, Lisbon, Portugal, Sept. 2005 (pp. 493–496).
go back to reference Lugger, M., & Yang, B. (2007). The relevance of voice quality features in speaker independent emotion recognition. In ICASSP, Honolulu, Hawaii, USA, May 2007 (pp. IV17–IV20). New York: IEEE. Lugger, M., & Yang, B. (2007). The relevance of voice quality features in speaker independent emotion recognition. In ICASSP, Honolulu, Hawaii, USA, May 2007 (pp. IV17–IV20). New York: IEEE.
go back to reference Mary, L., & Yegnanarayana, B. (2004). Autoassociative neural network models for language identification. In International conference on intelligent sensing and information processing, Aug. 24 2004 (pp. 317–320). New York: IEEE. CrossRef Mary, L., & Yegnanarayana, B. (2004). Autoassociative neural network models for language identification. In International conference on intelligent sensing and information processing, Aug. 24 2004 (pp. 317–320). New York: IEEE. CrossRef
go back to reference McGilloway, S., Cowie, R., Douglas-Cowie, E., Gielen, S., Westerdijk, M., & Stroeve, S. (2000). Approaching automatic recognition of emotion from voice: A rough benchmark. In ISCA workshop on speech and emotion, Belfast. McGilloway, S., Cowie, R., Douglas-Cowie, E., Gielen, S., Westerdijk, M., & Stroeve, S. (2000). Approaching automatic recognition of emotion from voice: A rough benchmark. In ISCA workshop on speech and emotion, Belfast.
go back to reference Mohan, C. K., & Yegnanarayana, B. (2008). Classification of sport videos using edge-based features and autoassociative neural network models. Signal, Image and Video Processing, 4, 61–73. CrossRef Mohan, C. K., & Yegnanarayana, B. (2008). Classification of sport videos using edge-based features and autoassociative neural network models. Signal, Image and Video Processing, 4, 61–73. CrossRef
go back to reference Mubarak, O. M., Ambikairajah, E., & Epps, J. (2005). Analysis of an MFCC-based audio indexing system for efficient coding of multimedia sources. In 8th international symposium on signal processing and its applications, Sydney, Australia, Aug. 2005. Mubarak, O. M., Ambikairajah, E., & Epps, J. (2005). Analysis of an MFCC-based audio indexing system for efficient coding of multimedia sources. In 8th international symposium on signal processing and its applications, Sydney, Australia, Aug. 2005.
go back to reference Murray, I. R., & Arnott, J. L. (1995). Implementation and testing of a system for producing emotion by rule in synthetic speech. Speech Communication, 16, 369–390. CrossRef Murray, I. R., & Arnott, J. L. (1995). Implementation and testing of a system for producing emotion by rule in synthetic speech. Speech Communication, 16, 369–390. CrossRef
go back to reference Murray, I. R., Arnott, J. L., & Rohwer, E. A. (1996). Emotional stress in synthetic speech: Progress and future directions. Speech Communication, 20, 85–91. CrossRef Murray, I. R., Arnott, J. L., & Rohwer, E. A. (1996). Emotional stress in synthetic speech: Progress and future directions. Speech Communication, 20, 85–91. CrossRef
go back to reference Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16, 1602–1613. CrossRef Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16, 1602–1613. CrossRef
go back to reference Muta, H., Baer, T., Wagatsuma, K., Muraoka, T., & Fukuda, H. (1988). Pitch synchronous analysis of hoarseness in running speech. The Journal of the Acoustical Society of America, 84, 1292–1301. CrossRef Muta, H., Baer, T., Wagatsuma, K., Muraoka, T., & Fukuda, H. (1988). Pitch synchronous analysis of hoarseness in running speech. The Journal of the Acoustical Society of America, 84, 1292–1301. CrossRef
go back to reference Neiberg, D., Elenius, K., & Laskowski, K. (2006). Emotion recognition in spontaneous speech using GMMs. In INTERSPEECH—ICSLP. Pittsburgh, Pennsylvania, 17–19 September 2006 (pp. 809–812). Neiberg, D., Elenius, K., & Laskowski, K. (2006). Emotion recognition in spontaneous speech using GMMs. In INTERSPEECH—ICSLP. Pittsburgh, Pennsylvania, 17–19 September 2006 (pp. 809–812).
go back to reference Nwe, T. L., Foo, S. W., & Silva, L. C. D. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41, 603–623. CrossRef Nwe, T. L., Foo, S. W., & Silva, L. C. D. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41, 603–623. CrossRef
go back to reference Oudeyer, P. Y. (2003). The production and recognition of emotions in speech: features and algorithms. International Journal of Human-Computer Studies, 59, 157–183. CrossRef Oudeyer, P. Y. (2003). The production and recognition of emotions in speech: features and algorithms. International Journal of Human-Computer Studies, 59, 157–183. CrossRef
go back to reference Pao, T. L., Chen, Y. T., Yeh, J. H., & Liao, W. Y. (2005). Combining acoustic features for improved emotion recognition in Mandarin speech. In J. Tao, T. Tan, & R. Picard (Eds.), LNCS. ACII, Berlin, Heidelberg (pp. 279–285), Berlin: Springer. Pao, T. L., Chen, Y. T., Yeh, J. H., & Liao, W. Y. (2005). Combining acoustic features for improved emotion recognition in Mandarin speech. In J. Tao, T. Tan, & R. Picard (Eds.), LNCS. ACII, Berlin, Heidelberg (pp. 279–285), Berlin: Springer.
go back to reference Pao, T. L., Chen, Y. T., Yeh, J. H., Cheng, Y. M., & Chien, C. S. (2007). Feature combination for better differentiating anger from neutral in mandarin emotional speech. In LNCS: Vol. 4738. ACII 2007. Berlin: Springer. Pao, T. L., Chen, Y. T., Yeh, J. H., Cheng, Y. M., & Chien, C. S. (2007). Feature combination for better differentiating anger from neutral in mandarin emotional speech. In LNCS: Vol. 4738. ACII 2007. Berlin: Springer.
go back to reference Prasanna, S. R. M., Reddy, B. V. S., & Krishnamoorthy, P. (2009). Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Transactions on Audio, Speech, and Language Processing, 17, 556–565. CrossRef Prasanna, S. R. M., Reddy, B. V. S., & Krishnamoorthy, P. (2009). Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Transactions on Audio, Speech, and Language Processing, 17, 556–565. CrossRef
go back to reference Rao, K. S. (2005). Acquisition and incorporation prosody knowledge for speech systems in Indian languages. PhD thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India, May 2005. Rao, K. S. (2005). Acquisition and incorporation prosody knowledge for speech systems in Indian languages. PhD thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India, May 2005.
go back to reference Rao, K. S., & Yegnanarayana, B. (2009). Intonation modeling for Indian languages. Computer Speech and Language, 23, 240–256. CrossRef Rao, K. S., & Yegnanarayana, B. (2009). Intonation modeling for Indian languages. Computer Speech and Language, 23, 240–256. CrossRef
go back to reference Rao, K. S., Prasanna, S. R. M., & Yegnanarayana, B. (2007). Determination of instants of significant excitation in speech using Hilbert envelope and group delay function. IEEE Signal Processing Letters, 14, 762–765. CrossRef Rao, K. S., Prasanna, S. R. M., & Yegnanarayana, B. (2007). Determination of instants of significant excitation in speech using Hilbert envelope and group delay function. IEEE Signal Processing Letters, 14, 762–765. CrossRef
go back to reference Rao, K. S., Reddy, R., Maity, S., & Koolagudi, S. G. (2010). Characterization of emotions using the dynamics of prosodic features. In International conference on speech prosody, Chicago, USA, May 2010. Rao, K. S., Reddy, R., Maity, S., & Koolagudi, S. G. (2010). Characterization of emotions using the dynamics of prosodic features. In International conference on speech prosody, Chicago, USA, May 2010.
go back to reference Reddy, K. S. (2004). Source and system features for speaker recognition. Master’s thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India. Reddy, K. S. (2004). Source and system features for speaker recognition. Master’s thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India.
go back to reference Scherer, K. R. (2003). Vocal communication of emotion: A review of research paradigms. Speech Communication, 40, 227–256. MATHCrossRef Scherer, K. R. (2003). Vocal communication of emotion: A review of research paradigms. Speech Communication, 40, 227–256. MATHCrossRef
go back to reference Schroder, M. (2001). Emoptional speech synthesis: A review. In Seventh European conference on speech communication and technology, Eurospeech, Aalborg, Denmark, Sept. 2001. Schroder, M. (2001). Emoptional speech synthesis: A review. In Seventh European conference on speech communication and technology, Eurospeech, Aalborg, Denmark, Sept. 2001.
go back to reference Schroder, M., & Cowie, R. (2006). Issues in emotion-oriented computing toward a shared understanding. In Workshop on emotion and computing (HUMAINE). Schroder, M., & Cowie, R. (2006). Issues in emotion-oriented computing toward a shared understanding. In Workshop on emotion and computing (HUMAINE).
go back to reference Seshadri, G. P., & Yegnanarayana, B. (2009). Perceived loudness of speech based on the characteristics of glottal excitation source. The Journal of the Acoustical Society of America, 126, 2061–2071. CrossRef Seshadri, G. P., & Yegnanarayana, B. (2009). Perceived loudness of speech based on the characteristics of glottal excitation source. The Journal of the Acoustical Society of America, 126, 2061–2071. CrossRef
go back to reference Sigmund, M. (2007). Spectral analysis of speech under stress. International Journal of Computer Science and Network Security, 7, 170–172. Sigmund, M. (2007). Spectral analysis of speech under stress. International Journal of Computer Science and Network Security, 7, 170–172.
go back to reference Tato, R., Santos, R., & Pardo, R. K. J. (2002). Emotional space improves emotion recognition. In 7th international conference on spoken language processing, Denver, Colorado, USA, Sept. 16–20 2002. Tato, R., Santos, R., & Pardo, R. K. J. (2002). Emotional space improves emotion recognition. In 7th international conference on spoken language processing, Denver, Colorado, USA, Sept. 16–20 2002.
go back to reference Theodoridis, S., & Koutroumbas, K. (2006). Pattern recognition (3rd ed.). New York: Elsevier, Academic Press. MATH Theodoridis, S., & Koutroumbas, K. (2006). Pattern recognition (3rd ed.). New York: Elsevier, Academic Press. MATH
go back to reference Thevenaz, P., & Hugli, H. (1995). Usefulness of LPC residue in textindependent speaker verification. Speech Communication, 17, 145–157. CrossRef Thevenaz, P., & Hugli, H. (1995). Usefulness of LPC residue in textindependent speaker verification. Speech Communication, 17, 145–157. CrossRef
go back to reference Ververidis, D., & Kotropoulos, C. (2006). A state of the art review on emotional speech databases. In Eleventh Australasian international conference on speech science and technology, Auckland, New Zealand, Dec. 2006. Ververidis, D., & Kotropoulos, C. (2006). A state of the art review on emotional speech databases. In Eleventh Australasian international conference on speech science and technology, Auckland, New Zealand, Dec. 2006.
go back to reference Ververidis, D., Kotropoulos, C., & Pitas, I. (2004). Automatic emotional speech classification. In ICASSP (pp. I593–I596). New York: IEEE. Ververidis, D., Kotropoulos, C., & Pitas, I. (2004). Automatic emotional speech classification. In ICASSP (pp. I593–I596). New York: IEEE.
go back to reference Wakita, H. (1976). Residual energy of linear prediction to vowel and speaker recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24, 270–271. CrossRef Wakita, H. (1976). Residual energy of linear prediction to vowel and speaker recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24, 270–271. CrossRef
go back to reference Wang, Y., Du, S., & Zhan, Y. (2008). Adaptive and optimal classification of speech emotion recognition. In Fourth international conference on natural computation, Oct. 2008 (pp. 407–411). CrossRef Wang, Y., Du, S., & Zhan, Y. (2008). Adaptive and optimal classification of speech emotion recognition. In Fourth international conference on natural computation, Oct. 2008 (pp. 407–411). CrossRef
go back to reference Williams, C. E., & Stevens, K. N. (1981). Vocal correlates of emotional states. In Speech evaluation in psychiatry (pp. 189–220). Williams, C. E., & Stevens, K. N. (1981). Vocal correlates of emotional states. In Speech evaluation in psychiatry (pp. 189–220).
go back to reference Yegnanarayana, B. (1999). Artificial neural networks. New Delhi: Prentice-Hall. Yegnanarayana, B. (1999). Artificial neural networks. New Delhi: Prentice-Hall.
go back to reference Yegnanarayana, B., & Kishore, S. P. (2002). AANN an alternative to GMM for pattern recognition. Neural Networks, 15, 459–469. CrossRef Yegnanarayana, B., & Kishore, S. P. (2002). AANN an alternative to GMM for pattern recognition. Neural Networks, 15, 459–469. CrossRef
go back to reference Yegnanarayana, B., Murthy, P. S., Avendano, C., & Hermansky, H. (1998). Enhancement of reverberant speech using lp residual. In IEEE international conference on acoustics, speech and signal processing, Seattle, WA, USA, May 1998 (Vol. 1, pp. 405–408). New York: IEEE Xplore. Yegnanarayana, B., Murthy, P. S., Avendano, C., & Hermansky, H. (1998). Enhancement of reverberant speech using lp residual. In IEEE international conference on acoustics, speech and signal processing, Seattle, WA, USA, May 1998 (Vol. 1, pp. 405–408). New York: IEEE Xplore.
go back to reference Yegnanarayana, B., Reddy, K. S., & Kishore, S. P. (2001a). Source and system features for speaker recognition using aann models. In IEEE int. conf. acoust., speech, and signal processing, Salt Lake City, UT, May 2001. Yegnanarayana, B., Reddy, K. S., & Kishore, S. P. (2001a). Source and system features for speaker recognition using aann models. In IEEE int. conf. acoust., speech, and signal processing, Salt Lake City, UT, May 2001.
go back to reference Yegnanarayana, B., Reddy, K. S., & Kishore, S. P. (2001b). Source and system features for speaker recognition using AANN models. In Proc. IEEE int. conf. acoust., speech, signal processing, Salt Lake City, Utah, USA, May 2001 (pp. 409–412). Yegnanarayana, B., Reddy, K. S., & Kishore, S. P. (2001b). Source and system features for speaker recognition using AANN models. In Proc. IEEE int. conf. acoust., speech, signal processing, Salt Lake City, Utah, USA, May 2001 (pp. 409–412).
go back to reference Yegnanarayana, B., Swamy, R. K., & Murty, K. S. R. (2009). Determining mixing parameters from multispeaker data using speech-specific information. IEEE Transactions on Audio, Speech, and Language Processing, 17(6), 1196–1207. CrossRef Yegnanarayana, B., Swamy, R. K., & Murty, K. S. R. (2009). Determining mixing parameters from multispeaker data using speech-specific information. IEEE Transactions on Audio, Speech, and Language Processing, 17(6), 1196–1207. CrossRef
go back to reference Yildirim, S., Bulut, M., Lee, C. M., Kazemzadeh, A., Busso, C., Deng, Z., Lee, S., & Narayanan, S. (2004). An acoustic study of emotions expressed in speech. In Int. conf. on spoken language processing (ICSLP 2004), Jeju Island, Korea, Oct. 2004. Yildirim, S., Bulut, M., Lee, C. M., Kazemzadeh, A., Busso, C., Deng, Z., Lee, S., & Narayanan, S. (2004). An acoustic study of emotions expressed in speech. In Int. conf. on spoken language processing (ICSLP 2004), Jeju Island, Korea, Oct. 2004.
go back to reference Zeng, Y., Wu, H., & Gao, R. (2007). Pitch synchronous analysis method and Fisher criterion based speaker identification. In Third international conference on natural computation, Washington DC, USA (pp. 691–695). Washington: IEEE Computer Society. CrossRef Zeng, Y., Wu, H., & Gao, R. (2007). Pitch synchronous analysis method and Fisher criterion based speaker identification. In Third international conference on natural computation, Washington DC, USA (pp. 691–695). Washington: IEEE Computer Society. CrossRef
go back to reference Zhang, S. (2008). Emotion recognition in Chinese natural speech by combining prosody and voice quality features. In Sun, et al. (Eds.), Lecture notes in computer science. Advances in neural networks (pp. 457–464). Berlin: Springer. Zhang, S. (2008). Emotion recognition in Chinese natural speech by combining prosody and voice quality features. In Sun, et al. (Eds.), Lecture notes in computer science. Advances in neural networks (pp. 457–464). Berlin: Springer.
go back to reference Zhu, A., & Luo, Q. (2007). Study on speech emotion recognition system in E-learning. In J. Jacko (Ed.), LNCS. Human computer interaction, Part III, HCII (pp. 544–552). Berlin: Springer. Zhu, A., & Luo, Q. (2007). Study on speech emotion recognition system in E-learning. In J. Jacko (Ed.), LNCS. Human computer interaction, Part III, HCII (pp. 544–552). Berlin: Springer.
Metadata
Title
Emotion recognition from speech using source, system, and prosodic features
Authors
Shashidhar G. Koolagudi
K. Sreenivasa Rao
Publication date
01-06-2012
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 2/2012
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-012-9139-3

Other articles of this Issue 2/2012

International Journal of Speech Technology 2/2012 Go to the issue