Skip to main content
Top
Published in:
Cover of the book

2017 | OriginalPaper | Chapter

Speech Emotion Recognition Using Local and Global Features

Authors : Yuanbo Gao, Baobin Li, Ning Wang, Tingshao Zhu

Published in: Brain Informatics

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Speech is an easy and useful way to detect speakers’ mental and psychological health, and automatic emotion recognition in speech has been investigated widely in the fields of human-machine interaction, psychology, psychiatry, etc. In this paper, we extract prosodic and spectral features including pitch, MFCC, intensity, ZCR and LSP to establish the emotion recognition model with SVM classifier. In particular, we find different frame duration and overlap have different influences on final results. So, Depth-First-Search method is applied to find the best parameters. Experimental results on two known databases, EMODB and RAVDESS, show that this model works well, and our speech features are enough effectively in characterizing and recognizing emotions.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Minker, W., Pittermann, J., Pittermann, A., Strauß, P.M., Bühler, D.: Challenges in speech-based human-computer interfaces. Int. J. Speech Technol. 10(2–3), 109–119 (2007)CrossRef Minker, W., Pittermann, J., Pittermann, A., Strauß, P.M., Bühler, D.: Challenges in speech-based human-computer interfaces. Int. J. Speech Technol. 10(2–3), 109–119 (2007)CrossRef
2.
go back to reference Ntalampiras, S., Potamitis, I., Fakotakis, N.: An adaptive framework for acoustic monitoring of potential hazards. EURASIP J. Audio Speech Music Process. 2009, 13 (2009)CrossRefMATH Ntalampiras, S., Potamitis, I., Fakotakis, N.: An adaptive framework for acoustic monitoring of potential hazards. EURASIP J. Audio Speech Music Process. 2009, 13 (2009)CrossRefMATH
3.
go back to reference Cummings, K.E., Clements, M.A., Hansen, J.H.: Estimation and comparison of the glottal source waveform across stress styles using glottal inverse filtering. In: Proceedings of the IEEE Energy and Information Technologies in the Southeast. Southeastcon 1989, pp. 776–781. IEEE (1989) Cummings, K.E., Clements, M.A., Hansen, J.H.: Estimation and comparison of the glottal source waveform across stress styles using glottal inverse filtering. In: Proceedings of the IEEE Energy and Information Technologies in the Southeast. Southeastcon 1989, pp. 776–781. IEEE (1989)
4.
go back to reference Seppänen, T., Väyrynen, E., Toivanen, J.: Prosody-based classification of emotions in spoken finnish. In: INTERSPEECH (2003) Seppänen, T., Väyrynen, E., Toivanen, J.: Prosody-based classification of emotions in spoken finnish. In: INTERSPEECH (2003)
5.
go back to reference Origlia, A., Galatà, V., Ludusan, B.: Automatic classification of emotions via global and local prosodic features on a multilingual emotional database. In: Proceeding of the 2010 Speech Prosody. Chicago (2010) Origlia, A., Galatà, V., Ludusan, B.: Automatic classification of emotions via global and local prosodic features on a multilingual emotional database. In: Proceeding of the 2010 Speech Prosody. Chicago (2010)
6.
go back to reference Atal, B.S.: Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. J. Acoust. Soc. Am. 55(6), 1304–1312 (1974)CrossRef Atal, B.S.: Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. J. Acoust. Soc. Am. 55(6), 1304–1312 (1974)CrossRef
7.
go back to reference Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)CrossRef Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)CrossRef
8.
go back to reference Ververidis, D., Kotropoulos, C., Pitas, I.: Automatic emotional speech classification. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP 2004), vol. 1, IEEE I-593 (2004) Ververidis, D., Kotropoulos, C., Pitas, I.: Automatic emotional speech classification. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP 2004), vol. 1, IEEE I-593 (2004)
9.
go back to reference Fernandez, R., Picard, R.W.: Classical and novel discriminant features for affect recognition from speech. In: Interspeech, pp. 473–476 (2005) Fernandez, R., Picard, R.W.: Classical and novel discriminant features for affect recognition from speech. In: Interspeech, pp. 473–476 (2005)
10.
go back to reference Bou-Ghazale, S.E., Hansen, J.H.: A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Trans. Speech Audio Process. 8(4), 429–442 (2000)CrossRef Bou-Ghazale, S.E., Hansen, J.H.: A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Trans. Speech Audio Process. 8(4), 429–442 (2000)CrossRef
11.
go back to reference Rabiner, L.R., Schafer, R.W.: Digital processing of speech signals (prentice-hall series in signal processing) (1978) Rabiner, L.R., Schafer, R.W.: Digital processing of speech signals (prentice-hall series in signal processing) (1978)
12.
go back to reference Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden markov models. Speech Commun. 41(4), 603–623 (2003)CrossRef Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden markov models. Speech Commun. 41(4), 603–623 (2003)CrossRef
13.
go back to reference Wu, S., Falk, T.H., Chan, W.Y.: Automatic speech emotion recognition using modulation spectral features. Speech Commun. 53(5), 768–785 (2011)CrossRef Wu, S., Falk, T.H., Chan, W.Y.: Automatic speech emotion recognition using modulation spectral features. Speech Commun. 53(5), 768–785 (2011)CrossRef
14.
go back to reference Li, X., Tao, J., Johnson, M.T., Soltis, J., Savage, A., Leong, K.M., Newman, J.D.: Stress and emotion classification using jitter and shimmer features. In: IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP 2007, vol. 4, IEEE IV-1081 (2007) Li, X., Tao, J., Johnson, M.T., Soltis, J., Savage, A., Leong, K.M., Newman, J.D.: Stress and emotion classification using jitter and shimmer features. In: IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP 2007, vol. 4, IEEE IV-1081 (2007)
15.
go back to reference Lugger, M., Janoir, M.E., Yang, B.: Combining classifiers with diverse feature sets for robust speaker independent emotion recognition. In: 2009 17th European Signal Processing Conference, pp. 1225–1229. IEEE (2009) Lugger, M., Janoir, M.E., Yang, B.: Combining classifiers with diverse feature sets for robust speaker independent emotion recognition. In: 2009 17th European Signal Processing Conference, pp. 1225–1229. IEEE (2009)
16.
go back to reference Pan, Y., Shen, P., Shen, L.: Speech emotion recognition using support vector machine. Int. J. Smart Home 6(2), 101–108 (2012) Pan, Y., Shen, P., Shen, L.: Speech emotion recognition using support vector machine. Int. J. Smart Home 6(2), 101–108 (2012)
17.
go back to reference Chen, L., Mao, X., Xue, Y., Cheng, L.L.: Speech emotion recognition: features and classification models. Digit. Signal Process. 22(6), 1154–1160 (2012)CrossRefMathSciNet Chen, L., Mao, X., Xue, Y., Cheng, L.L.: Speech emotion recognition: features and classification models. Digit. Signal Process. 22(6), 1154–1160 (2012)CrossRefMathSciNet
18.
go back to reference Deng, J., Zhang, Z., Marchi, E., Schuller, B.: Sparse autoencoder-based feature transfer learning for speech emotion recognition 7971, 511–516 (2013) Deng, J., Zhang, Z., Marchi, E., Schuller, B.: Sparse autoencoder-based feature transfer learning for speech emotion recognition 7971, 511–516 (2013)
19.
go back to reference Han, K., Yu, D., Tashev, I.: Speech emotion recognition using deep neural network and extreme learning machine. In: Interspeech, pp. 223–227 (2014) Han, K., Yu, D., Tashev, I.: Speech emotion recognition using deep neural network and extreme learning machine. In: Interspeech, pp. 223–227 (2014)
20.
go back to reference Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database of german emotional speech. Interspeech 5, 1517–1520 (2005) Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database of german emotional speech. Interspeech 5, 1517–1520 (2005)
21.
go back to reference Livingstone, S., Peck, K., Russo, F.: Ravdess: the ryerson audio-visual database of emotional speech and song. In: 22nd Annual Meeting of the Canadian Society for Brain, Behaviour and Cognitive Science (CSBBCS) (2012) Livingstone, S., Peck, K., Russo, F.: Ravdess: the ryerson audio-visual database of emotional speech and song. In: 22nd Annual Meeting of the Canadian Society for Brain, Behaviour and Cognitive Science (CSBBCS) (2012)
22.
go back to reference Eyben, F., Wöllmer, M., Schuller, B.: Opensmile: the munich versatile and fast open-source audio feature extractor. In: Proceedings of the 18th ACM international conference on Multimedia, pp. 1459–1462. ACM (2010) Eyben, F., Wöllmer, M., Schuller, B.: Opensmile: the munich versatile and fast open-source audio feature extractor. In: Proceedings of the 18th ACM international conference on Multimedia, pp. 1459–1462. ACM (2010)
23.
go back to reference Kotti, M., Paternò, F.: Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema. Int. J. Speech Technol. 15(2), 131–150 (2012)CrossRef Kotti, M., Paternò, F.: Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema. Int. J. Speech Technol. 15(2), 131–150 (2012)CrossRef
24.
go back to reference Lampropoulos, A.S., Tsihrintzis, G.A.: Evaluation of MPEG-7 Descriptors for Speech Emotional Recognition (2012) Lampropoulos, A.S., Tsihrintzis, G.A.: Evaluation of MPEG-7 Descriptors for Speech Emotional Recognition (2012)
25.
go back to reference Wang, K., An, N., Li, B.N., Zhang, Y., Li, L.: Speech emotion recognition using fourier parameters. IEEE Trans. Affect. Comput. 6(1), 69–75 (2015)CrossRef Wang, K., An, N., Li, B.N., Zhang, Y., Li, L.: Speech emotion recognition using fourier parameters. IEEE Trans. Affect. Comput. 6(1), 69–75 (2015)CrossRef
26.
go back to reference Zhang, B., Essl, G., Provost, E.M.: Recognizing emotion from singing and speaking using shared models. In: 2015 International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 139–145. IEEE (2015) Zhang, B., Essl, G., Provost, E.M.: Recognizing emotion from singing and speaking using shared models. In: 2015 International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 139–145. IEEE (2015)
Metadata
Title
Speech Emotion Recognition Using Local and Global Features
Authors
Yuanbo Gao
Baobin Li
Ning Wang
Tingshao Zhu
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-70772-3_1

Premium Partner