Skip to main content

2015 | OriginalPaper | Buchkapitel

Significance of Emotionally Significant Regions of Speech for Emotive to Neutral Conversion

verfasst von : Hari Krishna Vydana, V. V. Vidyadhara Raju, Suryakanth V. Gangashetty, Anil Kumar Vuppala

Erschienen in: Mining Intelligence and Knowledge Exploration

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Most of the speech processing applications suffer from a degradation in performance when operated in emotional environments. The degradation in performance is mostly due to a mismatch between developing and operating environments. Model adaptation and feature adaptation schemes have been employed to adapt speech systems developed in neutral environments to emotional environments. In this study, we have considered only anger emotion in emotional environments. In this work, we have studied the signal level conversion from anger emotion to neutral emotion. Emotion in human speech is concentrated over a small region in the entire utterance. The regions of speech that are highly influenced by the emotive state of the speaker is are considered as emotionally significant regions of an utterance. Physiological constraints of human speech production mechanism are explored to detect the emotionally significant regions of an utterance. Variation of various prosody parameters (Pitch, duration and energy) based on their position in the sentences is analyzed to obtain the modification factors. Speech signal in the emotionally significant regions is modified using the corresponding modification factor to generate the neutral version of the anger speech. Speech samples from Indian Institute of Technology Kharagpur Simulated Emotion Speech Corpus (IITKGP-SESC) are used in this study. A subjective listening test is performed for evaluating the effectiveness of the proposed conversion.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Alku, P.: Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Commun. 11(2), 109–118 (1992)CrossRef Alku, P.: Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Commun. 11(2), 109–118 (1992)CrossRef
2.
Zurück zum Zitat Batliner, A., Steidl, S., Seppi, D., Schuller, B.: Segmenting into adequate units for automatic recognition of emotion-related episodes: a speech-based approach. Adv. Hum. Comput. Interact. 2010, 3 (2010)CrossRef Batliner, A., Steidl, S., Seppi, D., Schuller, B.: Segmenting into adequate units for automatic recognition of emotion-related episodes: a speech-based approach. Adv. Hum. Comput. Interact. 2010, 3 (2010)CrossRef
3.
Zurück zum Zitat Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human-computer interaction. Sig. Process. Mag. IEEE 18(1), 32–80 (2001)CrossRef Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human-computer interaction. Sig. Process. Mag. IEEE 18(1), 32–80 (2001)CrossRef
4.
Zurück zum Zitat Gangamohan, P., Kadiri, S.R., Yegnanarayana, B.: Analysis of emotional speech at subsegmental level. In: INTERSPEECH, pp. 1916–1920 (2013) Gangamohan, P., Kadiri, S.R., Yegnanarayana, B.: Analysis of emotional speech at subsegmental level. In: INTERSPEECH, pp. 1916–1920 (2013)
5.
Zurück zum Zitat Hansen, J.H., Bou-Ghazale, S.E., Sarikaya, R., Pellom, B.: Getting started with susas: a speech under simulated and actual stress database. In: Eurospeech, vol. 97, pp. 1743–1746 (1997) Hansen, J.H., Bou-Ghazale, S.E., Sarikaya, R., Pellom, B.: Getting started with susas: a speech under simulated and actual stress database. In: Eurospeech, vol. 97, pp. 1743–1746 (1997)
6.
Zurück zum Zitat Hansen, J.H., Womack, B.D.: Feature analysis and neural network-based classification of speech under stress. IEEE Trans. Speech Audio Process. 4(4), 307–313 (1996)CrossRef Hansen, J.H., Womack, B.D.: Feature analysis and neural network-based classification of speech under stress. IEEE Trans. Speech Audio Process. 4(4), 307–313 (1996)CrossRef
7.
Zurück zum Zitat Kadiri, S.R., Gangamohan, P., Yegnanarayana, B.: Discriminating neutral and emotional speech using neural networks. ICON (2014) Kadiri, S.R., Gangamohan, P., Yegnanarayana, B.: Discriminating neutral and emotional speech using neural networks. ICON (2014)
8.
Zurück zum Zitat Koolagudi, S.G., Maity, S., Kumar, V.A., Chakrabarti, S., Rao, K.S.: IITKGP-SESC: speech database for emotion analysis. In: Ranka, S., Aluru, S., Buyya, R., Chung, Y.-C., Dua, S., Grama, A., Gupta, S.K.S., Kumar, R., Phoha, V.V. (eds.) IC3 2009. CCIS, vol. 40, pp. 485–492. Springer, Heidelberg (2009) CrossRef Koolagudi, S.G., Maity, S., Kumar, V.A., Chakrabarti, S., Rao, K.S.: IITKGP-SESC: speech database for emotion analysis. In: Ranka, S., Aluru, S., Buyya, R., Chung, Y.-C., Dua, S., Grama, A., Gupta, S.K.S., Kumar, R., Phoha, V.V. (eds.) IC3 2009. CCIS, vol. 40, pp. 485–492. Springer, Heidelberg (2009) CrossRef
9.
Zurück zum Zitat Krothapalli, S.R., Yadav, J., Sarkar, S., Koolagudi, S.G., Vuppala, A.K.: Neural network based feature transformation for emotion independent speaker identification. Int. J. Speech Technol. 15(3), 335–349 (2012)CrossRef Krothapalli, S.R., Yadav, J., Sarkar, S., Koolagudi, S.G., Vuppala, A.K.: Neural network based feature transformation for emotion independent speaker identification. Int. J. Speech Technol. 15(3), 335–349 (2012)CrossRef
10.
Zurück zum Zitat Murty, K.S.R., Yegnanarayana, B.: Epoch extraction from speech signals. IEEE Trans. Speech Audio Lang. Process. 16(8), 1602–1613 (2008)CrossRef Murty, K.S.R., Yegnanarayana, B.: Epoch extraction from speech signals. IEEE Trans. Speech Audio Lang. Process. 16(8), 1602–1613 (2008)CrossRef
11.
Zurück zum Zitat Murty, K.: Significance of excitation source information for speech analysis. Ph.D. thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras (2009) Murty, K.: Significance of excitation source information for speech analysis. Ph.D. thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras (2009)
12.
Zurück zum Zitat Ortony, A., Clore, G.L., Collins, A.: The cognitive structure of emotions. Cambridge University Press, Cambridge (1990) Ortony, A., Clore, G.L., Collins, A.: The cognitive structure of emotions. Cambridge University Press, Cambridge (1990)
13.
Zurück zum Zitat Raja, G.S., Dandapat, S.: Speaker recognition under stressed condition. Int. J. Speech Technol. 13(3), 141–161 (2010)CrossRef Raja, G.S., Dandapat, S.: Speaker recognition under stressed condition. Int. J. Speech Technol. 13(3), 141–161 (2010)CrossRef
14.
Zurück zum Zitat Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. Digital Signal Process. 10(1), 19–41 (2000)CrossRef Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. Digital Signal Process. 10(1), 19–41 (2000)CrossRef
15.
Zurück zum Zitat Schuller, B., Stadermann, J., Rigoll, G.: Affect-robust speech recognition by dynamic emotional adaptation. In: Proceedings of Speech Prosody. Citeseer (2006) Schuller, B., Stadermann, J., Rigoll, G.: Affect-robust speech recognition by dynamic emotional adaptation. In: Proceedings of Speech Prosody. Citeseer (2006)
16.
Zurück zum Zitat Stevens, K.N.: Acoustic Phonetics, vol. 30. MIT press, Cambridge (2000) Stevens, K.N.: Acoustic Phonetics, vol. 30. MIT press, Cambridge (2000)
17.
Zurück zum Zitat Tao, J., Kang, Y., Li, A.: Prosody conversion from neutral speech to emotional speech. IEEE Trans. Audio Speech Lang. Process. 14(4), 1145–1154 (2006)CrossRef Tao, J., Kang, Y., Li, A.: Prosody conversion from neutral speech to emotional speech. IEEE Trans. Audio Speech Lang. Process. 14(4), 1145–1154 (2006)CrossRef
18.
Zurück zum Zitat Valbret, H., Moulines, E., Tubach, J.P.: Voice transformation using psola technique. In: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP-92, vol. 1, pp. 145–148. IEEE (1992) Valbret, H., Moulines, E., Tubach, J.P.: Voice transformation using psola technique. In: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP-92, vol. 1, pp. 145–148. IEEE (1992)
19.
Zurück zum Zitat Vlasenko, B., Philippou-Hübner, D., Prylipko, D., Böck, R., Siegert, I., Wendemuth, A.: Vowels formants analysis allows straightforward detection of high arousal emotions. In: 2011 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2011) Vlasenko, B., Philippou-Hübner, D., Prylipko, D., Böck, R., Siegert, I., Wendemuth, A.: Vowels formants analysis allows straightforward detection of high arousal emotions. In: 2011 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2011)
20.
Zurück zum Zitat Vlasenko, B., Prylipko, D., Wendemuth, A.: Towards robust spontaneous speech recognition with emotional speech adapted acoustic models. In: Poster and Demo Track of the 35th German Conference on Artificial Intelligence, KI-2012, pp. 103–107. Citeseer, Saarbrucken, Germany (2012) Vlasenko, B., Prylipko, D., Wendemuth, A.: Towards robust spontaneous speech recognition with emotional speech adapted acoustic models. In: Poster and Demo Track of the 35th German Conference on Artificial Intelligence, KI-2012, pp. 103–107. Citeseer, Saarbrucken, Germany (2012)
21.
Zurück zum Zitat Vlasenko, B., Wendemuth, A.: Location of an emotionally neutral region in valence-arousal space: two-class vs. three-class cross corpora emotion recognition evaluations. In: 2014 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2014) Vlasenko, B., Wendemuth, A.: Location of an emotionally neutral region in valence-arousal space: two-class vs. three-class cross corpora emotion recognition evaluations. In: 2014 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2014)
22.
Zurück zum Zitat Vuppala, A.K., Kadiri, S.R.: Neutral to anger speech conversion using non-uniform duration modification. In: 2014 9th International Conference on Industrial and Information Systems (ICIIS), pp. 1–4. IEEE (2014) Vuppala, A.K., Kadiri, S.R.: Neutral to anger speech conversion using non-uniform duration modification. In: 2014 9th International Conference on Industrial and Information Systems (ICIIS), pp. 1–4. IEEE (2014)
23.
Zurück zum Zitat Vuppala, A.K., Limmayya, J., Raghavendra, G.: Neutral speech to anger speech conversion using prosody modification. In: Prasath, R., Kathirvalavakumar, T. (eds.) MIKE 2013. LNCS, vol. 8284, pp. 383–390. Springer, Heidelberg (2013) CrossRef Vuppala, A.K., Limmayya, J., Raghavendra, G.: Neutral speech to anger speech conversion using prosody modification. In: Prasath, R., Kathirvalavakumar, T. (eds.) MIKE 2013. LNCS, vol. 8284, pp. 383–390. Springer, Heidelberg (2013) CrossRef
24.
Zurück zum Zitat Vydana, H.K., Kadiri, S.R., Vuppala, A.K.: Vowel-based non-uniform prosody modification for emotion conversion. Circuits Syst. Signal Process. 34, 1–21 (2015)CrossRef Vydana, H.K., Kadiri, S.R., Vuppala, A.K.: Vowel-based non-uniform prosody modification for emotion conversion. Circuits Syst. Signal Process. 34, 1–21 (2015)CrossRef
25.
Zurück zum Zitat Vydana, H.K., Kumar, P.P., Krishna, K., Vuppala, A.K.: Improved emotion recognition using GMM-UBMs. In: 2015 International Conference on Signal Processing And Communication Engineering Systems (SPACES), pp. 53–57. IEEE (2015) Vydana, H.K., Kumar, P.P., Krishna, K., Vuppala, A.K.: Improved emotion recognition using GMM-UBMs. In: 2015 International Conference on Signal Processing And Communication Engineering Systems (SPACES), pp. 53–57. IEEE (2015)
26.
Zurück zum Zitat Yang, B., Lugger, M.: Emotion recognition from speech signals using new harmony features. Signal Process. 90(5), 1415–1423 (2010)MATHCrossRef Yang, B., Lugger, M.: Emotion recognition from speech signals using new harmony features. Signal Process. 90(5), 1415–1423 (2010)MATHCrossRef
Metadaten
Titel
Significance of Emotionally Significant Regions of Speech for Emotive to Neutral Conversion
verfasst von
Hari Krishna Vydana
V. V. Vidyadhara Raju
Suryakanth V. Gangashetty
Anil Kumar Vuppala
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-26832-3_28