Skip to main content
Top
Published in: International Journal of Speech Technology 1/2013

01-03-2013

Emotion modeling from speech signal based on wavelet packet transform

Authors: Varsha N. Degaonkar, Shaila D. Apte

Published in: International Journal of Speech Technology | Issue 1/2013

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The recognition of emotion in human speech has gained increasing attention in recent years due to the wide variety of applications that benefit from such technology. Detecting emotion from speech can be viewed as a classification task. It consists of assigning, out of a fixed set, an emotion category e.g. happiness, anger, to a speech utterance. In this paper, we have tackled two emotions namely happiness and anger. The parameters extracted from speech signal depend on speaker, spoken word as well as emotion. To detect the emotion, we have kept the spoken utterance and the speaker constant and only the emotion is changed. Different features are extracted to identify the parameters responsible for emotion. Wavelet packet transform (WPT) is found to be emotion specific. We have performed the experiments using three methods. Method uses WPT and compares the number of coefficients greater than threshold in different bands. Second method uses energy ratios of different bands using WPT and compares the energy ratios in different bands. The third method is a conventional method using MFCC. The results obtained using WPT for angry, happy and neutral mode are 85 %, 65 % and 80 % respectively as compared to results obtained using MFCC i.e. 75 %, 45 % and 60 % respectively for the three emotions. Based on WPT features a model is proposed for emotion conversion namely neutral to angry and neutral to happy emotion.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Agarwal, A., Jain, A., Prakash, N., & Agrawal, S. S. (2010). Word based emotion conversion in Hindi language. In ICCSIT’2010 proceedings, Chengdu (pp. 419–423). Agarwal, A., Jain, A., Prakash, N., & Agrawal, S. S. (2010). Word based emotion conversion in Hindi language. In ICCSIT’2010 proceedings, Chengdu (pp. 419–423).
go back to reference Burkhardt, F., Polzehl, T., Stegmann, J., Metze, F., & Huber, R. (2009). Detecting real life anger. In ICASSP’09 proceedings, Taipei, Taiwan (pp. 4761–4764). Burkhardt, F., Polzehl, T., Stegmann, J., Metze, F., & Huber, R. (2009). Detecting real life anger. In ICASSP’09 proceedings, Taipei, Taiwan (pp. 4761–4764).
go back to reference Hidayati, R., Purnama, I. K. E., & Purnomo, M. H. (2009). The extraction of acoustic features of infant cry for emotion detection based on pitch and formants. In ICICI-BME’09, proceedings, Bandung (pp. 1–5). Hidayati, R., Purnama, I. K. E., & Purnomo, M. H. (2009). The extraction of acoustic features of infant cry for emotion detection based on pitch and formants. In ICICI-BME’09, proceedings, Bandung (pp. 1–5).
go back to reference Krajewski, J., Batliner, A., & Kessel, S. (2010). Comparing multiple classifiers for speech-based detection of self-confidence—a pilot study. In ICPR’2010 proceedings, Istanbul (pp. 3716–3719). Krajewski, J., Batliner, A., & Kessel, S. (2010). Comparing multiple classifiers for speech-based detection of self-confidence—a pilot study. In ICPR’2010 proceedings, Istanbul (pp. 3716–3719).
go back to reference Laskowski, K. (2010). Finding emotionally involved speech using implicitly proximity-annotated laughter. In ICASSP’2010 proceedings, Dallas, TX (pp. 5226–5229). Laskowski, K. (2010). Finding emotionally involved speech using implicitly proximity-annotated laughter. In ICASSP’2010 proceedings, Dallas, TX (pp. 5226–5229).
go back to reference Meshram, A. P., Shirbahadurkar, S. D., Kohok, A., & Jadhav, S. (2010). An overview and preparation for recognition of emotion from speech signal with multi modal fusion. In ICCAE’2010 proceedings, Singapore (pp. 446–452). Meshram, A. P., Shirbahadurkar, S. D., Kohok, A., & Jadhav, S. (2010). An overview and preparation for recognition of emotion from speech signal with multi modal fusion. In ICCAE’2010 proceedings, Singapore (pp. 446–452).
go back to reference Metze, F., Polzehl, T., & Wagner, M. (2009). Fusion of acoustic and linguistic features for emotion detection. In ICSC’09 proceedings, Berkeley (pp. 153–160). Metze, F., Polzehl, T., & Wagner, M. (2009). Fusion of acoustic and linguistic features for emotion detection. In ICSC’09 proceedings, Berkeley (pp. 153–160).
go back to reference Tawari, A., & Trivedi, M. M. (2010). Speech emotion analysis: exploring the role of context. IEEE Transactions on Multimedia, 12(6), 502–509. CrossRef Tawari, A., & Trivedi, M. M. (2010). Speech emotion analysis: exploring the role of context. IEEE Transactions on Multimedia, 12(6), 502–509. CrossRef
go back to reference Wooil, K., & Hansen, J. H. L. (2010). Angry emotion detection from real-life conversational speech by leveraging content structure. In ICASSP’2010, proceedings, Dallas, TX (pp. 5166–5169). Wooil, K., & Hansen, J. H. L. (2010). Angry emotion detection from real-life conversational speech by leveraging content structure. In ICASSP’2010, proceedings, Dallas, TX (pp. 5166–5169).
Metadata
Title
Emotion modeling from speech signal based on wavelet packet transform
Authors
Varsha N. Degaonkar
Shaila D. Apte
Publication date
01-03-2013
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 1/2013
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-012-9142-8

Other articles of this Issue 1/2013

International Journal of Speech Technology 1/2013 Go to the issue