Skip to main content
Erschienen in: International Journal of Speech Technology 3/2019

28.08.2018

Emotional speech analysis using harmonic plus noise model and Gaussian mixture model

verfasst von: Jang Bahadur Singh, Parveen Kumar Lehana

Erschienen in: International Journal of Speech Technology | Ausgabe 3/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Extracting the valuable information from the emotional speech is one of the major challenges in the areas of emotion recognition and human-machine interfaces. Most of the research in emotion recognition is based on the analysis of fundamental frequency, energy contour, duration of silence, formant, Mel-band energies, linear prediction cepstral coefficients, and Mel frequency cepstral coefficients. It was observed that emotion classification using sinusoidal features perform better as compared to the linear prediction and cepstral features. Harmonic models are considered as a variant of the sinusoidal model. In order to improve emotional speech classification rate and conversion of neutral speech to emotional speech, analysis using different harmonic features of emotional speech is a critical step. In this paper, investigations have been carried out using Berlin emotional speech database to analyze gender-based emotional speech using harmonic plus noise model (HNM) features and Gaussian mixture model (GMM). Analysis has been performed with the HNM features like pitch, harmonic amplitude, maximum voiced frequency and noise components. From the results, it can be observed that different emotional speech of male and female speakers can be represented with K components of GMM distribution. The optimal number of GMM components have been decided on the basis of Akaike information criterion (AIC) score.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Akaike, H. (2011). Akaike’s information criterion. International encyclopedia of statistical science. Berlin: Springer. Akaike, H. (2011). Akaike’s information criterion. International encyclopedia of statistical science. Berlin: Springer.
Zurück zum Zitat Ali, F. B., & Djaziri-Larbi, S. (2017). A long term harmonic plus noise model for narrow-band speech coding at very low bit-rates. In Telecommunications and Signal Processing, 40th International Conference, pp. 372–376. Ali, F. B., & Djaziri-Larbi, S. (2017). A long term harmonic plus noise model for narrow-band speech coding at very low bit-rates. In Telecommunications and Signal Processing, 40th International Conference, pp. 372–376.
Zurück zum Zitat Anagnostopoulos, C. N., Iliou, T., & Giannoukos, I. (2015). Features and classifiers for emotion recognition from speech: A survey from 2000 to 2011. Artificial Intelligence Review., 43(2), 155–177.CrossRef Anagnostopoulos, C. N., Iliou, T., & Giannoukos, I. (2015). Features and classifiers for emotion recognition from speech: A survey from 2000 to 2011. Artificial Intelligence Review., 43(2), 155–177.CrossRef
Zurück zum Zitat Bandoin, G., & Stylianou, Y. (1996). On the transformation of the speech spectrum for voice conversion. In Proceeding of Fourth International Conference on Spoken Language Processing ICSLP ’96. Bandoin, G., & Stylianou, Y. (1996). On the transformation of the speech spectrum for voice conversion. In Proceeding of Fourth International Conference on Spoken Language Processing ICSLP ’96.
Zurück zum Zitat Bhaykar, M., Yadav, J., & Rao, K. S. (2013). Speaker dependent, speaker independent and cross language emotion recognition from speech using GMM and HMM. In Communications, National Conference, pp. 1–5. Bhaykar, M., Yadav, J., & Rao, K. S. (2013). Speaker dependent, speaker independent and cross language emotion recognition from speech using GMM and HMM. In Communications, National Conference, pp. 1–5.
Zurück zum Zitat Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005). A database of German emotional speech. In Ninth European Conference on Speech Communication and Technology. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005). A database of German emotional speech. In Ninth European Conference on Speech Communication and Technology.
Zurück zum Zitat Chavhan, Y., Dhore, M. L., & Yesaware, P. (2010). Speech emotion recognition using support vector machine. International Journal of Computer Applications, 1(20), 6–9.CrossRef Chavhan, Y., Dhore, M. L., & Yesaware, P. (2010). Speech emotion recognition using support vector machine. International Journal of Computer Applications, 1(20), 6–9.CrossRef
Zurück zum Zitat Degottex, G., & Stylianou, Y. (2013). Analysis and synthesis of speech using an adaptive full-band harmonic model. IEEE Transactions on Audio, Speech, and Language Processing, 21(10), 2085–2095.CrossRef Degottex, G., & Stylianou, Y. (2013). Analysis and synthesis of speech using an adaptive full-band harmonic model. IEEE Transactions on Audio, Speech, and Language Processing, 21(10), 2085–2095.CrossRef
Zurück zum Zitat Erro, D., Sainz, I., Navas, E., & Hernaez, I. (2014). Harmonics plus noise model based vocoder for statistical parametric speech synthesis. IEEE Journal of Selected Topics in Signal Processing, 8(2), 184–194.CrossRef Erro, D., Sainz, I., Navas, E., & Hernaez, I. (2014). Harmonics plus noise model based vocoder for statistical parametric speech synthesis. IEEE Journal of Selected Topics in Signal Processing, 8(2), 184–194.CrossRef
Zurück zum Zitat Eslava, D. E., & Bilbao, A. M. (2008). Intra-lingual and cross-lingual voice conversion using harmonic plus stochastic models. Barcelona, Spain: PhD Thesis, Universitat Politechnica de Catalunya. Eslava, D. E., & Bilbao, A. M. (2008). Intra-lingual and cross-lingual voice conversion using harmonic plus stochastic models. Barcelona, Spain: PhD Thesis, Universitat Politechnica de Catalunya.
Zurück zum Zitat Gangeh, M. J., Fewzee, P., Ghodsi, A., Kamel, M. S., & Karray, F. (2014). Multiview supervised dictionary learning in speech emotion recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing Institute of Electrical and Electronics Engineers (IEEE), 22(6), 1056–1068. Gangeh, M. J., Fewzee, P., Ghodsi, A., Kamel, M. S., & Karray, F. (2014). Multiview supervised dictionary learning in speech emotion recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing Institute of Electrical and Electronics Engineers (IEEE), 22(6), 1056–1068.
Zurück zum Zitat Han, K., Yu, D., & Tashev, I. (2014). Speech emotion recognition using deep neural network and extreme learning machine. In Fifteenth Annual Conference of the International Speech Communication Association. Han, K., Yu, D., & Tashev, I. (2014). Speech emotion recognition using deep neural network and extreme learning machine. In Fifteenth Annual Conference of the International Speech Communication Association.
Zurück zum Zitat Haque, A., & Rao, K. S. (2017). Modification of energy spectra, epoch parameters and prosody for emotion conversion in speech. International Journal of Speech Technology, 20(1), 15–25.CrossRef Haque, A., & Rao, K. S. (2017). Modification of energy spectra, epoch parameters and prosody for emotion conversion in speech. International Journal of Speech Technology, 20(1), 15–25.CrossRef
Zurück zum Zitat Hemptinne, C. (2006). Integration of the harmonic plus noise model into the hidden Markov model-based speech synthesis system. Master thesis. Hemptinne, C. (2006). Integration of the harmonic plus noise model into the hidden Markov model-based speech synthesis system. Master thesis.
Zurück zum Zitat Kafentzis, G. P., Rosec, O., & Stylianou, Y. (2014a). Robust full-band adaptive Sinusoidal analysis and synthesis of speech. In International Conference on Acoustics, Speech, and Signal Processing, pp. 6260–6264. Kafentzis, G. P., Rosec, O., & Stylianou, Y. (2014a). Robust full-band adaptive Sinusoidal analysis and synthesis of speech. In International Conference on Acoustics, Speech, and Signal Processing, pp. 6260–6264.
Zurück zum Zitat Kafentzis, G. P., Yakoumaki, T., Mouchtaris, A., & Stylianou, Y. (2014b). Analysis of emotional speech using an adaptive sinusoidal model. In European Signal Processing Conference, 2014 Proceedings of the 22nd European, pp. 1492–1496. Kafentzis, G. P., Yakoumaki, T., Mouchtaris, A., & Stylianou, Y. (2014b). Analysis of emotional speech using an adaptive sinusoidal model. In European Signal Processing Conference, 2014 Proceedings of the 22nd European, pp. 1492–1496.
Zurück zum Zitat Karimi, S., & Sedaaghi, M. H. (2016). How to categorize emotional speech signals with respect to the speaker’s degree of emotional intensity. Turkish Journal of Electrical Engineering & Computer Sciences, 24(3), 1306–1324.CrossRef Karimi, S., & Sedaaghi, M. H. (2016). How to categorize emotional speech signals with respect to the speaker’s degree of emotional intensity. Turkish Journal of Electrical Engineering & Computer Sciences, 24(3), 1306–1324.CrossRef
Zurück zum Zitat Khanna, P., & Kumar, M. S. (2011). Application of vector quantization in emotion recognition from human speech. In International Conference on Information Intelligence, Systems, Technology and Management, pp. 118–125. Khanna, P., & Kumar, M. S. (2011). Application of vector quantization in emotion recognition from human speech. In International Conference on Information Intelligence, Systems, Technology and Management, pp. 118–125.
Zurück zum Zitat Kwon, O. W., Chan, K., Hao, J., & Lee, T. W. (2003). Emotion recognition by speech signals. In Eighth European Conference on Speech Communication and Technology. Kwon, O. W., Chan, K., Hao, J., & Lee, T. W. (2003). Emotion recognition by speech signals. In Eighth European Conference on Speech Communication and Technology.
Zurück zum Zitat Lehana, P. K., & Pandey, P. C. (2004). Harmonic plus noise model based speech synthesis in Hindi and pitch modification. In Proceedings of the 16th International Congress on Acoustics, pp. 3333–3336. Lehana, P. K., & Pandey, P. C. (2004). Harmonic plus noise model based speech synthesis in Hindi and pitch modification. In Proceedings of the 16th International Congress on Acoustics, pp. 3333–3336.
Zurück zum Zitat Li, R., Perneczky, R., Yakushev, I., Förster, S., Kurz, A., & Drzezga, A. (2015). Gaussian mixture models and model selection for [18F] fluorodeoxyglucose positron emission tomography classification in Alzheimer’s disease. PLoS ONE, 10(4), e0122731.CrossRef Li, R., Perneczky, R., Yakushev, I., Förster, S., Kurz, A., & Drzezga, A. (2015). Gaussian mixture models and model selection for [18F] fluorodeoxyglucose positron emission tomography classification in Alzheimer’s disease. PLoS ONE, 10(4), e0122731.CrossRef
Zurück zum Zitat Mao, X., Chen, L., & Fu, L., (2009). Multi-level speech emotion recognition based on HMM and ANN. In 2009 World Congress on Computer Science and Information Engineering, Los Angeles, CA, pp. 225–229. Mao, X., Chen, L., & Fu, L., (2009). Multi-level speech emotion recognition based on HMM and ANN. In 2009 World Congress on Computer Science and Information Engineering, Los Angeles, CA, pp. 225–229.
Zurück zum Zitat Moon, T. K. (1996). The expectation-maximization algorithm. IEEE Signal Processing Magazine, 13(6), 47–60.CrossRef Moon, T. K. (1996). The expectation-maximization algorithm. IEEE Signal Processing Magazine, 13(6), 47–60.CrossRef
Zurück zum Zitat Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41(4), 603–623.CrossRef Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41(4), 603–623.CrossRef
Zurück zum Zitat Pantazis, Y., Rosec, O., & Stylianou, Y. (2008). On the estimation of the speech harmonic model. In ISCA Tutorial and Research Workshop (ITRW) on Speech Analysis and Processing for Knowledge Discovery. Pantazis, Y., Rosec, O., & Stylianou, Y. (2008). On the estimation of the speech harmonic model. In ISCA Tutorial and Research Workshop (ITRW) on Speech Analysis and Processing for Knowledge Discovery.
Zurück zum Zitat Pantazis, Y., Rosec, O., & Stylianou, Y. (2011). Adaptive AM-FM signal decomposition with application to speech analysis. IEEE Transactions on Audio, Speech, and Language Processing, 19(2), 290–300.CrossRef Pantazis, Y., Rosec, O., & Stylianou, Y. (2011). Adaptive AM-FM signal decomposition with application to speech analysis. IEEE Transactions on Audio, Speech, and Language Processing, 19(2), 290–300.CrossRef
Zurück zum Zitat Pantazis, Y., & Stylianou, Y. (2008). Improving the modeling of the noise part in the harmonic plus noise model of speech. In Acoustics, Speech and Signal Processing, IEEE International Conference, pp. 4609–4612. Pantazis, Y., & Stylianou, Y. (2008). Improving the modeling of the noise part in the harmonic plus noise model of speech. In Acoustics, Speech and Signal Processing, IEEE International Conference, pp. 4609–4612.
Zurück zum Zitat Ramakrishnan, S., & El Emary, I. M. (2013). Speech emotion recognition approaches in human computer interaction. Telecommunication Systems, 52(3), 1467–1478.CrossRef Ramakrishnan, S., & El Emary, I. M. (2013). Speech emotion recognition approaches in human computer interaction. Telecommunication Systems, 52(3), 1467–1478.CrossRef
Zurück zum Zitat Ramamohan, S., & Dandapat, S. (2006). Sinusoidal model-based analysis and classification of stressed speech. IEEE Transactions on Audio, Speech, and Language Processing, 14(3), 737–746.CrossRef Ramamohan, S., & Dandapat, S. (2006). Sinusoidal model-based analysis and classification of stressed speech. IEEE Transactions on Audio, Speech, and Language Processing, 14(3), 737–746.CrossRef
Zurück zum Zitat Shahzadi, A., Ahmadyfard, A., Harimi, A., & Yaghmaie, K. (2015). Speech emotion recognition using nonlinear dynamics features. Turkish Journal of Electrical Engineering & Computer Sciences, 23, 2056–2073.CrossRef Shahzadi, A., Ahmadyfard, A., Harimi, A., & Yaghmaie, K. (2015). Speech emotion recognition using nonlinear dynamics features. Turkish Journal of Electrical Engineering & Computer Sciences, 23, 2056–2073.CrossRef
Zurück zum Zitat Singh, R., Kumar, A., & Lehana, P. K. (2017). Effect of bandwidth modifications on the quality of speech imitated by Alexandrine and Indian Ringneck parrots. International Journal of Speech Technology, 20(3), 659–672.CrossRef Singh, R., Kumar, A., & Lehana, P. K. (2017). Effect of bandwidth modifications on the quality of speech imitated by Alexandrine and Indian Ringneck parrots. International Journal of Speech Technology, 20(3), 659–672.CrossRef
Zurück zum Zitat Stylianou, Y. (2001). Applying the harmonic plus noise model in concatenative speech synthesis. IEEE Transactions on Speech and Audio Processing, 9(1), 21–29.CrossRef Stylianou, Y. (2001). Applying the harmonic plus noise model in concatenative speech synthesis. IEEE Transactions on Speech and Audio Processing, 9(1), 21–29.CrossRef
Zurück zum Zitat Stylianou, Y., & Cappe, O. (1998). A system for voice conversion based on probabilistic classification and a harmonic plus noise model. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP ’98 (Cat No98CH36181). Stylianou, Y., & Cappe, O. (1998). A system for voice conversion based on probabilistic classification and a harmonic plus noise model. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP ’98 (Cat No98CH36181).
Zurück zum Zitat Tao, J., Kang, Y., & Li, A. (2006). Prosody conversion from neutral speech to emotional speech. IEEE Transactions on Audio, Speech and Language Processing, 14(4), 1145–1154.CrossRef Tao, J., Kang, Y., & Li, A. (2006). Prosody conversion from neutral speech to emotional speech. IEEE Transactions on Audio, Speech and Language Processing, 14(4), 1145–1154.CrossRef
Zurück zum Zitat Truong, K. P., & van Leeuwen, D. A. (2007). Automatic discrimination between laughter and speech. Speech Communication, 49(2), 144–158.CrossRef Truong, K. P., & van Leeuwen, D. A. (2007). Automatic discrimination between laughter and speech. Speech Communication, 49(2), 144–158.CrossRef
Zurück zum Zitat Ververidis, D., & Kotropoulos, C. (2004). Automatic speech classification to five emotional states based on gender information. In European Signal Processing Conference, pp. 341–344. Ververidis, D., & Kotropoulos, C. (2004). Automatic speech classification to five emotional states based on gender information. In European Signal Processing Conference, pp. 341–344.
Zurück zum Zitat Vogt, T., & André, E. (2006). Improving automatic emotion recognition from speech via gender differentiation. In Proceedings of the Language Resources and Evaluation Conference, Genoa. Vogt, T., & André, E. (2006). Improving automatic emotion recognition from speech via gender differentiation. In Proceedings of the Language Resources and Evaluation Conference, Genoa.
Zurück zum Zitat Yakoumaki, T., Kafentzis, G. P., & Stylianou, Y. (2014). Emotional speech classification using adaptive sinusoidal modelling. In Fifteenth Annual Conference of the International Speech Communication Association. Yakoumaki, T., Kafentzis, G. P., & Stylianou, Y. (2014). Emotional speech classification using adaptive sinusoidal modelling. In Fifteenth Annual Conference of the International Speech Communication Association.
Metadaten
Titel
Emotional speech analysis using harmonic plus noise model and Gaussian mixture model
verfasst von
Jang Bahadur Singh
Parveen Kumar Lehana
Publikationsdatum
28.08.2018
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 3/2019
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-018-9549-y

Weitere Artikel der Ausgabe 3/2019

International Journal of Speech Technology 3/2019 Zur Ausgabe

Neuer Inhalt