Skip to main content
Top
Published in: International Journal of Speech Technology 2/2013

01-06-2013

Robust emotional speech classification in the presence of babble noise

Authors: Salman Karimi, Mohammad Hossein Sedaaghi

Published in: International Journal of Speech Technology | Issue 2/2013

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Emotional speech recognition (ESR) is a new field of research in the realm of human-computer interactions. Most of the studies in this field are performed in clean environments. Nevertheless, in the real world conditions, there are different noise and disturbance parameters such as car noise, background music, buzz and etc., which can decrease the performance of such recognizing systems. One of the most common noises which can be heard in different places is the babble noise. Because of the similarity of this kind of noise to the desired speech sounds, babble or cross-talk, is highly challenging for different speech-related systems. In this paper, in order to find the most appropriate features for ESR in the presence of babble noise with different signal to noise ratios, 286 features are extracted from speech utterances of two emotional speech datasets in German and Persian. Then the best features are selected among them using different filter and wrapper methods. Finally, different classifiers like Bayes, KNN, GMM, ANN and SVM are used for the selected features in two ways, namely multi-class and binary classifications.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Banziger, T., Tran, V., & Scherer, K. R. (2005). The Geneva emotion wheel: a tool for the verbal report of emotional reactions. In ISRE’05 proceedings. Bari: ISRE. Banziger, T., Tran, V., & Scherer, K. R. (2005). The Geneva emotion wheel: a tool for the verbal report of emotional reactions. In ISRE’05 proceedings. Bari: ISRE.
go back to reference Beritelli, F., Casale, S., & Cavallaro, A. (1998). A robust voice activity detector for wireless communications using soft computing. IEEE Journal on Selected Areas in Communications, 16(9), 1818–1829. CrossRef Beritelli, F., Casale, S., & Cavallaro, A. (1998). A robust voice activity detector for wireless communications using soft computing. IEEE Journal on Selected Areas in Communications, 16(9), 1818–1829. CrossRef
go back to reference Bremner, D., Demaine, E., Erickson, J., Iacono, J., Langerman, S., Morin, P., & Toussaint, G. (2005). Output-sensitive algorithms for computing nearest-neighbor decision boundaries. Discrete and Computational Geometry, 33(4), 593–604. MathSciNetMATHCrossRef Bremner, D., Demaine, E., Erickson, J., Iacono, J., Langerman, S., Morin, P., & Toussaint, G. (2005). Output-sensitive algorithms for computing nearest-neighbor decision boundaries. Discrete and Computational Geometry, 33(4), 593–604. MathSciNetMATHCrossRef
go back to reference Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005). A database of German emotional speech. In Interspeech’05 proceedings. Lisbon: Interspeech. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005). A database of German emotional speech. In Interspeech’05 proceedings. Lisbon: Interspeech.
go back to reference Everitt, B. S., & Hand, D. J. (1981). Finite mixture distributions. New York: Chapman and Hall. MATHCrossRef Everitt, B. S., & Hand, D. J. (1981). Finite mixture distributions. New York: Chapman and Hall. MATHCrossRef
go back to reference Grimm, M., Kroschel, K., & Harris, H. (2007). On the necessity and feasibility of detecting a driver’s emotional state while driving. In A. Paiva, R. Prada, & R. W. Picard (Eds.), Affective computing and intelligent interaction (pp. 126–138). Berlin: Springer. CrossRef Grimm, M., Kroschel, K., & Harris, H. (2007). On the necessity and feasibility of detecting a driver’s emotional state while driving. In A. Paiva, R. Prada, & R. W. Picard (Eds.), Affective computing and intelligent interaction (pp. 126–138). Berlin: Springer. CrossRef
go back to reference Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis for speech. Journal of Acoustic Society, America, 1738–1752. Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis for speech. Journal of Acoustic Society, America, 1738–1752.
go back to reference Hermansky, H., Morgan, N., Bayya, A., & Kohn, P. (1991). RASTA_PLP speech analysis (TR-91-069). Hermansky, H., Morgan, N., Bayya, A., & Kohn, P. (1991). RASTA_PLP speech analysis (TR-91-069).
go back to reference Hess, W. J. (1992). Pitch and voicing determination. In S. Furui & M. M. Sondhi (Eds.) Advances in speech signal processing. New York: Marcel Dekker. Hess, W. J. (1992). Pitch and voicing determination. In S. Furui & M. M. Sondhi (Eds.) Advances in speech signal processing. New York: Marcel Dekker.
go back to reference Kim, E. H., Hyun, K. H., & Kwak, Y. K. (2005). Robust emotion recognition feature, frequency range of meaningful signal. In ROMAN’05 proceedings, Nashville, TN, USA. Kim, E. H., Hyun, K. H., & Kwak, Y. K. (2005). Robust emotion recognition feature, frequency range of meaningful signal. In ROMAN’05 proceedings, Nashville, TN, USA.
go back to reference Krishnamurthy, N., & Hansen, J. H. L. (2009). Babble noise: modeling, analysis, and applications. IEEE Transactions on Audio, Speech, and Language Processing, 17(7), 1394–1407. CrossRef Krishnamurthy, N., & Hansen, J. H. L. (2009). Babble noise: modeling, analysis, and applications. IEEE Transactions on Audio, Speech, and Language Processing, 17(7), 1394–1407. CrossRef
go back to reference Lane, H., & Tranel, B. (1971). The lombard sign and the role of hearing in speech. Journal of Speech and Hearing Research, 14, 677–709. Lane, H., & Tranel, B. (1971). The lombard sign and the role of hearing in speech. Journal of Speech and Hearing Research, 14, 677–709.
go back to reference Lee, K. K., Cho, Y. H., & Park, K. S. (2006). Robust feature extraction for mobile-based speech emotion recognition system. In Lecture notes in control and information sciences. Intelligent computing in signal processing and pattern recognition (pp. 470–477). Berlin: Springer. CrossRef Lee, K. K., Cho, Y. H., & Park, K. S. (2006). Robust feature extraction for mobile-based speech emotion recognition system. In Lecture notes in control and information sciences. Intelligent computing in signal processing and pattern recognition (pp. 470–477). Berlin: Springer. CrossRef
go back to reference Loizou, P. (2003). Colea: a MATLAB software-tool for speech analysis. University of Arkansas. Loizou, P. (2003). Colea: a MATLAB software-tool for speech analysis. University of Arkansas.
go back to reference McGilloway, S., Cowie, R., & Douglas-Cowi, E. (2000). Approaching automatic recognition of emotion from voice: a rough benchmark. In ISCAWSE’00 proceedings. Newcastle: ISCAWSE. McGilloway, S., Cowie, R., & Douglas-Cowi, E. (2000). Approaching automatic recognition of emotion from voice: a rough benchmark. In ISCAWSE’00 proceedings. Newcastle: ISCAWSE.
go back to reference Ning, T., & Whiting, S. (1990). Power spectrum estimation is a orthogona1 transformation. In ASSP’90 proceedings (pp. 2523–2526). Ning, T., & Whiting, S. (1990). Power spectrum estimation is a orthogona1 transformation. In ASSP’90 proceedings (pp. 2523–2526).
go back to reference Pearce, D., & Hirsch, H. G. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy. In ICSLP’00 proceedings. Beijing: ICSLP. Pearce, D., & Hirsch, H. G. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy. In ICSLP’00 proceedings. Beijing: ICSLP.
go back to reference Rabiner, L. R., & Sambur, M. R. (1977). Voiced-unvoiced-silence detection using itakura LPC distance measure. In ASSP’77 proceedings (pp. 323–326). Rabiner, L. R., & Sambur, M. R. (1977). Voiced-unvoiced-silence detection using itakura LPC distance measure. In ASSP’77 proceedings (pp. 323–326).
go back to reference Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Englewood Cliffs: Prentice-Hall. Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Englewood Cliffs: Prentice-Hall.
go back to reference Shah, F., Krishnan, V., Sukumar, R., Jayakumar, A., & Anto, B. (2009). Speaker independent automatic emotion recognition from speech, a comparison of MFCCs and discrete wavelet transforms. In ARTCC’09 proceedings (pp. 528–531). Shah, F., Krishnan, V., Sukumar, R., Jayakumar, A., & Anto, B. (2009). Speaker independent automatic emotion recognition from speech, a comparison of MFCCs and discrete wavelet transforms. In ARTCC’09 proceedings (pp. 528–531).
go back to reference Sedaaghi, M. H. (2008a). Gender classification in emotional speech. In F. Mihelic, & J. Zibert (Eds.) Speech recognition: technologies and applications. Vienna: I-Tech (Chap. 20). Sedaaghi, M. H. (2008a). Gender classification in emotional speech. In F. Mihelic, & J. Zibert (Eds.) Speech recognition: technologies and applications. Vienna: I-Tech (Chap. 20).
go back to reference Sedaaghi, M. H. (2008b). Documentation of the sahand emotional speech database (SES) (Technical Report). Department of Electrical Eng., Sahand Univ. of Tech, Iran. Sedaaghi, M. H. (2008b). Documentation of the sahand emotional speech database (SES) (Technical Report). Department of Electrical Eng., Sahand Univ. of Tech, Iran.
go back to reference Sedaaghi, M. H., Kotropoulos, C., & Ververidis, D. (2007). Using adaptive genetic algorithms to improve speech emotion recognition. In MMSP’07 proceedings (pp. 461–464). Greece: MMSP. Sedaaghi, M. H., Kotropoulos, C., & Ververidis, D. (2007). Using adaptive genetic algorithms to improve speech emotion recognition. In MMSP’07 proceedings (pp. 461–464). Greece: MMSP.
go back to reference Schuller, B., Steidl, S., & Batliner, A. (2009). The INTERSPEECH 2009 emotion challenge. In ISCA’09 proceedings (pp. 312–315). Brighton: ISCA. Schuller, B., Steidl, S., & Batliner, A. (2009). The INTERSPEECH 2009 emotion challenge. In ISCA’09 proceedings (pp. 312–315). Brighton: ISCA.
go back to reference Snell, R. C., & Milinazzo, F. (1993). Formant location from LPC analysis data. IEEE Transactions on Speech and Audio Processing, 1(2), 129–134. CrossRef Snell, R. C., & Milinazzo, F. (1993). Formant location from LPC analysis data. IEEE Transactions on Speech and Audio Processing, 1(2), 129–134. CrossRef
go back to reference Tawari, A., & Trivedi, M. (2010). Speech emotion analysis in noisy real-world environment. In ICPR’10 proceedings (pp. 4605–4608). Istanbul: ICPR. Tawari, A., & Trivedi, M. (2010). Speech emotion analysis in noisy real-world environment. In ICPR’10 proceedings (pp. 4605–4608). Istanbul: ICPR.
go back to reference Ververidis, D., & Kotropoulos, C. (2006a). Emotional speech recognition: resources, features, and methods. Speech Communication, 48, 1162–1181. CrossRef Ververidis, D., & Kotropoulos, C. (2006a). Emotional speech recognition: resources, features, and methods. Speech Communication, 48, 1162–1181. CrossRef
go back to reference Ververidis, D., & Kotropoulos, C. (2006b). Fast sequential floating forward selection applied to emotional speech features estimated on DES and SUSAS data collections. In EUSIPCO’06 proceedings. Italy: EUSIPCO. Ververidis, D., & Kotropoulos, C. (2006b). Fast sequential floating forward selection applied to emotional speech features estimated on DES and SUSAS data collections. In EUSIPCO’06 proceedings. Italy: EUSIPCO.
go back to reference Wechsler, J. D. (1994). Detection of human speech in structured noise. In ASSP’94 proceedings (pp. 237–240). Wechsler, J. D. (1994). Detection of human speech in structured noise. In ASSP’94 proceedings (pp. 237–240).
go back to reference Yoon, W. J., Cho, Y. H., & Park, K. S. (2007). A study of speech emotion recognition and its application to mobile services. In Lecture notes in computer science. Ubiquitous intelligence and computing (pp. 758–766). Berlin: Springer. CrossRef Yoon, W. J., Cho, Y. H., & Park, K. S. (2007). A study of speech emotion recognition and its application to mobile services. In Lecture notes in computer science. Ubiquitous intelligence and computing (pp. 758–766). Berlin: Springer. CrossRef
go back to reference You, M., Chen, C., Bu, J., Liu, J., & Tao, J. (2007). Manifolds based emotion recognition in speech. Computational Linguistics and Chinese Language Processing, 12(1), 49–64. You, M., Chen, C., Bu, J., Liu, J., & Tao, J. (2007). Manifolds based emotion recognition in speech. Computational Linguistics and Chinese Language Processing, 12(1), 49–64.
Metadata
Title
Robust emotional speech classification in the presence of babble noise
Authors
Salman Karimi
Mohammad Hossein Sedaaghi
Publication date
01-06-2013
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 2/2013
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-012-9176-y

Other articles of this Issue 2/2013

International Journal of Speech Technology 2/2013 Go to the issue