Top

International Journal of Speech Technology

Published in:

01-06-2013

Robust emotional speech classification in the presence of babble noise

Authors: Salman Karimi, Mohammad Hossein Sedaaghi

Published in: International Journal of Speech Technology | Issue 2/2013

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Emotional speech recognition (ESR) is a new field of research in the realm of human-computer interactions. Most of the studies in this field are performed in clean environments. Nevertheless, in the real world conditions, there are different noise and disturbance parameters such as car noise, background music, buzz and etc., which can decrease the performance of such recognizing systems. One of the most common noises which can be heard in different places is the babble noise. Because of the similarity of this kind of noise to the desired speech sounds, babble or cross-talk, is highly challenging for different speech-related systems. In this paper, in order to find the most appropriate features for ESR in the presence of babble noise with different signal to noise ratios, 286 features are extracted from speech utterances of two emotional speech datasets in German and Persian. Then the best features are selected among them using different filter and wrapper methods. Finally, different classifiers like Bayes, KNN, GMM, ANN and SVM are used for the selected features in two ways, namely multi-class and binary classifications.

previous article Multiclass support vector machines for environmental sounds classification in visual domain based on log-Gabor filters

next article Vowel onset point detection for noisy speech using spectral energy at formant frequencies

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Banziger, T., Tran, V., & Scherer, K. R. (2005). The Geneva emotion wheel: a tool for the verbal report of emotional reactions. In ISRE’05 proceedings. Bari: ISRE.

Beritelli, F., Casale, S., & Cavallaro, A. (1998). A robust voice activity detector for wireless communications using soft computing. IEEE Journal on Selected Areas in Communications, 16(9), 1818–1829. CrossRef

Bremner, D., Demaine, E., Erickson, J., Iacono, J., Langerman, S., Morin, P., & Toussaint, G. (2005). Output-sensitive algorithms for computing nearest-neighbor decision boundaries. Discrete and Computational Geometry, 33(4), 593–604. MathSciNetMATHCrossRef

Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005). A database of German emotional speech. In Interspeech’05 proceedings. Lisbon: Interspeech.

Everitt, B. S., & Hand, D. J. (1981). Finite mixture distributions. New York: Chapman and Hall. MATHCrossRef

Grimm, M., Kroschel, K., & Harris, H. (2007). On the necessity and feasibility of detecting a driver’s emotional state while driving. In A. Paiva, R. Prada, & R. W. Picard (Eds.), Affective computing and intelligent interaction (pp. 126–138). Berlin: Springer. CrossRef

Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis for speech. Journal of Acoustic Society, America, 1738–1752.

Hermansky, H., Morgan, N., Bayya, A., & Kohn, P. (1991). RASTA_PLP speech analysis (TR-91-069).

Hess, W. J. (1992). Pitch and voicing determination. In S. Furui & M. M. Sondhi (Eds.) Advances in speech signal processing. New York: Marcel Dekker.

Kim, E. H., Hyun, K. H., & Kwak, Y. K. (2005). Robust emotion recognition feature, frequency range of meaningful signal. In ROMAN’05 proceedings, Nashville, TN, USA.

Krishnamurthy, N., & Hansen, J. H. L. (2009). Babble noise: modeling, analysis, and applications. IEEE Transactions on Audio, Speech, and Language Processing, 17(7), 1394–1407. CrossRef

Lane, H., & Tranel, B. (1971). The lombard sign and the role of hearing in speech. Journal of Speech and Hearing Research, 14, 677–709.

Lee, K. K., Cho, Y. H., & Park, K. S. (2006). Robust feature extraction for mobile-based speech emotion recognition system. In Lecture notes in control and information sciences. Intelligent computing in signal processing and pattern recognition (pp. 470–477). Berlin: Springer. CrossRef

Loizou, P. (2003). Colea: a MATLAB software-tool for speech analysis. University of Arkansas.

Markel, J. D., & Gray, A. H. (1976). Linear prediction of speech. Berlin: Springer. MATHCrossRef

McGilloway, S., Cowie, R., & Douglas-Cowi, E. (2000). Approaching automatic recognition of emotion from voice: a rough benchmark. In ISCAWSE’00 proceedings. Newcastle: ISCAWSE.

Ning, T., & Whiting, S. (1990). Power spectrum estimation is a orthogona1 transformation. In ASSP’90 proceedings (pp. 2523–2526).

Pearce, D., & Hirsch, H. G. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy. In ICSLP’00 proceedings. Beijing: ICSLP.

Rabiner, L. R., & Sambur, M. R. (1977). Voiced-unvoiced-silence detection using itakura LPC distance measure. In ASSP’77 proceedings (pp. 323–326).

Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Englewood Cliffs: Prentice-Hall.

Shah, F., Krishnan, V., Sukumar, R., Jayakumar, A., & Anto, B. (2009). Speaker independent automatic emotion recognition from speech, a comparison of MFCCs and discrete wavelet transforms. In ARTCC’09 proceedings (pp. 528–531).

Sedaaghi, M. H. (2008a). Gender classification in emotional speech. In F. Mihelic, & J. Zibert (Eds.) Speech recognition: technologies and applications. Vienna: I-Tech (Chap. 20).

Sedaaghi, M. H. (2008b). Documentation of the sahand emotional speech database (SES) (Technical Report). Department of Electrical Eng., Sahand Univ. of Tech, Iran.

Sedaaghi, M. H., Kotropoulos, C., & Ververidis, D. (2007). Using adaptive genetic algorithms to improve speech emotion recognition. In MMSP’07 proceedings (pp. 461–464). Greece: MMSP.

Schuller, B., Steidl, S., & Batliner, A. (2009). The INTERSPEECH 2009 emotion challenge. In ISCA’09 proceedings (pp. 312–315). Brighton: ISCA.

Snell, R. C., & Milinazzo, F. (1993). Formant location from LPC analysis data. IEEE Transactions on Speech and Audio Processing, 1(2), 129–134. CrossRef

Tawari, A., & Trivedi, M. (2010). Speech emotion analysis in noisy real-world environment. In ICPR’10 proceedings (pp. 4605–4608). Istanbul: ICPR.

Ververidis, D., & Kotropoulos, C. (2006a). Emotional speech recognition: resources, features, and methods. Speech Communication, 48, 1162–1181. CrossRef

Ververidis, D., & Kotropoulos, C. (2006b). Fast sequential floating forward selection applied to emotional speech features estimated on DES and SUSAS data collections. In EUSIPCO’06 proceedings. Italy: EUSIPCO.

Wechsler, J. D. (1994). Detection of human speech in structured noise. In ASSP’94 proceedings (pp. 237–240).

Yoon, W. J., Cho, Y. H., & Park, K. S. (2007). A study of speech emotion recognition and its application to mobile services. In Lecture notes in computer science. Ubiquitous intelligence and computing (pp. 758–766). Berlin: Springer. CrossRef

You, M., Chen, C., Bu, J., Liu, J., & Tao, J. (2007). Manifolds based emotion recognition in speech. Computational Linguistics and Chinese Language Processing, 12(1), 49–64.

Title: Robust emotional speech classification in the presence of babble noise
Authors: Salman Karimi
Mohammad Hossein Sedaaghi
Publication date: 01-06-2013
Publisher: Springer US
Published in: International Journal of Speech Technology / Issue 2/2013
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-012-9176-y

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 2/2013

The optimized wavelet filters for speech compression

An efficient lattice-based phonetic search method for accelerating keyword spotting in large speech databases

Emotion recognition from speech using global and local prosodic features

Gender-dependent emotion recognition based on HMMs and SPHMMs

Characterization and recognition of emotions from speech using excitation source information

Multiclass support vector machines for environmental sounds classification in visual domain based on log-Gabor filters