Top

International Journal of Speech Technology

Published in:

20-01-2016

Efficient feature combination techniques for emotional speech classification

Authors: Hemanta Kumar Palo, Mihir Narayan Mohanty, Mahesh Chandra

Published in: International Journal of Speech Technology | Issue 1/2016

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The challenge to enhance the naturalness and efficiency of spoken language man–machine interface, emotional speech identification and its classification has been a predominant research area. The reliability and accuracy of such emotion identification greatly depends on the feature selection and extraction. In this paper, a combined feature selection technique has been proposed which uses the reduced features set artifact of vector quantizer (VQ) in a Radial Basis Function Neural Network (RBFNN) environment for classification. In the initial stage, Linear Prediction Coefficient (LPC) and time–frequency Hurst parameter (pH) are utilized to extract the relevant feature, both exhibiting complementary information from the emotional speech. Extensive simulations have been carried out using Berlin Database of Emotional Speech (EMO-DB) with various combination of feature set. The experimental results reveal 76 % accuracy for pH and 68 % for LPC using standalone feature set, whereas the combination of feature sets, (LP VQC and pH VQC) enhance the average accuracy level up to 90.55 %.

previous article Articulatory and excitation source features for speech recognition in read, extempore and conversation modes

next article Integration of Yoruba language into MaryTTS

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Abry, P., & Veith, D. (1998). Wavelet analysis of long-range-dependent traffic. IEEE Transactions on Information Theory, 44(1), 2–15.CrossRefMATHMathSciNet

Ayadi, M. E., Kamel, M. S., & Karray, F. (2011). Survey on speech recognition: Resources, features and methods. Pattern Recognition, 44, 572–587.CrossRefMATH

Beran, J. (1994). Statistics for long-memory processes. New York: Chapman and Hall/CRC Press.MATH

Bishop, C. M. (1995). Neural networks for pattern recognition (1st ed.). Oxford: Oxford University Press.MATH

Biswas, A., Sahu, P. K., Bhowmick, A., & Chandra, M. (2014). Feature extraction technique using ERB like wavelet sub-band periodic and aperiodic decomposition for TIMIT phoneme recognition. International Journal of Speech Technology, Springer, 17(4), 389–399.CrossRef

Burkhardt, F., Paeschke, A. and Rolfes, M., Sendlmeier, W.F. and Weiss, B. (2005). A database of German emotional speech, Proceedings of Interspeech 2005.

Chauhan, H. B., & Tanawala, B. A. (2015). Comparative study of MFCC and LPC algorithms for Gujrati isolated word recognition. International Journal of Innovative Research in Computer and Communication Engineering, 3(2), 822–826.

Do, V. H., Xiao, X. and Chng, E.S. (2011). Comparison and combination of multilayer perceptrons and deep belief networks in hybrid automatic speech recognition systems. Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Xi’an, China.

Droua-Hamdani, G., Selouani, S. A., & Boudraa, M. (2012). Speaker-independent ASR for modern standard Arabic: Effect of regional accents. International Journal of Speech Technology, Springer, 15(4), 487–493.CrossRef

Fulmare, N. S., Chakrabarti, P., & Yada, D. (2013). Understanding and estimation of emotional expression using acoustic analysis of natural speech. International Journal on Natural Language Computing, 2(4), 37–46.CrossRef

Gevaert, W., Tsenov, G., & Mladenov, V. (2010). Neural networks used for speech recognition. Journal of Automatic Control, university of Belgrade, 20, 1–7.CrossRef

Haykins, S. (2006). Neural networks: A comprehensive foundation (2nd ed.). Delhi: Pearson Education.

Ishizuka, K., & Nakatani, T. (2006). A feature extraction method using sub-band based periodicity and aperiodicity decomposition with noise robust frontend processing for automatic speech recognition. Speech Communication, 48(11), 1447–1457.CrossRef

Kopparapu, S. K. and Laxminarayana, M. (2010). Choice of mel filter bank in computing MFCC of a resampled speech. 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010), IEEE, pp. 121–124.

Kulikovs, M., Sharkovsky, S., & Petersons, E. (2010). Comparative studies of methods for accurate Hurst parameter estimation. Electronics and Electrical engineering, Elektronika IR Elektrotechnika, 7(103), 113–116.

Li, Y. and Zhao, Y. (1998). Recognizing emotions in speech using short-term and long-term features. ICSLP.

Linde, Y., Buzo, A., & Gray, R. (1980). An algorithm for vector quantizer design. IEEE Transactions on Communications, 28, 84–95.CrossRef

Liu, Q., Yao, M., Xu, H., & Wang, F. (2013). Research on different feature parameters in speaker recognition. Journal of Signal and Information Processing, Scientific Research, 4, 106–110.CrossRef

Luengo, I., Navas, E., & Hernáez, I. (2010). Feature analysis and evaluation for automatic emotion identification in speech. IEEE Transaction Multimedia, 12(6), 490–501.CrossRef

Messina, E., Arosio, G., & Archetti, F. (2009). Audio-based emotion recognition in judicial domain: A multilayer support vector machines approach. Machine Learning and Data Mining in Pattern Recognition, 5632, 594–602.CrossRef

Palo, H. K., Mohanty, M. N. and Chandra, M. (2015a). Emotion recognition using MLP and GMM for Oriya language. International Journal of Computational Vision and Robotics, Inderscience.

Palo, H. K., Mohanty, M. N. and Chandra, M. (2015b). Design of neural network model for emotional speech recognition. Artificial Intelligence and Evolutionary Algorithms in Engineering Systems, 325, Springer India, pp. 291–300.

Pao, T.-L., Chen, Y.-T., Yenand, J.-H., & Liao, W.-Y. (2005). Detecting emotions in Mandarin speech. Computational Linguistics and Chinese Language Processing, 10(3), 347–362.

Roughan, M. and Veitch, D. (1999). Measuring long-range dependence under changing traffic conditions. IEEE, pp. 1513–1521.

Roughan, M., Veith, D., & Abry, P. (2000). Real-time estimation of the parameters of long-range dependence. IEEE/ACM Transaction Network, 8(4), 467–478.CrossRef

Sant’Ana, R., Coelho, R., & Alcaim, A. (2006). Text-independent speaker recognition based on the Hurst parameter and the multidimensional fractional brownian motion model. IEEE Transactions on Audio, Speech and Language Processing, 14(3), 931–940.CrossRef

Santos, R. B., Rupp, M., Bonzi, S. J., & Fileti, A. M. F. (2013). Comparison between multilayer feed forward neural networks and a radial basis function network to detect and locate leaks in pipelines transporting gas. Chemical Engineering Transactions, 32, 1375–1380.

Scherer, K. (2003). Vocal communication of emotion: A review of research paradigms. Speech Communication, 40(1–2), 227–256.CrossRefMATH

Schuller, B., Batliner, A., Steidl, S., & Seppi, D. (2011). Recognizing realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge. Speech Communication, 53, 1062–1087.CrossRef

Schuller, B., Müller, R., Lang, M., & Rigoll, G. (2005). Speaker independent emotion recognition by early Fusion of acoustic and linguistic features within ensembles (pp. 805–808). Lisbon: Interspeech.

Sharma, R. P., Farooq, O., & Khan, I. (2012). Wavelet based sub-band parameters for classification of unaspirated Hindi stop consonants in initial position of CV syllables. International Journal of Speech Technology, Springer, 16(3), 323–332.CrossRef

Spanias, A. S. (1994). Speech coding: A tutorial review. Proceedings IEEE, 82, 1541–1582.CrossRef

Sunny, S., Peter, S. D. and Jacob, K. P. (2013). Combined feature extraction techniques and naïve bayes classifier for speech recognition. VLSI, CS & IT-CSCP, pp. 155–163.

Veitch, D., & Abry, P. (1999). A wavelet-based joint estimator of the parameters of long-range dependence. IEEE Transactions on Information Theory, 45(3), 878–897.CrossRefMathSciNetMATH

Wang, K., An, N., Li, B.N. & Zhang, Y. (2015) Speech emotion recognition using Fourier parameters. IEEE Transactions on Affective Computing, 6(1), 69–75.CrossRef

Wenjing, H., Haifeng, L. and Chunyu, G. (2009). A hybrid speech emtion perception method of VQ-based feature processing and ANN recognition. Global Congress on Intelligent Systems, IEEE computer society, Xiamen, pp. 145–149.

Wu, C. H., & Liang, W. B. (2011). Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Transaction on Affective Computing, 2(1), 10–21.CrossRefMathSciNet

Yu, H., Xie, T., Paszezynski, S., & Wilamowski, B. M. (2011). Advantages of radial basis function networks for dynamics system design. IEEE Transactions on Industrial Electronics, 58(12), 5438–5450.CrossRef

Zao, L., Cavalcante, D., & Coelho, R. (2014). Time-frequency feature and AMS-GMM mask for acoustic emotion classification. IEEE Signal Processing Letters, 21(5), 620–624.CrossRef

Title: Efficient feature combination techniques for emotional speech classification
Authors: Hemanta Kumar Palo
Mihir Narayan Mohanty
Mahesh Chandra
Publication date: 20-01-2016
Publisher: Springer US
Published in: International Journal of Speech Technology / Issue 1/2016
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-016-9333-9

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 1/2016

ILATalk: a new multilingual text-to-speech synthesizer with machine learning

Automatic prosodic tone choice classification with Brazil’s intonation model

Sub-vector based biometric speaker verification using MLLR super-vector

Pitch estimation of speech and music sound based on multi-scale product with auditory feature extraction

Efficient audio integrity verification algorithm using discrete cosine transform

MFCC-GMM based accent recognition system for Telugu speech signals