Skip to main content
Top
Published in: International Journal of Speech Technology 1/2016

20-01-2016

Efficient feature combination techniques for emotional speech classification

Authors: Hemanta Kumar Palo, Mihir Narayan Mohanty, Mahesh Chandra

Published in: International Journal of Speech Technology | Issue 1/2016

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The challenge to enhance the naturalness and efficiency of spoken language man–machine interface, emotional speech identification and its classification has been a predominant research area. The reliability and accuracy of such emotion identification greatly depends on the feature selection and extraction. In this paper, a combined feature selection technique has been proposed which uses the reduced features set artifact of vector quantizer (VQ) in a Radial Basis Function Neural Network (RBFNN) environment for classification. In the initial stage, Linear Prediction Coefficient (LPC) and time–frequency Hurst parameter (pH) are utilized to extract the relevant feature, both exhibiting complementary information from the emotional speech. Extensive simulations have been carried out using Berlin Database of Emotional Speech (EMO-DB) with various combination of feature set. The experimental results reveal 76 % accuracy for pH and 68 % for LPC using standalone feature set, whereas the combination of feature sets, (LP VQC and pH VQC) enhance the average accuracy level up to 90.55 %.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Abry, P., & Veith, D. (1998). Wavelet analysis of long-range-dependent traffic. IEEE Transactions on Information Theory, 44(1), 2–15.CrossRefMATHMathSciNet Abry, P., & Veith, D. (1998). Wavelet analysis of long-range-dependent traffic. IEEE Transactions on Information Theory, 44(1), 2–15.CrossRefMATHMathSciNet
go back to reference Ayadi, M. E., Kamel, M. S., & Karray, F. (2011). Survey on speech recognition: Resources, features and methods. Pattern Recognition, 44, 572–587.CrossRefMATH Ayadi, M. E., Kamel, M. S., & Karray, F. (2011). Survey on speech recognition: Resources, features and methods. Pattern Recognition, 44, 572–587.CrossRefMATH
go back to reference Beran, J. (1994). Statistics for long-memory processes. New York: Chapman and Hall/CRC Press.MATH Beran, J. (1994). Statistics for long-memory processes. New York: Chapman and Hall/CRC Press.MATH
go back to reference Bishop, C. M. (1995). Neural networks for pattern recognition (1st ed.). Oxford: Oxford University Press.MATH Bishop, C. M. (1995). Neural networks for pattern recognition (1st ed.). Oxford: Oxford University Press.MATH
go back to reference Biswas, A., Sahu, P. K., Bhowmick, A., & Chandra, M. (2014). Feature extraction technique using ERB like wavelet sub-band periodic and aperiodic decomposition for TIMIT phoneme recognition. International Journal of Speech Technology, Springer, 17(4), 389–399.CrossRef Biswas, A., Sahu, P. K., Bhowmick, A., & Chandra, M. (2014). Feature extraction technique using ERB like wavelet sub-band periodic and aperiodic decomposition for TIMIT phoneme recognition. International Journal of Speech Technology, Springer, 17(4), 389–399.CrossRef
go back to reference Burkhardt, F., Paeschke, A. and Rolfes, M., Sendlmeier, W.F. and Weiss, B. (2005). A database of German emotional speech, Proceedings of Interspeech 2005. Burkhardt, F., Paeschke, A. and Rolfes, M., Sendlmeier, W.F. and Weiss, B. (2005). A database of German emotional speech, Proceedings of Interspeech 2005.
go back to reference Chauhan, H. B., & Tanawala, B. A. (2015). Comparative study of MFCC and LPC algorithms for Gujrati isolated word recognition. International Journal of Innovative Research in Computer and Communication Engineering, 3(2), 822–826. Chauhan, H. B., & Tanawala, B. A. (2015). Comparative study of MFCC and LPC algorithms for Gujrati isolated word recognition. International Journal of Innovative Research in Computer and Communication Engineering, 3(2), 822–826.
go back to reference Do, V. H., Xiao, X. and Chng, E.S. (2011). Comparison and combination of multilayer perceptrons and deep belief networks in hybrid automatic speech recognition systems. Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Xi’an, China. Do, V. H., Xiao, X. and Chng, E.S. (2011). Comparison and combination of multilayer perceptrons and deep belief networks in hybrid automatic speech recognition systems. Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Xi’an, China.
go back to reference Droua-Hamdani, G., Selouani, S. A., & Boudraa, M. (2012). Speaker-independent ASR for modern standard Arabic: Effect of regional accents. International Journal of Speech Technology, Springer, 15(4), 487–493.CrossRef Droua-Hamdani, G., Selouani, S. A., & Boudraa, M. (2012). Speaker-independent ASR for modern standard Arabic: Effect of regional accents. International Journal of Speech Technology, Springer, 15(4), 487–493.CrossRef
go back to reference Fulmare, N. S., Chakrabarti, P., & Yada, D. (2013). Understanding and estimation of emotional expression using acoustic analysis of natural speech. International Journal on Natural Language Computing, 2(4), 37–46.CrossRef Fulmare, N. S., Chakrabarti, P., & Yada, D. (2013). Understanding and estimation of emotional expression using acoustic analysis of natural speech. International Journal on Natural Language Computing, 2(4), 37–46.CrossRef
go back to reference Gevaert, W., Tsenov, G., & Mladenov, V. (2010). Neural networks used for speech recognition. Journal of Automatic Control, university of Belgrade, 20, 1–7.CrossRef Gevaert, W., Tsenov, G., & Mladenov, V. (2010). Neural networks used for speech recognition. Journal of Automatic Control, university of Belgrade, 20, 1–7.CrossRef
go back to reference Haykins, S. (2006). Neural networks: A comprehensive foundation (2nd ed.). Delhi: Pearson Education. Haykins, S. (2006). Neural networks: A comprehensive foundation (2nd ed.). Delhi: Pearson Education.
go back to reference Ishizuka, K., & Nakatani, T. (2006). A feature extraction method using sub-band based periodicity and aperiodicity decomposition with noise robust frontend processing for automatic speech recognition. Speech Communication, 48(11), 1447–1457.CrossRef Ishizuka, K., & Nakatani, T. (2006). A feature extraction method using sub-band based periodicity and aperiodicity decomposition with noise robust frontend processing for automatic speech recognition. Speech Communication, 48(11), 1447–1457.CrossRef
go back to reference Kopparapu, S. K. and Laxminarayana, M. (2010). Choice of mel filter bank in computing MFCC of a resampled speech. 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010), IEEE, pp. 121–124. Kopparapu, S. K. and Laxminarayana, M. (2010). Choice of mel filter bank in computing MFCC of a resampled speech. 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010), IEEE, pp. 121–124.
go back to reference Kulikovs, M., Sharkovsky, S., & Petersons, E. (2010). Comparative studies of methods for accurate Hurst parameter estimation. Electronics and Electrical engineering, Elektronika IR Elektrotechnika, 7(103), 113–116. Kulikovs, M., Sharkovsky, S., & Petersons, E. (2010). Comparative studies of methods for accurate Hurst parameter estimation. Electronics and Electrical engineering, Elektronika IR Elektrotechnika, 7(103), 113–116.
go back to reference Li, Y. and Zhao, Y. (1998). Recognizing emotions in speech using short-term and long-term features. ICSLP. Li, Y. and Zhao, Y. (1998). Recognizing emotions in speech using short-term and long-term features. ICSLP.
go back to reference Linde, Y., Buzo, A., & Gray, R. (1980). An algorithm for vector quantizer design. IEEE Transactions on Communications, 28, 84–95.CrossRef Linde, Y., Buzo, A., & Gray, R. (1980). An algorithm for vector quantizer design. IEEE Transactions on Communications, 28, 84–95.CrossRef
go back to reference Liu, Q., Yao, M., Xu, H., & Wang, F. (2013). Research on different feature parameters in speaker recognition. Journal of Signal and Information Processing, Scientific Research, 4, 106–110.CrossRef Liu, Q., Yao, M., Xu, H., & Wang, F. (2013). Research on different feature parameters in speaker recognition. Journal of Signal and Information Processing, Scientific Research, 4, 106–110.CrossRef
go back to reference Luengo, I., Navas, E., & Hernáez, I. (2010). Feature analysis and evaluation for automatic emotion identification in speech. IEEE Transaction Multimedia, 12(6), 490–501.CrossRef Luengo, I., Navas, E., & Hernáez, I. (2010). Feature analysis and evaluation for automatic emotion identification in speech. IEEE Transaction Multimedia, 12(6), 490–501.CrossRef
go back to reference Messina, E., Arosio, G., & Archetti, F. (2009). Audio-based emotion recognition in judicial domain: A multilayer support vector machines approach. Machine Learning and Data Mining in Pattern Recognition, 5632, 594–602.CrossRef Messina, E., Arosio, G., & Archetti, F. (2009). Audio-based emotion recognition in judicial domain: A multilayer support vector machines approach. Machine Learning and Data Mining in Pattern Recognition, 5632, 594–602.CrossRef
go back to reference Palo, H. K., Mohanty, M. N. and Chandra, M. (2015a). Emotion recognition using MLP and GMM for Oriya language. International Journal of Computational Vision and Robotics, Inderscience. Palo, H. K., Mohanty, M. N. and Chandra, M. (2015a). Emotion recognition using MLP and GMM for Oriya language. International Journal of Computational Vision and Robotics, Inderscience.
go back to reference Palo, H. K., Mohanty, M. N. and Chandra, M. (2015b). Design of neural network model for emotional speech recognition. Artificial Intelligence and Evolutionary Algorithms in Engineering Systems, 325, Springer India, pp. 291–300. Palo, H. K., Mohanty, M. N. and Chandra, M. (2015b). Design of neural network model for emotional speech recognition. Artificial Intelligence and Evolutionary Algorithms in Engineering Systems, 325, Springer India, pp. 291–300.
go back to reference Pao, T.-L., Chen, Y.-T., Yenand, J.-H., & Liao, W.-Y. (2005). Detecting emotions in Mandarin speech. Computational Linguistics and Chinese Language Processing, 10(3), 347–362. Pao, T.-L., Chen, Y.-T., Yenand, J.-H., & Liao, W.-Y. (2005). Detecting emotions in Mandarin speech. Computational Linguistics and Chinese Language Processing, 10(3), 347–362.
go back to reference Roughan, M. and Veitch, D. (1999). Measuring long-range dependence under changing traffic conditions. IEEE, pp. 1513–1521. Roughan, M. and Veitch, D. (1999). Measuring long-range dependence under changing traffic conditions. IEEE, pp. 1513–1521.
go back to reference Roughan, M., Veith, D., & Abry, P. (2000). Real-time estimation of the parameters of long-range dependence. IEEE/ACM Transaction Network, 8(4), 467–478.CrossRef Roughan, M., Veith, D., & Abry, P. (2000). Real-time estimation of the parameters of long-range dependence. IEEE/ACM Transaction Network, 8(4), 467–478.CrossRef
go back to reference Sant’Ana, R., Coelho, R., & Alcaim, A. (2006). Text-independent speaker recognition based on the Hurst parameter and the multidimensional fractional brownian motion model. IEEE Transactions on Audio, Speech and Language Processing, 14(3), 931–940.CrossRef Sant’Ana, R., Coelho, R., & Alcaim, A. (2006). Text-independent speaker recognition based on the Hurst parameter and the multidimensional fractional brownian motion model. IEEE Transactions on Audio, Speech and Language Processing, 14(3), 931–940.CrossRef
go back to reference Santos, R. B., Rupp, M., Bonzi, S. J., & Fileti, A. M. F. (2013). Comparison between multilayer feed forward neural networks and a radial basis function network to detect and locate leaks in pipelines transporting gas. Chemical Engineering Transactions, 32, 1375–1380. Santos, R. B., Rupp, M., Bonzi, S. J., & Fileti, A. M. F. (2013). Comparison between multilayer feed forward neural networks and a radial basis function network to detect and locate leaks in pipelines transporting gas. Chemical Engineering Transactions, 32, 1375–1380.
go back to reference Scherer, K. (2003). Vocal communication of emotion: A review of research paradigms. Speech Communication, 40(1–2), 227–256.CrossRefMATH Scherer, K. (2003). Vocal communication of emotion: A review of research paradigms. Speech Communication, 40(1–2), 227–256.CrossRefMATH
go back to reference Schuller, B., Batliner, A., Steidl, S., & Seppi, D. (2011). Recognizing realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge. Speech Communication, 53, 1062–1087.CrossRef Schuller, B., Batliner, A., Steidl, S., & Seppi, D. (2011). Recognizing realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge. Speech Communication, 53, 1062–1087.CrossRef
go back to reference Schuller, B., Müller, R., Lang, M., & Rigoll, G. (2005). Speaker independent emotion recognition by early Fusion of acoustic and linguistic features within ensembles (pp. 805–808). Lisbon: Interspeech. Schuller, B., Müller, R., Lang, M., & Rigoll, G. (2005). Speaker independent emotion recognition by early Fusion of acoustic and linguistic features within ensembles (pp. 805–808). Lisbon: Interspeech.
go back to reference Sharma, R. P., Farooq, O., & Khan, I. (2012). Wavelet based sub-band parameters for classification of unaspirated Hindi stop consonants in initial position of CV syllables. International Journal of Speech Technology, Springer, 16(3), 323–332.CrossRef Sharma, R. P., Farooq, O., & Khan, I. (2012). Wavelet based sub-band parameters for classification of unaspirated Hindi stop consonants in initial position of CV syllables. International Journal of Speech Technology, Springer, 16(3), 323–332.CrossRef
go back to reference Spanias, A. S. (1994). Speech coding: A tutorial review. Proceedings IEEE, 82, 1541–1582.CrossRef Spanias, A. S. (1994). Speech coding: A tutorial review. Proceedings IEEE, 82, 1541–1582.CrossRef
go back to reference Sunny, S., Peter, S. D. and Jacob, K. P. (2013). Combined feature extraction techniques and naïve bayes classifier for speech recognition. VLSI, CS & IT-CSCP, pp. 155–163. Sunny, S., Peter, S. D. and Jacob, K. P. (2013). Combined feature extraction techniques and naïve bayes classifier for speech recognition. VLSI, CS & IT-CSCP, pp. 155–163.
go back to reference Veitch, D., & Abry, P. (1999). A wavelet-based joint estimator of the parameters of long-range dependence. IEEE Transactions on Information Theory, 45(3), 878–897.CrossRefMathSciNetMATH Veitch, D., & Abry, P. (1999). A wavelet-based joint estimator of the parameters of long-range dependence. IEEE Transactions on Information Theory, 45(3), 878–897.CrossRefMathSciNetMATH
go back to reference Wang, K., An, N., Li, B.N. & Zhang, Y. (2015) Speech emotion recognition using Fourier parameters. IEEE Transactions on Affective Computing, 6(1), 69–75.CrossRef Wang, K., An, N., Li, B.N. & Zhang, Y. (2015) Speech emotion recognition using Fourier parameters. IEEE Transactions on Affective Computing, 6(1), 69–75.CrossRef
go back to reference Wenjing, H., Haifeng, L. and Chunyu, G. (2009). A hybrid speech emtion perception method of VQ-based feature processing and ANN recognition. Global Congress on Intelligent Systems, IEEE computer society, Xiamen, pp. 145–149. Wenjing, H., Haifeng, L. and Chunyu, G. (2009). A hybrid speech emtion perception method of VQ-based feature processing and ANN recognition. Global Congress on Intelligent Systems, IEEE computer society, Xiamen, pp. 145–149.
go back to reference Wu, C. H., & Liang, W. B. (2011). Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Transaction on Affective Computing, 2(1), 10–21.CrossRefMathSciNet Wu, C. H., & Liang, W. B. (2011). Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Transaction on Affective Computing, 2(1), 10–21.CrossRefMathSciNet
go back to reference Yu, H., Xie, T., Paszezynski, S., & Wilamowski, B. M. (2011). Advantages of radial basis function networks for dynamics system design. IEEE Transactions on Industrial Electronics, 58(12), 5438–5450.CrossRef Yu, H., Xie, T., Paszezynski, S., & Wilamowski, B. M. (2011). Advantages of radial basis function networks for dynamics system design. IEEE Transactions on Industrial Electronics, 58(12), 5438–5450.CrossRef
go back to reference Zao, L., Cavalcante, D., & Coelho, R. (2014). Time-frequency feature and AMS-GMM mask for acoustic emotion classification. IEEE Signal Processing Letters, 21(5), 620–624.CrossRef Zao, L., Cavalcante, D., & Coelho, R. (2014). Time-frequency feature and AMS-GMM mask for acoustic emotion classification. IEEE Signal Processing Letters, 21(5), 620–624.CrossRef
Metadata
Title
Efficient feature combination techniques for emotional speech classification
Authors
Hemanta Kumar Palo
Mihir Narayan Mohanty
Mahesh Chandra
Publication date
20-01-2016
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 1/2016
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-016-9333-9

Other articles of this Issue 1/2016

International Journal of Speech Technology 1/2016 Go to the issue