Skip to main content
Erschienen in: International Journal of Speech Technology 1/2016

20.01.2016

Efficient feature combination techniques for emotional speech classification

verfasst von: Hemanta Kumar Palo, Mihir Narayan Mohanty, Mahesh Chandra

Erschienen in: International Journal of Speech Technology | Ausgabe 1/2016

Einloggen

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The challenge to enhance the naturalness and efficiency of spoken language man–machine interface, emotional speech identification and its classification has been a predominant research area. The reliability and accuracy of such emotion identification greatly depends on the feature selection and extraction. In this paper, a combined feature selection technique has been proposed which uses the reduced features set artifact of vector quantizer (VQ) in a Radial Basis Function Neural Network (RBFNN) environment for classification. In the initial stage, Linear Prediction Coefficient (LPC) and time–frequency Hurst parameter (pH) are utilized to extract the relevant feature, both exhibiting complementary information from the emotional speech. Extensive simulations have been carried out using Berlin Database of Emotional Speech (EMO-DB) with various combination of feature set. The experimental results reveal 76 % accuracy for pH and 68 % for LPC using standalone feature set, whereas the combination of feature sets, (LP VQC and pH VQC) enhance the average accuracy level up to 90.55 %.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abry, P., & Veith, D. (1998). Wavelet analysis of long-range-dependent traffic. IEEE Transactions on Information Theory, 44(1), 2–15.CrossRefMATHMathSciNet Abry, P., & Veith, D. (1998). Wavelet analysis of long-range-dependent traffic. IEEE Transactions on Information Theory, 44(1), 2–15.CrossRefMATHMathSciNet
Zurück zum Zitat Ayadi, M. E., Kamel, M. S., & Karray, F. (2011). Survey on speech recognition: Resources, features and methods. Pattern Recognition, 44, 572–587.CrossRefMATH Ayadi, M. E., Kamel, M. S., & Karray, F. (2011). Survey on speech recognition: Resources, features and methods. Pattern Recognition, 44, 572–587.CrossRefMATH
Zurück zum Zitat Beran, J. (1994). Statistics for long-memory processes. New York: Chapman and Hall/CRC Press.MATH Beran, J. (1994). Statistics for long-memory processes. New York: Chapman and Hall/CRC Press.MATH
Zurück zum Zitat Bishop, C. M. (1995). Neural networks for pattern recognition (1st ed.). Oxford: Oxford University Press.MATH Bishop, C. M. (1995). Neural networks for pattern recognition (1st ed.). Oxford: Oxford University Press.MATH
Zurück zum Zitat Biswas, A., Sahu, P. K., Bhowmick, A., & Chandra, M. (2014). Feature extraction technique using ERB like wavelet sub-band periodic and aperiodic decomposition for TIMIT phoneme recognition. International Journal of Speech Technology, Springer, 17(4), 389–399.CrossRef Biswas, A., Sahu, P. K., Bhowmick, A., & Chandra, M. (2014). Feature extraction technique using ERB like wavelet sub-band periodic and aperiodic decomposition for TIMIT phoneme recognition. International Journal of Speech Technology, Springer, 17(4), 389–399.CrossRef
Zurück zum Zitat Burkhardt, F., Paeschke, A. and Rolfes, M., Sendlmeier, W.F. and Weiss, B. (2005). A database of German emotional speech, Proceedings of Interspeech 2005. Burkhardt, F., Paeschke, A. and Rolfes, M., Sendlmeier, W.F. and Weiss, B. (2005). A database of German emotional speech, Proceedings of Interspeech 2005.
Zurück zum Zitat Chauhan, H. B., & Tanawala, B. A. (2015). Comparative study of MFCC and LPC algorithms for Gujrati isolated word recognition. International Journal of Innovative Research in Computer and Communication Engineering, 3(2), 822–826. Chauhan, H. B., & Tanawala, B. A. (2015). Comparative study of MFCC and LPC algorithms for Gujrati isolated word recognition. International Journal of Innovative Research in Computer and Communication Engineering, 3(2), 822–826.
Zurück zum Zitat Do, V. H., Xiao, X. and Chng, E.S. (2011). Comparison and combination of multilayer perceptrons and deep belief networks in hybrid automatic speech recognition systems. Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Xi’an, China. Do, V. H., Xiao, X. and Chng, E.S. (2011). Comparison and combination of multilayer perceptrons and deep belief networks in hybrid automatic speech recognition systems. Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Xi’an, China.
Zurück zum Zitat Droua-Hamdani, G., Selouani, S. A., & Boudraa, M. (2012). Speaker-independent ASR for modern standard Arabic: Effect of regional accents. International Journal of Speech Technology, Springer, 15(4), 487–493.CrossRef Droua-Hamdani, G., Selouani, S. A., & Boudraa, M. (2012). Speaker-independent ASR for modern standard Arabic: Effect of regional accents. International Journal of Speech Technology, Springer, 15(4), 487–493.CrossRef
Zurück zum Zitat Fulmare, N. S., Chakrabarti, P., & Yada, D. (2013). Understanding and estimation of emotional expression using acoustic analysis of natural speech. International Journal on Natural Language Computing, 2(4), 37–46.CrossRef Fulmare, N. S., Chakrabarti, P., & Yada, D. (2013). Understanding and estimation of emotional expression using acoustic analysis of natural speech. International Journal on Natural Language Computing, 2(4), 37–46.CrossRef
Zurück zum Zitat Gevaert, W., Tsenov, G., & Mladenov, V. (2010). Neural networks used for speech recognition. Journal of Automatic Control, university of Belgrade, 20, 1–7.CrossRef Gevaert, W., Tsenov, G., & Mladenov, V. (2010). Neural networks used for speech recognition. Journal of Automatic Control, university of Belgrade, 20, 1–7.CrossRef
Zurück zum Zitat Haykins, S. (2006). Neural networks: A comprehensive foundation (2nd ed.). Delhi: Pearson Education. Haykins, S. (2006). Neural networks: A comprehensive foundation (2nd ed.). Delhi: Pearson Education.
Zurück zum Zitat Ishizuka, K., & Nakatani, T. (2006). A feature extraction method using sub-band based periodicity and aperiodicity decomposition with noise robust frontend processing for automatic speech recognition. Speech Communication, 48(11), 1447–1457.CrossRef Ishizuka, K., & Nakatani, T. (2006). A feature extraction method using sub-band based periodicity and aperiodicity decomposition with noise robust frontend processing for automatic speech recognition. Speech Communication, 48(11), 1447–1457.CrossRef
Zurück zum Zitat Kopparapu, S. K. and Laxminarayana, M. (2010). Choice of mel filter bank in computing MFCC of a resampled speech. 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010), IEEE, pp. 121–124. Kopparapu, S. K. and Laxminarayana, M. (2010). Choice of mel filter bank in computing MFCC of a resampled speech. 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010), IEEE, pp. 121–124.
Zurück zum Zitat Kulikovs, M., Sharkovsky, S., & Petersons, E. (2010). Comparative studies of methods for accurate Hurst parameter estimation. Electronics and Electrical engineering, Elektronika IR Elektrotechnika, 7(103), 113–116. Kulikovs, M., Sharkovsky, S., & Petersons, E. (2010). Comparative studies of methods for accurate Hurst parameter estimation. Electronics and Electrical engineering, Elektronika IR Elektrotechnika, 7(103), 113–116.
Zurück zum Zitat Li, Y. and Zhao, Y. (1998). Recognizing emotions in speech using short-term and long-term features. ICSLP. Li, Y. and Zhao, Y. (1998). Recognizing emotions in speech using short-term and long-term features. ICSLP.
Zurück zum Zitat Linde, Y., Buzo, A., & Gray, R. (1980). An algorithm for vector quantizer design. IEEE Transactions on Communications, 28, 84–95.CrossRef Linde, Y., Buzo, A., & Gray, R. (1980). An algorithm for vector quantizer design. IEEE Transactions on Communications, 28, 84–95.CrossRef
Zurück zum Zitat Liu, Q., Yao, M., Xu, H., & Wang, F. (2013). Research on different feature parameters in speaker recognition. Journal of Signal and Information Processing, Scientific Research, 4, 106–110.CrossRef Liu, Q., Yao, M., Xu, H., & Wang, F. (2013). Research on different feature parameters in speaker recognition. Journal of Signal and Information Processing, Scientific Research, 4, 106–110.CrossRef
Zurück zum Zitat Luengo, I., Navas, E., & Hernáez, I. (2010). Feature analysis and evaluation for automatic emotion identification in speech. IEEE Transaction Multimedia, 12(6), 490–501.CrossRef Luengo, I., Navas, E., & Hernáez, I. (2010). Feature analysis and evaluation for automatic emotion identification in speech. IEEE Transaction Multimedia, 12(6), 490–501.CrossRef
Zurück zum Zitat Messina, E., Arosio, G., & Archetti, F. (2009). Audio-based emotion recognition in judicial domain: A multilayer support vector machines approach. Machine Learning and Data Mining in Pattern Recognition, 5632, 594–602.CrossRef Messina, E., Arosio, G., & Archetti, F. (2009). Audio-based emotion recognition in judicial domain: A multilayer support vector machines approach. Machine Learning and Data Mining in Pattern Recognition, 5632, 594–602.CrossRef
Zurück zum Zitat Palo, H. K., Mohanty, M. N. and Chandra, M. (2015a). Emotion recognition using MLP and GMM for Oriya language. International Journal of Computational Vision and Robotics, Inderscience. Palo, H. K., Mohanty, M. N. and Chandra, M. (2015a). Emotion recognition using MLP and GMM for Oriya language. International Journal of Computational Vision and Robotics, Inderscience.
Zurück zum Zitat Palo, H. K., Mohanty, M. N. and Chandra, M. (2015b). Design of neural network model for emotional speech recognition. Artificial Intelligence and Evolutionary Algorithms in Engineering Systems, 325, Springer India, pp. 291–300. Palo, H. K., Mohanty, M. N. and Chandra, M. (2015b). Design of neural network model for emotional speech recognition. Artificial Intelligence and Evolutionary Algorithms in Engineering Systems, 325, Springer India, pp. 291–300.
Zurück zum Zitat Pao, T.-L., Chen, Y.-T., Yenand, J.-H., & Liao, W.-Y. (2005). Detecting emotions in Mandarin speech. Computational Linguistics and Chinese Language Processing, 10(3), 347–362. Pao, T.-L., Chen, Y.-T., Yenand, J.-H., & Liao, W.-Y. (2005). Detecting emotions in Mandarin speech. Computational Linguistics and Chinese Language Processing, 10(3), 347–362.
Zurück zum Zitat Roughan, M. and Veitch, D. (1999). Measuring long-range dependence under changing traffic conditions. IEEE, pp. 1513–1521. Roughan, M. and Veitch, D. (1999). Measuring long-range dependence under changing traffic conditions. IEEE, pp. 1513–1521.
Zurück zum Zitat Roughan, M., Veith, D., & Abry, P. (2000). Real-time estimation of the parameters of long-range dependence. IEEE/ACM Transaction Network, 8(4), 467–478.CrossRef Roughan, M., Veith, D., & Abry, P. (2000). Real-time estimation of the parameters of long-range dependence. IEEE/ACM Transaction Network, 8(4), 467–478.CrossRef
Zurück zum Zitat Sant’Ana, R., Coelho, R., & Alcaim, A. (2006). Text-independent speaker recognition based on the Hurst parameter and the multidimensional fractional brownian motion model. IEEE Transactions on Audio, Speech and Language Processing, 14(3), 931–940.CrossRef Sant’Ana, R., Coelho, R., & Alcaim, A. (2006). Text-independent speaker recognition based on the Hurst parameter and the multidimensional fractional brownian motion model. IEEE Transactions on Audio, Speech and Language Processing, 14(3), 931–940.CrossRef
Zurück zum Zitat Santos, R. B., Rupp, M., Bonzi, S. J., & Fileti, A. M. F. (2013). Comparison between multilayer feed forward neural networks and a radial basis function network to detect and locate leaks in pipelines transporting gas. Chemical Engineering Transactions, 32, 1375–1380. Santos, R. B., Rupp, M., Bonzi, S. J., & Fileti, A. M. F. (2013). Comparison between multilayer feed forward neural networks and a radial basis function network to detect and locate leaks in pipelines transporting gas. Chemical Engineering Transactions, 32, 1375–1380.
Zurück zum Zitat Scherer, K. (2003). Vocal communication of emotion: A review of research paradigms. Speech Communication, 40(1–2), 227–256.CrossRefMATH Scherer, K. (2003). Vocal communication of emotion: A review of research paradigms. Speech Communication, 40(1–2), 227–256.CrossRefMATH
Zurück zum Zitat Schuller, B., Batliner, A., Steidl, S., & Seppi, D. (2011). Recognizing realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge. Speech Communication, 53, 1062–1087.CrossRef Schuller, B., Batliner, A., Steidl, S., & Seppi, D. (2011). Recognizing realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge. Speech Communication, 53, 1062–1087.CrossRef
Zurück zum Zitat Schuller, B., Müller, R., Lang, M., & Rigoll, G. (2005). Speaker independent emotion recognition by early Fusion of acoustic and linguistic features within ensembles (pp. 805–808). Lisbon: Interspeech. Schuller, B., Müller, R., Lang, M., & Rigoll, G. (2005). Speaker independent emotion recognition by early Fusion of acoustic and linguistic features within ensembles (pp. 805–808). Lisbon: Interspeech.
Zurück zum Zitat Sharma, R. P., Farooq, O., & Khan, I. (2012). Wavelet based sub-band parameters for classification of unaspirated Hindi stop consonants in initial position of CV syllables. International Journal of Speech Technology, Springer, 16(3), 323–332.CrossRef Sharma, R. P., Farooq, O., & Khan, I. (2012). Wavelet based sub-band parameters for classification of unaspirated Hindi stop consonants in initial position of CV syllables. International Journal of Speech Technology, Springer, 16(3), 323–332.CrossRef
Zurück zum Zitat Spanias, A. S. (1994). Speech coding: A tutorial review. Proceedings IEEE, 82, 1541–1582.CrossRef Spanias, A. S. (1994). Speech coding: A tutorial review. Proceedings IEEE, 82, 1541–1582.CrossRef
Zurück zum Zitat Sunny, S., Peter, S. D. and Jacob, K. P. (2013). Combined feature extraction techniques and naïve bayes classifier for speech recognition. VLSI, CS & IT-CSCP, pp. 155–163. Sunny, S., Peter, S. D. and Jacob, K. P. (2013). Combined feature extraction techniques and naïve bayes classifier for speech recognition. VLSI, CS & IT-CSCP, pp. 155–163.
Zurück zum Zitat Veitch, D., & Abry, P. (1999). A wavelet-based joint estimator of the parameters of long-range dependence. IEEE Transactions on Information Theory, 45(3), 878–897.CrossRefMathSciNetMATH Veitch, D., & Abry, P. (1999). A wavelet-based joint estimator of the parameters of long-range dependence. IEEE Transactions on Information Theory, 45(3), 878–897.CrossRefMathSciNetMATH
Zurück zum Zitat Wang, K., An, N., Li, B.N. & Zhang, Y. (2015) Speech emotion recognition using Fourier parameters. IEEE Transactions on Affective Computing, 6(1), 69–75.CrossRef Wang, K., An, N., Li, B.N. & Zhang, Y. (2015) Speech emotion recognition using Fourier parameters. IEEE Transactions on Affective Computing, 6(1), 69–75.CrossRef
Zurück zum Zitat Wenjing, H., Haifeng, L. and Chunyu, G. (2009). A hybrid speech emtion perception method of VQ-based feature processing and ANN recognition. Global Congress on Intelligent Systems, IEEE computer society, Xiamen, pp. 145–149. Wenjing, H., Haifeng, L. and Chunyu, G. (2009). A hybrid speech emtion perception method of VQ-based feature processing and ANN recognition. Global Congress on Intelligent Systems, IEEE computer society, Xiamen, pp. 145–149.
Zurück zum Zitat Wu, C. H., & Liang, W. B. (2011). Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Transaction on Affective Computing, 2(1), 10–21.CrossRefMathSciNet Wu, C. H., & Liang, W. B. (2011). Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Transaction on Affective Computing, 2(1), 10–21.CrossRefMathSciNet
Zurück zum Zitat Yu, H., Xie, T., Paszezynski, S., & Wilamowski, B. M. (2011). Advantages of radial basis function networks for dynamics system design. IEEE Transactions on Industrial Electronics, 58(12), 5438–5450.CrossRef Yu, H., Xie, T., Paszezynski, S., & Wilamowski, B. M. (2011). Advantages of radial basis function networks for dynamics system design. IEEE Transactions on Industrial Electronics, 58(12), 5438–5450.CrossRef
Zurück zum Zitat Zao, L., Cavalcante, D., & Coelho, R. (2014). Time-frequency feature and AMS-GMM mask for acoustic emotion classification. IEEE Signal Processing Letters, 21(5), 620–624.CrossRef Zao, L., Cavalcante, D., & Coelho, R. (2014). Time-frequency feature and AMS-GMM mask for acoustic emotion classification. IEEE Signal Processing Letters, 21(5), 620–624.CrossRef
Metadaten
Titel
Efficient feature combination techniques for emotional speech classification
verfasst von
Hemanta Kumar Palo
Mihir Narayan Mohanty
Mahesh Chandra
Publikationsdatum
20.01.2016
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 1/2016
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-016-9333-9

Weitere Artikel der Ausgabe 1/2016

International Journal of Speech Technology 1/2016 Zur Ausgabe

Neuer Inhalt