Skip to main content
Erschienen in: International Journal of Speech Technology 4/2022

02.11.2021

Handling high dimensional features by ensemble learning for emotion identification from speech signal

verfasst von: Konduru Ashok Kumar, J. L. Mazher Iqbal

Erschienen in: International Journal of Speech Technology | Ausgabe 4/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In the recent past, handling the curse of dimensionality observed in acoustic features of the speech signal in machine learning-based emotion detection has been considered a crucial objective. The contemporary emotion prediction methods are experiencing false alarming due to the high dimensionality of the features used in training phase of the machine learning models. The majority of the contemporary models have endeavored to handle the curse of high dimensionality of the training corpus. However, the contemporary models are focusing more on using fusion of multiple classifiers, which is barely improvising the decision accuracy, if the volume of the training corpus is high. The contribution of this manuscript endeavored to portray a novel ensemble model that using fusion of diversity measures to suggest the optimal features. Moreover, the proposed method attempts to reduce the impact of the high dimensionality in feature values by using a novel clustering process. The experimental study signifies the proposed method performance in term of emotion prediction from speech signals and compared to contemporary models of emotion detection using machine learning. The fourfold cross-validation of standard data corpus has used in performance analysis.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Alonso, J. B. (2015). New approach in quantification of emotional intensity from the speech signal: Emotional temperature. Expert Systems with Applications, 42(24), 9554–9564.CrossRef Alonso, J. B. (2015). New approach in quantification of emotional intensity from the speech signal: Emotional temperature. Expert Systems with Applications, 42(24), 9554–9564.CrossRef
Zurück zum Zitat Basu, S. C. (2017). A review on emotion recognition using speech. In 2017 International conference on inventive communication and computational technologies (ICICCT) (pp. 109–114). IEEE. Basu, S. C. (2017). A review on emotion recognition using speech. In 2017 International conference on inventive communication and computational technologies (ICICCT) (pp. 109–114). IEEE.
Zurück zum Zitat Bhavan, A. C. (2019). Bagged support vector machines for emotion recognition from speech. Knowledge-Based Systems, 184, 104886.CrossRef Bhavan, A. C. (2019). Bagged support vector machines for emotion recognition from speech. Knowledge-Based Systems, 184, 104886.CrossRef
Zurück zum Zitat Budak, H. (2016). A modified t-score for feature selection. Anadolu University Journal of Science and Technology A-Applied Sciences and Engineering, 17(5), 845–852. Budak, H. (2016). A modified t-score for feature selection. Anadolu University Journal of Science and Technology A-Applied Sciences and Engineering, 17(5), 845–852.
Zurück zum Zitat Cao, H. V. (2015). peaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech. Computer Speech & Language, 29(1), 186–202.CrossRef Cao, H. V. (2015). peaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech. Computer Speech & Language, 29(1), 186–202.CrossRef
Zurück zum Zitat Cong, P., Wang, C., Ren, Z., Wang, H., Wang, Y., & Feng, J. (2016). Unsatisfied customer call detection with deep learning. In Proceedings of the 2016 10th international symposium on chinese spoken language processing (ISCSLP), 1–5. Cong, P., Wang, C., Ren, Z., Wang, H., Wang, Y., & Feng, J. (2016). Unsatisfied customer call detection with deep learning. In Proceedings of the 2016 10th international symposium on chinese spoken language processing (ISCSLP), 1–5.
Zurück zum Zitat Dietterich, T. G. (2000). Ensemble methods in machine learning. In International workshop on multiple classifier systems, 1–15. Dietterich, T. G. (2000). Ensemble methods in machine learning. In International workshop on multiple classifier systems, 1–15.
Zurück zum Zitat Getahun, F. (2016). Emotion identification from spontaneous communication. In 2016 12th international conference on signal-image technology & internet-based systems (SITIS), 151–158. Getahun, F. (2016). Emotion identification from spontaneous communication. In 2016 12th international conference on signal-image technology & internet-based systems (SITIS), 151–158.
Zurück zum Zitat Ghasemi, A. (2012). Normality tests for statistical analysis: A guide for non-statisticians. International Journal of Endocrinology and Metabolism, 10(2), 486.CrossRef Ghasemi, A. (2012). Normality tests for statistical analysis: A guide for non-statisticians. International Journal of Endocrinology and Metabolism, 10(2), 486.CrossRef
Zurück zum Zitat Hu, Q. H. (2007). Fault diagnosis of rotating machinery based on improved wavelet package transform and SVMs ensemble. Mechanical Systems and Signal Processing, 21(2), 688–705.CrossRef Hu, Q. H. (2007). Fault diagnosis of rotating machinery based on improved wavelet package transform and SVMs ensemble. Mechanical Systems and Signal Processing, 21(2), 688–705.CrossRef
Zurück zum Zitat Jiang, W. W. (2019). Speech emotion recognition with heterogeneous feature unification of deep neural network. Sensors, 19(12), 2730.CrossRef Jiang, W. W. (2019). Speech emotion recognition with heterogeneous feature unification of deep neural network. Sensors, 19(12), 2730.CrossRef
Zurück zum Zitat K Ashok Kumar, J. L. (2020). Digital feature optimization using fusion of diversity measures for emotion identification from speech signal. Journal of Ambient Intelligence and Humanized Computing, 1–13. K Ashok Kumar, J. L. (2020). Digital feature optimization using fusion of diversity measures for emotion identification from speech signal. Journal of Ambient Intelligence and Humanized Computing, 1–13.
Zurück zum Zitat Kerkeni, L. S. (2019). Automatic speech emotion recognition using an optimal combination of features based on EMD-TKEO. Speech Communication, 114, 22–35.CrossRef Kerkeni, L. S. (2019). Automatic speech emotion recognition using an optimal combination of features based on EMD-TKEO. Speech Communication, 114, 22–35.CrossRef
Zurück zum Zitat Khan, A., & Roy, U. (2017). motion recognition using prosodie and spectral features of speech and Naïve Bayes Classifier. In 2017 international conference on wireless communications, signal processing and networking (WiSPNET) (pp. 1017–1021). IEEE. Khan, A., & Roy, U. (2017). motion recognition using prosodie and spectral features of speech and Naïve Bayes Classifier. In 2017 international conference on wireless communications, signal processing and networking (WiSPNET) (pp. 1017–1021). IEEE.
Zurück zum Zitat Kim, H. C. (2002). Support vector machine ensemble with bagging. In International workshop on support vector machines, 397–408. Kim, H. C. (2002). Support vector machine ensemble with bagging. In International workshop on support vector machines, 397–408.
Zurück zum Zitat Koduru, A., Valiveti, H. B., & Budati, A. K. (2020). Feature extraction algorithms to improve the speech emotion recognition rate. International Journal of Speech Technology, 23(1), 45–55.CrossRef Koduru, A., Valiveti, H. B., & Budati, A. K. (2020). Feature extraction algorithms to improve the speech emotion recognition rate. International Journal of Speech Technology, 23(1), 45–55.CrossRef
Zurück zum Zitat Liu, F. B. (2016). Boost picking: a universal method on converting supervised classification to semi-supervised classification. arXiv preprint arXiv:abs/1602.05659. Liu, F. B. (2016). Boost picking: a universal method on converting supervised classification to semi-supervised classification. arXiv preprint arXiv:​abs/​1602.​05659.
Zurück zum Zitat Liu, Z. T. (2018). Speech emotion recognition based on feature selection and extreme learning machine decision tree. Neurocomputing, 273, 271–280.CrossRef Liu, Z. T. (2018). Speech emotion recognition based on feature selection and extreme learning machine decision tree. Neurocomputing, 273, 271–280.CrossRef
Zurück zum Zitat Livingstone, S. R. (2018). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE, 13(5), e0196391.CrossRef Livingstone, S. R. (2018). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE, 13(5), e0196391.CrossRef
Zurück zum Zitat Luengo, I. N. (2010). Feature analysis and evaluation for automatic emotion identification in speech. IEEE Transactions on Multimedia, 12(6), 490–501.CrossRef Luengo, I. N. (2010). Feature analysis and evaluation for automatic emotion identification in speech. IEEE Transactions on Multimedia, 12(6), 490–501.CrossRef
Zurück zum Zitat Mao, Q. Z. (2011). Extraction and analysis for non-personalized emotion features of speech. Advances in Information Sciences and Service Sciences, 3(10). Mao, Q. Z. (2011). Extraction and analysis for non-personalized emotion features of speech. Advances in Information Sciences and Service Sciences, 3(10).
Zurück zum Zitat Matsuki, K. K. (2016). The Random Forests statistical technique: An examination of its value for the study of reading. Scientific Studies of Reading, 20(1), 20–33.CrossRef Matsuki, K. K. (2016). The Random Forests statistical technique: An examination of its value for the study of reading. Scientific Studies of Reading, 20(1), 20–33.CrossRef
Zurück zum Zitat McKnight, P. E. (2010). Mann-Whitney U test. The Corsini Encyclopedia of Psychology, 1–1. McKnight, P. E. (2010). Mann-Whitney U test. The Corsini Encyclopedia of Psychology, 1–1.
Zurück zum Zitat Moretti, F. P. (2015). Urban traffic flow forecasting through statistical and neural network bagging ensemble hybrid modeling. Neurocomputing, 167, 3–7.CrossRef Moretti, F. P. (2015). Urban traffic flow forecasting through statistical and neural network bagging ensemble hybrid modeling. Neurocomputing, 167, 3–7.CrossRef
Zurück zum Zitat Morrison, D. W. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech Communication, 49(2), 98–112.CrossRef Morrison, D. W. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech Communication, 49(2), 98–112.CrossRef
Zurück zum Zitat Ozcift, A. (2011). Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms. Computer Methods and Programs in Biomedicine, 104(3), 443–451.CrossRef Ozcift, A. (2011). Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms. Computer Methods and Programs in Biomedicine, 104(3), 443–451.CrossRef
Zurück zum Zitat Palo, H. K. (2017). Emotion recognition using MLP and GMM for Oriya language. International Journal of Computational Vision and Robotics, 7(4), 426–442.MathSciNetCrossRef Palo, H. K. (2017). Emotion recognition using MLP and GMM for Oriya language. International Journal of Computational Vision and Robotics, 7(4), 426–442.MathSciNetCrossRef
Zurück zum Zitat Pérez-Espinosa, H.R.-G.-P. (2012). Acoustic feature selection and classification of emotions in speech using a 3D continuous emotion model. Biomedical Signal Processing and Control, 7(1), 79–87.CrossRef Pérez-Espinosa, H.R.-G.-P. (2012). Acoustic feature selection and classification of emotions in speech using a 3D continuous emotion model. Biomedical Signal Processing and Control, 7(1), 79–87.CrossRef
Zurück zum Zitat Quinlan, J. R. (1996). Bagging, boosting, and C4. 5. Aaai/iaai, 1, 725–730. Quinlan, J. R. (1996). Bagging, boosting, and C4. 5. Aaai/iaai, 1, 725–730.
Zurück zum Zitat Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5(2), 197–227.CrossRef Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5(2), 197–227.CrossRef
Zurück zum Zitat Schuller, B. R.-H. (2005). Speaker independent speech emotion recognition by ensemble classification. In 2005 IEEE international conference on multimedia and expo (pp. 864–867). IEEE. Schuller, B. R.-H. (2005). Speaker independent speech emotion recognition by ensemble classification. In 2005 IEEE international conference on multimedia and expo (pp. 864–867). IEEE.
Zurück zum Zitat Semwal, N. K. (2017). Automatic speech emotion detection system using multi-domain acoustic feature selection and classification models. In 2017 IEEE International Conference on Identity, Security and Behavior Analysis (ISBA) (pp. 1–6). Semwal, N. K. (2017). Automatic speech emotion detection system using multi-domain acoustic feature selection and classification models. In 2017 IEEE International Conference on Identity, Security and Behavior Analysis (ISBA) (pp. 1–6).
Zurück zum Zitat Shaqra, F.A.-A. (2019). Recognizing emotion from speech based on age and gender using hierarchical models. Procedia Computer Science, 151, 37–44.CrossRef Shaqra, F.A.-A. (2019). Recognizing emotion from speech based on age and gender using hierarchical models. Procedia Computer Science, 151, 37–44.CrossRef
Zurück zum Zitat Shasidhar, M. R. (2011). MRI brain image segmentation using modified fuzzy c-means clustering algorithm. In 2011 international conference on communication systems and network technologies (pp. 473–478). IEEE. Shasidhar, M. R. (2011). MRI brain image segmentation using modified fuzzy c-means clustering algorithm. In 2011 international conference on communication systems and network technologies (pp. 473–478). IEEE.
Zurück zum Zitat Shegokar, P. &. (2016). Continuous wavelet transform based speech emotion recognition. In 2016 10th international conference on signal processing and communication systems (ICSPCS) (pp. 1–8). IEEE. Shegokar, P. &. (2016). Continuous wavelet transform based speech emotion recognition. In 2016 10th international conference on signal processing and communication systems (ICSPCS) (pp. 1–8). IEEE.
Zurück zum Zitat Stuhlsatz, A. M. (2011). Deep neural networks for acoustic emotion recognition: Raising the benchmarks. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5688–5691). IEEE. Stuhlsatz, A. M. (2011). Deep neural networks for acoustic emotion recognition: Raising the benchmarks. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5688–5691). IEEE.
Zurück zum Zitat Stuhlsatz, A., Meyer, C., Eyben, F., Zielke, T., Meier, G., & Schuller, B. (2011). Deep neural networks for acoustic emotion recognition: Raising the benchmarks. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5688–5691). IEEE. Stuhlsatz, A., Meyer, C., Eyben, F., Zielke, T., Meier, G., & Schuller, B. (2011). Deep neural networks for acoustic emotion recognition: Raising the benchmarks. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5688–5691). IEEE.
Zurück zum Zitat Sun, L. Z. (2019). Speech emotion recognition based on DNN-decision tree SVM model. Speech Communication, 115, 29–37.CrossRef Sun, L. Z. (2019). Speech emotion recognition based on DNN-decision tree SVM model. Speech Communication, 115, 29–37.CrossRef
Zurück zum Zitat Sun, X. (2002). Pitch accent prediction using ensemble machine learning. In Seventh international conference on spoken language processing. Sun, X. (2002). Pitch accent prediction using ensemble machine learning. In Seventh international conference on spoken language processing.
Zurück zum Zitat Wang, G. H. (2011). A comparative assessment of ensemble learning for credit scoring. Expert Systems with Applications, 38(1), 223–230.CrossRef Wang, G. H. (2011). A comparative assessment of ensemble learning for credit scoring. Expert Systems with Applications, 38(1), 223–230.CrossRef
Zurück zum Zitat Woolson, R. F. (2007). Wilcoxon signed‐rank test. Wiley encyclopedia of clinical trials, 1–3. Woolson, R. F. (2007). Wilcoxon signed‐rank test. Wiley encyclopedia of clinical trials, 1–3.
Zurück zum Zitat Yang, X. S. (2014). Cuckoo search: Recent advances and applications. Neural Computing and Applications, 24(1), 169–174.CrossRef Yang, X. S. (2014). Cuckoo search: Recent advances and applications. Neural Computing and Applications, 24(1), 169–174.CrossRef
Zurück zum Zitat Yu, Z. Z. (2017). Adaptive semi-supervised classifier ensemble for high dimensional data classification. IEEE Transactions on Cybernetics, 49(2), 366–379.CrossRef Yu, Z. Z. (2017). Adaptive semi-supervised classifier ensemble for high dimensional data classification. IEEE Transactions on Cybernetics, 49(2), 366–379.CrossRef
Zurück zum Zitat Zareapoor, M. (2015). Application of credit card fraud detection: Based on bagging ensemble classifier. Procedia Computer Science, 48, 679–685.CrossRef Zareapoor, M. (2015). Application of credit card fraud detection: Based on bagging ensemble classifier. Procedia Computer Science, 48, 679–685.CrossRef
Zurück zum Zitat Zhang, Z. C. (2014). Cooperative learning and its application to emotion recognition from speech. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(1), 15–126. Zhang, Z. C. (2014). Cooperative learning and its application to emotion recognition from speech. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(1), 15–126.
Zurück zum Zitat Zvarevashe, K. (2020). Ensemble learning of hybrid acoustic features for speech emotion recognition. Algorithms, 13(3), 70.CrossRef Zvarevashe, K. (2020). Ensemble learning of hybrid acoustic features for speech emotion recognition. Algorithms, 13(3), 70.CrossRef
Metadaten
Titel
Handling high dimensional features by ensemble learning for emotion identification from speech signal
verfasst von
Konduru Ashok Kumar
J. L. Mazher Iqbal
Publikationsdatum
02.11.2021
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 4/2022
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-021-09916-x

Weitere Artikel der Ausgabe 4/2022

International Journal of Speech Technology 4/2022 Zur Ausgabe

Neuer Inhalt