Skip to main content
Erschienen in: International Journal of Speech Technology 4/2020

08.09.2020

Pattern recognition and features selection for speech emotion recognition model using deep learning

verfasst von: Kittisak Jermsittiparsert, Abdurrahman Abdurrahman, Parinya Siriattakul, Ludmila A. Sundeeva, Wahidah Hashim, Robbi Rahim, Andino Maseleno

Erschienen in: International Journal of Speech Technology | Ausgabe 4/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Automatic speaker recognizing models consists of a foundation on building various models of speaker characterization, pattern analyzing and engineering. The effect of classification and feature selection methods for the speech emotion recognition is focused. The process of selecting the exact parameter in arrangement with the classifier is an important part of minimizing the difficulty of system computing. This process becomes essential particularly for the models which undergo deployment in real time scenario. In this paper, a new deep learning speech based recognition model is presented for automatically recognizes the speech words. The superiority of an input source, i.e. speech sound in this state has straight impact on a classifier correctness attaining process. The Berlin database consist around 500 demonstrations to media persons that is both male and female. On the applied dataset, the presented model achieves a maximum accuracy of 94.21%, 83.54%, 83.65% and 78.13% under MFCC, prosodic, LSP and LPC features. The presented model offered better recognition performance over the other methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Cakır, E., Heittola, T., & Virtanen, T. (2016). Domestic audio tagging with convolutional neural networks. In IEEE AASP challenge on detection and classification of acoustic scenes and events (DCASE 2016), (pp. 1–2). Cakır, E., Heittola, T., & Virtanen, T. (2016). Domestic audio tagging with convolutional neural networks. In IEEE AASP challenge on detection and classification of acoustic scenes and events (DCASE 2016), (pp. 1–2).
Zurück zum Zitat Cakır, E., Parascandolo, G., Heittola, T., Huttunen, H., & Virtanen, T. (2017). Convolutional recurrent neural networks for polyphonic sound event detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(6), 1291–1303.CrossRef Cakır, E., Parascandolo, G., Heittola, T., Huttunen, H., & Virtanen, T. (2017). Convolutional recurrent neural networks for polyphonic sound event detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(6), 1291–1303.CrossRef
Zurück zum Zitat Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 1251–1258). Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 1251–1258).
Zurück zum Zitat El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587.CrossRef El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587.CrossRef
Zurück zum Zitat Hershey, S., Chaudhuri, S., Ellis, D.P., Gemmeke, J. F., Jansen, A., Moore, R. C., Plakal, M., Platt, D., Saurous, R. A., & Seybold, B. et al. (2017). CNN architectures for largescale audio classification. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), (pp. 131–135). IEEE. Hershey, S., Chaudhuri, S., Ellis, D.P., Gemmeke, J. F., Jansen, A., Moore, R. C., Plakal, M., Platt, D., Saurous, R. A., & Seybold, B. et al. (2017). CNN architectures for largescale audio classification. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), (pp. 131–135). IEEE.
Zurück zum Zitat Koolagudi, S. G., & Rao, K. S. (2012). Emotion recognition from speech: A review. International Journal of Speech Technology, 15(2), 99–117.CrossRef Koolagudi, S. G., & Rao, K. S. (2012). Emotion recognition from speech: A review. International Journal of Speech Technology, 15(2), 99–117.CrossRef
Zurück zum Zitat Lakshmanaprabu, S. K., Mohanty, S. N., Krishnamoorthy, S., Uthayakumar, J., & Shankar, K. (2019a). Online clinical decision support system using optimal deep neural networks. Applied Soft Computing, 81, 105487.CrossRef Lakshmanaprabu, S. K., Mohanty, S. N., Krishnamoorthy, S., Uthayakumar, J., & Shankar, K. (2019a). Online clinical decision support system using optimal deep neural networks. Applied Soft Computing, 81, 105487.CrossRef
Zurück zum Zitat Lakshmanaprabu, S. K., Mohanty, S. N., Shankar, K., Arunkumar, N., & Ramirez, G. (2019b). Optimal deep learning model for classification of lung cancer on CT images. Future Generation Computer Systems, 92, 374–382.CrossRef Lakshmanaprabu, S. K., Mohanty, S. N., Shankar, K., Arunkumar, N., & Ramirez, G. (2019b). Optimal deep learning model for classification of lung cancer on CT images. Future Generation Computer Systems, 92, 374–382.CrossRef
Zurück zum Zitat LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436.CrossRef LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436.CrossRef
Zurück zum Zitat Lydia, E., Moses, G., Sharmili, N., Shankar, K., & Maseleno, A. (2019). Image classification using deep neural networks for malaria disease detection. International Journal on Emerging Technologies, 10, 66–70. Lydia, E., Moses, G., Sharmili, N., Shankar, K., & Maseleno, A. (2019). Image classification using deep neural networks for malaria disease detection. International Journal on Emerging Technologies, 10, 66–70.
Zurück zum Zitat Partila, P., Voznak, M., Mikulec, M., & Zdralek, J. (2012). Fundamental frequency extraction method using central clipping and its importance for the classification of emotional state. Advances in Electrical and Electronic Engineering, 10(4), 270–275.CrossRef Partila, P., Voznak, M., Mikulec, M., & Zdralek, J. (2012). Fundamental frequency extraction method using central clipping and its importance for the classification of emotional state. Advances in Electrical and Electronic Engineering, 10(4), 270–275.CrossRef
Zurück zum Zitat Shankar, K., Manickam, P., Devika, G., & Ilayaraja, M. (2018, December). Optimal feature selection for chronic kidney disease classification using deep learning classifier. In 2018 IEEE international conference on computational intelligence and computing research (ICCIC) (pp. 1–5). IEEE. Shankar, K., Manickam, P., Devika, G., & Ilayaraja, M. (2018, December). Optimal feature selection for chronic kidney disease classification using deep learning classifier. In 2018 IEEE international conference on computational intelligence and computing research (ICCIC) (pp. 1–5). IEEE.
Zurück zum Zitat Voznak, M., Rezac, F., & Rozhon, J. (2010). Speech quality monitoring in Czech national research network. Advances in Electrical and Electronic Engineering, 8(5), 114–117. Voznak, M., Rezac, F., & Rozhon, J. (2010). Speech quality monitoring in Czech national research network. Advances in Electrical and Electronic Engineering, 8(5), 114–117.
Zurück zum Zitat Zarkowski, M. (2013). Identification-driven emotion recognition system for a social robot. In Proceedings of the 18th international conference on methods and models in automation and robotics (MMAR’13), August 2013 (pp. 138–143). Zarkowski, M. (2013). Identification-driven emotion recognition system for a social robot. In Proceedings of the 18th international conference on methods and models in automation and robotics (MMAR’13), August 2013 (pp. 138–143).
Zurück zum Zitat Zhang, L., & Han, J. (2019). Acoustic scene classification using multi-layer temporal pooling based on convolutional neural network. arXiv:1902.10063. Zhang, L., & Han, J. (2019). Acoustic scene classification using multi-layer temporal pooling based on convolutional neural network. arXiv:​1902.​10063.
Metadaten
Titel
Pattern recognition and features selection for speech emotion recognition model using deep learning
verfasst von
Kittisak Jermsittiparsert
Abdurrahman Abdurrahman
Parinya Siriattakul
Ludmila A. Sundeeva
Wahidah Hashim
Robbi Rahim
Andino Maseleno
Publikationsdatum
08.09.2020
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 4/2020
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-020-09690-2

Weitere Artikel der Ausgabe 4/2020

International Journal of Speech Technology 4/2020 Zur Ausgabe