Skip to main content
Erschienen in: International Journal of Speech Technology 2/2022

08.02.2022

English speech emotion recognition method based on speech recognition

verfasst von: Man Liu

Erschienen in: International Journal of Speech Technology | Ausgabe 2/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Speech emotion reflects important information other than text content in speech signal, while traditional speech recognition often ignores the emotion of text content, so it is difficult to understand more abundant emotional content from English text. In order to change this situation and get more emotional information from English texts, it is necessary to understand English speech emotion recognition. However, at present, the research on speech emotion recognition technology in China mainly focuses on Chinese, while the research on English speech emotion recognition is relatively few. Therefore, this paper studies English speech emotion recognition. The digital processing of speech signal is based on speech recognition. The digitization of speech signal is the premise of computer processing and analysis of speech signal. The preprocessing of speech signal can also be called front-end processing. The specific steps are: sampling and quantization, pre intensity and windowing. Voice endpoint detection is based on high-order differentiation of volume and waveform. In feature extraction, open smile is selected as the tool to directly extract features, libsvm is selected to establish speech emotion recognition model, and finally an experimental environment is built to verify the design method. The experimental results show that this method can better recognize the emotion of English speech and realize a high degree of human–computer interaction.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Alotaibi, Y. A., et al. (2019). A canonicalization of distinctive phonetic features to improve arabic speech recognition. Acta Acustica United with Acustica, 105(6), 1269–1277.CrossRef Alotaibi, Y. A., et al. (2019). A canonicalization of distinctive phonetic features to improve arabic speech recognition. Acta Acustica United with Acustica, 105(6), 1269–1277.CrossRef
Zurück zum Zitat Cai, X., Yin, Y., & Zhang, Q. (2020). A cross-language study on feedforward and feedback control of voice intensity in Chinese-English bilinguals. Applied Psycholinguistics, 41(4), 771–795.CrossRef Cai, X., Yin, Y., & Zhang, Q. (2020). A cross-language study on feedforward and feedback control of voice intensity in Chinese-English bilinguals. Applied Psycholinguistics, 41(4), 771–795.CrossRef
Zurück zum Zitat Cui, X., et al. (2020). Distributed training of deep neural network acoustic models for automatic speech recognition: A comparison of current training strategies. IEEE Signal Processing Magazine, 37(3), 39–49.CrossRef Cui, X., et al. (2020). Distributed training of deep neural network acoustic models for automatic speech recognition: A comparison of current training strategies. IEEE Signal Processing Magazine, 37(3), 39–49.CrossRef
Zurück zum Zitat Dong, Y., et al. (2019). Bidirectional convolutional recurrent sparse network (bcrsn): An efficient model for music emotion recognition. IEEE Transactions on Multimedia, 21(12), 3150–3163.CrossRef Dong, Y., et al. (2019). Bidirectional convolutional recurrent sparse network (bcrsn): An efficient model for music emotion recognition. IEEE Transactions on Multimedia, 21(12), 3150–3163.CrossRef
Zurück zum Zitat Elizabeth et al. (2019). Illusions of transitive expletives in middle English. Journal of Comparative Germanic Linguistics, 22(3), 211–246.MathSciNetCrossRef Elizabeth et al. (2019). Illusions of transitive expletives in middle English. Journal of Comparative Germanic Linguistics, 22(3), 211–246.MathSciNetCrossRef
Zurück zum Zitat Haeb-Umbach, R., et al. (2019). Speech processing for digital home assistants: Combining signal processing with deep-learning techniques. IEEE Signal Processing Magazine, 36(6), 111–124.CrossRef Haeb-Umbach, R., et al. (2019). Speech processing for digital home assistants: Combining signal processing with deep-learning techniques. IEEE Signal Processing Magazine, 36(6), 111–124.CrossRef
Zurück zum Zitat Hu, S., et al. (2019). Adversarial examples for automatic speech recognition: Attacks and countermeasures. IEEE Communications Magazine, 57(99), 120–126.CrossRef Hu, S., et al. (2019). Adversarial examples for automatic speech recognition: Attacks and countermeasures. IEEE Communications Magazine, 57(99), 120–126.CrossRef
Zurück zum Zitat Kumar, A., & Aggarwal, R. K. (2020). Discriminatively trained continuous Hindi speech recognition using integrated acoustic features and recurrent neural network language modeling. Journal of Intelligent Systems, 30(1), 165–179.MathSciNetCrossRef Kumar, A., & Aggarwal, R. K. (2020). Discriminatively trained continuous Hindi speech recognition using integrated acoustic features and recurrent neural network language modeling. Journal of Intelligent Systems, 30(1), 165–179.MathSciNetCrossRef
Zurück zum Zitat Liliana, D. Y., et al. (2019). Fuzzy emotion: A natural approach to automatic facial expression recognition from psychological perspective using fuzzy system. Cognitive Processing, 20(4), 391–403.CrossRef Liliana, D. Y., et al. (2019). Fuzzy emotion: A natural approach to automatic facial expression recognition from psychological perspective using fuzzy system. Cognitive Processing, 20(4), 391–403.CrossRef
Zurück zum Zitat Martin-Key, N. A., Allison, G., & Fairchild, G. (2020). Empathic accuracy in female adolescents with conduct disorder and sex differences in the relationship between conduct disorder and empathy. Journal of Abnormal Child Psychology, 48(9), 1155–1167.CrossRef Martin-Key, N. A., Allison, G., & Fairchild, G. (2020). Empathic accuracy in female adolescents with conduct disorder and sex differences in the relationship between conduct disorder and empathy. Journal of Abnormal Child Psychology, 48(9), 1155–1167.CrossRef
Zurück zum Zitat Mcdonough, K., et al. (2019). The occurrence and perception of listener visual cues during nonunderstanding episodes. Studies in Second Language Acquisition, 41(5), 1–15.CrossRef Mcdonough, K., et al. (2019). The occurrence and perception of listener visual cues during nonunderstanding episodes. Studies in Second Language Acquisition, 41(5), 1–15.CrossRef
Zurück zum Zitat Nordström, H., & Laukka, P. (2019). The time course of emotion recognition in speech and music. The Journal of the Acoustical Society of America, 145(5), 3058–3074.CrossRef Nordström, H., & Laukka, P. (2019). The time course of emotion recognition in speech and music. The Journal of the Acoustical Society of America, 145(5), 3058–3074.CrossRef
Zurück zum Zitat Priya, R. V., Vijayakumar, V., & Ta, V. J. (2020). MQSMER: A mixed quadratic shape model with optimal fuzzy membership functions for emotion recognition. Neural Computing and Applications, 32(8), 3165–3182.CrossRef Priya, R. V., Vijayakumar, V., & Ta, V. J. (2020). MQSMER: A mixed quadratic shape model with optimal fuzzy membership functions for emotion recognition. Neural Computing and Applications, 32(8), 3165–3182.CrossRef
Zurück zum Zitat Senthil, K. T. (2021). Construction of hybrid deep learning model for predicting children behavior based on their emotional reaction. Journal of Information Technology and Digital World, 3(1), 29–43.MathSciNetCrossRef Senthil, K. T. (2021). Construction of hybrid deep learning model for predicting children behavior based on their emotional reaction. Journal of Information Technology and Digital World, 3(1), 29–43.MathSciNetCrossRef
Zurück zum Zitat Smys, S., & Raj, J. S. (2021). Analysis of deep learning techniques for early detection of depression on social media network—A comparative study. Journal of Trends in Computer Science and Smart Technology (TCSST), 3(1), 24–39.CrossRef Smys, S., & Raj, J. S. (2021). Analysis of deep learning techniques for early detection of depression on social media network—A comparative study. Journal of Trends in Computer Science and Smart Technology (TCSST), 3(1), 24–39.CrossRef
Zurück zum Zitat Song, Z. (2020). English speech recognition based on deep learning with multiple features. Computing, 102(99), 1–20.MathSciNetMATH Song, Z. (2020). English speech recognition based on deep learning with multiple features. Computing, 102(99), 1–20.MathSciNetMATH
Zurück zum Zitat Ton-That, A. H., & Cao, N. T. (2019). Speech emotion recognition using a fuzzy approach. Journal of Intelligent and Fuzzy Systems, 36(2), 1587–1597.CrossRef Ton-That, A. H., & Cao, N. T. (2019). Speech emotion recognition using a fuzzy approach. Journal of Intelligent and Fuzzy Systems, 36(2), 1587–1597.CrossRef
Zurück zum Zitat Tsikandilakis, M., et al. (2019). “There Is NoFaceLike Home”: Ratings for Cultural Familiarity to Own and Other FacialDialectsof Emotion With and Without Conscious Awareness in a British Sample. Perception, 48(10), 918–947.CrossRef Tsikandilakis, M., et al. (2019). “There Is NoFaceLike Home”: Ratings for Cultural Familiarity to Own and Other FacialDialectsof Emotion With and Without Conscious Awareness in a British Sample. Perception, 48(10), 918–947.CrossRef
Zurück zum Zitat Wang, Y. (2019). The function development of network teaching system to English pronunciation and tone in the background of internet of things. Journal of Intelligent and Fuzzy Systems, 37(5), 5965–5972.CrossRef Wang, Y. (2019). The function development of network teaching system to English pronunciation and tone in the background of internet of things. Journal of Intelligent and Fuzzy Systems, 37(5), 5965–5972.CrossRef
Zurück zum Zitat Wei, J., et al. (2019). Lifelong learning for tactile emotion recognition. Interaction Studies, 20(1), 25–41.CrossRef Wei, J., et al. (2019). Lifelong learning for tactile emotion recognition. Interaction Studies, 20(1), 25–41.CrossRef
Zurück zum Zitat Yazdani, R., Arnau, J. M., & Gonzalez, A. (2019). A low-power, high-performance speech recognition accelerator. IEEE Transactions on Computers, 68(12), 1817–1831.CrossRef Yazdani, R., Arnau, J. M., & Gonzalez, A. (2019). A low-power, high-performance speech recognition accelerator. IEEE Transactions on Computers, 68(12), 1817–1831.CrossRef
Metadaten
Titel
English speech emotion recognition method based on speech recognition
verfasst von
Man Liu
Publikationsdatum
08.02.2022
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 2/2022
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-021-09955-4

Weitere Artikel der Ausgabe 2/2022

International Journal of Speech Technology 2/2022 Zur Ausgabe