Top

International Journal of Speech Technology

Published in:

22-11-2018

Enhanced speech emotion detection using deep neural networks

Authors: S. Lalitha, Shikha Tripathi, Deepa Gupta

Published in: International Journal of Speech Technology | Issue 3/2019

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

This paper focusses on investigation of the effective performance of perceptual based speech features on emotion detection. Mel frequency cepstral coefficients (MFCC’s), perceptual linear predictive cepstrum (PLPC), Mel frequency perceptual linear prediction cepstrum (MFPLPC), bark frequency cepstral coefficients (BFCC), revised perceptual linear prediction coefficient’s (RPLP) and inverted Mel frequency cepstral coefficients (IMFCC) are the perception features considered. The algorithm using these auditory cues is evaluated with deep neural networks (DNN). The novelty of the work involves analysis of the perceptual features to identify predominant features that contain significant emotional information about the speaker. The validity of the algorithm is analysed on publicly available Berlin database with seven emotions in 1-dimensional space termed categorical and 2-dimensional continuous space consisting of emotions in valence and arousal dimensions. Comparative analysis reveals that considerable improvement in the performance of emotion recognition is obtained using DNN with the identified combination of perceptual features.

previous article Emotional speech analysis using harmonic plus noise model and Gaussian mixture model

next article Classifying females’ stressed and neutral voices using acoustic–phonetic analysis of vowels: an exploratory investigation with emergency calls

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Amer, M. R., Siddiquie, B., Richey, C., & Divakaran, A. (2014). Emotion detection in speech using deep networks. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), Florence, pp. 3724–3728.

Anagnostopoulos, C. N., Iliou, T., & Giannoukos, I. (2015). Features and classifiers for emotion recognition from speech: A survey from 2010 to 2011. Artificial Intelligence Review, 43(2), 155–177.CrossRef

Anila, R., & Revathy, A. (2015). Emotion recognition using continuous density HMM. In IEEE international conference on communications and signal processing (ICCSP), pp. 0919–0923.

Badshah, A. M., Ahmad, J., Rahim, N., & Baik, S. W. (2017). Speech emotion recognition from spectrograms with deep convolutional neural network, 2017. In International conference on platform technology and service (PlatCon), Busan, South Korea, pp. 1–5.

Bitouk, D., Verma, R., & Nenkova, A. (2010). Class-level spectral features for emotion recognition. Speech Communication, 52(7), 613–625.CrossRef

Cullen, A., & Harte, N. (2013). Late integration of features for acoustic emotion recognition. In Proceedings of the 21st European Signal Processing Conference (EUSIPCO), IEEE, pp 1–5.

Deb, S., & Dandapat, S. (2017). Emotion classification using segmentation of vowel-like and non-vowel-like regions. IEEE Transactions on Affective Computing, 99, 1–1.CrossRef

Deng, L. (2012). Three classes of deep learning architectures and their applications: A tutorial survey. In APSIPA Transactions on Signal and Information Processing.

Dorota Kaminska, T., Sapinski, & Anbarjafari, G. (2017). Efficiency of chosen speech descriptors in relation to emotion recognition, Eurasip Journal of Speech, Audio and Music Processing. https://doi.org/10.1186/s13636-017-0100-x.

Ekman, P. (1992). Argument for basic emotions. Cognition and Emotion., 6, 169–200.CrossRef

Fayek, H. M., Lech, M., & Cavedon, L. (2015). Towards real-time speech emotion recognition using deep neural networks. In 2015 9th International Conference on Signal Processing and Communication Systems (ICSPCS), Cairns, QLD, pp. 1–5.

Feraru, S. M., & Zbancioc, M. D. (2013). Emotion recognition in Romanain language using LPC features. In E-health and bioengineering conference (EHB), pp 1–4.

Ghai, M., Lal, S., Duggal, S., & Manik, S. (2017). Emotion recognition on speech signals using machine learning. In 2017 international conference on big data analytics and computational intelligence (ICBDAC), Chirala, pp. 34–39.

Han, J., Zhang, Z., Ringeval, F., & Schuller, B. (2017). Prediction-based learning for continuous emotion recognition in speech. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), New Orleans, pp. 5005–5009.

Hassan, A., & Damper, R. I. (2012). Classification of emotional speech using 3dec hierarchical classifier. Speech Communication, 54(7), 903–916.CrossRef

http://www.mathworks.com/matlabcentral/.

Huang, C. W., & Narayanan, S. S. S. (2016). Flow of Renyi information in deep neural networks. In 2016 IEEE 26th international workshop on machine learning for signal processing (MLSP), Vietrisul Mare, pp. 1–6.

Huang, Z., & Epps, J. (2017). A PLLR and multi-stage staircase regression framework for speech-based emotion prediction. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), New Orleans, pp. 5145–5149.

Jassim, W. A., Paramesran, R., & Harte, N. (2017). Speech emotion classification using combined neurogram and INTERSPEECH 2010 paralinguistic challenge features. IET Signal Processing, 11(5), 587–595.CrossRef

Kamińska, D., Sapiński, T., & Pelikant, A. (2013). Comparison of perceptual features efficiency for automatic identification of emotional states from speech. In 6th international conference on human system interactions (HSI), Sopot, pp. 210–213.

Khan, A., & Roy, U. K. (2017). Emotion recognition using prosodie and spectral features of speech and Naïve Bayes Classifier. In 2017 international conference on wireless communications, signal processing and networking (WiSPNET), Chennai, pp. 1017–1021.

Khorrami, P., Le Paine, T., Brady, K., Dagli, C., & Huang, T. S. (2016). How deep neural networks can improve emotion recognition on video data. In 2016 IEEE international conference on image processing (ICIP), Phoenix, pp. 619–623.

Kotti, M., & Paterno, F. (2012). Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema. International Journal of Speech Technology, 15(2), 131–150.CrossRef

Kumar, K., Kim, C., & Stern, R. M. (2011). Delta-spectral cepstral coefficients for robust speech recognition. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), Prague, pp. 4784–4787.

Lalitha, S., Chaitanya, K. K., Teja, G. V. N., Varma, K. V., & Tripathi, S. (2015). Time-frequency and phase derived features for emotion classification. In 2015 annual IEEE India conference (INDICON), New Delhi, pp. 1–5.

Lalitha, S., Geyasruti, D., Narayanan, R., & Shravani, M. (2015). Emotion detection using MFCC and cepstrum features. In 4th international conference on eco-friendly computing and communication systems, Procedia Computer Science, pp 29–35.

Lalitha, S., Madhavan, A., Bhushan, B., & Saketh, S. (2014). Speech emotion recognition. In International conference on advances in electronics, computers and communications (ICAECC), pp 1–4.

Latha (2016). Robust speaker identification incorporating high frequency features. In Twelth international multi-conference on information processing, Procedia Computer Science, pp 804–811.

Li, L., et al. (2013). Hybrid deep neural network–hidden markov model (DNN-HMM) based speech emotion recognition. In Humaine association conference on affective computing and intelligent interaction, Geneva, pp. 312–317.

Lim, W., Jang, D., & Lee, T. (2016). Speech emotion recognition using convolutional and recurrent neural networks. In 2016 Asia-Pacific signal and information processing association annual summit and conference (APSIPA), Jeju, pp. 1–4.

Ma, J., Jin, H., Yang, L., & Tsai, J. (2006). Ubiquitous intelligence and computing. In Third International Conference, UIC 2006, Wuhan, China, September 3–6, 2006, Proceedings (Lecture Notes in Computer Science), Springer, New York, Inc., Secaucus.

Mannepalli, K., Sastry, P. N., & Suman, M. (2016). A novel adaptive fractional deep belief networks for speaker emotion recognition. Alexandria Engineering Journal. https://doi.org/10.1016/j.aej.2016.09.002

Mao, X., Chen, L., & Fu, L. (2009). Multi-level Speech Emotion Recognition Based on HMM and ANN. In WRI world congress on computer science and information engineering, IEEE, pp. 225–229.

Niu, J., Qian, Y., & Yu, K. (2014). Acoustic emotion recognition using deep neural network. In The 9th international symposium on chinese spoken language processing, Singapore, pp. 128–132.

Parthasarathy, S., Lotfian, R., & Busso, C. (2017). Ranking emotional attributes with deep neural networks. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), New Orleans, pp. 4995–4999.

Schlosberg, H. (1954). Three dimensions of emotions. Psychological Review 61, 81–88.CrossRef

Soltani, K., & Ainon, R. N. (2007). Speech emotion detection based on neural networks. In 2007 9th international symposium on signal processing and its applications, Sharjah, pp. 1–3.

Trigeorgis, G., et al. (2017). Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, pp. 5200–5204.

Vlckova-Mejvaldova, J., & Horak, P. (2011). The influence of individual prosodic parameters on the perception of emotions in Czech. In Signal processing algorithms, architectures, arrangements, and applications conference proceedings (SPA), IEEE, pp. 1–6.

Wang, K., An, N., Li, B. N., Zhang, Y., & Li, L. (2015). Speech emotion recognition using Fourier parameters. IEEE Transactions on Affective Computing, 6(1), 69–75.CrossRef

Wang, Y., & Guan, L. (2008). Recognizing human emotional state from audiovisual signals. In IEEE transactions on multimedia, pp. 936–946.

Wang, Z., & Tashev, I. (2017). Learning utterance-level representations for speech emotion and age/gender recognition using deep neural networks. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), New Orleans, pp. 5150–5154.

Xia, R., & Liu, Y. (2017). A multi-task learning framework for emotion recognition using 2D continuous space. IEEE Transactions on Affective Computing, 8(1), 3–14.CrossRef

Yadav, J., Kumari, A., & Rao, K. S. (2015). Emotion recognition using LP residual at sub-segmental, segmental and supra-segmental levels. In International conference on communication, information & computing Technology (ICCICT), IEEE, pp. 1–6.

Yang, B., & Lugger, M. (2010). Emotion recognition from speech signals using new harmony features. Signal Processing, 90(5), 1415–1423.CrossRefMATH

Zao, L., Cavalcante, D., & Coelho, R. (2014). Time-frequency feature and AMS-GMM mask for acoustic emotion classification. IEEE Signal Processing Letters, 21(5), 620–624.CrossRef

Zhang, Y., Liu, Y., Weninger, F., & Schuller, B. (2017). Multi-task deep neural network with shared hidden layers: Breaking down the wall between emotion representations. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), New Orleans, pp. 4990–4994.

Zheng, W. Q., Yu, J. S., & Zou, Y. X. (2015). An experimental study of speech emotion recognition based on deep convolutional neural networks. In 2015 international conference on affective computing and intelligent interaction (ACII), Xi’an, pp. 827–831.

Title: Enhanced speech emotion detection using deep neural networks
Authors: S. Lalitha
Shikha Tripathi
Deepa Gupta
Publication date: 22-11-2018
Publisher: Springer US
Published in: International Journal of Speech Technology / Issue 3/2019
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-018-09572-8

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 3/2019

Thorough evaluation of TIMIT database speaker identification performance under noise with and without the G.712 type handset

Efficient underwater acoustic communication with peak-to-average power ratio reduction and channel equalization

Improving the performance of the speaker emotion recognition based on low dimension prosody features vector

Dual estimation based vocal tract shape computation

A new architecture based VAD for speaker diarization/detection systems

Speech and language processing for assessing child–adult interaction based on diarization and location