nach oben

International Journal of Speech Technology

Erschienen in:

22.01.2018

Speech emotion recognition research: an analysis of research focus

verfasst von: Mumtaz Begum Mustafa, Mansoor A. M. Yusoof, Zuraidah M. Don, Mehdi Malekzadeh

Erschienen in: International Journal of Speech Technology | Ausgabe 1/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This article analyses research in speech emotion recognition (“SER”) from 2006 to 2017 in order to identify the current focus of research, and areas in which research is lacking. The objective is to examine what is being done in this field of research. Searching on selected keywords, we extracted and analysed 260 articles from well-known online databases. The analysis indicates that SER research is an active field of research, dozens of articles being published each year in journals and conference proceedings. The majority of articles concentrate on three critical aspects of SER, namely (1) databases, (2) suitable speech features, and (3) classification techniques to maximize the recognition accuracy of SER systems. Having carried out association analysis of the critical aspects and how they influence the performance of the SER system in term of recognition accuracy, we found that certain combination of databases, speech features and classifiers influence the recognition accuracy of the SER system. We have also suggested aspects of SER that could be taken into consideration in future works based on our review.

Vorheriger Artikel Improving Arabic information retrieval using word embedding similarities

Nächster Artikel A novel whispered speaker identification system based on extreme learning machine

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

http://www.cs.waikato.ac.nz/ml/weka/.

Abdelwahab, M., & Busso, C. (2017). Incremental adaptation using active learning for acoustic emotion Recognition. In International conference on acoustics, speech and signal processing.

Alam, M. J., Attabi, Y., Dumouchel, P., Kenny, P., & O’Shaughnessy, D. D. (2013). Amplitude modulation features for emotion recognition from speech. In INTERSPEECH (pp. 2420–2424).

Albornoz, E. M., Crolla, M. B., & Milone, D. H. (2008). Recognition of emotions in speech. In Proceedings of XXXIV CLEI, Santa Fe Argentina, pp. 1120–1129.

Albornoz, E. M., Milone, D. H., & Rufiner, H. L. (2011). Spoken emotion recognition using hierarchical classifiers. Computer Speech & Language, 25(3), 556–570.CrossRef

Altun, H., & Polat, G. (2009). Boosting selection of speech related features to improve performance of multi-class SVMs in emotion detection. Expert Systems with Applications, 36(4), 8197–8203.CrossRef

Álvarez, A., Cearreta, I., López, J. M., Arruti, A., Lazkano, E., Sierra, B., & Garay, N. (2007). A comparison using different speech parameters in the automatic emotion recognition using Feature Subset Selection based on Evolutionary Algorithms. In International conference on text, speech and dialogue (pp. 423–430). Berlin: Springer.CrossRef

Anagnostopoulos, C. N., Iliou, T., & Giannoukos, I. (2012). Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011. Artificial Intelligence Review, 43(2), 155–177.CrossRef

Ananthakrishnan, S., Vembu, A. N., & Prasad, R. (2011). Model-based parametric features for emotion recognition from speech. In 2011 IEEE workshop on automatic speech recognition and understanding (ASRU), (pp. 529–534). Piscataway: IEEE.CrossRef

Arias, J. P., Busso, C., & Yoma, N. B. (2013). Energy and F0 contour modeling with functional data analysis for emotional speech detection. In INTERSPEECH (pp. 2871–2875).

Arias, J. P., Busso, C., & Yoma, N. B. (2014). Shape-based modeling of the fundamental frequency contour for emotion detection in speech. Computer Speech & Language, 28(1), 278–294.CrossRef

Atassi, H., & Esposito, A. (2008). A speaker independent approach to the classification of emotional vocal expressions. In 20th IEEE international conference on tools with artificial intelligence, 2008. ICTAI’08. (Vol. 2, pp. 147–152). Piscataway: IEEE.CrossRef

Atassi, H., Smekal, Z., & Esposito, A. (2012). Emotion recognition from spontaneous Slavic speech. In 2012 IEEE 3rd international conference on cognitive infocommunications (CogInfoCom) (pp. 389–394). Piscataway: IEEE.CrossRef

Athanaselis, T., Bakamidis, S., Dologlou, I., Cowie, R., Douglas-Cowie, E., & Cox, C. (2005). ASR for emotional speech: Clarifying the issues and enhancing performance. Neural Networks, 18(4), 437–444.CrossRef

Attabi, Y., & Dumouchel, P. (2012). Emotion recognition from speech: WOC-NN and class-interaction. In 2012 11th international conference on information science, signal processing and their applications (ISSPA) (pp. 126–131). Piscataway: IEEE.CrossRef

Attabi, Y., & Dumouchel, P. (2013). Anchor models for emotion recognition from speech. IEEE Transactions on Affective Computing, 4(3), 280–290.CrossRef

Bahreini, K., Nadolski, R., & Westera, W. (2016). Towards real-time speech emotion recognition for affective e-learning. Education and Information Technologies, 21(5), 1367–1386.CrossRef

Balti, H., & Elmaghraby, A. S. (2013). Speech emotion detection using time dependent self organizing maps. In 2013 IEEE international symposium on signal processing and information technology (ISSPIT) (pp. 000470–000478). Piscataway: IEEE.

Barra Chicote, R., Fernández Martínez, F., Lutfi, L., Binti, S., Lucas Cuesta, J. M., Macías Guarasa, J., … Pardo Muñoz, J. M. (2009). Acoustic emotion recognition using dynamic Bayesian networks and multi-space distributions. ISCA.

Batliner, A., Schuller, B., Seppi, D., Steidl, S., Devillers, L., Vidrascu, L., & Amir, N. (2011). The automatic recognition of emotions in speech. In Emotion-oriented systems (pp. 71–99). Berlin Heidelberg: Springer.CrossRef

Batliner, A., Steidl, S., Schuller, B., Seppi, D., Laskowski, K., Vogt, T., … Aharonson, V. (2006). Combining efforts for improving automatic classification of emotional user states. Proc. IS-LTC 240–245.

Batliner, A., Steidl, S., Schuller, B., Seppi, D., Vogt, T., Devillers, L., … Aharonson, V. (2007). The impact of F0 extraction errors on the classification of prominence and emotion. Proc. ICPhS 2201–2204.

Benzeghiba, M., De Mori, R., Deroo, O., Dupont, S., Erbes, T., Jouvet, D., & Rose, R. (2007). Automatic speech recognition and speech variability: A review. Speech Communication, 49(10), 763–786.CrossRef

Bertero, D., & Fung, P. (2017). A first look into a Convolutional Neural Network for speech emotion detection. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5115–5119). Piscataway: IEEE.CrossRef

Bhaykar, M., Yadav, J., & Rao, K. S. (2013). Speaker dependent, speaker independent and cross language emotion recognition from speech using GMM and HMM. 2013 national conference on communications (NCC) (pp. 1–5). Piscataway: IEEE.

Bitouk, D., Nenkova, A., & Verma, R. (2009). Improving emotion recognition using class-level spectral features. In INTERSPEECH (pp. 2023–2026).

Böck, R., Hübner, D., & Wendemuth, A. (2010). Determining optimal signal features and parameters for hmm-based emotion classification. In 15th IEEE mediterranean electrotechnical conference MELECON 2010–2010 (pp. 1586–1590). Piscataway: IEEE.CrossRef

Bojanić, M., Crnojević, V., & Delić, V. (2012). Application of neural networks in emotional speech recognition. In 2012 11th symposium on neural network applications in electrical engineering (NEUREL) (pp. 223–226). Piscataway: IEEE.

Bozkurt, E., Erzin, E., Erdem, C. E., & Erdem, A. T. (2010). Use of line spectral frequencies for emotion recognition from speech. In 2010 20th international conference on pattern recognition (ICPR) (pp. 3708–3711). Piscataway: IEEE.CrossRef

Bozkurt, E., Erzin, E., Erdem, C. E., & Erdem, A. T. (2011). Formant position based weighted spectral features for emotion recognition. Speech Communication, 53(9), 1186–1197.CrossRef

Bozkurt, E., Erzin, E., Eroğlu Erdem, Ç, & Erdem, T. (2009). Improving automatic emotion recognition from speech signals. In 10th annual conference of the international speech communication association 2009 (INTERSPEECH 2009). International Speech Communications Association.

Brester, C., Semenkin, E., & Sidorov, M. (2016). Multi-objective heuristic feature selection for speech-based multilingual emotion recognition. Journal of Artificial Intelligence and Soft Computing Research, 6(4), 243–253.CrossRef

Brooks, C. A., Thompson, C., & Kovanović, V. (2016). Introduction to data mining for educational researchers. In Proceedings of the 6th international conference on learning analytics & knowledge (pp. 505–506). ACM.

Busso, C., Lee, S., & Narayanan, S. (2009). Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 582–596.CrossRef

Busso, C., Mariooryad, S., Metallinou, A., & Narayanan, S. (2013). Iterative feature normalization scheme for automatic emotion detection from speech. IEEE Transactions on Affective Computing, 4(4), 386–397.CrossRef

Busso, C., Metallinou, A., & Narayanan, S. S. (2011). Iterative feature normalization for emotional speech detection. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5692–5695). Piscataway: IEEE.CrossRef

Calvo, R. A., & D’Mello, S. (2010). Affect detection: An interdisciplinary review of models, methods, and their applications. IEEE Transactions on Affective Computing, 1(1), 18–37.CrossRef

Casale, S., Russo, A., Scebba, G., & Serrano, S. (2008). Speech emotion classification using machine learning algorithms. In 2008 IEEE international conference on semantic computing (pp. 158–165). Piscataway: IEEE.CrossRef

Casale, S., Russo, A., & Serrano, S. (2010). Analysis of robustness of attributes selection applied to speech emotion recognition. In 2010 18th European signal processing conference (pp. 1174–1178). Piscataway: IEEE.

Chakraborty, R., Pandharipande, M., & Kopparapu, S. K. (2016). Knowledge-based framework for intelligent emotion recognition in spontaneous speech. Procedia Computer Science, 96, 587–596.CrossRef

Chandaka, S., Chatterjee, A., & Munshi, S. (2009). Support vector machines employing cross-correlation for emotional speech recognition. Measurement, 42(4), 611–618.CrossRef

Chandrakala, S., & Sekhar, C. C. (2009). Combination of generative models and SVM based classifier for speech emotion recognition. In International joint conference on neural networks, 2009. IJCNN 2009 (pp. 497–502). Piscataway: IEEE.

Chavhan, Y., Dhore, M. L., & Yesaware, P. (2010). Speech emotion recognition using support vector machine. International Journal of Computer Applications, 1(20), 6–9.CrossRef

Chavhan, Y. D., Yelure, B. S., & Tayade, K. N. (2015). Speech emotion recognition using RBF kernel of LIBSVM. In 2015 2nd international conference on electronics and communication systems (ICECS) (pp. 1132–1135). Piscataway: IEEE.CrossRef

Chen, L., Mao, X., Wei, P., Xue, Y., & Ishizuka, M. (2012). Mandarin emotion recognition combining acoustic and emotional point information. Applied Intelligence, 37(4), 602–612.CrossRef

Chenchah, F., & Lachiri, Z. (2014). Speech emotion recognition in acted and spontaneous context. Procedia Computer Science, 39, 139–145.CrossRef

Cheng, X., & Duan, Q. (2012). Speech emotion recognition using gaussian mixture model. In The 2nd international conference on computer application and system modeling.

Chiou, B. C., & Chen, C. P. (2013). Feature space dimension reduction in speech emotion recognition using support vector machine. In Signal and information processing association annual summit and conference (APSIPA), 2013 Asia-Pacific (pp. 1–6). Piscataway: IEEE.

Christina, I. J., & Milton, A. (2012). Analysis of all pole model to recognize emotions from speech signal. In 2012 international conference on computing, electronics and electrical technologies (ICCEET) (pp. 723–728). Piscataway: IEEE.CrossRef

Cowie, R., Douglas-Cowie, E., Savvidou, S., McMahon, E., Sawey, M., & Schroder, M. (2000) FEELTRACE: an instrument for recording perceived emotion in real time. In Proceedings of ISCA speech and emotion workshop, pp 19–24.

Cummins, N., Amiriparian, S., Hagerer, G., Batliner, A., Steidl, S., & Schuller, B. (2017). An Image-based deep spectrum feature representation for the recognition of emotional speech. In Proceedings of the 25th ACM international conference on multimedia, MM. Piscataway: IEEE.

D’Mello, S., & Kory, J. (2012). Consistent but modest: a meta-analysis on unimodal and multimodal affect detection accuracies from 30 studies. In Proceedings of the 14th ACM international conference on multimodal interaction (pp. 31–38). ACM.

Dai, K., Fell, H. J., & MacAuslan, J. (2008). Recognizing emotion in speech using neural networks. Telehealth and Assistive Technologies, 31, 38–43.

Delic, V., Bojanic, M., Gnjatovic, M., Secujski, M., & Jovicic, S. T. (2012). Discrimination capability of prosodic and spectral features for emotional speech recognition. Elektronika ir Elektrotechnika, 18(9), 51–54.CrossRef

Deng, J., Han, W., & Schuller, B. (2012). Confidence measures for speech emotion recognition: A start. In Proceedings of speech communication; 10. ITG symposium (pp. 1–4). VDE.

Deng, J., Xu, X., Zhang, Z., Frühholz, S., Grandjean, D., & Schuller, B. (2017). Fisher kernels on phase-based features for speech emotion recognition. In Dialogues with social robots (pp. 195–203). Springer: Singapore.CrossRef

Deng, J., Zhang, Z., Eyben, F., & Schuller, B. (2014). Autoencoder-based unsupervised domain adaptation for speech emotion recognition. IEEE Signal Processing Letters, 21(9), 1068–1072.CrossRef

Deng, J., Zhang, Z., Marchi, E., & Schuller, B. (2013). Sparse autoencoder-based feature transfer learning for speech emotion recognition. In 2013 humaine association conference on affective computing and intelligent interaction (ACII) (pp. 511–516). Piscataway: IEEE.CrossRef

Devillers, L., & Vidrascu, L. (2006). Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs. In 9th international conference on spoken language processing.

Douglas-Cowie, E., Campbell, N., Cowie, R., & Roach, P. (2003). Emotional speech: towards a new generation of databases. Speech Communication, 40, 33–60.MATHCrossRef

Douglas-Cowie, E., Cowie, R., Sneddon, I., Cox, C., Lowry, O., McRorie, M., Martin, J. C., Devillers, L., Abrilan, S., Batliner, A., Amir, N., & Karpouzis, K. (2007) The HUMAINE database: addressing the collection and annotation of naturalistic and induced emotional data. In Proceedings of international conference affective computing and intelligent interaction, pp 488–500.

Ekman, P. (1957). A methodological discussion of non-verbal behavior. Journal of Psychology, 43, 141–149.CrossRef

Ekman, P. (1972). Universals and cultural differences in facial expression of emotion. In J. Cole (Ed.), Nebraska symposium on motivation (pp. 207–283). Lincoln, NE: University of Nebraska Press.

Ekman, P. (1999). Basic emotions. In T. Dalgleish & M. Power (Eds.), Handbook of cognition and emotion. Chichester: Wiley.

El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587.MATHCrossRef

Elbarougy, R., & Akagi, M. (2012). Speech emotion recognition system based on a dimensional approach using a three-layered model. In Signal & information processing association annual summit and conference (APSIPA ASC), 2012 Asia-Pacific (pp. 1–9). Piscataway: IEEE.

Elbarougy, R., & Akagi, M. (2013). Cross-lingual speech emotion recognition system based on a three-layer model for human perception. In Signal and information processing association annual summit and conference (APSIPA), 2013 Asia-Pacific (pp. 1–10). Piscataway: IEEE.

Erdem, C. E., Bozkurt, E., Erzin, E., & Erdem, A. T. (2010). RANSAC-based training data selection for emotion recognition from spontaneous speech. In Proceedings of the 3rd international workshop on affective interaction in natural environments (pp. 9–14). ACM.

Esmaileyan, Z., & Marvi, H. (2014). Recognition of emotion in speech using variogram based features. Malaysian Journal of Computer Science, 27(3), 156–170.

Espinosa, H. P., García, C. A. R., & Pineda, L. V. (2010). Features selection for primitives estimation on emotional speech. In 2010 IEEE international conference on acoustics speech and signal processing (ICASSP) (pp. 5138–5141). Piscataway: IEEE.CrossRef

Fayek, H. M., Lech, M., & Cavedon, L. (2016). On the correlation and transferability of features between automatic speech recognition and speech emotion recognition. In INTERSPEECH (pp. 3618–3622).

Feraru, M., & Zbancioc, M. (2013). Speech emotion recognition for SROL database using weighted KNN algorithm. In 2013 international conference on electronics, computers and artificial intelligence (ECAI) (pp. 1–4). Piscataway: IEEE.

Fernandez, R., & Picard, R. (2011). Recognizing affect from speech prosody using hierarchical graphical models. Speech Communication, 53(9), 1088–1103.CrossRef

Firoz Shah, A., Vimal, K. V. R., Raji, S. A., Jayakumar, A., & Babu, A. P. (2009) Speaker independent automatic emotion recognition from speech: a comparison of MFCCs and discrete wavelet transforms. In Proceedings of international conference on advances in recent technologies in communication and computing, pp 528–531.

Fu, L., Mao, X., & Chen, L. (2008a). Relative speech emotion recognition based artificial neural network. In Pacific-Asia workshop on computational intelligence and industrial application, 2008. PACIIA’08. (Vol. 2, pp. 140–144). Piscataway: IEEE.CrossRef

Fu, L., Mao, X., & Chen, L. (2008b). Speaker independent emotion recognition based on SVM/HMMs fusion system. In International conference on audio, language and image processing, 2008. ICALIP 2008 (pp. 61–65). Piscataway: IEEE.

Gamage, K. W., Sethu, V., & Ambikairajah, E. (2017). Salience based lexical features for emotion recognition. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5830–5834). Piscataway: IEEE.CrossRef

Garg, V., Kumar, H., & Sinha, R. (2013). Speech based emotion recognition based on hierarchical decision tree with SVM, BLG and SVR classifiers. In 2013 national conference on communications (NCC) (pp. 1–5). Piscataway: IEEE.

Gaurav, M. (2008). Performance analysis of spectral and prosodic features and their fusion for emotion recognition in speech. In Spoken language technology workshop, 2008. SLT 2008 (pp. 313–316). Piscataway: IEEE.CrossRef

Georgogiannis, A., & Digalakis, V. (2012). Speech emotion recognition using non-linear teager energy based features in noisy environments. In 2012 proceedings of the 20th European signal processing conference (EUSIPCO) (pp. 2045–2049). Piscataway: IEEE.

Gharavian, D., Sheikhan, M., & Ashoftedel, F. (2013). Emotion recognition improvement using normalized formant supplementary features by hybrid of DTW-MLP-GMM model. Neural Computing and Applications, 22(6), 1181–1191.CrossRef

Gharavian, D., Sheikhan, M., & Janipour, M. (2010). Pitch in emotional speech and emotional speech recognition using pitch frequency. Majlesi Journal of Electrical Engineering, 4(1).

Gharavian, D., Sheikhan, M., Nazerieh, A., & Garoucy, S. (2012). Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Computing and Applications, 21(8), 2115–2126.CrossRef

Gharsellaoui, S., Selouani, S. A., & Dahmane, A. O. (2015). Automatic emotion recognition using auditory and prosodic indicative features. In 2015 IEEE 28th Canadian conference on electrical and computer engineering (CCECE) (pp. 1265–1270). Piscataway: IEEE.CrossRef

Giannoulis, P., & Potamianos, G. (2012). A hierarchical approach with feature selection for emotion recognition from speech. In LREC (pp. 1203–1206).

Glüge, S., Böck, R., & Wendemuth, A. (2011). Segmented-memory recurrent neural networks versus hidden markov models in emotion recognition from speech. In IJCCI (NCTA) (pp. 308–315).

Grimm, M., Kroschel, K., Mower, E., & Narayanan, S. (2007a). Primitives-based evaluation and estimation of emotions in speech. Speech Communication, 49(10), 787–800.CrossRef

Grimm, M., Kroschel, K., & Narayanan, S. (2007b). Support vector regression for automatic recognition of spontaneous emotions in speech. In IEEE international conference on acoustics, speech and signal processing, 2007. ICASSP 2007. (Vol. 4, pp. IV–1085). Piscataway: IEEE.

Hamidi, M., & Mansoorizade, M. (2012). Emotion recognition from persian speech with neural network. International Journal of Artificial Intelligence & Applications, 3(5), 107.CrossRef

Han, J., Zhang, Z., Ringeval, F., & Schuller, B. (2017). Prediction-based learning for continuous emotion recognition in speech. In 42nd IEEE international conference on acoustics, speech, and signal processing, ICASSP 2017.

Han, K., Yu, D., & Tashev, I. (2014). Speech emotion recognition using deep neural network and extreme learning machine. In 15th annual conference of the international speech communication association.

Harimi, A., Fakhr, H. S., & Bakhshi, A. (2016). Recognition of emotion using reconstructed phase space of speech. Malaysian Journal of Computer Science, 29(4), 262–271.CrossRef

Hassan, A., & Damper, R. I. (2009). Emotion recognition from speech using extended feature selection and a simple classifier. In 10th annual conference of the international speech communication association.

He, L., Lech, M., Maddage, N. C., & Allen, N. B. (2011). Study of empirical mode decomposition and spectral analysis for stress and emotion classification in natural speech. Biomedical Signal Processing and Control, 6(2), 139–146.CrossRef

Henríquez, P., Alonso, J. B., Ferrer, M. A., Travieso, C. M., & Orozco-Arroyave, J. R. (2014). Nonlinear dynamics characterization of emotional speech. Neurocomputing, 132, 126–135.CrossRef

Hu, H., Xu, M. X., & Wu, W. (2007). Fusion of global statistical and segmental spectral features for speech emotion recognition. In INTERSPEECH (pp. 2269–2272).

Hu, H., Xu, M. X., & Wu, W. (2007). GMM supervector based SVM with spectral features for speech emotion recognition. In IEEE international conference on acoustics, speech and signal processing, 2007. ICASSP 2007. (Vol. 4, pp. IV–413). Piscataway: IEEE.

Huang, R., & Ma, C. (2006). Toward a speaker-independent real-time affect detection system. In 18th international conference on pattern recognition, 2006. ICPR 2006. (Vol. 1, pp. 1204–1207). Piscataway: IEEE.

Huang, Y., Wu, A., Zhang, G., & Li, Y. (2016). Speech emotion recognition based on deep belief networks and wavelet packet cepstral coefficients. International Journal of Simulation: Systems, Science and Technology, 17(28), 28–31.

Huang, Z., Dong, M., Mao, Q., & Zhan, Y. (2014). Speech emotion recognition using CNN. In Proceedings of the 22nd ACM international conference on Multimedia (pp. 801–804). ACM.

Hussain, L., Shafi, I., Saeed, S., Abbas, A., Awan, I. A., Nadeem, S. A., … Rahman, B. (2017). A radial base neural network approach for emotion recognition in human speech. IJCSNS, 17(8), 52.

Iliev, A. I., & Scordilis, M. S. (2008). Emotion recognition in speech using inter-sentence Glottal statistics. In 15th international conference on systems, signals and image processing, 2008. IWSSIP 2008. (pp. 465–468). Piscataway: IEEE.CrossRef

Iliev, A. I., Scordilis, M. S., Papa, J. P., & Falcão, A. X. (2010). Spoken emotion recognition through optimum-path forest classification using glottal features. Computer Speech & Language, 24(3), 445–460.CrossRef

Iliou, T., & Anagnostopoulos, C. N. (2009). Comparison of different classifiers for emotion recognition. In 13th panhellenic conference on informatics, 2009. PCI’09. (pp. 102–106). Piscataway: IEEE.CrossRef

Iliou, T., & Anagnostopoulos, C. N. (2010a). SVM-MLP-PNN classifiers on speech emotion recognition field—A comparative study. In 2010 fifth international conference on digital telecommunications (ICDT) (pp. 1–6). Piscataway: IEEE.

Iliou, T., & Anagnostopoulos, C. N. (2010b). Classification on speech emotion recognition-a comparative study. Animation, 4, 5.

Iriondo, I., Planet, S., Alías, F., Socoró, J. C., & Martínez, E. (2007). Validation of an expressive speech corpus by mapping automatic classification to subjective evaluation. Computational and Ambient Intelligence, 646–653.

Ivanov, A., & Riccardi, G. (2012). Kolmogorov-Smirnov test for feature selection in emotion recognition from speech. In 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5125–5128). Piscataway: IEEE.CrossRef

Javidi, M. M., & Roshan, E. F. (2013). Speech emotion recognition by using combinations of C5. 0, neural network (NN), and support vector machines (SVM) classification methods. International Journal of Applied Mathematics and Computer Science, 6, 191–200.

Jeon, J. H., Xia, R., & Liu, Y. (2011). Sentence level emotion recognition based on decisions from subsentence segments. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4940–4943). Piscataway: IEEE.CrossRef

Jiang, J., Wu, Z., Xu, M., Jia, J., & Cai, L. (2012). Comparison of adaptation methods for GMM-SVM based speech emotion recognition. In 2012 IEEE spoken language technology workshop (SLT) (pp. 269–273). Piscataway: IEEE.CrossRef

Jin, Q., Li, C., Chen, S., & Wu, H. (2015). Speech emotion recognition with acoustic and lexical features. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4749–4753). Piscataway: IEEE.CrossRef

Kamińska, D., & Pelikant, A. (2012). Recognition of human emotion from a speech signal based on Plutchik’s model. International Journal of Electronics and Telecommunications, 58(2), 165–170.CrossRef

Kandali, A. B., Routray, A., & Basu, T. K. (2008). Emotion recognition from Assamese speeches using MFCC features and GMM classifier. In TENCON 2008–2008 IEEE region 10 conference (pp. 1–5). Piscataway: IEEE.

Khan, M., Goskula, T., Nasiruddin, M., & Quazi, R. (2011). Comparison between k-nn and svm method for speech emotion recognition. International Journal on Computer Science and Engineering, 3(2), 607–611.

Khanna, P., & Kumar, M. S. (2011). Application of vector quantization in emotion recognition from human speech. In International conference on information intelligence, systems, technology and management (pp. 118–125). Berlin, Heidelberg: Springer.CrossRef

Kim, E. H., Hyun, K. H., Kim, S. H., & Kwak, Y. K. (2009). Improved emotion recognition with a novel speaker-independent feature. IEEE/ASME Transactions on Mechatronics, 14(3), 317–325.CrossRef

Kim, E. H., Hyun, K. H., & Kwak, Y. K. (2006). Improvement of emotion recognition from voice by separating of obstruents. In The 15th IEEE international symposium on robot and human interactive communication, 2006. ROMAN 2006. (pp. 564–568). Piscataway: IEEE.CrossRef

Kim, J. B., Park, J. S., & Oh, Y. H. (2011). On-line speaker adaptation based emotion recognition using incremental emotional information. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4948–4951). Piscataway: IEEE.CrossRef

Kim, J. B., Park, J. S., & Oh, Y. H. (2012). Speaker-characterized emotion recognition using online and iterative speaker adaptation. Cognitive Computation, 4(4), 398–408.CrossRef

Kim, S., Georgiou, P. G., Lee, S., & Narayanan, S. (2007). Real-time emotion detection system using speech: Multi-modal fusion of different timescale features. In IEEE 9th workshop on multimedia signal processing, 2007. MMSP 2007 (pp. 48–51).

Kishore, K. K., & Satish, P. K. (2013). Emotion recognition in speech using MFCC and wavelet features. In 2013 IEEE 3rd international advance computing conference (IACC) (pp. 842–847). Piscataway: IEEE.CrossRef

Kitchenham, B. (2004). Procedures for performing systematic reviews. Keele, Keele University 33.

Koolagudi, S. G., & Krothapalli, R. S. (2011). Two stage emotion recognition based on speaking rate. International Journal of Speech Technology, 14(1), 35–48.CrossRef

Koolagudi, S. G., & Krothapalli, S. R. (2012). Emotion recognition from speech using sub-syllabic and pitch synchronous spectral features. International Journal of Speech Technology, 15(4), 495–511.CrossRef

Koolagudi, S. G., & Rao, K. S. (2012). Emotion recognition from speech using source, system, and prosodic features. International Journal of Speech Technology, 15(2), 265–289.CrossRef

Koolagudi, S. G., Reddy, R., & Rao, K. S. (2010). Emotion recognition from speech signal using epoch parameters. In 2010 international conference on signal processing and communications (SPCOM) (pp. 1–5). Piscataway: IEEE.

Kostoulas, T., Ganchev, T., Lazaridis, A., & Fakotakis, N. (2010) Enhancing Emotion recognition from speech through feature selection. In P. Sojka, A. Horák, I. Kopecek & K. Pala (Eds.) Text, speech and dialogue, lecture notes in artificial intelligence, Vol. 6231, pp. 338–344.

Kostoulas, T., Ganchev, T., Mporas, I., & Fakotakis, N. (2007) Detection of negative emotional states in real-world scenario. In Proceedings of 19th IEEE international conference on tools with artificial intelligence, pp 502–509.

Kotti, M., Paterno, F., & Kotropoulos, C. (2010). Speaker-independent negative emotion recognition. In 2010 2nd international workshop on cognitive information processing (CIP) (pp. 417–422). Piscataway: IEEE.

Le, D., Aldeneh, Z., & Provost, E. M. (2017). Discretized continuous speech emotion recognition with multi-task deep recurrent neural network. Interspeech, 2017.

Le, D., & Provost, E. M. (2013). Emotion recognition from spontaneous speech using hidden markov models with deep belief networks. In 2013 IEEE workshop on automatic speech recognition and understanding (ASRU) (pp. 216–221). Piscataway: IEEE.CrossRef

Lee, J., & Tashev, I. (2015). High-level feature representation using recurrent neural network for speech emotion recognition. In INTERSPEECH (pp. 1537–1540).

Lefter, I., Rothkrantz, L. J., Wiggers, P., & Van Leeuwen, D. A. (2010). Emotion recognition from speech by combining databases and fusion of classifiers. In Text, speech and dialogue (pp. 353–360). Berlin Heidelberg: Springer.CrossRef

Li, L., Zhao, Y., Jiang, D., Zhang, Y., Wang, F., Gonzalez, I., … Sahli, H. (2013). Hybrid deep neural network–hidden markov model (DNN-HMM) based speech emotion recognition. In 2013 humaine association conference on affective computing and intelligent interaction (ACII) (pp. 312–317). Piscataway: IEEE.CrossRef

Li, Y., Chao, L., Liu, Y., Bao, W., & Tao, J. (2015) From simulated speech to natural speech, what are the robust features for emotion recognition? In International conference on affective computing and intelligent interaction (ACII) (pp. 368–373). Piscataway: IEEE

Lim, W., Jang, D., & Lee, T. (2016). Speech emotion recognition using convolutional and recurrent neural networks. In Signal and information processing association annual summit and conference (APSIPA), 2016 Asia-Pacific (pp. 1–4). Piscataway: IEEE.

Litman, D. J., & Forbes-Riley, K. (2006). Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors. Speech Communication, 48(5), 559–590.CrossRef

Liu, J., Chen, C., Bu, J., You, M., & Tao, J. (2007). Speech emotion recognition based on a fusion of all-class and pairwise-class feature selection. Computational Science–ICCS 2007, 168–175.

Luengo, I., Navas, E., & Hernáez, I. (2010). Feature analysis and evaluation for automatic emotion identification in speech. IEEE Transactions on Multimedia, 12(6), 490–501.CrossRef

Lugger, M., Janoir, M. E., & Yang, B. (2009). Combining classifiers with diverse feature sets for robust speaker independent emotion recognition. In Signal processing conference, 2009 17th European (1225–1229). Piscataway: IEEE.

Lugger, M., & Yang, B. (2007). The relevance of voice quality features in speaker independent emotion recognition. In IEEE international conference on acoustics, speech and signal processing, 2007. ICASSP 2007. (Vol. 4, pp. IV–17). Piscataway: IEEE.

Lugger, M., & Yang, B. (2007). An incremental analysis of different feature groups in speaker independent emotion recognition. In 16th Int. congress of phonetic sciences.

Mannepalli, K., Sastry, P. N., & Suman, M. (2016). A novel adaptive fractional deep belief networks for speaker emotion recognition. Alexandria Engineering Journal.

Mao, Q., Xue, W., Rao, Q., Zhang, F., & Zhan, Y. (2016). Domain adaptation for speech emotion recognition by sharing priors between related source and target classes. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 2608–2612). Piscataway: IEEE.CrossRef

Mao, X., Chen, L., & Fu, L. (2009). Multi-level speech emotion recognition based on HMM and ANN. In 2009 WRI World congress on computer science and information engineering (Vol. 7, pp. 225–229). Piscataway: IEEE.CrossRef

Mao, X., Zhang, B., & Luo, Y. (2007). Speech emotion recognition based on a hybrid of HMM/ANN. In Proceedings of the 7th conference on 7th WSEAS international conference on applied informatics and communications (Vol. 7, pp. 367–370).

Mencattini, A., Martinelli, E., Ringeval, F., Schuller, B., & Di Natlae, C. (2017). Continuous estimation of emotions in speech by dynamic cooperative speaker models. In IEEE transactions on affective computing.

Milton, A., Roy, S. S., & Selvi, S. T. (2013). Svm scheme for speech emotion recognition using mfcc feature. International Journal of Computer Applications, 69(9).

Milton, A., & Selvi, S. T. (2014). Class-specific multiple classifiers scheme to recognize emotions from speech signals. Computer Speech & Language, 28(3), 727–742.CrossRef

Mishra, H. K., & Sekhar, C. C. (2009). Variational Gaussian mixture models for speech emotion recognition. In Seventh international conference on advances in pattern recognition, 2009. ICAPR’09. (pp. 183–186). Piscataway: IEEE.CrossRef

Morales-Perez, M., Echeverry-Correa, J., Orozco-Gutierrez, A., & Castellanos-Dominguez, G. (2008). Feature extraction of speech signals in emotion identification. In Engineering in medicine and biology society, 2008. EMBS 2008. 30th annual international conference of the IEEE (pp. 2590–2593). Piscataway: IEEE.

Morrison, D., Wang, R., & De Silva, L. C. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech communication, 49(2), 98–112.CrossRef

Navas, E., Hernáez, I., & Luengo, I. (2006). An objective and subjective study of the role of semantics and prosodic features in building corpora for emotional TTS. EEE Transactions on Audio, Speech and Language Processing 14, 1117–1127.CrossRef

Neiberg, D., & Elenius, K. (2008). Automatic recognition of anger in spontaneous speech. In 9th annual conference of the international speech communication association.

Nicolaou, M. A., Gunes, H., & Pantic, M. (2011). Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space. IEEE Transactions on Affective Computing, 2(2), 92–105.CrossRef

Ntalampiras, S., & Fakotakis, N. (2012). Modeling the temporal evolution of acoustic parameters for speech emotion recognition. IEEE Transactions on Affective Computing, 3(1), 116–125.CrossRef

Pan, Y., Shen, P., & Shen, L. (2012). Speech emotion recognition using support vector machine. International Journal of Smart Home, 6(2), 101–108.

Pao, T. L., Chien, C. S., Chen, Y. T., Yeh, J. H., Cheng, Y. M., & Liao, W. Y. (2007). Combination of multiple classifiers for improving emotion recognition in Mandarin speech. In 3rd international conference on intelligent information hiding and multimedia signal processing, 2007. IIHMSP 2007 (Vol. 1, pp. 35–38). Piscataway: IEEE.

Pao, T. L., Wang, C. H., & Li, Y. J. (2012). A study on the search of the most discriminative speech features in the speaker dependent speech emotion recognition. In 2012 fifth international symposium on parallel architectures, algorithms and programming (PAAP) (pp. 157–162). Piscataway: IEEE.CrossRef

Pathak, S., & Kulkarni, A. (2011). Recognizing emotions from speech. In 2011 3rd international conference on electronics computer technology (ICECT) (Vol. 4, pp. 107–109). Piscataway: IEEE.CrossRef

Philippou-Hübner, D., Vlasenko, B., Böck, R., & Wendemuth, A. (2012). The performance of the speaking rate parameter in emotion recognition from speech. In 2012 IEEE international conference on multimedia and expo (ICME) (pp. 248–253). Piscataway: IEEE.CrossRef

Picard, R. W., & Picard, R. (1997). Affective computing (252). Cambridge: MIT press.

Pierre-Yves, O. (2003). The production and recognition of emotions in speech: features and algorithms. International Journal of Human-Computer Studies, 59(1), 157–183.CrossRef

Planet, S., & Iriondo, I. (2012). Comparison between decision-level and feature-level fusion of acoustic and linguistic features for spontaneous emotion recognition. In 2012 7th Iberian conference on information systems and technologies (CISTI) (pp. 1–6). Piscataway: IEEE.

Plutchik, R. (1991). The emotions. Lanham, MD: University Press of America.

Pohjalainen, J., Fabien Ringeval, F., Zhang, Z., & Schuller, B. (2016). Spectral and cepstral audio noise reduction techniques in speech emotion recognition. In Proceedings of the 2016 ACM on multimedia conference (pp. 670–674). ACM.

Polzehl, T., Schmitt, A., Metze, F., & Wagner, M. (2011). Anger recognition in speech using acoustic and linguistic cues. Speech Communication, 53(9), 1198–1209.CrossRef

Přibil, J., & Přibilová, A. (2013). Evaluation of influence of spectral and prosodic features on GMM classification of Czech and Slovak emotional speech. EURASIP Journal on Audio, Speech, and Music Processing, 2013(1), 8.CrossRef

Rao, K. S., Koolagudi, S. G., & Vempada, R. R. (2013). Emotion recognition from speech using global and local prosodic features. International Journal of Speech Technology, 16(2), 143–160.CrossRef

Rao, K. S., Kumar, T. P., Anusha, K., Leela, B., Bhavana, I., & Gowtham, S. V. S. K. (2012). Emotion recognition from speech. International Journal of Computer Science and Information Technologies, 3(2), 3603–3607.

Rehmam, B., Halim, Z., Abbas, G., & Muhammad, T. (2015). Artificial neural network-based speech recognition using Dwt analysis applied on isolated words from oriental languages. Malaysian Journal of Computer Science, 28(3), 242–262.CrossRef

Ringeval, F., & Chetouani, M. (2008). Exploiting a vowel based approach for acted emotion recognition. In Verbal and nonverbal features of human-human and human-machine interaction, pp. 243–254.

Rodríguez, P. H., Hernández, J. B. A., Ballester, M. A. F., González, C. M. T., & Orozco-Arroyave, J. R. (2013). Global selection of features for nonlinear dynamics characterization of emotional speech. Cognitive Computation, 5(4), 517–525.CrossRef

Rong, J., Li, G., & Chen, Y. P. P. (2009). Acoustic feature selection for automatic emotion recognition from speech. Information Processing & Management, 45(3), 315–328.CrossRef

Sagha, H., Deng, J., Gavryukova, M., Han, J., & Schuller, B. (2016). Cross lingual speech emotion recognition using canonical correlation analysis on principal component subspace. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5800–5804). Piscataway: IEEE.

Sánchez-Gutiérrez, M. E., Albornoz, E. M., Martinez-Licona, F., Rufiner, H. L., & Goddard, J. (2014). Deep learning for emotional speech recognition. In Mexican conference on pattern recognition (pp. 311–320). Cham: Springer International Publishing.

Scherer, S., Schwenker, F., & Palm, G. (2008). Emotion recognition from speech using multi-classifier systems and rbf-ensembles. In Speech, audio, image and biomedical signal processing using neural networks, pp. 49–70.

Scherer, S., Schwenker, F., & Palm, G. (2009). Classifier fusion for emotion recognition from speech. In Advanced intelligent environments (pp. 95–117). Springer US.

Schmitt, M., Ringeval, F., & Schuller, B. W. (2016). At the border of acoustics and linguistics: bag-of-audio-words for the recognition of emotions in speech. In INTERSPEECH (pp. 495–499).

Schuller, B., Seppi, D., Batliner, A., Maier, A., & Steidl, S. (2007). Towards more reality in the recognition of emotional speech. In 2007 IEEE international conference on acoustics, speech and signal processing-ICASSP’07 (Vol. 4, pp. IV–941). Piscataway: IEEE.

Schuller, B., Vlasenko, B., Eyben, F., Wollmer, M., Stuhlsatz, A., Wendemuth, A., & Rigoll, G. (2010). Cross-corpus acoustic emotion recognition: Variances and strategies. IEEE Transactions on Affective Computing, 1(2), 119–131.CrossRef

Schuller, B., Vlasenko, B., Minguez, R., Rigoll, G., & Wendemuth, A. (2007). Comparing one and two-stage acoustic modeling in the recognition of emotion in speech. In IEEE workshop on automatic speech recognition & understanding, 2007. ASRU (pp. 596–600). Piscataway: IEEE.CrossRef

Schuller, B. W. (2008). Speaker, noise, and acoustic space adaptation for emotion recognition in the automotive environment. In 2008 ITG conference on voice communication (SprachKommunikation) (pp. 1–4). VDE.

Schwenker, F., Scherer, S., Magdi, Y. M., & Palm, G. (2009). The GMM-SVM supervector approach for the recognition of the emotional status from speech. In International conference on artificial neural networks (pp. 894–903). Berlin, Heidelberg: Springer.

Sedaaghi, M. H., Kotropoulos, C., & Ververidis, D. (2007). Using adaptive genetic algorithms to improve speech emotion recognition. In IEEE 9th workshop on multimedia signal processing, 2007. MMSP 2007. (pp. 461–464). Piscataway: IEEE.CrossRef

Seehapoch, T., & Wongthanavasu, S. (2013). Speech emotion recognition using support vector machines. In 2013 5th international conference on knowledge and smart technology (KST) (pp. 86–91). Piscataway: IEEE.CrossRef

Ser, W., Cen, L., & Yu, Z. L. (2008). A hybrid PNN-GMM classification scheme for speech emotion recognition. In 19th international conference on pattern recognition, 2008. ICPR 2008 (pp. 1–4). Piscataway: IEEE.

Sethu, V., Ambikairajah, E., & Epps, J. (2007). Speaker normalisation for speech-based emotion detection. In 2007 15th international conference on digital signal processing (pp. 611–614). Piscataway: IEEE.

Sethu, V., Ambikairajah, E., & Epps, J. (2008a). Phonetic and speaker variations in automatic emotion classification. In 9th annual conference of the international speech communication association.

Sethu, V., Ambikairajah, E., & Epps, J. (2008b). Empirical mode decomposition based weighted frequency feature for speech-based emotion classification. In IEEE international conference on acoustics, speech and signal processing, 2008. ICASSP 2008. (pp. 5017–5020). Piscataway: IEEE.CrossRef

Sethu, V., Ambikairajah, E., & Epps, J. (2009). Speaker dependency of spectral features and speech production cues for automatic emotion classification. In IEEE international conference on acoustics, speech and signal processing, 2009. ICASSP 2009. (pp. 4693–4696). Piscataway: IEEE.CrossRef

Sethu, V., Ambikairajah, E., & Epps, J. (2013). On the use of speech parameter contours for emotion recognition. EURASIP Journal on Audio, Speech, and Music Processing, 2013(1), 19.CrossRef

Shah, F. (2009). Automatic emotion recognition from speech using artificial neural networks with gender-dependent databases. In International conference on advances in computing, control, & telecommunication technologies, 2009. ACT’09. (pp. 162–164). Piscataway: IEEE.

Shah, M., Miao, L., Chakrabarti, C., & Spanias, A. (2013). A speech emotion recognition framework based on latent Dirichlet allocation: Algorithm and FPGA implementation. In 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 2553–2557). Piscataway: IEEE.CrossRef

Shami, M., & Verhelst, W. (2007). An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech. Speech Communication, 49(3), 201–212.CrossRef

Shaukat, A., & Chen, K. (2011). Emotional state recognition from speech via soft-competition on different acoustic representations. In The 2011 international joint conference on neural networks (IJCNN) (pp. 1910–1917). Piscataway: IEEE.CrossRef

Shaw, A., Vardhan, R. K., & Saxena, S. (2016). Emotion recognition and classification in speech using Artificial neural networks. International Journal of Computer Applications, 145(8).

Sheikhan, M., Bejani, M., & Gharavian, D. (2013). Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method. Neural Computing and Applications, 23(1), 215–227.CrossRef

Sheikhan, M., Gharavian, D., & Ashoftedel, F. (2012). Using DTW neural–based MFCC warping to improve emotional speech recognition. Neural Computing and Applications, 21(7), 1765–1773.CrossRef

Shen, P., Changjun, Z., & Chen, X. (2011). Automatic speech emotion recognition using support vector machine. In 2011 international conference on electronic and mechanical engineering and information technology (EMEIT) (Vol. 2, pp. 621–625). Piscataway: IEEE.CrossRef

Sidorov, M., Ultes, S., & Schmitt, A. (2014). Emotions are a personal thing: Towards speaker-adaptive emotion recognition. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4803–4807). Piscataway: IEEE.CrossRef

Soltani, K., & Ainon, R. N. (2007). Speech emotion detection based on neural networks. In 9th international symposium on signal processing and its applications, 2007. ISSPA 2007. (pp. 1–3). Piscataway: IEEE.

Song, P., Ou, S., Zheng, W., Jin, Y., & Zhao, L. (2016). Speech emotion recognition using transfer non-negative matrix factorization. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5180–5184). Piscataway: IEEE.CrossRef

Song, P., Zheng, W., Ou, S., Zhang, X., Jin, Y., Liu, J., & Yu, Y. (2016). Cross-corpus speech emotion recognition based on transfer non-negative matrix factorization. Speech Communication, 83, 34–41.CrossRef

Steidl, S., Batliner, A., Nöth, E., & Hornegger, J. (2008). Quantification of segmentation and F0 errors and their effect on emotion recognition. In Text, speech and dialogue (pp. 525–534). Berlin/Heidelberg: Springer.CrossRef

Sun, Y., & Wen, G. (2015). Emotion recognition using semi-supervised feature selection with speaker normalization. International Journal of Speech Technology, 18(3), 317–331.CrossRef

Sun, Y., Wen, G., & Wang, J. (2015). Weighted spectral features based on local Hu moments for speech emotion recognition. Biomedical Signal Processing and Control, 18, 80–90.CrossRef

Sun, Y., Zhou, Y., Zhao, Q., & Yan, Y. (2009). Acoustic feature optimization for emotion affected speech recognition. In International conference on information engineering and computer science, 2009. ICIECS 2009. (pp. 1–4). Piscataway: IEEE.

Swain, M., Sahoo, S., Routray, A., Kabisatpathy, P., & Kundu, J. N. (2015). Study of feature combination using HMM and SVM for multilingual Odiya speech emotion recognition. International Journal of Speech Technology, 18(3), 387–393.CrossRef

Sztahó, D., Imre, V., & Vicsi, K. (2011). Automatic classification of emotions in spontaneous speech. Analysis of verbal and nonverbal communication and enactment. The Processing Issues, pp. 229–239.

Tabatabaei, T. S., Krishnan, S., & Guergachi, A. (2007). Emotion recognition using novel speech signal features. In IEEE international symposium on circuits and systems, 2007. ISCAS 2007 (pp. 345–348). Piscataway: IEEE.CrossRef

Tahon, M., & Devillers, L. (2015). Towards a small set of robust acoustic features for emotion recognition: IEEE/ACM transactions on challenges audio, speech, and language processing, 24(1), 16–28.

Tamulevicius, G., & Liogiene, T. (2015). Low-order multi-level features for speech emotions recognition. Baltic Journal of Modern Computing, 3(4), 234–247.

Tarasov, A., & Delany, S. J. (2011). Benchmarking classification models for emotion recognition in natural speech: A multi-corporal study. In 2011 IEEE international conference on automatic face & gesture recognition and workshops (FG 2011) (pp. 841–846). Piscataway: IEEE.

Ten Bosch, L. (2003). Emotions, speech and the ASR framework. Speech Communication, 40(1), 213–225.MATHCrossRef

Thapliyal, N., & Amoli, G. (2012). Speech based emotion recognition with gaussian mixture model. International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), 1(5), 65.

Trigeorgis, G., Ringeval, F., Brueckner, R., Marchi, E., Nicolaou, M. A., Schuller, B., & Zafeiriou, S. (2016). Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), (pp. 5200–5204). Piscataway: IEEE.CrossRef

Truong, K., & Van Leeuwen, D. (2007). An ‘open-set’detection evaluation methodology for automatic emotion recognition in speech. In Workshop on paralinguistic speech-between models and data (pp. 5–10).

Tseng, M., Hu, Y., Han, W. W., & Bergen, B. (2005). “Searching for happiness” or” Full of Joy”? Source domain activation matters. In annual meeting of the Berkeley linguistics society (Vol. 31, No. 1, pp. 359–370).

Utane, A. S., & Nalbalwar, S. L. (2013). Emotion recognition through speech using gaussian mixture model and support vector machine. Emotion, 2, 8.

Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48(9), 1162–1181.CrossRef

Vlasenko, B., Philippou-Hübner, D., Prylipko, D., Böck, R., Siegert, I., & Wendemuth, A. (2011a). Vowels formants analysis allows straightforward detection of high arousal emotions. In 2011 IEEE international conference on multimedia and expo (ICME) (pp. 1–6). Piscataway: IEEE.

Vlasenko, B., Prylipko, D., Philippou-Hübner, D., & Wendemuth, A. (2011b). Vowels formants analysis allows straightforward detection of high arousal acted and spontaneous emotions. In 12th annual conference of the international speech communication association.

Vlasenko, B., Schuller, B., Wendemut, A., & Rigoll, G. (2007) Frame vs Turn-level: emotion recognition from speech considering static and dynamic processing. In Proceedings 2nd international conference on affective computing and intelligent interaction, pp 139–147.

Vogt, T., & André, E. (2006). Improving automatic emotion recognition from speech via gender differentiation. In Proceeding language resources and evaluation conference (LREC 2006), Genoa.

Vogt, T., & André, E. (2009). Exploring the benefits of discretization of acoustic features for speech emotion recognition. In 10th annual conference of the international speech communication association.

Vogt, T., & André, E. (2011). An evaluation of emotion units and feature types for real-time speech emotion recognition. KI-Künstliche Intelligenz, 25(3), 213–223.CrossRef

Vondra, M., & Vích, R. (2009). Evaluation of speech emotion classification based on GMM and data fusion. In Cross-modal analysis of speech, gestures, gaze and facial expressions, pp. 98–105.

Wagner, J., Vogt, T., & André, E. (2007). A systematic comparison of different HMM designs for emotion recognition from acted and spontaneous speech. In international conference on affective computing and intelligent interaction (pp. 114–125). Springer, Berlin, Heidelberg.

Wang, F., Verhelst, W., & Sahli, H. (2011). Relevance vector machine based speech emotion recognition. In Affective computing and intelligent interaction, pp. 111–120.

Weninger, F., Ringeval, F., Marchi, E., & Schuller, B. W. (2016). Discriminatively trained recurrent neural networks for continuous dimensional emotion recognition from audio. In IJCAI (pp. 2196–2202).

Wenjing, H., Haifeng, L., & Chunyu, G. (2009). A hybrid speech emotion perception method of VQ-based feature processing and ANN recognition. In WRI global congress on intelligent systems, 2009. GCIS’09. (Vol. 2, pp. 145–149). Piscataway: IEEE.CrossRef

Womack, B. D., & Hansen, J. H. (1999). N-channel hidden Markov models for combined stressed speech classification and recognition. IEEE Transactions on Speech and Audio Processing, 7(6), 668–677.CrossRef

Wu, C. H., & Liang, W. B. (2011). Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Transaction Affective Computing, 2, 10–21.CrossRef

Wu, S., Falk, T. H., & Chan, W. Y. (2009). Automatic recognition of speech emotion using long-term spectro-temporal features. In 2009 16th international conference on digital signal processing (pp. 1–6). Piscataway: IEEE.

Wu, S., Falk, T. H., & Chan, W. Y. (2011). Automatic speech emotion recognition using modulation spectral features. Speech Communication, 53(5), 768–785.CrossRef

Wu, T., Yang, Y., Wu, Z., & Li, D. (2006). MASC: A speech corpus in mandarin for emotion analysis and affective speaker recognition. In Speaker and language recognition workshop, 2006. IEEE Odyssey 2006 (pp. 1–5). Piscataway: IEEE.

Xiao, Z., Dellandréa, E., Chen, L., & Dou, W. (2009). Recognition of emotions in speech by a hierarchical approach. In 3rd international conference on affective computing and intelligent interaction and workshops, 2009. ACII 2009. (pp. 1–8). Piscataway: IEEE.

Xiao, Z., Dellandrea, E., Dou, W., & Chen, L. (2006). Two-stage classification of emotional speech. In international conference on digital telecommunications, 2006. ICDT’06. (pp. 32–32). Piscataway: IEEE.

Xiao, Z., Dellandrea, E., Dou, W., & Chen, L. (2007, December). Automatic hierarchical classification of emotional speech. In 9th IEEE international symposium on multimedia workshops, 2007. ISMW’07. (pp. 291–296). Piscataway: IEEE.

Xiao, Z., Dellandrea, E., Dou, W., & Chen, L. (2007). Hierarchical classification of emotional speech. IEEE Transactions on Multimedia, 37.

Yang, B., & Lugger, M. (2010). Emotion recognition from speech signals using new harmony features. Signal Processing, 90(5), 1415–1423.MATHCrossRef

Yang, N., Muraleedharan, R., Kohl, J., Demirkol, I., Heinzelman, W., & Sturge-Apple, M. (2012). Speech-based emotion classification using multiclass SVM with hybrid kernel and thresholding fusion. In Spoken Language Technology Workshop (SLT), 2012 IEEE (pp. 455–460). Piscataway: IEEE.CrossRef

Ye, C., Liu, J., Chen, C., Song, M., & Bu, J. (2008). Speech emotion classification on a Riemannian manifold. In Advances in multimedia information processing-PCM 2008, pp. 61–69.

Yeh, J. H., Pao, T. L., Lin, C. Y., Tsai, Y. W., & Chen, Y. T. (2011). Segment-based emotion recognition from continuous Mandarin Chinese speech. Computers in Human Behavior, 27(5), 1545–1552.CrossRef

You, M., Chen, C., Bu, J., Liu, J., & Tao, J. (2006). A hierarchical framework for speech emotion recognition. In 2006 IEEE international symposium on industrial electronics (Vol. 1, pp. 515–519). Piscataway: IEEE.CrossRef

You, M., Chen, C., Bu, J., Liu, J., & Tao, J. (2006). Emotional speech analysis on nonlinear manifold. In 18th international conference on pattern recognition, 2006. ICPR 2006. (Vol. 3, pp. 91–94). Piscataway: IEEE.

Yun, S., & Yoo, C. D. (2012). Loss-scaled large-margin Gaussian mixture models for speech emotion classification. IEEE Transactions on Audio, Speech, and Language Processing, 20(2), 585–598.CrossRef

Yüncü, E., Hacihabiboglu, H., & Bozsahin, C. (2014). Automatic speech emotion recognition using auditory models with binary decision tree and svm. In 2014 22nd international conference on pattern recognition (ICPR) (pp. 773–778). Piscataway: IEEE.

Zbancioc, M., & Feraru, S. M. (2012). Emotion recognition of the SROL Romanian database using fuzzy KNN algorithm. In 10th international symposium on electronics and telecommunications (ISETC), 2012 (pp. 347–350). Piscataway: IEEE.

Zeng, Z., Pantic, M., Roisman, G. I., & Huang, T. S. (2009). A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 39–58.CrossRef

Zha, C., Yang, P., Zhang, X., & Zhao, L. (2016). Spontaneous speech emotion recognition via multiple kernel learning. In 2016 eighth international conference on measuring technology and mechatronics automation (ICMTMA) (pp. 621–623). Piscataway: IEEE.

Zhang, S., Lei, B., Chen, A., Chen, C., & Chen, Y. (2010). Spoken emotion recognition using local fisher discriminant analysis. In 10th international conference on signal processing (ICSP), 2010 IEEE (pp. 538–540). Piscataway: IEEE.

Zhang, S., & Zhao, Z. (2008). Feature selection filtering methods for emotion recognition in Chinese speech signal. In 9th international conference on signal processing, 2008. ICSP 2008. (pp. 1699–1702). Piscataway: IEEE.

Zheng, W. Q., Yu, J. S., & Zou, Y. X. (2015). An experimental study of speech emotion recognition based on deep convolutional neural networks. In 2015 international conference on affective computing and intelligent interaction (ACII) (pp. 827–831). Piscataway: IEEE.

Zhou, J., Wang, G., Yang, Y., & Chen, P. (2006). Speech emotion recognition based on rough set and SVM. In 5th IEEE international conference on cognitive informatics, 2006. ICCI 2006. (Vol. 1, pp. 53–61). Piscataway: IEEE.

Zhou, Y., Sun, Y., Yang, L., & Yan, Y. (2009). Applying articulatory features to speech emotion recognition. In international conference on research challenges in computer science, 2009. ICRCCS’09. (pp. 73–76). Piscataway: IEEE.

Zhu, L., Chen, L., Zhao, D., Zhou, J., & Zhang, W. (2017). Emotion recognition from chinese speech for smart affective services using a combination of SVM and DBN. Sensors, 17(7), 1694.CrossRef

Zong, Y., Zheng, W., Zhang, T., & Huang, X. (2016). Cross-corpus speech emotion recognition based on domain-adaptive least-squares regression. IEEE Signal Processing Letters, 23(5), 585–589.CrossRef

Titel: Speech emotion recognition research: an analysis of research focus
verfasst von: Mumtaz Begum Mustafa
Mansoor A. M. Yusoof
Zuraidah M. Don
Mehdi Malekzadeh
Publikationsdatum: 22.01.2018
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 1/2018
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-018-9493-x

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Arbeitszeit/© granata68 / Fotolia, E-Autos im Fuhrpark: Lohnt sich das noch?/© Petair / stock.adobe.com, Kryptowährungen/© gopixa / Getty Images / iStock, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1/2018

Improved target detection with ultra wideband radars using efficient coded waveforms

A new method of speech transmission over space time block coded co-operative MIMO–OFDM networks using time and space diversity

Vocal parameters analysis of smoker using Amazigh language

Multiclass classification of Parkinson’s disease using cepstral analysis

Choice of a classifier, based on properties of a dataset: case study-speech emotion recognition

Improvement of time alignment of the speech signals to be used in voice conversion

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.