nach oben

Neural Computing and Applications

Erschienen in:

04.10.2018 | Original Article

Novel cascaded Gaussian mixture model-deep neural network classifier for speaker identification in emotional talking environments

verfasst von: Ismail Shahin, Ali Bou Nassif, Shibani Hamsa

Erschienen in: Neural Computing and Applications | Ausgabe 7/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This research is an effort to present an effective approach to enhance text-independent speaker identification performance in emotional talking environments based on novel classifier called cascaded Gaussian Mixture Model-Deep Neural Network (GMM-DNN). Our current work focuses on proposing, implementing and evaluating a new approach for speaker identification in emotional talking environments based on cascaded Gaussian mixture model-deep neural network as a classifier. The results point out that the cascaded GMM-DNN classifier improves speaker identification performance at various emotions using two distinct speech databases: Emirati speech database (Arabic United Arab Emirates dataset) and “speech under simulated and actual stress” English dataset. The proposed classifier outperforms classical classifiers such as multilayer perceptron and support vector machine in each dataset. Speaker identification performance that has been attained based on the cascaded GMM-DNN is similar to that acquired from subjective assessment by human listeners.

Vorheriger Artikel Classification of calcified regions in atherosclerotic lesions of the carotid artery in computed tomography angiography images

Nächster Artikel Development of new agglomerative and performance evaluation models for classification

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Furui S (1991) Speaker-dependent-feature-extraction, recognition and processing techniques. Speech Commun 10:505–520CrossRef

Wang Y, Tang F, Zheng U (2012) Robust text-independent speaker identification in a time-varying noisy environment. J Softw 7:1975–1980

Cowie R, Douglas-Cowie E, Tsapatsoulis N, Votsis G, Collias S, Fellenz W, Taylor J (2001) Emotion recognition in human-computer interaction. IEEE Signal Process Mag 18(1):32–80CrossRef

Fragopanagos N, Taylor JG (2005) Emotion recognition in human-computer interaction. Neural Netw 18:389–405 (Special issue) CrossRef

Shahin I (2013) Speaker identification in emotional talking environments based on CSPHMM2 s. Eng Appl Artif Intell 26(7):1652–1659. https://doi.org/10.1016/j.engappai.2013.03.013 CrossRef

Li D, Yang Y, Wu Z, Wu T (2005) Emotion-state conversion for speaker recognition. In: Affective computing and intelligent interaction. LNCS, vol 3784. Springer, Berlin, pp 403–410

Wu S, Falk TH, Chan WY (2011) Automatic speech emotion recognition using modulation spectral features. Speech Commun 53:768–785CrossRef

Bao H, Xu M, Zheng TF (2007) Emotion attribute projection for speaker recognition on emotional speech. In: Proceedings of the 8th MLPual conference of the international speech communication association (Interspeech’07), Antwerp, Belgium, pp 601–604

Koolagudi SG, Krothapalli RS (2011) Two stage emotion recognition based on speaking rate. Int J Speech Technol 14:35–48CrossRef

10.

Jawarkar NP, Holambe RS, Basu TK (2012) Text-independent speaker identification in emotional environments: a classifier fusion approach. In: Frontiers in computer education, Volume 133 of the series advances in intelligent and soft computing, pp 569–576

11.

Mansour A, Lachiri Z (2016) Emotional speaker recognition in simulated and spontaneous context. In: 2nd International conference on advanced technologies for signal and image processing (ATSIP), pp 776–781

12.

Shahin I (2011) Identifying speakers using their emotion cues. Int J Speech Technol 14(2):89–98. https://doi.org/10.1007/s10772-011-9089-1 CrossRef

13.

Shahin I (2013) Employing both gender and emotion cues to enhance speaker identification performance in emotional talking environments. Int J Speech Technol 16(3):341–351. https://doi.org/10.1007/s10772-013-9188-2 CrossRef

14.

Shahin I, Nasser Ba-Hutair M (2014) Emarati speaker identification. In: 12th International conference on signal processing (ICSP 2014), HangZhou, China, pp 488–493

15.

Shahin I (2012) Studying and enhancing talking condition recognition in stressful and emotional talking environments based on HMMs, CHMM2 s and SPHMMs. J Multimodal User Interfaces 6(1):59–71. https://doi.org/10.1007/s12193-011-0082-4 CrossRef

16.

Shahin I, Nasser Ba-Hutair M (2015) Talking condition recognition in stressful and emotional talking environments based on CSPHMM2s. Int J Speech Technol 18(1):77–90. https://doi.org/10.1007/s10772-014-9251-7 CrossRef

17.

George T, Fabien R, Raymond B, Erik M, Mihalis AN, Björn S, Stefanos Z (2016) Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In: IEEE international conference on acoustics, speech and signal processing (ICASSP)

18.

Erik MS, Youngmoo EK (2011) Learning emotion-based acoustic features with deep belief networks. In: 2011 IEEE workshop on applications of signal processing to audio and acoustics, pp 16–19

19.

Matejka P, Glembek O, Navotny O, Plchot O, Grezl F, Burget L, Cernocky J (2016) Analysis of DNN approaches to speaker identification. In: International conference on acoustics, speech and signal processing, pp 5100–5104

20.

Lee H, Pham P, Largman Y, Ng A (2009) Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Bengio Y, Schuurmans D, Lafferty J, Williams CKI, Culotta A (eds) Advances in neural information processing systems, vol 22. MIT Press, Cambridge, pp 1096–1104

21.

Richardson F, Reynolds D, Dehak N (2015) Deep neural network approaches to speaker and language recognition. IEEE Signal Process Lett 22(10):1671–1675CrossRef

22.

Ali H, Tran SN, Benetos E, d’Avila Garcez AS (2018) Speaker recognition with hybrid features from a deep belief network. Neural Comput Appl 29(6):13–19. https://doi.org/10.1007/s00521-016-2501-7 CrossRef

23.

Geeta N, Soni MK (2014) Speaker recognition using support vector machine. Int J Comput Appl 87(2):7–10

24.

Marcel K, Sven EK, Martin S, Edin A, Andreas W (2016) Speaker identification and verification using support vector machines and sparse kernel logistic regression. In: International workshop on intelligent computing in pattern analysis/synthesis (IWICPAS), pp 176–184

25.

Sharma A, Snghand SP, Kumar VK (2005) Text-independent speaker identification using back propagation MLP network classifier for a closed set of speakers. In: Proceedings of the fifth IEEE international symposium on signal processing and information technology

26.

Srinivas V, Santhi Rani C, Madhu T (2014) Neural network based classification for speaker identification. Int J Signal Process Image Process Pattern Recognit 7:109–120

27.

Anagnostopoulos CN, Iliou T, Giannoukos I (2012) Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011. Artif Intell Rev 43:155–177. https://doi.org/10.1007/s10462-012-9368-5 CrossRef

28.

Adell J, Benafonte A, Escudero D (2005) Analysis of prosodic features: towards modeling of emotional and pragmatic attributes of speech. XXI Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural, SEPLN, Granada, Spain

29.

Ververidis D, Kotropoulos C (2006) Emotional speech recognition: resources, features and methods. Speech Commun 48(9):1162–1181CrossRef

30.

Bosch LT (2003) Emotions, speech and the ASR framework. Speech Commun 40(1–2):213–225MATH

31.

http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/

32.

Reynold DA (1995) Robust text independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audio Process 3:72–82CrossRef

33.

McLaren M, Lei Y, Scheffer N, Ferrer L (2014) Application of convolutional neural networks to speaker recognition in noisy conditions. In: Interspeech, pp 686–690

34.

Xu B, Wang N, Chen T, Li M (2015) Empirical evaluation of rectified activations in convolution network, pp 1–5. arXiv:1505.00853v2

35.

Hogg RV, Craig AT (1970) Chapter 4: Introduction to mathematical statistics. Collier-Macmillan, London

36.

Hansen JHL, Bou-Ghazale S (1997) Getting started with SUSAS: a speech under simulated and actual stress database. In: International conference on speech communication and technology, EUROSPEECH-97, Rhodes, Greece, vol 4, pp 1743–1746

37.

Shahin I (2014) Novel third-order hidden Markov models for speaker identification in shouted talking environments. Eng Appl Artif Intell 35:316–323. https://doi.org/10.1016/j.engappai.2014.07.006 CrossRef

Titel: Novel cascaded Gaussian mixture model-deep neural network classifier for speaker identification in emotional talking environments
verfasst von: Ismail Shahin
Ali Bou Nassif
Shibani Hamsa
Publikationsdatum: 04.10.2018
Verlag: Springer London
Erschienen in: Neural Computing and Applications / Ausgabe 7/2020
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-018-3760-2

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 7/2020

Extended hesitant fuzzy linguistic term set with fuzzy confidence for solving group decision-making problems

Improved word-level handwritten Indic script identification by integrating small convolutional neural networks

River discharge simulation using variable parameter McCarthy–Muskingum and wavelet-support vector machine methods

Recognition and prediction of ground vibration signal based on machine learning algorithm

Single-column CNN for crowd counting with pixel-wise attention mechanism

Application research of improved genetic algorithm based on machine learning in production scheduling

Premium Partner