nach oben

Neural Computing and Applications

Erschienen in:

01.02.2015 | Original Article

Affect-insensitive speaker recognition systems via emotional speech clustering using prosodic features

verfasst von: Dongdong Li, Yubo Yuan, Zhaohui Wu, Yingchun Yang

Erschienen in: Neural Computing and Applications | Ausgabe 2/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Voice-based biometric security systems involving only neutral speech have achieved promising performance. However, the speakers are very likely to fail the recognition when the test data exhibit multiple emotions. This paper aimed to address the mismatch of the emotional states between training and testing speech. We discuss different modeling strategies that incorporate the emotions (affects) of speakers into the training stage of a Mandarin-based speaker recognition system and propose an alternative approach, which could optimize the utilization of the limited affective speech. The training speeches are partitioned and clustered by the trends of the prosodic variations. Multiple models are built based on the clustered speech for a given speaker. The prosodic differences are characterized by a combination of features that describe the changes of the fundamental frequencies and energy contours. The experiments were carried out based on the Mandarin Affective Speech Corpus. The result shows 73.37 % improvement in recognition rate over that of the traditional speaker verification tasks relatively and also achieves 63.53 % higher in performance over the structural training-based systems relatively.

Vorheriger Artikel The generalized hybrid weighted average operator based on interval neutrosophic hesitant set and its application to multiple attribute decision making

Nächster Artikel New result on convergence for HCNNs with time-varying leakage delays

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Adami AG (2007) Modeling prosodic difference for speaker recognition. Speech Commun 49(4):277–291CrossRef

Amir N, Ron S (1998) Towards an automatic classification of emotions in speech. ICSLP, Sydney

Arcienega M, Drygajlo A (2001) Pitch-dependent GMM for Text-Independent Speaker Recognition Systems. EUROSPEECH, Scandinavia, pp 2821–2824

Atal BS (1976) Automatic recognition of speakers from their voices. In: Proceedings of IEEE, pp 460–475

Atkinson JE (1978) Correlation analysis of the physiological factors controlling fundamental voice frequency. J Acoust Soc Am 63(1):211–222CrossRef

Cowie R, Douglas-Cowie EN (1996) Automatic statistical analysis of the signal and prosodic signs of emotion in speech. ICSLP, Philadelphia

Cowie R, Douglas-Cowie EN (2001) Emotion recognition in human–computer interaction. IEEE Singal Process Mag 18(1):32–80CrossRef

Daniel K, Raquel T, Thomas K, Beate M (2004) Towards real life application in emotion recognition. ADS, Kloster Irsee

Dongdong L, Yingchun Y, Zhaohui W (2005) Emotion-state conversion for speaker recognition. ACII, Beijing

10.

Dongdong L, Yingchun Y (2009) Emotional speech clustering based robust speaker recognition system. In: 2nd international Congress on image and signal processing, pp 4576–4580

11.

Fant G, Kruckenberg A, Nord L (1991) Prosodic and segmental speaker variations. Speech Commun 10(2):521–531CrossRef

12.

Frick RW (1985) Communicating emotion: the role of prosodic features. Psychological 97(2):412–429

13.

Gish H, Schmidt N (1994) Text-independent speaker identification. IEEE Singal Process Mag 11(4):18–32CrossRef

14.

Hassan E, Jean R (2001) Towards combining pitch and MFCC for speaker identification systems. EUROSPEECH, Aalborg

15.

Hirschberg J (1999) Communication and prosody: functional aspects of prosody. In: Proceedings of the ESCA workshop dialogue and prosody, pp 7–15

16.

Kemal S, Elizabeth S, Larry H, Mitchel W (1998) Modeling dynamic prosodic variation for speaker verifiction. ICSLP, Sydney

17.

Klasmeyer G, Johnstone T, Banziger T, Sappok C, Scherer KR (2000) Emotional voice variability in speaker verification. In: The ISCA workshop on speech and emotion, Newcastle, Northern Ireland, UK, pp 213–218

18.

Klatt DH, Klatt LC (1990) Analysis, synthesis, and perception of voice quality variations among female and male talkers. J Acoust Soc Am 87(2):820–857CrossRef

19.

Mammone RJ, Zhang XY, Ramachandran RP (1996) Robust speaker recognition. IEEE Singal Process Mag 13(5):58–70CrossRef

20.

Markov KP, Nakagawa S (1998) Text-independent speaker recognition using non-linear frame likelihood transformation. Speech Commun 24(3):193–209CrossRef

21.

Martin A, Doddington G, Kamm T, Ordowski M, Przybocki M (1997) The DET curve in assessment of detection task performance. EUROSPEECH, Rhodes

22.

Matsui T, Furui S (1995) Likelihood normalization for speaker verification using a phoneme- and speaker-independent model. Speech Commun 17(1–2):109–116CrossRef

23.

Minematsu N, Nakagawa S (1998) Modeling of variations in cepstral coefficients caused by Fo changes and its application to Speech Processing. ICSLP, Sydney, Australia

24.

Montero JM, Gutierrez-Arriola JM, Palazuelos S, Enriquez E, Aguilera S, Pardo JM (1998) Emotional speech synthesis: from speech database to TTS. ICSLP, Sydney

25.

Murray IR, Arnott JL (1996) Synthesizing emotions in speech: Is it time to get excited?. ICASSP, Philadelphia

26.

Murray IR, Arnott JL (2008) Applying an analysis of acted vocal emotions to improve the simulation of synthetic speech. Comput Speech Lang 22(2):107–129CrossRef

27.

Peskin B, Navratil J, Abramson J, Jones D, Reynolds D, Xiang B (2003) Using prosodic and conversational features for high-performance speaker recognition. ICASSP, HongKong

28.

Pu Y, Yingchun Y, Zhaohui W (2005) Exploiting glottal information in speaker recognition using parallel GMM. AVBPA, Hilton Rye Town

29.

Reynolds DA (1992) A Gaussian mixture modeling approach to text independent speaker identification. Georgia Institute of Technology

30.

Reynolds DA (2003) Channel robust speaker verification via feature mapping. ICASSP, Hong Kong, pp 53–56

31.

Reynolds DA (2003) The SuperSID Project: exploiting high-level information for high-accuracy speaker recognition. ICASSP, HongKong

32.

Scherer KR (2000) A cross-cultural investigation of emotion inferences from voice and speech: implicationfor speech technology. ICSLP, Beijing

33.

Scherer KR (2003) Vocal communication of emotion: a review of research paradigms. Speech Commun 40(1–2):227–256CrossRefMATH

34.

Scherer KR, Johnstone T, Klasmeyer G (2000) Can automatic speaker verification be improved by training the algorithms on emotional speech?. ICSLP, Beijing

35.

Scherer KR, Johnstone T, Banziger T (1998) Verification of emotionally stressed speakers: the problem of individual differences. SPECOM, pp 233–238

36.

Schroder M (2001) Emotional speech synthesis: a review. EUROSPEECH, pp 561–564

37.

Shao X, Milner B, Cox S (2003) Integrated pitch and MFCC extraction for speech reconstruction and speech recognition applications. Eurospeech, Geneva

38.

Soong FK, Rosenberg AE (1988) On the use of instantaneous and transitional spectral information in speaker recognition. IEEE Trans Acoust Speech Signal Process 36(6):871–879CrossRefMATH

39.

Tian W, Yingchun Y, Zhaohui W, Dongdong L (2005) Improving speaker recognition by training on emotion-added models. ACII, Beijing

40.

Tian W, Yingchun Y, Zhaohui W, Dongdong L (2006) MASC: a speech corpus in mandarin for emotion analysis and affective speaker recognition.Odyssey, San Juan, Puerto Rico, pp 1–5

41.

Ververidis D, Kotropoulos C (2004) Emotional speech recognition: resources, features, and methods. Speech Commun 48(9):1162–1181CrossRef

42.

Wei W, Thomas FZ, Xu MX, HuanJun B (2006) Study on speaker verification on emotional speech. Interspeech, pp 2102–2105

43.

Zhaohui W, Dongdong L, Yingchun Y (2006) Rules based feature modification for affective speaker recognition. ICASSP, Toulouse

44.

Zilca RD, Navratil J, Ramaswamy GN (2003) SynPitch: a pseudo pitch synchronous algorithm for speaker recognition, Eurospeech, pp 2649–2652

Titel: Affect-insensitive speaker recognition systems via emotional speech clustering using prosodic features
verfasst von: Dongdong Li
Yubo Yuan
Zhaohui Wu
Yingchun Yang
Publikationsdatum: 01.02.2015
Verlag: Springer London
Erschienen in: Neural Computing and Applications / Ausgabe 2/2015
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-014-1708-8

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 2/2015

Fusion trees for fast and accurate classification of hyperspectral data with ensembles of -divergence-based RBF networks

The generalized hybrid weighted average operator based on interval neutrosophic hesitant set and its application to multiple attribute decision making

Discriminative structure discovery via dimensionality reduction for facial image manifold

New result on convergence for HCNNs with time-varying leakage delays

Stochastic stability analysis for neural networks with mixed time-varying delays

A novel stochastic mean filter based on Ornstein–Uhlenbeck process