nach oben

Erschienen in:

2012 | OriginalPaper | Buchkapitel

13. Prosodic Features for Speaker Recognition

verfasst von : Leena Mary, Ph.D.

Erschienen in: Forensic Speaker Recognition

Verlag: Springer New York

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this chapter the effectiveness of syllable-based prosodic features for speaker recognition is discussed. The term prosody represents a collection of characteristics such as intonation, stress and timing, primarily expressed using variations in pitch, energy and duration at various levels of speech. Prosody reflects the learned/acquired speaking habits of a person and hence contributes for speaker recognition. Because prosodic features are less affected by channel mismatch and noise, they are particularly well suited for speaker forensics, a field that demands accurate identification of suspects with as few mitigating conditions as possible. In this chapter, the author describes a method for extracting prosodic features directly from speech signal. Applying this method, speech is segmented into syllable-like regions using vowel onset points (VOP). The locations of VOPs serve as reference for extraction and representation of prosodic features. The effectiveness of the prosodic features for speaker recognition is demonstrated for extended task of NIST speaker recognition evaluation 2003. Combining evidence from spectral features with that of the proposed prosodic features helps to improve overall speaker recognition accuracy.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Aerodynamic and Acoustic Theory of Voice Production

Nächstes Kapitel Speaker Identification Using Intermediate Matching Kernel-Based Support Vector Machines

Heck LP (2002) Integrating high-level information for robust speaker recognition in John Hopkins University workshop on SuperSID, Baltimore, Maryland. http:\\www.cslp.jhu.edu/ws2002/groups/supersid

Doddington GG (2001) Speaker recognition based on idiolectic differences between speakers. Proc. EUROSPEECH, Aalborg, Denmark, pp 2521–2524

Campbell JP (1997) Speaker recognition: a tutorial. Proc IEEE 85(9):1437–1462CrossRef

Mary L (2006) Multilevel implicit features for language and speaker recognition. Ph. D. Thesis, Indian Institute of Technology, Madras

Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Commun 52:12–40CrossRef

NIST (2001) Speaker recognition evaluation website: http://www.nist.gov/speech/tests/spk/2001

Reynolds D, Andrews W, Campbell J, Navratil J, Peskin B, Adami A, Jin Q, Klusacek D, Abramson J, Mihaescu R, Godfrey J, Jones D, Xiang B (2003) The superSID project: exploiting high-level information for high-accuracy speaker recognition Proc. IEEE Int. Conf. Acoust., Speech and Signal Processing, Hong Kong, China, 4, pp 784–787

Shriberg E, Stolcke A, Hakkani-Tur D, Tur G (2000) Prosody-based automatic segmentation of speech into sentences and topics. Speech Commun 32:127–154CrossRef

Sonmez MK, Heck L, Weintraub M, Shriberg E (1997) A lognormal tied mixture model of pitch for prosody-based speaker recognition. Proc. EUROSPEECH, Rhodes, Greece. 3, pp 1391–1394

10.

Atkinson JE (1978) Correlation analysis of the physiological factors controlling fundamental voice frequency. J Acoust Soc Am 63(1):211–222CrossRef

11.

Yegnanarayana B, Prasanna SRM, Zachariah JM, Gupta CS (2005) Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Trans Speech Audio Process 13(4):575–582CrossRef

12.

Atal B (1972) Automatic speaker recognition based on pitch contours. J Acous Soc Am 52(3):1687–1697CrossRef

13.

Adami AG, Mihaescu R, Reynolds DA, Godfrey JJ (2003) Modeling prosodic dynamics for speaker recognition. Proc. ICASSP, Hong Kong, China, 4, pp 788–791

14.

Makhoul J (1975) Linear prediction: a tutorial review. Proc IEEE 63:561–580CrossRef

15.

Furui S (1981) Cepstral analysis technique for automatic speaker verification. IEEE Trans Speech Audio Process 29:254–272

16.

Reynolds DA, Rose R (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audio Process 3:72–83CrossRef

17.

Reynolds DA (1996) The effect of handset variability on speaker recognition performance: Experiments on the switchboard corpus. Proc. ICASSP, Atlanta, GA, USA, 1, pp 113–116

18.

Thyme-Gobbel AE, Hutchins SE (1996) On using prosodic cues in automatic language identification. Proc. Int. Conf. Spoken Language Processing, Philadelphia, PA, USA, 3, pp 1768–1772

19.

Mary L, Yegnanarayana B (2008) Extraction and representation of prosodic features for language and speaker recognition. Speech Commun 50:782–796CrossRef

20.

Drygajlo A (2007) Forensic automatic speaker recognition. IEEE Signal Process Mag 132–135

21.

Shriberg E, Stolcke A (2008) The case for automatic higher level features in forensic speaker recognition. Proc. Interspeech, Brisbane, Australia, pp 1509–1512

22.

Rose P (2006) Technical speaker recognition: evaluation, types and testing of evidence. Comp Speech Lang 20:159–1914CrossRef

23.

Shriberg E, Ferrer L, Kajarekar S, Venkataraman A, Stolcke A (2005) Modeling prosodic feature sequences for speaker recognition. Speech Commun 46:455–472CrossRef

24.

Sonmez MK, Shriberg E, Heck L, Weintraub M (1998) Modeling dynamic prosodic variation for speaker variation. Proc. ICSLP, Sydney, Australia, 7, pp 3189–3192

25.

Adami AG, Mihaescu R, Reynolds DA, Godfrey JJ (2003) Modeling prosodic dynamics for speaker recognition. Proc. ICASSP, Hong kong, China, 4, pp 788–791

26.

Peskin B, Navratil J, Abramson J, Jones D, Klusacek D, Reynolds D, Xiang B (2003) Using prosodic and conversational features for high-performance speaker recognition: report from JHU WS`02. Proc. ICASSP, Hong kong, China, 4, pp 792–795

27.

Rouas J, Farinas J, Pellegrino F, Andre-Obrecht R (2005) Rhythmic unit extraction and modelling for automatic language identification. Speech Commun 47:436–456CrossRef

28.

Nagarajan T, Murthy HA (2006) Language identification using acoustic log-likelihoods of syllable-like units. Speech Commun 48:913–926CrossRef

29.

Dehak N, Kenny P, Dumouchel P (2007) Continuous prosodic features and formant modeling with joint factor analysis for speaker verification. Proc. of Interspeech, pp 1234–1237

30.

Mary L, Yegnanarayana B (2006) Prosodic features for speaker verification. Proc. of Interspeech, Pittsburgh, Pennsylvania, pp 917–920

31.

MacNeilage PF (1998) The frame/content theory of evolution of speech production. Behav Brain Sci 21:499–546

32.

Krakow RA (1999) Physiological organization of syllables: a review. J Phonetics 27:23–54CrossRef

33.

Atterer M, Ladd DR (2004) On the phonetics and phonology of “segmental anchoring” of F0: evidence from German. J Phonetics 32:177–197CrossRef

34.

Prasanna SRM, Gangashetty SV, Yegnanarayana B (2001) Significance of vowel onset point for speech analysis. Proc. Signal Proc. Com, Indian Institute of Science, pp. 81–88

35.

Prasanna SRM (2004) Event-based analysis of speech. Ph D Thesis, Indian Institute of Technology, Madras

36.

Prasanna SRM, Yegnanarayana B (2005) Detection of vowel onset point events using excitation source information, Proc. of Interspeech, pp 1133–1136

37.

Prasanna SRM, Zachariah JM (2002) Detection of vowel onset point in speech. Proc. IEEE Int Conf Acoust Speech, Signal Processing, Orlando, Fl, USA 4:4159

38.

Ananthapadmanabha TV (1978) Epoch extraction of voice speech. Ph. D. Thesis, Indian institute of Science, Bangalore

39.

Hess W (1983) Pitch determination of speech signals. Springer, BerlinCrossRef

40.

Ananthapadmanabha TV, Yegnanarayana B (1979) Epoch extraction fromlinear prediction residual for identification of closed glottis interval. IEEE Trans ASSP 27:309–319CrossRef

41.

Ananthapadmanabha TV, Yegnanarayana B (1975) Epoch extraction of voice speech. IEEE Trans ASSP 23:562–570CrossRef

42.

Taylor P (2000) Analysis and synthesis of intonation using the tilt model. J Acoust Soc Am 107(3):1697–1714CrossRef

43.

Gussenhoven C, Reepp BH, Rietveld A, Rump HH, Terken J (1997) The perceptual prominence of fundamental frequency peaks. J Acoust Soc Am 102(5):3009–3022CrossRef

44.

Yegnanarayana B (1999) Artificial neural network. Prentice Hall of India, New Delhi

45.

Yegnanarayana B, Kishore SP (2002) AANN-An alternative for GMM for pattern recognition. Neural Netw 15(3):459–469CrossRef

Titel: Prosodic Features for Speaker Recognition
verfasst von: Leena Mary, Ph.D.
Verlag: Springer New York
Buch: Forensic Speaker Recognition
Print ISBN: 978-1-4614-0262-6

Electronic ISBN: 978-1-4614-0263-3

Copyright-Jahr: 2012
DOI: https://doi.org/10.1007/978-1-4614-0263-3_13

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Die Gewinner und Laudatoren des Sustainability Award in Automotive 2024/© Uli Regenscheit | ATZlive, Search Icon, Banner Hanser, Dinko Eror/© Red Hat GmbH, Suresh Vittal/© Alteryx, Additiv gefertigte Teile/© Marina_Skoropadskaya | Getty Images | iStock, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade, chassis.tech plus 2023/© [M] ATZlive / TÜV SÜD PRODUCT SERVICE GMBH, adäsion-Webinar-Matinee/© krystiannawrocki_ Getty Images

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.