Skip to main content

2012 | OriginalPaper | Buchkapitel

13. Prosodic Features for Speaker Recognition

verfasst von : Leena Mary, Ph.D.

Erschienen in: Forensic Speaker Recognition

Verlag: Springer New York

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this chapter the effectiveness of syllable-based prosodic features for speaker recognition is discussed. The term prosody represents a collection of characteristics such as intonation, stress and timing, primarily expressed using variations in pitch, energy and duration at various levels of speech. Prosody reflects the learned/acquired speaking habits of a person and hence contributes for speaker recognition. Because prosodic features are less affected by channel mismatch and noise, they are particularly well suited for speaker forensics, a field that demands accurate identification of suspects with as few mitigating conditions as possible. In this chapter, the author describes a method for extracting prosodic features directly from speech signal. Applying this method, speech is segmented into syllable-like regions using vowel onset points (VOP). The locations of VOPs serve as reference for extraction and representation of prosodic features. The effectiveness of the prosodic features for speaker recognition is demonstrated for extended task of NIST speaker recognition evaluation 2003. Combining evidence from spectral features with that of the proposed prosodic features helps to improve overall speaker recognition accuracy.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Doddington GG (2001) Speaker recognition based on idiolectic differences between speakers. Proc. EUROSPEECH, Aalborg, Denmark, pp 2521–2524 Doddington GG (2001) Speaker recognition based on idiolectic differences between speakers. Proc. EUROSPEECH, Aalborg, Denmark, pp 2521–2524
3.
Zurück zum Zitat Campbell JP (1997) Speaker recognition: a tutorial. Proc IEEE 85(9):1437–1462CrossRef Campbell JP (1997) Speaker recognition: a tutorial. Proc IEEE 85(9):1437–1462CrossRef
4.
Zurück zum Zitat Mary L (2006) Multilevel implicit features for language and speaker recognition. Ph. D. Thesis, Indian Institute of Technology, Madras Mary L (2006) Multilevel implicit features for language and speaker recognition. Ph. D. Thesis, Indian Institute of Technology, Madras
5.
Zurück zum Zitat Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Commun 52:12–40CrossRef Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Commun 52:12–40CrossRef
7.
Zurück zum Zitat Reynolds D, Andrews W, Campbell J, Navratil J, Peskin B, Adami A, Jin Q, Klusacek D, Abramson J, Mihaescu R, Godfrey J, Jones D, Xiang B (2003) The superSID project: exploiting high-level information for high-accuracy speaker recognition Proc. IEEE Int. Conf. Acoust., Speech and Signal Processing, Hong Kong, China, 4, pp 784–787 Reynolds D, Andrews W, Campbell J, Navratil J, Peskin B, Adami A, Jin Q, Klusacek D, Abramson J, Mihaescu R, Godfrey J, Jones D, Xiang B (2003) The superSID project: exploiting high-level information for high-accuracy speaker recognition Proc. IEEE Int. Conf. Acoust., Speech and Signal Processing, Hong Kong, China, 4, pp 784–787
8.
Zurück zum Zitat Shriberg E, Stolcke A, Hakkani-Tur D, Tur G (2000) Prosody-based automatic segmentation of speech into sentences and topics. Speech Commun 32:127–154CrossRef Shriberg E, Stolcke A, Hakkani-Tur D, Tur G (2000) Prosody-based automatic segmentation of speech into sentences and topics. Speech Commun 32:127–154CrossRef
9.
Zurück zum Zitat Sonmez MK, Heck L, Weintraub M, Shriberg E (1997) A lognormal tied mixture model of pitch for prosody-based speaker recognition. Proc. EUROSPEECH, Rhodes, Greece. 3, pp 1391–1394 Sonmez MK, Heck L, Weintraub M, Shriberg E (1997) A lognormal tied mixture model of pitch for prosody-based speaker recognition. Proc. EUROSPEECH, Rhodes, Greece. 3, pp 1391–1394
10.
Zurück zum Zitat Atkinson JE (1978) Correlation analysis of the physiological factors controlling fundamental voice frequency. J Acoust Soc Am 63(1):211–222CrossRef Atkinson JE (1978) Correlation analysis of the physiological factors controlling fundamental voice frequency. J Acoust Soc Am 63(1):211–222CrossRef
11.
Zurück zum Zitat Yegnanarayana B, Prasanna SRM, Zachariah JM, Gupta CS (2005) Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Trans Speech Audio Process 13(4):575–582CrossRef Yegnanarayana B, Prasanna SRM, Zachariah JM, Gupta CS (2005) Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Trans Speech Audio Process 13(4):575–582CrossRef
12.
Zurück zum Zitat Atal B (1972) Automatic speaker recognition based on pitch contours. J Acous Soc Am 52(3):1687–1697CrossRef Atal B (1972) Automatic speaker recognition based on pitch contours. J Acous Soc Am 52(3):1687–1697CrossRef
13.
Zurück zum Zitat Adami AG, Mihaescu R, Reynolds DA, Godfrey JJ (2003) Modeling prosodic dynamics for speaker recognition. Proc. ICASSP, Hong Kong, China, 4, pp 788–791 Adami AG, Mihaescu R, Reynolds DA, Godfrey JJ (2003) Modeling prosodic dynamics for speaker recognition. Proc. ICASSP, Hong Kong, China, 4, pp 788–791
14.
Zurück zum Zitat Makhoul J (1975) Linear prediction: a tutorial review. Proc IEEE 63:561–580CrossRef Makhoul J (1975) Linear prediction: a tutorial review. Proc IEEE 63:561–580CrossRef
15.
Zurück zum Zitat Furui S (1981) Cepstral analysis technique for automatic speaker verification. IEEE Trans Speech Audio Process 29:254–272 Furui S (1981) Cepstral analysis technique for automatic speaker verification. IEEE Trans Speech Audio Process 29:254–272
16.
Zurück zum Zitat Reynolds DA, Rose R (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audio Process 3:72–83CrossRef Reynolds DA, Rose R (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audio Process 3:72–83CrossRef
17.
Zurück zum Zitat Reynolds DA (1996) The effect of handset variability on speaker recognition performance: Experiments on the switchboard corpus. Proc. ICASSP, Atlanta, GA, USA, 1, pp 113–116 Reynolds DA (1996) The effect of handset variability on speaker recognition performance: Experiments on the switchboard corpus. Proc. ICASSP, Atlanta, GA, USA, 1, pp 113–116
18.
Zurück zum Zitat Thyme-Gobbel AE, Hutchins SE (1996) On using prosodic cues in automatic language identification. Proc. Int. Conf. Spoken Language Processing, Philadelphia, PA, USA, 3, pp 1768–1772 Thyme-Gobbel AE, Hutchins SE (1996) On using prosodic cues in automatic language identification. Proc. Int. Conf. Spoken Language Processing, Philadelphia, PA, USA, 3, pp 1768–1772
19.
Zurück zum Zitat Mary L, Yegnanarayana B (2008) Extraction and representation of prosodic features for language and speaker recognition. Speech Commun 50:782–796CrossRef Mary L, Yegnanarayana B (2008) Extraction and representation of prosodic features for language and speaker recognition. Speech Commun 50:782–796CrossRef
20.
Zurück zum Zitat Drygajlo A (2007) Forensic automatic speaker recognition. IEEE Signal Process Mag 132–135 Drygajlo A (2007) Forensic automatic speaker recognition. IEEE Signal Process Mag 132–135
21.
Zurück zum Zitat Shriberg E, Stolcke A (2008) The case for automatic higher level features in forensic speaker recognition. Proc. Interspeech, Brisbane, Australia, pp 1509–1512 Shriberg E, Stolcke A (2008) The case for automatic higher level features in forensic speaker recognition. Proc. Interspeech, Brisbane, Australia, pp 1509–1512
22.
Zurück zum Zitat Rose P (2006) Technical speaker recognition: evaluation, types and testing of evidence. Comp Speech Lang 20:159–1914CrossRef Rose P (2006) Technical speaker recognition: evaluation, types and testing of evidence. Comp Speech Lang 20:159–1914CrossRef
23.
Zurück zum Zitat Shriberg E, Ferrer L, Kajarekar S, Venkataraman A, Stolcke A (2005) Modeling prosodic feature sequences for speaker recognition. Speech Commun 46:455–472CrossRef Shriberg E, Ferrer L, Kajarekar S, Venkataraman A, Stolcke A (2005) Modeling prosodic feature sequences for speaker recognition. Speech Commun 46:455–472CrossRef
24.
Zurück zum Zitat Sonmez MK, Shriberg E, Heck L, Weintraub M (1998) Modeling dynamic prosodic variation for speaker variation. Proc. ICSLP, Sydney, Australia, 7, pp 3189–3192 Sonmez MK, Shriberg E, Heck L, Weintraub M (1998) Modeling dynamic prosodic variation for speaker variation. Proc. ICSLP, Sydney, Australia, 7, pp 3189–3192
25.
Zurück zum Zitat Adami AG, Mihaescu R, Reynolds DA, Godfrey JJ (2003) Modeling prosodic dynamics for speaker recognition. Proc. ICASSP, Hong kong, China, 4, pp 788–791 Adami AG, Mihaescu R, Reynolds DA, Godfrey JJ (2003) Modeling prosodic dynamics for speaker recognition. Proc. ICASSP, Hong kong, China, 4, pp 788–791
26.
Zurück zum Zitat Peskin B, Navratil J, Abramson J, Jones D, Klusacek D, Reynolds D, Xiang B (2003) Using prosodic and conversational features for high-performance speaker recognition: report from JHU WS`02. Proc. ICASSP, Hong kong, China, 4, pp 792–795 Peskin B, Navratil J, Abramson J, Jones D, Klusacek D, Reynolds D, Xiang B (2003) Using prosodic and conversational features for high-performance speaker recognition: report from JHU WS`02. Proc. ICASSP, Hong kong, China, 4, pp 792–795
27.
Zurück zum Zitat Rouas J, Farinas J, Pellegrino F, Andre-Obrecht R (2005) Rhythmic unit extraction and modelling for automatic language identification. Speech Commun 47:436–456CrossRef Rouas J, Farinas J, Pellegrino F, Andre-Obrecht R (2005) Rhythmic unit extraction and modelling for automatic language identification. Speech Commun 47:436–456CrossRef
28.
Zurück zum Zitat Nagarajan T, Murthy HA (2006) Language identification using acoustic log-likelihoods of syllable-like units. Speech Commun 48:913–926CrossRef Nagarajan T, Murthy HA (2006) Language identification using acoustic log-likelihoods of syllable-like units. Speech Commun 48:913–926CrossRef
29.
Zurück zum Zitat Dehak N, Kenny P, Dumouchel P (2007) Continuous prosodic features and formant modeling with joint factor analysis for speaker verification. Proc. of Interspeech, pp 1234–1237 Dehak N, Kenny P, Dumouchel P (2007) Continuous prosodic features and formant modeling with joint factor analysis for speaker verification. Proc. of Interspeech, pp 1234–1237
30.
Zurück zum Zitat Mary L, Yegnanarayana B (2006) Prosodic features for speaker verification. Proc. of Interspeech, Pittsburgh, Pennsylvania, pp 917–920 Mary L, Yegnanarayana B (2006) Prosodic features for speaker verification. Proc. of Interspeech, Pittsburgh, Pennsylvania, pp 917–920
31.
Zurück zum Zitat MacNeilage PF (1998) The frame/content theory of evolution of speech production. Behav Brain Sci 21:499–546 MacNeilage PF (1998) The frame/content theory of evolution of speech production. Behav Brain Sci 21:499–546
32.
Zurück zum Zitat Krakow RA (1999) Physiological organization of syllables: a review. J Phonetics 27:23–54CrossRef Krakow RA (1999) Physiological organization of syllables: a review. J Phonetics 27:23–54CrossRef
33.
Zurück zum Zitat Atterer M, Ladd DR (2004) On the phonetics and phonology of “segmental anchoring” of F0: evidence from German. J Phonetics 32:177–197CrossRef Atterer M, Ladd DR (2004) On the phonetics and phonology of “segmental anchoring” of F0: evidence from German. J Phonetics 32:177–197CrossRef
34.
Zurück zum Zitat Prasanna SRM, Gangashetty SV, Yegnanarayana B (2001) Significance of vowel onset point for speech analysis. Proc. Signal Proc. Com, Indian Institute of Science, pp. 81–88 Prasanna SRM, Gangashetty SV, Yegnanarayana B (2001) Significance of vowel onset point for speech analysis. Proc. Signal Proc. Com, Indian Institute of Science, pp. 81–88
35.
Zurück zum Zitat Prasanna SRM (2004) Event-based analysis of speech. Ph D Thesis, Indian Institute of Technology, Madras Prasanna SRM (2004) Event-based analysis of speech. Ph D Thesis, Indian Institute of Technology, Madras
36.
Zurück zum Zitat Prasanna SRM, Yegnanarayana B (2005) Detection of vowel onset point events using excitation source information, Proc. of Interspeech, pp 1133–1136 Prasanna SRM, Yegnanarayana B (2005) Detection of vowel onset point events using excitation source information, Proc. of Interspeech, pp 1133–1136
37.
Zurück zum Zitat Prasanna SRM, Zachariah JM (2002) Detection of vowel onset point in speech. Proc. IEEE Int Conf Acoust Speech, Signal Processing, Orlando, Fl, USA 4:4159 Prasanna SRM, Zachariah JM (2002) Detection of vowel onset point in speech. Proc. IEEE Int Conf Acoust Speech, Signal Processing, Orlando, Fl, USA 4:4159
38.
Zurück zum Zitat Ananthapadmanabha TV (1978) Epoch extraction of voice speech. Ph. D. Thesis, Indian institute of Science, Bangalore Ananthapadmanabha TV (1978) Epoch extraction of voice speech. Ph. D. Thesis, Indian institute of Science, Bangalore
39.
Zurück zum Zitat Hess W (1983) Pitch determination of speech signals. Springer, BerlinCrossRef Hess W (1983) Pitch determination of speech signals. Springer, BerlinCrossRef
40.
Zurück zum Zitat Ananthapadmanabha TV, Yegnanarayana B (1979) Epoch extraction fromlinear prediction residual for identification of closed glottis interval. IEEE Trans ASSP 27:309–319CrossRef Ananthapadmanabha TV, Yegnanarayana B (1979) Epoch extraction fromlinear prediction residual for identification of closed glottis interval. IEEE Trans ASSP 27:309–319CrossRef
41.
Zurück zum Zitat Ananthapadmanabha TV, Yegnanarayana B (1975) Epoch extraction of voice speech. IEEE Trans ASSP 23:562–570CrossRef Ananthapadmanabha TV, Yegnanarayana B (1975) Epoch extraction of voice speech. IEEE Trans ASSP 23:562–570CrossRef
42.
Zurück zum Zitat Taylor P (2000) Analysis and synthesis of intonation using the tilt model. J Acoust Soc Am 107(3):1697–1714CrossRef Taylor P (2000) Analysis and synthesis of intonation using the tilt model. J Acoust Soc Am 107(3):1697–1714CrossRef
43.
Zurück zum Zitat Gussenhoven C, Reepp BH, Rietveld A, Rump HH, Terken J (1997) The perceptual prominence of fundamental frequency peaks. J Acoust Soc Am 102(5):3009–3022CrossRef Gussenhoven C, Reepp BH, Rietveld A, Rump HH, Terken J (1997) The perceptual prominence of fundamental frequency peaks. J Acoust Soc Am 102(5):3009–3022CrossRef
44.
Zurück zum Zitat Yegnanarayana B (1999) Artificial neural network. Prentice Hall of India, New Delhi Yegnanarayana B (1999) Artificial neural network. Prentice Hall of India, New Delhi
45.
Zurück zum Zitat Yegnanarayana B, Kishore SP (2002) AANN-An alternative for GMM for pattern recognition. Neural Netw 15(3):459–469CrossRef Yegnanarayana B, Kishore SP (2002) AANN-An alternative for GMM for pattern recognition. Neural Netw 15(3):459–469CrossRef
Metadaten
Titel
Prosodic Features for Speaker Recognition
verfasst von
Leena Mary, Ph.D.
Copyright-Jahr
2012
Verlag
Springer New York
DOI
https://doi.org/10.1007/978-1-4614-0263-3_13

Neuer Inhalt