Skip to main content
Erschienen in: International Journal of Speech Technology 3/2018

27.11.2017

Performance comparison of multitaper techniques for speaker verification with expressive speech

verfasst von: K. C. Narendra, R. Kumaraswamy, Sanjeev Gurugopinath

Erschienen in: International Journal of Speech Technology | Ausgabe 3/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we provide a comparative study of spectral front-end features used as representations for speech signals by processing multitaper magnitude and phase spectra, for speaker verification with expressive speech. In particular, the multitaper modified group delay function (MT-MOGDF) and multitaper magnitude (MT-MAG) spectra of the speech signals are employed to obtain low variance estimates of speech spectra. We observe that the cues that aid in representation of expressive speech are evident in the MT-MOGDF spectrum than the MT-MAG spectrum in terms of mean Formant value and Formant bandwidth. Our extensive experimental study on a speaker verification system with a Gaussian mixture model based universal background model classifier on expressive speech using the IITKGP-SESC and EMODB databases show that MT-MOGDF performs better than MT-MAG technique, in terms of equal error rate and minimum decision cost function. This improvement due to MT-MOGDF is owed to a better representation and a low-variance estimate of the speech spectrum. Our results highlight the utility of MT-MOGDF as a potential alternative for MT-MAG representation for speaker verification problems in general.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Atal, B. S. (1976). Automatic recognition of speakers from their voices. Proceedings of the IEEE, 64(4), 460–475.CrossRef Atal, B. S. (1976). Automatic recognition of speakers from their voices. Proceedings of the IEEE, 64(4), 460–475.CrossRef
Zurück zum Zitat Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005). A database of german emotional speech. Interspeech, 5, 1517–1520. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005). A database of german emotional speech. Interspeech, 5, 1517–1520.
Zurück zum Zitat Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366.CrossRef Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366.CrossRef
Zurück zum Zitat Hegde, R. M., Murthy, H. A., & Rao, G. R. (2004). Application of the modified group delay function to speaker identification and discrimination. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004 (ICASSP’04) (Vol. 1, pp. I–517). Hegde, R. M., Murthy, H. A., & Rao, G. R. (2004). Application of the modified group delay function to speaker identification and discrimination. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004 (ICASSP’04) (Vol. 1, pp. I–517).
Zurück zum Zitat Kinnunen, T., Saeidi, R., Sedlák, F., Lee, K. A., Sandberg, J., Hansson-Sandsten, M., et al. (2012). Low-variance multitaper MFCC features: A case study in robust speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 20(7), 1990–2001.CrossRef Kinnunen, T., Saeidi, R., Sedlák, F., Lee, K. A., Sandberg, J., Hansson-Sandsten, M., et al. (2012). Low-variance multitaper MFCC features: A case study in robust speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 20(7), 1990–2001.CrossRef
Zurück zum Zitat Koolagudi, S. G., Maity, S., Vuppala, A. K., Chakrabarti, S., & Rao, K. S. (2009). IITKGP-SESC: Speech database for emotion analysis. In: IC3, Springer, pp. 485–492. Koolagudi, S. G., Maity, S., Vuppala, A. K., Chakrabarti, S., & Rao, K. S. (2009). IITKGP-SESC: Speech database for emotion analysis. In: IC3, Springer, pp. 485–492.
Zurück zum Zitat Makhoul, J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63(4), 561–580.CrossRef Makhoul, J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63(4), 561–580.CrossRef
Zurück zum Zitat Narendra, K. C., Kumaraswamy, R., & Gurugopinath, S. (2017). On a novel speech representation using multitapered modified group delay function. arXiv preprint arXiv:170609386 . Narendra, K. C., Kumaraswamy, R., & Gurugopinath, S. (2017). On a novel speech representation using multitapered modified group delay function. arXiv preprint arXiv:​170609386 .
Zurück zum Zitat Thomson, D. J. (1982). Spectrum estimation and harmonic analysis. Proceedings of the IEEE, 70(9), 1055–1096.CrossRef Thomson, D. J. (1982). Spectrum estimation and harmonic analysis. Proceedings of the IEEE, 70(9), 1055–1096.CrossRef
Metadaten
Titel
Performance comparison of multitaper techniques for speaker verification with expressive speech
verfasst von
K. C. Narendra
R. Kumaraswamy
Sanjeev Gurugopinath
Publikationsdatum
27.11.2017
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 3/2018
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-017-9479-0

Weitere Artikel der Ausgabe 3/2018

International Journal of Speech Technology 3/2018 Zur Ausgabe

Neuer Inhalt