nach oben

Erschienen in:

2017 | OriginalPaper | Buchkapitel

A Comparison of Covariance Matrix and i-vector Based Speaker Recognition

verfasst von : Nikša Jakovljević, Ivan Jokić, Slobodan Jošić, Vlado Delić

Erschienen in: Speech and Computer

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The paper presents results of an evaluation of covariance matrix and i-vector based speaker identification methods on Serbian S70W100s120 database. Open set speaker identification evaluation scheme was adopted. The number of target speakers and the number of impostors were 20 and 60 respectively. Additional utterances from 41 speakers were used for training. Amount of data for modeling a target speaker was limited to about 4 s of speech. In this study, the i-vector base approach showed significantly better performance (equal error rate EER ~5%) than the covariance matrix based approach (EER ~16%). This small EER for the i-vector based approach was obtained after substantial reduction of the number of the parameters in universal background model, i-vector transformation matrix and Gaussian probabilistic linear discriminant analysis that is typically reported in the papers. Additionally, these experiments showed that cepstral mean and variance normalization can deteriorate EER in case of a single channel.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Big Data, Deep Learning – At the Edge of X-Ray Speaker Analysis

Nächstes Kapitel A Trainable Method for the Phonetic Similarity Search in German Proper Names

Hennerbert, J.: Speaker recognition, overview. In: Encyclopedia of Biometrics. Springer Science + Business Media, New York (2009)

Gonzalez-Rodriguez, J.: Evaluating automatic speaker recognition systems: an overview of the NIST speaker recognition evaluations (1996–2014). Loquens 1(1), e007 (2014)MathSciNetCrossRef

Kohler, T.: The 2010 NIST Speaker Recognition Evaluation. http://archive.signalprocessingsociety.org/technical-committees/list/sl-tc/spl-nl/2010-07/NIST-SRE/. Accessed Mar 2017

McLaren, M., Ferrer, L., Castán, D., Lawson, A.: The 2016 speakers in the wild speaker recognition evaluation. In: INTERSPEECH 2016, San Francisco, CA, USA, pp. 823–827 (2016)

Matejka, P., Glembek, O., Castalado, F., Alam, M.J., Plchot, O., Kenny, P., Burget, L., Černocky, J.: Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification. In: ICASSP 2011, Prague, Czech Republic, pp. 4828–4831 (2011)

Jokić, I., Delić, V., Jokić, S., Perić, Z.: Automatic speaker recognition dependency on both the shape of auditory critical bands and speaker discriminative MFCCs. Adv. Electr. Comput. Eng. 15(4), 25–32 (2015)CrossRef

Novotny, O., Matejka, P., Plchot, O., Glembek, O., Burget, L., Černocky, J.: Analysis of speaker recognition systems in realistic scenarios of the SITW 2016 challenge. In: INTERSPEECH 2016, San Francisco, CA, USA, pp. 828–832 (2016)

Sadjadi, S., Ganapathy, S., Pelecanos, J.: The IBM speaker recognition system: recent advances and error analysis. In: INTERSPEECH 2016, San Francisco, CA, USA, pp. 3633–3637 (2016)

Hasan, T., Liu, G., Sadjadi, S.O., Shokouhi, N., Boril, H., Ziaei, A., Misra, A., Godin, K.W., Hansen, J.: UTD-CRSS systems for 2012 NIST speaker recognition evaluation. In: ICASSP 2013, Vancouver, BC, Canada, pp. 6783–6787 (2013)

10.

Garcia-Romero, D., Espy-Wilson, C: Analysis of i-vector length normalization in speaker recognition systems. In: INTERSPEECH 2011, Florence, Italy, pp. 249–252 (2011)

11.

Wildermoth, B.: Text-Independent Speaker Recognition Using Source Based Features. Master thesis, Griffith University, Australia (2001)

12.

Gelembek, O., Burget, L., Matejka, P., Karafiat, M., Kenny, P.: Simplification and optimization of i-vector extraction. In: ICASSP 2011, Prague, Czech Republic, pp. 4516–4519 (2011)

13.

Kenny, P.: Joint factor analysis of speaker and session variability: Theory and algorithms. Technical report CRIM-06/08-13, CRIM, Montreal (2005)

14.

Sadjadi, S., Slaney, M., Heck, L.: MSR Identity Toolbox: A MATLAB Toolbox for Speaker Recognition Research. Technical report, Microsoft Research, Conversational Systems Research Center (2013)

15.

Brookes, M.: VOICEBOX. http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html

16.

Delić, V., Sečujski, M., Jakovljević, N., Pekar, D., Mišković, D., Popović, B., Ostrogonac, S., Bojanić, M., Knežević, D.: Speech and language resources within speech recognition and synthesis systems for Serbian and Kindred South Slavic Languages. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS, vol. 8113, pp. 319–326. Springer, Cham (2013). doi:10.1007/978-3-319-01931-4_42 CrossRef

Titel: A Comparison of Covariance Matrix and i-vector Based Speaker Recognition
verfasst von: Nikša Jakovljević
Ivan Jokić
Slobodan Jošić
Vlado Delić
Verlag: Springer International Publishing
Buch: Speech and Computer
Print ISBN: 978-3-319-66428-6

Electronic ISBN: 978-3-319-66429-3

Copyright-Jahr: 2017
DOI: https://doi.org/10.1007/978-3-319-66429-3_3

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner