Skip to main content

2017 | OriginalPaper | Buchkapitel

A Comparison of Covariance Matrix and i-vector Based Speaker Recognition

verfasst von : Nikša Jakovljević, Ivan Jokić, Slobodan Jošić, Vlado Delić

Erschienen in: Speech and Computer

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The paper presents results of an evaluation of covariance matrix and i-vector based speaker identification methods on Serbian S70W100s120 database. Open set speaker identification evaluation scheme was adopted. The number of target speakers and the number of impostors were 20 and 60 respectively. Additional utterances from 41 speakers were used for training. Amount of data for modeling a target speaker was limited to about 4 s of speech. In this study, the i-vector base approach showed significantly better performance (equal error rate EER ~5%) than the covariance matrix based approach (EER ~16%). This small EER for the i-vector based approach was obtained after substantial reduction of the number of the parameters in universal background model, i-vector transformation matrix and Gaussian probabilistic linear discriminant analysis that is typically reported in the papers. Additionally, these experiments showed that cepstral mean and variance normalization can deteriorate EER in case of a single channel.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Hennerbert, J.: Speaker recognition, overview. In: Encyclopedia of Biometrics. Springer Science + Business Media, New York (2009) Hennerbert, J.: Speaker recognition, overview. In: Encyclopedia of Biometrics. Springer Science + Business Media, New York (2009)
2.
Zurück zum Zitat Gonzalez-Rodriguez, J.: Evaluating automatic speaker recognition systems: an overview of the NIST speaker recognition evaluations (1996–2014). Loquens 1(1), e007 (2014)MathSciNetCrossRef Gonzalez-Rodriguez, J.: Evaluating automatic speaker recognition systems: an overview of the NIST speaker recognition evaluations (1996–2014). Loquens 1(1), e007 (2014)MathSciNetCrossRef
4.
Zurück zum Zitat McLaren, M., Ferrer, L., Castán, D., Lawson, A.: The 2016 speakers in the wild speaker recognition evaluation. In: INTERSPEECH 2016, San Francisco, CA, USA, pp. 823–827 (2016) McLaren, M., Ferrer, L., Castán, D., Lawson, A.: The 2016 speakers in the wild speaker recognition evaluation. In: INTERSPEECH 2016, San Francisco, CA, USA, pp. 823–827 (2016)
5.
Zurück zum Zitat Matejka, P., Glembek, O., Castalado, F., Alam, M.J., Plchot, O., Kenny, P., Burget, L., Černocky, J.: Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification. In: ICASSP 2011, Prague, Czech Republic, pp. 4828–4831 (2011) Matejka, P., Glembek, O., Castalado, F., Alam, M.J., Plchot, O., Kenny, P., Burget, L., Černocky, J.: Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification. In: ICASSP 2011, Prague, Czech Republic, pp. 4828–4831 (2011)
6.
Zurück zum Zitat Jokić, I., Delić, V., Jokić, S., Perić, Z.: Automatic speaker recognition dependency on both the shape of auditory critical bands and speaker discriminative MFCCs. Adv. Electr. Comput. Eng. 15(4), 25–32 (2015)CrossRef Jokić, I., Delić, V., Jokić, S., Perić, Z.: Automatic speaker recognition dependency on both the shape of auditory critical bands and speaker discriminative MFCCs. Adv. Electr. Comput. Eng. 15(4), 25–32 (2015)CrossRef
7.
Zurück zum Zitat Novotny, O., Matejka, P., Plchot, O., Glembek, O., Burget, L., Černocky, J.: Analysis of speaker recognition systems in realistic scenarios of the SITW 2016 challenge. In: INTERSPEECH 2016, San Francisco, CA, USA, pp. 828–832 (2016) Novotny, O., Matejka, P., Plchot, O., Glembek, O., Burget, L., Černocky, J.: Analysis of speaker recognition systems in realistic scenarios of the SITW 2016 challenge. In: INTERSPEECH 2016, San Francisco, CA, USA, pp. 828–832 (2016)
8.
Zurück zum Zitat Sadjadi, S., Ganapathy, S., Pelecanos, J.: The IBM speaker recognition system: recent advances and error analysis. In: INTERSPEECH 2016, San Francisco, CA, USA, pp. 3633–3637 (2016) Sadjadi, S., Ganapathy, S., Pelecanos, J.: The IBM speaker recognition system: recent advances and error analysis. In: INTERSPEECH 2016, San Francisco, CA, USA, pp. 3633–3637 (2016)
9.
Zurück zum Zitat Hasan, T., Liu, G., Sadjadi, S.O., Shokouhi, N., Boril, H., Ziaei, A., Misra, A., Godin, K.W., Hansen, J.: UTD-CRSS systems for 2012 NIST speaker recognition evaluation. In: ICASSP 2013, Vancouver, BC, Canada, pp. 6783–6787 (2013) Hasan, T., Liu, G., Sadjadi, S.O., Shokouhi, N., Boril, H., Ziaei, A., Misra, A., Godin, K.W., Hansen, J.: UTD-CRSS systems for 2012 NIST speaker recognition evaluation. In: ICASSP 2013, Vancouver, BC, Canada, pp. 6783–6787 (2013)
10.
Zurück zum Zitat Garcia-Romero, D., Espy-Wilson, C: Analysis of i-vector length normalization in speaker recognition systems. In: INTERSPEECH 2011, Florence, Italy, pp. 249–252 (2011) Garcia-Romero, D., Espy-Wilson, C: Analysis of i-vector length normalization in speaker recognition systems. In: INTERSPEECH 2011, Florence, Italy, pp. 249–252 (2011)
11.
Zurück zum Zitat Wildermoth, B.: Text-Independent Speaker Recognition Using Source Based Features. Master thesis, Griffith University, Australia (2001) Wildermoth, B.: Text-Independent Speaker Recognition Using Source Based Features. Master thesis, Griffith University, Australia (2001)
12.
Zurück zum Zitat Gelembek, O., Burget, L., Matejka, P., Karafiat, M., Kenny, P.: Simplification and optimization of i-vector extraction. In: ICASSP 2011, Prague, Czech Republic, pp. 4516–4519 (2011) Gelembek, O., Burget, L., Matejka, P., Karafiat, M., Kenny, P.: Simplification and optimization of i-vector extraction. In: ICASSP 2011, Prague, Czech Republic, pp. 4516–4519 (2011)
13.
Zurück zum Zitat Kenny, P.: Joint factor analysis of speaker and session variability: Theory and algorithms. Technical report CRIM-06/08-13, CRIM, Montreal (2005) Kenny, P.: Joint factor analysis of speaker and session variability: Theory and algorithms. Technical report CRIM-06/08-13, CRIM, Montreal (2005)
14.
Zurück zum Zitat Sadjadi, S., Slaney, M., Heck, L.: MSR Identity Toolbox: A MATLAB Toolbox for Speaker Recognition Research. Technical report, Microsoft Research, Conversational Systems Research Center (2013) Sadjadi, S., Slaney, M., Heck, L.: MSR Identity Toolbox: A MATLAB Toolbox for Speaker Recognition Research. Technical report, Microsoft Research, Conversational Systems Research Center (2013)
16.
Zurück zum Zitat Delić, V., Sečujski, M., Jakovljević, N., Pekar, D., Mišković, D., Popović, B., Ostrogonac, S., Bojanić, M., Knežević, D.: Speech and language resources within speech recognition and synthesis systems for Serbian and Kindred South Slavic Languages. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS, vol. 8113, pp. 319–326. Springer, Cham (2013). doi:10.1007/978-3-319-01931-4_42 CrossRef Delić, V., Sečujski, M., Jakovljević, N., Pekar, D., Mišković, D., Popović, B., Ostrogonac, S., Bojanić, M., Knežević, D.: Speech and language resources within speech recognition and synthesis systems for Serbian and Kindred South Slavic Languages. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS, vol. 8113, pp. 319–326. Springer, Cham (2013). doi:10.​1007/​978-3-319-01931-4_​42 CrossRef
Metadaten
Titel
A Comparison of Covariance Matrix and i-vector Based Speaker Recognition
verfasst von
Nikša Jakovljević
Ivan Jokić
Slobodan Jošić
Vlado Delić
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-66429-3_3

Premium Partner