Skip to main content
Erschienen in: International Journal of Speech Technology 1/2016

07.12.2015

A study on the roles of total variability space and session variability modeling in speaker recognition

verfasst von: A. K. Sarkar, J. F. Bonastre, D. Matrouf

Erschienen in: International Journal of Speech Technology | Ausgabe 1/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Speaker verification (SV) using i-vector concept becomes state-of-the-art. In this technique, speakers are projected onto the total variability space and represented by vectors called i-vectors. During testing, the i-vectors of the test speech segment and claimant are conditioned to compensate for the session variability before scoring. So, i-vector system can be viewed as two processing blocks: one is total variability space and the other is post-processing module. Several questions arise, such as, (i) which part of the i-vector system plays a major role in speaker verification: total variability space or post-processing task; (ii) is the post-processing module intrinsic to the total variability space? The motivation of this paper is to partially answer these questions by proposing several simpler speaker characterization systems for speaker verification, where speakers are represented by their speaker characterization vectors (SCVs). The SCVs are obtained by uniform segmentation of the speakers gaussian mixture models (GMMs)- and maximum likelihood linear regression (MLLR) super-vectors. We consider two adaptation approaches for GMM super-vector: one is maximum a posteriori and other is MLLR. Similarly to the i-vector, SCVs are post-processed for session variability compensation during testing. The proposed system shows promising performance when compared to the classical i-vector system which indicates that the post-processing task plays an major role in i-vector based SV system and is not intrinsic to the total variability space. All experimental results are shown on NIST 2008 SRE core condition.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Bonastre, J. F., Scheffer, N., Fredouille, C., & Matrouf, D. (2004). Nist’04 speaker recognition evaluation campaign: New LIA speaker detection plateform based on ALIZE toolkit, In Proceedings of NIST 2004 Speaker Recognition Workshop. Bonastre, J. F., Scheffer, N., Fredouille, C., & Matrouf, D. (2004). Nist’04 speaker recognition evaluation campaign: New LIA speaker detection plateform based on ALIZE toolkit, In Proceedings of NIST 2004 Speaker Recognition Workshop.
Zurück zum Zitat Bousquet, P. M., Matrouf, D., & Bonastre, J. F. (2011). Intersession compensation and scoring methods in the i-vectors space for speaker recognition, In Proceedings of Interspeech (pp. 485–488). Bousquet, P. M., Matrouf, D., & Bonastre, J. F. (2011). Intersession compensation and scoring methods in the i-vectors space for speaker recognition, In Proceedings of Interspeech (pp. 485–488).
Zurück zum Zitat Bousquet, P. M. et al. (2012). Variance-spectra based normalization for i-vector standard and probabilistic linear discriminant analysis, In Proceenings of Odyssey Speaker and Language Recognition Workshop 2012. Bousquet, P. M. et al. (2012). Variance-spectra based normalization for i-vector standard and probabilistic linear discriminant analysis, In Proceenings of Odyssey Speaker and Language Recognition Workshop 2012.
Zurück zum Zitat Campbell, W. M., Sturim, D. E., & Reynolds, D. A. (2006). Support vector machines using GMM supervectors for speaker verification. IEEE Signal Processing Letter, 13, 308–311.CrossRef Campbell, W. M., Sturim, D. E., & Reynolds, D. A. (2006). Support vector machines using GMM supervectors for speaker verification. IEEE Signal Processing Letter, 13, 308–311.CrossRef
Zurück zum Zitat Dehak, N., et al. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech and Language Processing, 19, 788–798.CrossRef Dehak, N., et al. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech and Language Processing, 19, 788–798.CrossRef
Zurück zum Zitat Dehak, N., et al. (2009). Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification, In Proceedings of Interspeech (pp. 1559–1562). Dehak, N., et al. (2009). Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification, In Proceedings of Interspeech (pp. 1559–1562).
Zurück zum Zitat Duda, R., Hart, P., & Stork, D. (2001). Pattern Classification. New York: Wiley.MATH Duda, R., Hart, P., & Stork, D. (2001). Pattern Classification. New York: Wiley.MATH
Zurück zum Zitat Gauvain, J. L., & Lee, C. H. (1994). Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing, 2, 291–298.CrossRef Gauvain, J. L., & Lee, C. H. (1994). Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing, 2, 291–298.CrossRef
Zurück zum Zitat Hatch, A., Kajarekar, S., & Stolcke, A. (2006). Within-class covariance normalization for SVM-based speaker recognition, In Proceedings of International Conference Spoken Language Processing (ICSLP), pp. 1471–1474. Hatch, A., Kajarekar, S., & Stolcke, A. (2006). Within-class covariance normalization for SVM-based speaker recognition, In Proceedings of International Conference Spoken Language Processing (ICSLP), pp. 1471–1474.
Zurück zum Zitat Karam, Z. N., & Campbell, W. M. (2008). A multi-class MLLR kernel for SVM speaker recognition, In Proceedings of IEEE International Conference Acoustics Speech Signal Processing (ICASSP) pp. 4117–4120. Karam, Z. N., & Campbell, W. M. (2008). A multi-class MLLR kernel for SVM speaker recognition, In Proceedings of IEEE International Conference Acoustics Speech Signal Processing (ICASSP) pp. 4117–4120.
Zurück zum Zitat Kenny, P. (2005). Joint factor analysis of speaker and session variability: Theory and algorithms, Technical Report CRIM-06/08-13 Montreal, CRIM. Kenny, P. (2005). Joint factor analysis of speaker and session variability: Theory and algorithms, Technical Report CRIM-06/08-13 Montreal, CRIM.
Zurück zum Zitat Leggetter, C., & Woodland, P. (1995). Maximum likelihood linear regression for speaker adaptation of HMMs. Computer Speech and Language, 9, 171–186.CrossRef Leggetter, C., & Woodland, P. (1995). Maximum likelihood linear regression for speaker adaptation of HMMs. Computer Speech and Language, 9, 171–186.CrossRef
Zurück zum Zitat Ferras, M., et al. (2007). Constrained MLLR for speaker recognition, In Proceedings of IEEE International Conference Acoustics Speech Signal Processing (ICASSP) (pp. 53–56). Ferras, M., et al. (2007). Constrained MLLR for speaker recognition, In Proceedings of IEEE International Conference Acoustics Speech Signal Processing (ICASSP) (pp. 53–56).
Zurück zum Zitat Prince, S. J. (2012). Computer vision: Models learning and inference. Cambridge: Cambridge University Press.CrossRefMATH Prince, S. J. (2012). Computer vision: Models learning and inference. Cambridge: Cambridge University Press.CrossRefMATH
Zurück zum Zitat Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10, 19–41.CrossRef Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10, 19–41.CrossRef
Zurück zum Zitat Romero, D. G., & Espy-Wilson, C. Y. (2011). Analysis of i-vector length normalization in speaker recognition systems, In Proceedings of Interspeech (pp. 249–252). Romero, D. G., & Espy-Wilson, C. Y. (2011). Analysis of i-vector length normalization in speaker recognition systems, In Proceedings of Interspeech (pp. 249–252).
Zurück zum Zitat Sarkar, A. K., & Umesh, S. (2011). Eigen-voice based anchor modeling system for speaker identification using MLLR super-vector, In Proceedings of Interspeech (pp. 2357–2360). Sarkar, A. K., & Umesh, S. (2011). Eigen-voice based anchor modeling system for speaker identification using MLLR super-vector, In Proceedings of Interspeech (pp. 2357–2360).
Zurück zum Zitat Sarkar, A. K., & Umesh, S. (2010). Fast computation of speaker characterization vector using MLLR and sufficient statistics in anchor model framework, In Proceedings of Interspeech (pp. 2738–2741). Sarkar, A. K., & Umesh, S. (2010). Fast computation of speaker characterization vector using MLLR and sufficient statistics in anchor model framework, In Proceedings of Interspeech (pp. 2738–2741).
Zurück zum Zitat Sarkar, A. K., Umesh, S., & Bonastre, J. F. (2012). Computationally efficient speaker identification using fast-MLLR based anchor modeling, In Proceedings of IEEE International Conference Acoustics Speech Signal Processing (ICASSP) (pp. 4357–4360). Sarkar, A. K., Umesh, S., & Bonastre, J. F. (2012). Computationally efficient speaker identification using fast-MLLR based anchor modeling, In Proceedings of IEEE International Conference Acoustics Speech Signal Processing (ICASSP) (pp. 4357–4360).
Zurück zum Zitat Senoussaoui, M. et al. (2011). Mixture of PLDA models in i-vector space for gender-independent speaker recognition, In Proceedings of Interspeech (pp. 25–28). Senoussaoui, M. et al. (2011). Mixture of PLDA models in i-vector space for gender-independent speaker recognition, In Proceedings of Interspeech (pp. 25–28).
Zurück zum Zitat Stolcke, A. et al. (2005). MLLR transforms as features in speaker recognition, In Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH) (pp. 2425–2428). Stolcke, A. et al. (2005). MLLR transforms as features in speaker recognition, In Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH) (pp. 2425–2428).
Zurück zum Zitat Sturim, D., Reynolds, D. A., Singer, E., & Campbell, J. P. (2001). Speaker indexing in large audio databases using anchor models, In Proceedings of IEEE International Conference Acoustics Speech Signal Processing (ICASSP) (pp. 429–432). Sturim, D., Reynolds, D. A., Singer, E., & Campbell, J. P. (2001). Speaker indexing in large audio databases using anchor models, In Proceedings of IEEE International Conference Acoustics Speech Signal Processing (ICASSP) (pp. 429–432).
Metadaten
Titel
A study on the roles of total variability space and session variability modeling in speaker recognition
verfasst von
A. K. Sarkar
J. F. Bonastre
D. Matrouf
Publikationsdatum
07.12.2015
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 1/2016
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-015-9324-2

Weitere Artikel der Ausgabe 1/2016

International Journal of Speech Technology 1/2016 Zur Ausgabe

Neuer Inhalt