Skip to main content
Top

2019 | OriginalPaper | Chapter

Speaker Recognition in Orthogonal Complement of Time Session Variability Subspace

Authors : Satoru Tsuge, Shingo Kuroiwa

Published in: Intelligent Interactive Multimedia Systems and Services

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

A time session variability between the enrollment data and the recognized data degrades speaker recognition performance. Hence, the time session variability is one of the most important issues in the speaker recognition technology. In this paper, we propose a robust speaker recognition method for the time session variability. The proposed method estimates a time session variability subspace. Then, the proposed method carries out the speaker recognition in the orthogonal complement of the time session variability subspace. In addition, we incorporate a linear discriminant analysis method into the proposed method. In order to evaluate the proposed method, we conducted a speaker identification experiment. Experimental results show that the proposed method improves speaker identification performance of baseline.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Kinnunen, T., Li, H.: An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 52(1), 12–40 (2010)CrossRef Kinnunen, T., Li, H.: An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 52(1), 12–40 (2010)CrossRef
2.
go back to reference Matsui, T., Nishitani, T., Furui, S.: A study of model and a priori threshold updating in speaker verification. IEICE Trans. J81-DII(2), 268–276 (1998). (in Japanese) Matsui, T., Nishitani, T., Furui, S.: A study of model and a priori threshold updating in speaker verification. IEICE Trans. J81-DII(2), 268–276 (1998). (in Japanese)
3.
go back to reference Kenny, P., Boulianne, G., Ouellet, P., Dumouchel, P.: Joint factor analysis versus eigenchannels in speaker recognition. IEEE Trans. Audio Speech Lang. Process. 15(4), 1435–1447 (2007)CrossRef Kenny, P., Boulianne, G., Ouellet, P., Dumouchel, P.: Joint factor analysis versus eigenchannels in speaker recognition. IEEE Trans. Audio Speech Lang. Process. 15(4), 1435–1447 (2007)CrossRef
4.
go back to reference Kenny, P., Boulianne, G., Ouellet, P., Dumouchel, P.: Speaker and session variability in GMM-based speaker verification. IEEE Trans. Audio Speech Lang. Process. 15(4), 1448–1460 (2007)CrossRef Kenny, P., Boulianne, G., Ouellet, P., Dumouchel, P.: Speaker and session variability in GMM-based speaker verification. IEEE Trans. Audio Speech Lang. Process. 15(4), 1448–1460 (2007)CrossRef
5.
go back to reference Kenny, P., Ouellet, P., Dehak, N., Gupta, V., Dumouchel, P.: A study of interspeaker variability in speaker verification. IEEE Trans. Audio Speech Lang. Process. 16(5), 980–988 (2008)CrossRef Kenny, P., Ouellet, P., Dehak, N., Gupta, V., Dumouchel, P.: A study of interspeaker variability in speaker verification. IEEE Trans. Audio Speech Lang. Process. 16(5), 980–988 (2008)CrossRef
6.
go back to reference Kenny, P.: Bayesian speaker verification with heavy-tailed priors. In: Proceedings of Odyssey (2010) Kenny, P.: Bayesian speaker verification with heavy-tailed priors. In: Proceedings of Odyssey (2010)
7.
go back to reference Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)CrossRef Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)CrossRef
8.
go back to reference Makinae, H., Osanai, T., Kamada, T., Tanimoto, M.: Construction and preliminary analysis of a large-scale bone-conducted speech database. IEICE Techn. Rep. Speech 107(165), 97–102 (2007). (in Japanese) Makinae, H., Osanai, T., Kamada, T., Tanimoto, M.: Construction and preliminary analysis of a large-scale bone-conducted speech database. IEICE Techn. Rep. Speech 107(165), 97–102 (2007). (in Japanese)
9.
go back to reference Furui, S., Maekawa, K., Isahara, H.: A Japanese national project on spontaneous speech corpus and processing technology. In: Proceedings of ASR 2000, pp. 244–248 (2000) Furui, S., Maekawa, K., Isahara, H.: A Japanese national project on spontaneous speech corpus and processing technology. In: Proceedings of ASR 2000, pp. 244–248 (2000)
10.
go back to reference Partridge, M., Calvo, R.A.: Fast dimensionality reduction and simple PCA. Intell. Data Anal. 2, 203–214 (1998)CrossRef Partridge, M., Calvo, R.A.: Fast dimensionality reduction and simple PCA. Intell. Data Anal. 2, 203–214 (1998)CrossRef
11.
go back to reference Tsuge, S., Kuroiwa, S.: AWA long-term recording speech corpus (AWA-LTR). In: Proceedings of 2013 International Workshop on Nonlinear Circuits, Communication and Signal Processing (NCSP 2013), pp. 17–20 (2013) Tsuge, S., Kuroiwa, S.: AWA long-term recording speech corpus (AWA-LTR). In: Proceedings of 2013 International Workshop on Nonlinear Circuits, Communication and Signal Processing (NCSP 2013), pp. 17–20 (2013)
12.
go back to reference Garcia-Romero, D., Espy-Wilson, C.Y.: Analysis of i-vector length normalization in speaker recognition systems. In: Proceedings of Interspeech, pp. 249–252 (2011) Garcia-Romero, D., Espy-Wilson, C.Y.: Analysis of i-vector length normalization in speaker recognition systems. In: Proceedings of Interspeech, pp. 249–252 (2011)
13.
go back to reference Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (2011) Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (2011)
Metadata
Title
Speaker Recognition in Orthogonal Complement of Time Session Variability Subspace
Authors
Satoru Tsuge
Shingo Kuroiwa
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-319-92231-7_11

Premium Partner