Skip to main content

2012 | OriginalPaper | Buchkapitel

14. Speaker Identification Using Intermediate Matching Kernel-Based Support Vector Machines

verfasst von : A. D. Dileep, M. Tech., C. Chandra Sekhar, Ph.D.

Erschienen in: Forensic Speaker Recognition

Verlag: Springer New York

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Gaussian mixture model (GMM) based approaches have been commonly used for speaker recognition tasks. Methods for estimation of parameters of GMMs include the expectation-maximization method which is a non-discriminative learning based method and the large margin method which is a discriminative learning based method. Discriminative classifier based approaches to speaker recognition include support vector machine (SVM) based classifiers using dynamic kernels such as generalized linear discriminant sequence kernel, probabilistic sequence kernel, GMM supervector kernel and Bhattacharyya distance based kernel. Recently, the intermediate matching kernel (IMK) has been proposed as a dynamic kernel for recognition of objects in an image represented using a set of local feature vectors. The IMK-based SVMs give a better performance than the state-of-the-art GMM-based approaches for speaker identification tasks, because they are well suited for meeting the basic challenge of providing reliable scores of intra-speaker variation of suspects and scores of inter-speaker variation of the potential population which is crucial to law enforcement and counter terrorism agencies in evaluating the strength of the evidence at hand. Thus, the IMK-based SVMs can be used to build the speaker recognition models in the FSR (forensic speaker recognition) systems. However, it is necessary to develop techniques to determine the strength of evidence from the outputs of SVM-based models. The SVM-based models are trained using discriminative methods and their generalization ability is good. We propose to use the IMK-based SVM classifier for speaker identification from the speech signal of an utterance represented as a set of local feature vectors. The main issue in building the IMK-based SVM classifier is selection of the virtual feature vectors using which the local feature vectors from the representations of two different utterances are matched. We explore the use of components of universal background GMM as the set of virtual feature vectors. We compare the performance of the GMM-based approaches and the dynamic kernel SVM-based approaches to speaker identification. The 2002 and 2003 NIST speaker recognition corpora are used in evaluation of different approaches to speaker identification. Results of our studies show that the dynamic kernel SVM-based approaches give a significantly better performance than the GMM-based approaches. For speaker identification task, the IMK-based SVM gives a performance that is comparable to that of SVMs using any of the other dynamic kernels. The storage requirements and the computational complexity of the IMK-based SVMs are less than of SVMs using any of the other dynamic kernels.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Reynolds DA (1995) Speaker identification and verification using Gaussian mixture speaker models. Speech Commun 17:91–108CrossRef Reynolds DA (1995) Speaker identification and verification using Gaussian mixture speaker models. Speech Commun 17:91–108CrossRef
2.
Zurück zum Zitat Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Commun 52(1):12–40CrossRef Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Commun 52(1):12–40CrossRef
3.
Zurück zum Zitat Sha F, Saul L (2006) Large margin Gaussian mixture modeling for phonetic classification and recognition. Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, (ICASSP 2006), Toulouse, France, pp 265–268 Sha F, Saul L (2006) Large margin Gaussian mixture modeling for phonetic classification and recognition. Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, (ICASSP 2006), Toulouse, France, pp 265–268
4.
Zurück zum Zitat Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167CrossRef Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167CrossRef
5.
Zurück zum Zitat Campbell WM, Campbell JP, Reynolds DA, Singer E, Torres-Carrasquillo PA (2006) Support vector machines for speaker and language recognition. Comput Speech Lang 20(2–3):210–229CrossRef Campbell WM, Campbell JP, Reynolds DA, Singer E, Torres-Carrasquillo PA (2006) Support vector machines for speaker and language recognition. Comput Speech Lang 20(2–3):210–229CrossRef
6.
Zurück zum Zitat Lee K-A, You C, Li H, Kinnunen T (2007) A GMM-based probabilistic sequence kernel for speaker verification. Proc. of INTERSPEECH, Antwerp, Belgium, pp 294–297 Lee K-A, You C, Li H, Kinnunen T (2007) A GMM-based probabilistic sequence kernel for speaker verification. Proc. of INTERSPEECH, Antwerp, Belgium, pp 294–297
7.
Zurück zum Zitat Campbell WM, Sturim DE, Reynolds DA (2006) Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process Lett 13(5):308–311CrossRef Campbell WM, Sturim DE, Reynolds DA (2006) Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process Lett 13(5):308–311CrossRef
8.
Zurück zum Zitat You CH, Lee KA, Li H (2009) A GMM supervector kernel with the Bhattacharyya distance for SVM based speaker recognition. Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2009), Taipei, Taiwan, pp 4221–4224 You CH, Lee KA, Li H (2009) A GMM supervector kernel with the Bhattacharyya distance for SVM based speaker recognition. Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2009), Taipei, Taiwan, pp 4221–4224
9.
Zurück zum Zitat Alexander A, Drygajlo A (2004) Scoring and direct methods for the interpretation of evidence in forensic speaker recognition. Proc. of INTERSPEECH, Jeju, Korea, pp 2397–2400 Alexander A, Drygajlo A (2004) Scoring and direct methods for the interpretation of evidence in forensic speaker recognition. Proc. of INTERSPEECH, Jeju, Korea, pp 2397–2400
10.
Zurück zum Zitat Campbell JP, Nakasone H, Cieri C, Miller D, Walker K, Martin AF, Przybocki MA (2004) The MMSR bilingual and cross channel corpora for speaker recognition research and evaluation. Proc. of the Speaker and Language Recognition Workshop, Odyssey’04, Toledo, Spain, pp 29–32 Campbell JP, Nakasone H, Cieri C, Miller D, Walker K, Martin AF, Przybocki MA (2004) The MMSR bilingual and cross channel corpora for speaker recognition research and evaluation. Proc. of the Speaker and Language Recognition Workshop, Odyssey’04, Toledo, Spain, pp 29–32
11.
Zurück zum Zitat Drygajlo A, Meuwly D, Alexander A (2003) Statistical methods and Bayesian interpretation of evidence in forensic automatic speaker recognition. Proc. of Eurospeech, Geneva, Switzerland, pp 689–692 Drygajlo A, Meuwly D, Alexander A (2003) Statistical methods and Bayesian interpretation of evidence in forensic automatic speaker recognition. Proc. of Eurospeech, Geneva, Switzerland, pp 689–692
12.
Zurück zum Zitat Campbell WM, Brady KJ, Campbell JP, Granville R, Reynolds DA,(2006) Understanding scores in forensic speaker recognition, Speaker and Language Recognition Workshop, The IEEE Odyssey 2006, pp 1–8 Campbell WM, Brady KJ, Campbell JP, Granville R, Reynolds DA,(2006) Understanding scores in forensic speaker recognition, Speaker and Language Recognition Workshop, The IEEE Odyssey 2006, pp 1–8
13.
Zurück zum Zitat Gonzalez-Rodriguez J, Drygajlo A, Ramos-Castro D, Garcia-Gomar M, Ortega-Garcia J (2006) Robust estimation, interpretation and assessment of likelihood ratios in forensic speaker recognition. Comput Speech Lang 20:331–355CrossRef Gonzalez-Rodriguez J, Drygajlo A, Ramos-Castro D, Garcia-Gomar M, Ortega-Garcia J (2006) Robust estimation, interpretation and assessment of likelihood ratios in forensic speaker recognition. Comput Speech Lang 20:331–355CrossRef
14.
Zurück zum Zitat Thiruvaran T, Ambikairajah E, Epps J (2008) FM features for automatic forensic speaker recognition. Proc. of INTERSPEECH 2008 special session: forensic speaker recognition—traditional and automatic approach, Brisbane, Queensland, Australia Thiruvaran T, Ambikairajah E, Epps J (2008) FM features for automatic forensic speaker recognition. Proc. of INTERSPEECH 2008 special session: forensic speaker recognition—traditional and automatic approach, Brisbane, Queensland, Australia
15.
Zurück zum Zitat Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge
16.
Zurück zum Zitat Sekhar CC, Takeda K, Itakura F (2003) Recognition of subword units of speech using support vector machines. Proc. recent research developments in electronics and communication. Trivandrum, Kerala, India: Transworld Research Network, pp 101–136 Sekhar CC, Takeda K, Itakura F (2003) Recognition of subword units of speech using support vector machines. Proc. recent research developments in electronics and communication. Trivandrum, Kerala, India: Transworld Research Network, pp 101–136
17.
Zurück zum Zitat Haykin S (1999) Neural networks: a comprehensive foundation, 2nd edn. Prentice Hall, New JerseyMATH Haykin S (1999) Neural networks: a comprehensive foundation, 2nd edn. Prentice Hall, New JerseyMATH
18.
Zurück zum Zitat Kaufman L (1999) Solving the quadratic programming problem arising in support vector classification. In: Scholkopf B, Burges C, Smola A (eds) Advances in kernel methods: support vector learning. MIT Press, Cambridge, pp 147–167 Kaufman L (1999) Solving the quadratic programming problem arising in support vector classification. In: Scholkopf B, Burges C, Smola A (eds) Advances in kernel methods: support vector learning. MIT Press, Cambridge, pp 147–167
19.
Zurück zum Zitat Cover TM (1965) Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans Electron Comput EC-14(3):326–334MATHCrossRef Cover TM (1965) Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans Electron Comput EC-14(3):326–334MATHCrossRef
20.
Zurück zum Zitat Scholkopf B, Mika S, Burges C, Knirsch P, Muller K-R, Ratsch G, Smola A (1999) Input space versus feature space in kernel-based methods. IEEE Trans Neural Netw 10(5):1000–1017CrossRef Scholkopf B, Mika S, Burges C, Knirsch P, Muller K-R, Ratsch G, Smola A (1999) Input space versus feature space in kernel-based methods. IEEE Trans Neural Netw 10(5):1000–1017CrossRef
21.
Zurück zum Zitat Borgwardt KM (2007) Graph kernels. Ph.D Thesis, Faculty of Mathematics, Computer Science and Statistics, LudwigMaximilians Universität, Munich Borgwardt KM (2007) Graph kernels. Ph.D Thesis, Faculty of Mathematics, Computer Science and Statistics, LudwigMaximilians Universität, Munich
22.
Zurück zum Zitat Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, CambridgeCrossRef Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, CambridgeCrossRef
23.
Zurück zum Zitat Leslie C, Eskin E, Noble WS (2002) The spectrum kernel: a string kernel for svm protein classification. Proc. the pacific symposium on biocomputing, River Edge, NJ, pp 564–575 Leslie C, Eskin E, Noble WS (2002) The spectrum kernel: a string kernel for svm protein classification. Proc. the pacific symposium on biocomputing, River Edge, NJ, pp 564–575
24.
Zurück zum Zitat Leslie C, Eskin E, Weston J, Noble WS (2003) Mismatch string kernels for SVM protein classification. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing. MIT Press, Cambridge, pp 1417–1424 Leslie C, Eskin E, Weston J, Noble WS (2003) Mismatch string kernels for SVM protein classification. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing. MIT Press, Cambridge, pp 1417–1424
25.
Zurück zum Zitat Leslie C, Eskin E, Cohen A, Weston J, Noble WS (2004) Mismatch string kernels for discriminative protein classification. Bioinformatics 20:467–476CrossRef Leslie C, Eskin E, Cohen A, Weston J, Noble WS (2004) Mismatch string kernels for discriminative protein classification. Bioinformatics 20:467–476CrossRef
26.
Zurück zum Zitat Lodhi H, Saunders C, Shawe-Taylor J, Christianini N, Watkins C (2002) Text classification using string kernels. J Mach Learn Res 2:419–444MATH Lodhi H, Saunders C, Shawe-Taylor J, Christianini N, Watkins C (2002) Text classification using string kernels. J Mach Learn Res 2:419–444MATH
27.
Zurück zum Zitat Tsuda K, Kin T, Asai K (2002) Mariginalized kernels for biological sequences. Bioinformatics 18:S268–S275CrossRef Tsuda K, Kin T, Asai K (2002) Mariginalized kernels for biological sequences. Bioinformatics 18:S268–S275CrossRef
28.
Zurück zum Zitat Allwein EL, Schapire RE, Singer Y (2001) Reducing multiclass to binary: a unifying approach for margin classifiers. J Mach Learn Res 1:113–141MathSciNetMATH Allwein EL, Schapire RE, Singer Y (2001) Reducing multiclass to binary: a unifying approach for margin classifiers. J Mach Learn Res 1:113–141MathSciNetMATH
29.
Zurück zum Zitat Kressel UH-G (1999) Pairwise classification and support vector machines. In: Scholkopf B, Burges C, Smola A (eds) Advances in kernel methods: support vector learning. MIT Press, Cambridge, pp 255–268 Kressel UH-G (1999) Pairwise classification and support vector machines. In: Scholkopf B, Burges C, Smola A (eds) Advances in kernel methods: support vector learning. MIT Press, Cambridge, pp 255–268
30.
Zurück zum Zitat Boughorsbel S, Tarel JP, Boujemaa N (2005) The intermediate matching kernel for image local features. Proc. international joint conference on neural networks, Montreal, Canada, pp 889–894 Boughorsbel S, Tarel JP, Boujemaa N (2005) The intermediate matching kernel for image local features. Proc. international joint conference on neural networks, Montreal, Canada, pp 889–894
31.
Zurück zum Zitat Jayaraman A (2008) Modular approach to online handwritten character recognition of Telugu script. Master’s thesis, Department of CSE, IIT Madras, Chennai, India Jayaraman A (2008) Modular approach to online handwritten character recognition of Telugu script. Master’s thesis, Department of CSE, IIT Madras, Chennai, India
32.
Zurück zum Zitat Hu H, Xu M-X, Wu W (2007) GMM supervector based SVM with spectral features for speech emotion recognition. Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, 2007, ICASSP 2007, 4, Honolulu, Hawaii, USA, pp 413–416 Hu H, Xu M-X, Wu W (2007) GMM supervector based SVM with spectral features for speech emotion recognition. Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, 2007, ICASSP 2007, 4, Honolulu, Hawaii, USA, pp 413–416
33.
Zurück zum Zitat Veena T, Dileep AD, Sekhar CC (2010) Scene categorization using large margin Gaussian mixture models. Proc. 2010 International Conference on Image Processing, Computer Vision, & Pattern Recognition, (IPCV 2010), 1, Las Vegas, Navada, USA, pp 395–401 Veena T, Dileep AD, Sekhar CC (2010) Scene categorization using large margin Gaussian mixture models. Proc. 2010 International Conference on Image Processing, Computer Vision, & Pattern Recognition, (IPCV 2010), 1, Las Vegas, Navada, USA, pp 395–401
34.
Zurück zum Zitat Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digit Signal Process 10:19–41CrossRef Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digit Signal Process 10:19–41CrossRef
35.
Zurück zum Zitat Wan V, Renals S (2002) Evaluation of kernel methods for speaker verification and identification. Proc. of IEEE international conference on acoustics, speech and signal processing, Orlando, Florida, US, pp 669–672 Wan V, Renals S (2002) Evaluation of kernel methods for speaker verification and identification. Proc. of IEEE international conference on acoustics, speech and signal processing, Orlando, Florida, US, pp 669–672
36.
Zurück zum Zitat Wallraven C, Caputo B, Graf A (2003) Recognition with local features: the kernel recipe. Proc. Ninth IEEE International Conference on Computer Vision (ICCV 2003), pp 257–264 Wallraven C, Caputo B, Graf A (2003) Recognition with local features: the kernel recipe. Proc. Ninth IEEE International Conference on Computer Vision (ICCV 2003), pp 257–264
37.
Zurück zum Zitat Boughorbel S, Tarel J-P, Fleuret F (2004) Non-Mercer kernels for SVM object recognition. Proc. British Machine Vision Conference (BMVC 2004), pp 137–146 Boughorbel S, Tarel J-P, Fleuret F (2004) Non-Mercer kernels for SVM object recognition. Proc. British Machine Vision Conference (BMVC 2004), pp 137–146
38.
Zurück zum Zitat Campbell W, Assaleh K, Broun C (2002) Speaker recognition with polynomial classifiers. IEEE Trans Speech Audio Process 10(4):205–212CrossRef Campbell W, Assaleh K, Broun C (2002) Speaker recognition with polynomial classifiers. IEEE Trans Speech Audio Process 10(4):205–212CrossRef
39.
Zurück zum Zitat Auckenthaler R, Parris ES, Carey MJ (1999) Improving a GMM speaker verification system by phonetic weighting. Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP1999), 1, Phoenix, Arizona, USA, pp 313–316 Auckenthaler R, Parris ES, Carey MJ (1999) Improving a GMM speaker verification system by phonetic weighting. Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP1999), 1, Phoenix, Arizona, USA, pp 313–316
40.
Zurück zum Zitat Campbell W (2008) A covariance kernel for SVM language recognition. Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, (ICASSP 2008), Las Vegas, Nevada, USA, pp 4141–4144 Campbell W (2008) A covariance kernel for SVM language recognition. Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, (ICASSP 2008), Las Vegas, Nevada, USA, pp 4141–4144
41.
Zurück zum Zitat Dehak R, Dehak N, Kenny P, Dumouchel P (2007) Linear and nonlinear kernel GMM supervector machines for speaker verification. Proc. INTERSPEECH, Antwerp, Belgium, pp 302–305 Dehak R, Dehak N, Kenny P, Dumouchel P (2007) Linear and nonlinear kernel GMM supervector machines for speaker verification. Proc. INTERSPEECH, Antwerp, Belgium, pp 302–305
42.
Zurück zum Zitat Bhattacharyya A (1943) On a measure of divergence between two statistical populations defined by their probability distributions. Bull Calcutta Math Soc 35:99–109MathSciNetMATH Bhattacharyya A (1943) On a measure of divergence between two statistical populations defined by their probability distributions. Bull Calcutta Math Soc 35:99–109MathSciNetMATH
43.
Zurück zum Zitat Kailath T (1967) The divergence and Bhattacharyya distance measures in signal selection. IEEE Trans Commun Technol 15(1):52–60CrossRef Kailath T (1967) The divergence and Bhattacharyya distance measures in signal selection. IEEE Trans Commun Technol 15(1):52–60CrossRef
44.
Zurück zum Zitat Kondor R, Jebara T (2003) A kernel between sets of vectors. Proc. International Conference on Machine Learning, (ICML 2003), Washington DC, USA Kondor R, Jebara T (2003) A kernel between sets of vectors. Proc. International Conference on Machine Learning, (ICML 2003), Washington DC, USA
45.
Zurück zum Zitat You CH, Lee KA, Li H (2009) An SVM kernel with GMM-supervector based on the Bhattacharyya distance for speaker recognition. IEEE Signal Process Lett 16(1):49–52CrossRef You CH, Lee KA, Li H (2009) An SVM kernel with GMM-supervector based on the Bhattacharyya distance for speaker recognition. IEEE Signal Process Lett 16(1):49–52CrossRef
46.
Zurück zum Zitat The NIST year 2002 speaker recognition evaluation plan. http://www.itl.nist.gov/iad/mig/tests/spk/2002/, 2002 The NIST year 2002 speaker recognition evaluation plan. http://​www.​itl.​nist.​gov/​iad/​mig/​tests/​spk/​2002/​, 2002
47.
Zurück zum Zitat The NIST year 2003 speaker recognition evaluation plan. http://www.itl.nist.gov/iad/mig/tests/sre/2003/, 2003 The NIST year 2003 speaker recognition evaluation plan. http://​www.​itl.​nist.​gov/​iad/​mig/​tests/​sre/​2003/​, 2003
48.
Zurück zum Zitat Newcombe RG. (1998) Two-sided confidence intervals for the single proportion: comparison of seven methods. Stat Med 17:857–872CrossRef Newcombe RG. (1998) Two-sided confidence intervals for the single proportion: comparison of seven methods. Stat Med 17:857–872CrossRef
49.
Zurück zum Zitat Chang C-C, Lin C-J (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm Chang C-C, Lin C-J (2001) LIBSVM: a library for support vector machines. Software available at http://​www.​csie.​ntu.​edu.​tw/​cjlin/​libsvm
Metadaten
Titel
Speaker Identification Using Intermediate Matching Kernel-Based Support Vector Machines
verfasst von
A. D. Dileep, M. Tech.
C. Chandra Sekhar, Ph.D.
Copyright-Jahr
2012
Verlag
Springer New York
DOI
https://doi.org/10.1007/978-1-4614-0263-3_14

Neuer Inhalt