nach oben

Erschienen in:

2012 | OriginalPaper | Buchkapitel

14. Speaker Identification Using Intermediate Matching Kernel-Based Support Vector Machines

verfasst von : A. D. Dileep, M. Tech., C. Chandra Sekhar, Ph.D.

Erschienen in: Forensic Speaker Recognition

Verlag: Springer New York

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Gaussian mixture model (GMM) based approaches have been commonly used for speaker recognition tasks. Methods for estimation of parameters of GMMs include the expectation-maximization method which is a non-discriminative learning based method and the large margin method which is a discriminative learning based method. Discriminative classifier based approaches to speaker recognition include support vector machine (SVM) based classifiers using dynamic kernels such as generalized linear discriminant sequence kernel, probabilistic sequence kernel, GMM supervector kernel and Bhattacharyya distance based kernel. Recently, the intermediate matching kernel (IMK) has been proposed as a dynamic kernel for recognition of objects in an image represented using a set of local feature vectors. The IMK-based SVMs give a better performance than the state-of-the-art GMM-based approaches for speaker identification tasks, because they are well suited for meeting the basic challenge of providing reliable scores of intra-speaker variation of suspects and scores of inter-speaker variation of the potential population which is crucial to law enforcement and counter terrorism agencies in evaluating the strength of the evidence at hand. Thus, the IMK-based SVMs can be used to build the speaker recognition models in the FSR (forensic speaker recognition) systems. However, it is necessary to develop techniques to determine the strength of evidence from the outputs of SVM-based models. The SVM-based models are trained using discriminative methods and their generalization ability is good. We propose to use the IMK-based SVM classifier for speaker identification from the speech signal of an utterance represented as a set of local feature vectors. The main issue in building the IMK-based SVM classifier is selection of the virtual feature vectors using which the local feature vectors from the representations of two different utterances are matched. We explore the use of components of universal background GMM as the set of virtual feature vectors. We compare the performance of the GMM-based approaches and the dynamic kernel SVM-based approaches to speaker identification. The 2002 and 2003 NIST speaker recognition corpora are used in evaluation of different approaches to speaker identification. Results of our studies show that the dynamic kernel SVM-based approaches give a significantly better performance than the GMM-based approaches. For speaker identification task, the IMK-based SVM gives a performance that is comparable to that of SVMs using any of the other dynamic kernels. The storage requirements and the computational complexity of the IMK-based SVMs are less than of SVMs using any of the other dynamic kernels.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Prosodic Features for Speaker Recognition

Nächstes Kapitel Speaker Spotting: Automatic Telephony Surveillance for Homeland Security

Reynolds DA (1995) Speaker identification and verification using Gaussian mixture speaker models. Speech Commun 17:91–108CrossRef

Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Commun 52(1):12–40CrossRef

Sha F, Saul L (2006) Large margin Gaussian mixture modeling for phonetic classification and recognition. Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, (ICASSP 2006), Toulouse, France, pp 265–268

Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167CrossRef

Campbell WM, Campbell JP, Reynolds DA, Singer E, Torres-Carrasquillo PA (2006) Support vector machines for speaker and language recognition. Comput Speech Lang 20(2–3):210–229CrossRef

Lee K-A, You C, Li H, Kinnunen T (2007) A GMM-based probabilistic sequence kernel for speaker verification. Proc. of INTERSPEECH, Antwerp, Belgium, pp 294–297

Campbell WM, Sturim DE, Reynolds DA (2006) Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process Lett 13(5):308–311CrossRef

You CH, Lee KA, Li H (2009) A GMM supervector kernel with the Bhattacharyya distance for SVM based speaker recognition. Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2009), Taipei, Taiwan, pp 4221–4224

Alexander A, Drygajlo A (2004) Scoring and direct methods for the interpretation of evidence in forensic speaker recognition. Proc. of INTERSPEECH, Jeju, Korea, pp 2397–2400

10.

Campbell JP, Nakasone H, Cieri C, Miller D, Walker K, Martin AF, Przybocki MA (2004) The MMSR bilingual and cross channel corpora for speaker recognition research and evaluation. Proc. of the Speaker and Language Recognition Workshop, Odyssey’04, Toledo, Spain, pp 29–32

11.

Drygajlo A, Meuwly D, Alexander A (2003) Statistical methods and Bayesian interpretation of evidence in forensic automatic speaker recognition. Proc. of Eurospeech, Geneva, Switzerland, pp 689–692

12.

Campbell WM, Brady KJ, Campbell JP, Granville R, Reynolds DA,(2006) Understanding scores in forensic speaker recognition, Speaker and Language Recognition Workshop, The IEEE Odyssey 2006, pp 1–8

13.

Gonzalez-Rodriguez J, Drygajlo A, Ramos-Castro D, Garcia-Gomar M, Ortega-Garcia J (2006) Robust estimation, interpretation and assessment of likelihood ratios in forensic speaker recognition. Comput Speech Lang 20:331–355CrossRef

14.

Thiruvaran T, Ambikairajah E, Epps J (2008) FM features for automatic forensic speaker recognition. Proc. of INTERSPEECH 2008 special session: forensic speaker recognition—traditional and automatic approach, Brisbane, Queensland, Australia

15.

Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge

16.

Sekhar CC, Takeda K, Itakura F (2003) Recognition of subword units of speech using support vector machines. Proc. recent research developments in electronics and communication. Trivandrum, Kerala, India: Transworld Research Network, pp 101–136

17.

Haykin S (1999) Neural networks: a comprehensive foundation, 2nd edn. Prentice Hall, New JerseyMATH

18.

Kaufman L (1999) Solving the quadratic programming problem arising in support vector classification. In: Scholkopf B, Burges C, Smola A (eds) Advances in kernel methods: support vector learning. MIT Press, Cambridge, pp 147–167

19.

Cover TM (1965) Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans Electron Comput EC-14(3):326–334MATHCrossRef

20.

Scholkopf B, Mika S, Burges C, Knirsch P, Muller K-R, Ratsch G, Smola A (1999) Input space versus feature space in kernel-based methods. IEEE Trans Neural Netw 10(5):1000–1017CrossRef

21.

Borgwardt KM (2007) Graph kernels. Ph.D Thesis, Faculty of Mathematics, Computer Science and Statistics, LudwigMaximilians Universität, Munich

22.

Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, CambridgeCrossRef

23.

Leslie C, Eskin E, Noble WS (2002) The spectrum kernel: a string kernel for svm protein classification. Proc. the pacific symposium on biocomputing, River Edge, NJ, pp 564–575

24.

Leslie C, Eskin E, Weston J, Noble WS (2003) Mismatch string kernels for SVM protein classification. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing. MIT Press, Cambridge, pp 1417–1424

25.

Leslie C, Eskin E, Cohen A, Weston J, Noble WS (2004) Mismatch string kernels for discriminative protein classification. Bioinformatics 20:467–476CrossRef

26.

Lodhi H, Saunders C, Shawe-Taylor J, Christianini N, Watkins C (2002) Text classification using string kernels. J Mach Learn Res 2:419–444MATH

27.

Tsuda K, Kin T, Asai K (2002) Mariginalized kernels for biological sequences. Bioinformatics 18:S268–S275CrossRef

28.

Allwein EL, Schapire RE, Singer Y (2001) Reducing multiclass to binary: a unifying approach for margin classifiers. J Mach Learn Res 1:113–141MathSciNetMATH

29.

Kressel UH-G (1999) Pairwise classification and support vector machines. In: Scholkopf B, Burges C, Smola A (eds) Advances in kernel methods: support vector learning. MIT Press, Cambridge, pp 255–268

30.

Boughorsbel S, Tarel JP, Boujemaa N (2005) The intermediate matching kernel for image local features. Proc. international joint conference on neural networks, Montreal, Canada, pp 889–894

31.

Jayaraman A (2008) Modular approach to online handwritten character recognition of Telugu script. Master’s thesis, Department of CSE, IIT Madras, Chennai, India

32.

Hu H, Xu M-X, Wu W (2007) GMM supervector based SVM with spectral features for speech emotion recognition. Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, 2007, ICASSP 2007, 4, Honolulu, Hawaii, USA, pp 413–416

33.

Veena T, Dileep AD, Sekhar CC (2010) Scene categorization using large margin Gaussian mixture models. Proc. 2010 International Conference on Image Processing, Computer Vision, & Pattern Recognition, (IPCV 2010), 1, Las Vegas, Navada, USA, pp 395–401

34.

Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digit Signal Process 10:19–41CrossRef

35.

Wan V, Renals S (2002) Evaluation of kernel methods for speaker verification and identification. Proc. of IEEE international conference on acoustics, speech and signal processing, Orlando, Florida, US, pp 669–672

36.

Wallraven C, Caputo B, Graf A (2003) Recognition with local features: the kernel recipe. Proc. Ninth IEEE International Conference on Computer Vision (ICCV 2003), pp 257–264

37.

Boughorbel S, Tarel J-P, Fleuret F (2004) Non-Mercer kernels for SVM object recognition. Proc. British Machine Vision Conference (BMVC 2004), pp 137–146

38.

Campbell W, Assaleh K, Broun C (2002) Speaker recognition with polynomial classifiers. IEEE Trans Speech Audio Process 10(4):205–212CrossRef

39.

Auckenthaler R, Parris ES, Carey MJ (1999) Improving a GMM speaker verification system by phonetic weighting. Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP1999), 1, Phoenix, Arizona, USA, pp 313–316

40.

Campbell W (2008) A covariance kernel for SVM language recognition. Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, (ICASSP 2008), Las Vegas, Nevada, USA, pp 4141–4144

41.

Dehak R, Dehak N, Kenny P, Dumouchel P (2007) Linear and nonlinear kernel GMM supervector machines for speaker verification. Proc. INTERSPEECH, Antwerp, Belgium, pp 302–305

42.

Bhattacharyya A (1943) On a measure of divergence between two statistical populations defined by their probability distributions. Bull Calcutta Math Soc 35:99–109MathSciNetMATH

43.

Kailath T (1967) The divergence and Bhattacharyya distance measures in signal selection. IEEE Trans Commun Technol 15(1):52–60CrossRef

44.

Kondor R, Jebara T (2003) A kernel between sets of vectors. Proc. International Conference on Machine Learning, (ICML 2003), Washington DC, USA

45.

You CH, Lee KA, Li H (2009) An SVM kernel with GMM-supervector based on the Bhattacharyya distance for speaker recognition. IEEE Signal Process Lett 16(1):49–52CrossRef

46.

The NIST year 2002 speaker recognition evaluation plan. http://www.itl.nist.gov/iad/mig/tests/spk/2002/, 2002

47.

The NIST year 2003 speaker recognition evaluation plan. http://www.itl.nist.gov/iad/mig/tests/sre/2003/, 2003

48.

Newcombe RG. (1998) Two-sided confidence intervals for the single proportion: comparison of seven methods. Stat Med 17:857–872CrossRef

49.

Chang C-C, Lin C-J (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm

Titel: Speaker Identification Using Intermediate Matching Kernel-Based Support Vector Machines
verfasst von: A. D. Dileep, M. Tech.
C. Chandra Sekhar, Ph.D.
Verlag: Springer New York
Buch: Forensic Speaker Recognition
Print ISBN: 978-1-4614-0262-6

Electronic ISBN: 978-1-4614-0263-3

Copyright-Jahr: 2012
DOI: https://doi.org/10.1007/978-1-4614-0263-3_14

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Die Gewinner und Laudatoren des Sustainability Award in Automotive 2024/© Uli Regenscheit | ATZlive, Search Icon, Banner Hanser, Suresh Vittal/© Alteryx, Additiv gefertigte Teile/© Marina_Skoropadskaya | Getty Images | iStock, Warnschild "Land unter"/© Bluedesign / Fotolia, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade, chassis.tech plus 2023/© [M] ATZlive / TÜV SÜD PRODUCT SERVICE GMBH, adäsion-Webinar-Matinee/© krystiannawrocki_ Getty Images

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.