Skip to main content
Erschienen in: Neural Processing Letters 2/2014

01.04.2014

Extension of a Kernel-Based Classifier for Discriminative Spoken Keyword Spotting

verfasst von: Shima Tabibian, Ahmad Akbari, Babak Nasersharif

Erschienen in: Neural Processing Letters | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

A keyword spotter is considered as a binary classifier that separates a class of utterances containing a target keyword from utterances without the keyword. These two classes are not inherently linearly separable. Thus, linear classifiers are not completely suitable for such cases. In this paper, we extend a kernel-based classification approach to separate the mentioned two non-linearly separable classes so that the area under the Receiver/Relative Operating Characteristic (ROC) curve (the most common measure for keyword spotter evaluation) is maximized. We evaluated the proposed keyword spotter under different experimental conditions on TIMIT database. The results indicate that, in false alarm per keyword per hour smaller than two, the true detection rate of the proposed kernel-based classification approach is about 15 % greater than that of the linear classifiers exploited in previous researches. Additionally, area under the ROC curve (AUC) of the proposed method is 1 % higher than AUC of the linear classifiers that is significant due to confidence levels 80 and 95 % obtained by t-test and F-test evaluations, respectively. In addition, we evaluated the proposed method in different noisy conditions. The results indicate that the proposed method show a good robustness in noisy conditions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Rabiner LR, Juang B, Yegnanarayana B (2010) Fundamentals of speech recognition. Pearson, New Delhi Rabiner LR, Juang B, Yegnanarayana B (2010) Fundamentals of speech recognition. Pearson, New Delhi
2.
Zurück zum Zitat Rabiner LR, Schafer RW (2011) Theory and application of digital speech processing, 1st edn. Prentice Hall, New York Rabiner LR, Schafer RW (2011) Theory and application of digital speech processing, 1st edn. Prentice Hall, New York
3.
Zurück zum Zitat Deller JR, Hansen JHL, Proakis JG (2000) Discrete-time processing of speech signals. IEEE Press, New York Deller JR, Hansen JHL, Proakis JG (2000) Discrete-time processing of speech signals. IEEE Press, New York
4.
Zurück zum Zitat Ghaffari A, Homaeinezhad MR, Daevaeiha MM (2011) High resolution ambulatory holter ECG events detection-delineation via modified multi-lead wavelet-based features analysis: detection and quantification of heart rate turbulence. Expert Syst Appl 38:5299–5310 Ghaffari A, Homaeinezhad MR, Daevaeiha MM (2011) High resolution ambulatory holter ECG events detection-delineation via modified multi-lead wavelet-based features analysis: detection and quantification of heart rate turbulence. Expert Syst Appl 38:5299–5310
5.
Zurück zum Zitat Wang D, Tejedor J, Frankel J, King S, Colas J (2009) Posterior-based confidence measures for spoken term detection. In: Proceedings of ICASSP, pp 4889–4892 Wang D, Tejedor J, Frankel J, King S, Colas J (2009) Posterior-based confidence measures for spoken term detection. In: Proceedings of ICASSP, pp 4889–4892
6.
Zurück zum Zitat Rose RC, Paul DB (1990) A hidden Markov model based keyword recognition system. In: Proceedings of ICASSP, pp 129–132 Rose RC, Paul DB (1990) A hidden Markov model based keyword recognition system. In: Proceedings of ICASSP, pp 129–132
7.
Zurück zum Zitat Tejedor J, Wang D, Frankel J, King S, Colás J (2008) A comparison of grapheme and phone-based units for Spanish spoken term detection. Speech Commun 50:980–991CrossRef Tejedor J, Wang D, Frankel J, King S, Colás J (2008) A comparison of grapheme and phone-based units for Spanish spoken term detection. Speech Commun 50:980–991CrossRef
8.
Zurück zum Zitat Fernandez S, Graves A, Schmidhuber J (2007) An application of recurrent neural networks to discriminative keyword spotting. In: International conference on artificial, neural networks (ICANN), pp 220–229 Fernandez S, Graves A, Schmidhuber J (2007) An application of recurrent neural networks to discriminative keyword spotting. In: International conference on artificial, neural networks (ICANN), pp 220–229
9.
Zurück zum Zitat Li KP, Naylor JA, Rossen ML (1992) A whole word recurrent neural network for keyword spotting. In IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp 81–84 Li KP, Naylor JA, Rossen ML (1992) A whole word recurrent neural network for keyword spotting. In IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp 81–84
10.
Zurück zum Zitat Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge University Press, Cambridge Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge University Press, Cambridge
11.
Zurück zum Zitat Vapnik VN (1998) Statistical learning theory. Wiley, New YorkMATH Vapnik VN (1998) Statistical learning theory. Wiley, New YorkMATH
12.
Zurück zum Zitat Altun Y, Tsochantaridis I, Hofmann Th (2003) Hidden Markov support vector machines. In: Proceedings of the twentieth international conference on machine learning Altun Y, Tsochantaridis I, Hofmann Th (2003) Hidden Markov support vector machines. In: Proceedings of the twentieth international conference on machine learning
13.
Zurück zum Zitat Bahl LR, Brown PF, de Souza P, Mercer RL (1989) Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp 49–52 Bahl LR, Brown PF, de Souza P, Mercer RL (1989) Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp 49–52
14.
Zurück zum Zitat Juang B, Katagiri S (1992) Discriminative learning for minimum error classification. IEEE Trans Signal Process 40:3043–3054CrossRefMATH Juang B, Katagiri S (1992) Discriminative learning for minimum error classification. IEEE Trans Signal Process 40:3043–3054CrossRefMATH
15.
Zurück zum Zitat Povey D, Woodland P (2002) Minimum phone error and I-smoothing for improved discriminative training. In: International conference on acoustics, speech, and signal processing (ICASSP), pp 105–108 Povey D, Woodland P (2002) Minimum phone error and I-smoothing for improved discriminative training. In: International conference on acoustics, speech, and signal processing (ICASSP), pp 105–108
16.
Zurück zum Zitat Tabibian Sh, Akbari A, Nasersharif B (2011) An evolutionary based discriminative system for keyword spotting. In: Symposium on artificial intelligence and signal processing (AISP2011), indexed by IEEE, pp 83–88 Tabibian Sh, Akbari A, Nasersharif B (2011) An evolutionary based discriminative system for keyword spotting. In: Symposium on artificial intelligence and signal processing (AISP2011), indexed by IEEE, pp 83–88
17.
Zurück zum Zitat Keshet J, Bengio S (2009) Automatic speech and speaker recognition. Large margin and kernel methods. Wiley, New YorkCrossRef Keshet J, Bengio S (2009) Automatic speech and speaker recognition. Large margin and kernel methods. Wiley, New YorkCrossRef
18.
Zurück zum Zitat Keshet J, Grangier D, Bengio S (2009) Discriminative keyword spotting. Speech Commun 51:317–329CrossRef Keshet J, Grangier D, Bengio S (2009) Discriminative keyword spotting. Speech Commun 51:317–329CrossRef
19.
Zurück zum Zitat Tabibian Sh, Shokri A, Akbari A, Nasersharif B (2010) Performance evaluation for an HMM-based keyword spotter and a Large-margin based one in noisy environments. In: World conference on information technology, procedia computer science, vol 3, pp 1018–1022 Tabibian Sh, Shokri A, Akbari A, Nasersharif B (2010) Performance evaluation for an HMM-based keyword spotter and a Large-margin based one in noisy environments. In: World conference on information technology, procedia computer science, vol 3, pp 1018–1022
20.
Zurück zum Zitat Salomon J, King S, Osborne M (2002) Frame wise phone classification using support vector machines. In: Proceedings of the seventh international conference on spoken language processing, pp 2645–2648 Salomon J, King S, Osborne M (2002) Frame wise phone classification using support vector machines. In: Proceedings of the seventh international conference on spoken language processing, pp 2645–2648
21.
Zurück zum Zitat Ganapathiraju A, Hamaker J, Picone J (2002) Support vector machines for speech recognition. In: Proceedings of the international conference on spoken language processing Ganapathiraju A, Hamaker J, Picone J (2002) Support vector machines for speech recognition. In: Proceedings of the international conference on spoken language processing
22.
Zurück zum Zitat Padrell-Sendra1 J, Martin-Iglesias D, Diaz-de-Maria F (2006) Support vector machines for continuous speech recognition. In: European signal processing conference (EUSIPCO), pp 2–5 Padrell-Sendra1 J, Martin-Iglesias D, Diaz-de-Maria F (2006) Support vector machines for continuous speech recognition. In: European signal processing conference (EUSIPCO), pp 2–5
23.
Zurück zum Zitat Bardideh M, Razzazi F, Ghassemian H (2007) An SVM-based confidence measure for continuous speech recognition. In: IEEE international conference on signal processing and communications (ICSPC), pp 24–27 Bardideh M, Razzazi F, Ghassemian H (2007) An SVM-based confidence measure for continuous speech recognition. In: IEEE international conference on signal processing and communications (ICSPC), pp 24–27
24.
Zurück zum Zitat Benayed Y, Fohr D, Haton JP, Chollet G (2003) Improving the performance of a keyword spotting system by using support vector machines. In: IEEE workshop on automatic speech recognition and understanding (ASRU), pp 145–149 Benayed Y, Fohr D, Haton JP, Chollet G (2003) Improving the performance of a keyword spotting system by using support vector machines. In: IEEE workshop on automatic speech recognition and understanding (ASRU), pp 145–149
25.
Zurück zum Zitat Ben Ayed Y, Fohr D, Haton JP, Chollet G (2002) Keyword spotting using support vector machines. In: Proceedings of the international conference on text, speech and dialogue, pp 285–292 Ben Ayed Y, Fohr D, Haton JP, Chollet G (2002) Keyword spotting using support vector machines. In: Proceedings of the international conference on text, speech and dialogue, pp 285–292
26.
Zurück zum Zitat Zhi-yi Q, Yu L, Li-hong Zh, Ming-xin Sh (2006) A speech recognition system based on a hybrid HMM/SVM architecture. In: Proceedings of the first international conference on innovative computing, information and control (ICICIC), pp 100–104 Zhi-yi Q, Yu L, Li-hong Zh, Ming-xin Sh (2006) A speech recognition system based on a hybrid HMM/SVM architecture. In: Proceedings of the first international conference on innovative computing, information and control (ICICIC), pp 100–104
27.
Zurück zum Zitat Solera-Urena R, Padrell-Sendra J, Martín-Iglesias D, Gallardo-Antolín A, Peláaez-Moreno C, Díaz-de-María F (2007) SVMs for automatic speech recognition: a survey. Progress in nonlinear speech processing. Springer, New York, pp 190–216 Solera-Urena R, Padrell-Sendra J, Martín-Iglesias D, Gallardo-Antolín A, Peláaez-Moreno C, Díaz-de-María F (2007) SVMs for automatic speech recognition: a survey. Progress in nonlinear speech processing. Springer, New York, pp 190–216
28.
Zurück zum Zitat Hejazi SA, Kazemi R, Ghaemmaghami S (2008) Isolated Persian digit recognition using a hybrid HMM-SVM. In: International symposium on intelligent signal processing and communication systems (ISPACS), pp 1–4 Hejazi SA, Kazemi R, Ghaemmaghami S (2008) Isolated Persian digit recognition using a hybrid HMM-SVM. In: International symposium on intelligent signal processing and communication systems (ISPACS), pp 1–4
29.
Zurück zum Zitat Huang H, Zhu J (2006) Kernel-based non-linear feature extraction methods for speech recognition. In: Proceedings of the sixth international conference on intelligent systems design and applications (ISDA), pp 749–754 Huang H, Zhu J (2006) Kernel-based non-linear feature extraction methods for speech recognition. In: Proceedings of the sixth international conference on intelligent systems design and applications (ISDA), pp 749–754
30.
Zurück zum Zitat Zheng WM, Zou CR, Zhao L (2005) An improved algorithm for kernel principle components analysis. Neural Process Lett 22:49–56CrossRef Zheng WM, Zou CR, Zhao L (2005) An improved algorithm for kernel principle components analysis. Neural Process Lett 22:49–56CrossRef
31.
Zurück zum Zitat Zhang R, Wang W (2011) Learning linear and nonlinear PCA with linear programming. Neural Process Lett 33(2):151–170CrossRef Zhang R, Wang W (2011) Learning linear and nonlinear PCA with linear programming. Neural Process Lett 33(2):151–170CrossRef
32.
Zurück zum Zitat Yang J, Frangi AF, Yang JY (2004) A new kernel Fisher discriminant algorithm with application to face recognition. Neurocomputing 56:415–421CrossRef Yang J, Frangi AF, Yang JY (2004) A new kernel Fisher discriminant algorithm with application to face recognition. Neurocomputing 56:415–421CrossRef
33.
Zurück zum Zitat Xu Y, Zhang D, Jin Zh, Li M, Yang JY (2006) A fast kernel-based nonlinear discriminant analysis for multi-class problems. Pattern Recognit 39:1026–1033CrossRefMATH Xu Y, Zhang D, Jin Zh, Li M, Yang JY (2006) A fast kernel-based nonlinear discriminant analysis for multi-class problems. Pattern Recognit 39:1026–1033CrossRefMATH
34.
Zurück zum Zitat Theodoridis S, Koutroumbas K (2009) Pattern recognition, 2nd edn. Elsevier, Amsterdam Theodoridis S, Koutroumbas K (2009) Pattern recognition, 2nd edn. Elsevier, Amsterdam
35.
Zurück zum Zitat Rychetsky M (2001) Algorithms and architectures for machine learning based on regularized neural networks and support vector approaches. Shaker Verlag, Berlin Rychetsky M (2001) Algorithms and architectures for machine learning based on regularized neural networks and support vector approaches. Shaker Verlag, Berlin
36.
Zurück zum Zitat Suykens JAK, Van Gestel T, De Brabanter J, De Moor B, Vanthienen J (2002) Least squares support vector machines. World Scientific, SingaporeCrossRefMATH Suykens JAK, Van Gestel T, De Brabanter J, De Moor B, Vanthienen J (2002) Least squares support vector machines. World Scientific, SingaporeCrossRefMATH
37.
Zurück zum Zitat Salomon J, King S, Osborne M (2002) Frame wise phone classification using support vector machines. In: Proceedings of the seventh international conference on spoken language processing (ICSLP2002-INTERSPEECH) Salomon J, King S, Osborne M (2002) Frame wise phone classification using support vector machines. In: Proceedings of the seventh international conference on spoken language processing (ICSLP2002-INTERSPEECH)
38.
Zurück zum Zitat Keshet J, Shalev-Shwartz S, Bengio S, Singer Y, Chazan D (2006) Discriminative kernel-based phoneme sequence recognition. In: Proceedings of international conference on spoken, language processing (INTERSPEECH) Keshet J, Shalev-Shwartz S, Bengio S, Singer Y, Chazan D (2006) Discriminative kernel-based phoneme sequence recognition. In: Proceedings of international conference on spoken, language processing (INTERSPEECH)
39.
Zurück zum Zitat Dekel O, Keshet J, Singer Y (2004) Online algorithm for hierarchical phoneme classification. In Workshop on machine learning for multimodal interaction, pp 146–158 Dekel O, Keshet J, Singer Y (2004) Online algorithm for hierarchical phoneme classification. In Workshop on machine learning for multimodal interaction, pp 146–158
40.
Zurück zum Zitat Perez-Cruz F, Bousquet O (2004) Kernel methods and their potential use in signal processing. IEEE Signal Process Mag 21:57–65CrossRef Perez-Cruz F, Bousquet O (2004) Kernel methods and their potential use in signal processing. IEEE Signal Process Mag 21:57–65CrossRef
42.
Zurück zum Zitat Chen CP, Blimes J, Kirchhoff K (2002) Low-resource noise-robust feature post-processing on AURORA 2.0. In: Proceedings of ICSLP, pp 2445–2448 Chen CP, Blimes J, Kirchhoff K (2002) Low-resource noise-robust feature post-processing on AURORA 2.0. In: Proceedings of ICSLP, pp 2445–2448
43.
Zurück zum Zitat Kuo JW, Lo HY, Wang HM (2007) Improved HMM/SVM methods for automatic phoneme segmentation. In: Proceedings of the tenth European conference on speech communication and technology (Interspeech2007-Eurospeech) Kuo JW, Lo HY, Wang HM (2007) Improved HMM/SVM methods for automatic phoneme segmentation. In: Proceedings of the tenth European conference on speech communication and technology (Interspeech2007-Eurospeech)
44.
Zurück zum Zitat Toledano DT, Gómez LAH, Grande LV (2003) Automatic phonetic segmentation. IEEE Trans Speech Audio Process 11:617–625CrossRef Toledano DT, Gómez LAH, Grande LV (2003) Automatic phonetic segmentation. IEEE Trans Speech Audio Process 11:617–625CrossRef
45.
Zurück zum Zitat Toh M, Togneri R, Nordholm S (2005) Spectral entropy as speech features for speech recognition. In: Proceedings of postgraduate electrical engineering and computing symposium (PEECS) , pp 22–25 Toh M, Togneri R, Nordholm S (2005) Spectral entropy as speech features for speech recognition. In: Proceedings of postgraduate electrical engineering and computing symposium (PEECS) , pp 22–25
46.
Zurück zum Zitat Peeters G (2004) A large set of audio features for sound description (similarity and classification) in the CUIDADO project. Cuidado Project Report Ircam, pp 1–25 Peeters G (2004) A large set of audio features for sound description (similarity and classification) in the CUIDADO project. Cuidado Project Report Ircam, pp 1–25
47.
Zurück zum Zitat Lin ChY, Rager Jang JSh (2005) Automatic segmentation and labeling for Mandarin Chinese speech corpora for concatenation-based TTS. Comput Linguist Chin Lang Process 10:145–166 Lin ChY, Rager Jang JSh (2005) Automatic segmentation and labeling for Mandarin Chinese speech corpora for concatenation-based TTS. Comput Linguist Chin Lang Process 10:145–166
48.
Zurück zum Zitat Buadat G, Anouar F (2000) Generalized discriminant analysis using a kernel approach. J Neural Comput 12:2385–2404CrossRef Buadat G, Anouar F (2000) Generalized discriminant analysis using a kernel approach. J Neural Comput 12:2385–2404CrossRef
49.
Zurück zum Zitat Zacharie DG, Pinto JP (2007) Keyword spotting on word lattices. Research Report, IDIAP Research Institute Zacharie DG, Pinto JP (2007) Keyword spotting on word lattices. Research Report, IDIAP Research Institute
50.
Zurück zum Zitat Cortes C, Mohri M (2004) Confidence intervals for the area under the ROC curve. Adv Neural Inf Process Syst 17:305–312 Cortes C, Mohri M (2004) Confidence intervals for the area under the ROC curve. Adv Neural Inf Process Syst 17:305–312
51.
Zurück zum Zitat Lori L, Kassel R, Stephanie S (1989) Speech database development: design and analysis of the acoustic-phonetic corpus. In: Proceedings of DARPA speech recognition workshop, vol 2, pp 161–170 Lori L, Kassel R, Stephanie S (1989) Speech database development: design and analysis of the acoustic-phonetic corpus. In: Proceedings of DARPA speech recognition workshop, vol 2, pp 161–170
52.
Zurück zum Zitat Liu Ch (2004) Gabor-based kernel PCA with fractional power polynomial models for face recognition. IEEE Trans Pattern Anal Mach Intell 26:572–581CrossRef Liu Ch (2004) Gabor-based kernel PCA with fractional power polynomial models for face recognition. IEEE Trans Pattern Anal Mach Intell 26:572–581CrossRef
53.
Zurück zum Zitat Rossius R, Zenker G, Ittner A, Dilger W (1998) A short note about the application of polynomial kernel with fractional degree in support vector learning. In: Lecture notes in computer science, pp 143–148 Rossius R, Zenker G, Ittner A, Dilger W (1998) A short note about the application of polynomial kernel with fractional degree in support vector learning. In: Lecture notes in computer science, pp 143–148
54.
Zurück zum Zitat Tamimi H, Zell A (2004) Vision based localization of mobile robots using kernel approaches. In: Proceedings of the international conference on intelligent robots and systems (IROS 2004), pp 1896–1901 Tamimi H, Zell A (2004) Vision based localization of mobile robots using kernel approaches. In: Proceedings of the international conference on intelligent robots and systems (IROS 2004), pp 1896–1901
55.
Zurück zum Zitat Martin AF et al (1997) The DET curve in assessment of detection task performance. In: Proceedings of Eurospeech, vol 4, pp 1899–1903 Martin AF et al (1997) The DET curve in assessment of detection task performance. In: Proceedings of Eurospeech, vol 4, pp 1899–1903
56.
Zurück zum Zitat O’Mahony M (1986) Sensory evaluation of food: statistical methods and procedures. CRC Press, Boca Raton O’Mahony M (1986) Sensory evaluation of food: statistical methods and procedures. CRC Press, Boca Raton
57.
Zurück zum Zitat Lomax RG (2007) Statistical concepts: a second course. Lawrence Erlbaum Associates, Mahwah Lomax RG (2007) Statistical concepts: a second course. Lawrence Erlbaum Associates, Mahwah
Metadaten
Titel
Extension of a Kernel-Based Classifier for Discriminative Spoken Keyword Spotting
verfasst von
Shima Tabibian
Ahmad Akbari
Babak Nasersharif
Publikationsdatum
01.04.2014
Verlag
Springer US
Erschienen in
Neural Processing Letters / Ausgabe 2/2014
Print ISSN: 1370-4621
Elektronische ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-013-9299-4

Weitere Artikel der Ausgabe 2/2014

Neural Processing Letters 2/2014 Zur Ausgabe

Neuer Inhalt