Skip to main content
Top
Published in: International Journal of Speech Technology 2/2013

01-06-2013

An efficient lattice-based phonetic search method for accelerating keyword spotting in large speech databases

Authors: Ella Tetariy, Michal Gishri, Baruch Har-Lev, Vered Aharonson, Ami Moyal

Published in: International Journal of Speech Technology | Issue 2/2013

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper describes an algorithm for the reduction of computational complexity in phonetic search KeyWord Spotting (KWS). This reduction is particularly important when searching for keywords within very large speech databases and aiming for rapid response time. The suggested algorithm consists of an anchor-based phoneme search that reduces the search space by generating hypotheses only around phonemes recognized with high reliability. Three databases have been used for the evaluation: IBM Voicemail I and Voicemail II, consisting of long spontaneous utterances and the Wall Street Journal portion of the MACROPHONE database, consisting of read speech utterances. The results indicated a significant reduction of nearly 90 % in the computational complexity of the search while improving the false alarm rate, with only a small decrease in the detection rate in both databases. Search space reduction, as well as, performance gain or loss can be controlled according to the user preferences via the suggested algorithm parameters and thresholds.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Alon, G. (2005). Key-word spotting—the base technology for speech analytics. Rishon Lezion: NSC—Natural Speech Communication. Alon, G. (2005). Key-word spotting—the base technology for speech analytics. Rishon Lezion: NSC—Natural Speech Communication.
go back to reference Amir, A., Efrat, A., & Srinivasan, S. (2001). Advances in phonetic word spotting. In Proceedings of the tenth international conference on information and knowledge management (pp. 580–582). Atlanta. Amir, A., Efrat, A., & Srinivasan, S. (2001). Advances in phonetic word spotting. In Proceedings of the tenth international conference on information and knowledge management (pp. 580–582). Atlanta.
go back to reference Bernstein, J., Taussig, K., & Godfrey, J. (1994). MACROPHONE. Philadelphia, USA: Linguistic Data Consortium (LDC). Bernstein, J., Taussig, K., & Godfrey, J. (1994). MACROPHONE. Philadelphia, USA: Linguistic Data Consortium (LDC).
go back to reference Clements, M., Cardillo, P., & Miller, M. (2001). Phonetic searching of digital audio. In Proceedings of the broadcast engineering conference (pp. 131–140). Washington. Clements, M., Cardillo, P., & Miller, M. (2001). Phonetic searching of digital audio. In Proceedings of the broadcast engineering conference (pp. 131–140). Washington.
go back to reference Gishri, M., & Silber-Varod, V. (2010). Lexicon design for transcription of spontaneous voice messages. In Proceedings of the seventh conference on international language resources and evaluation. Valetta. Gishri, M., & Silber-Varod, V. (2010). Lexicon design for transcription of spontaneous voice messages. In Proceedings of the seventh conference on international language resources and evaluation. Valetta.
go back to reference Gusfield, D. (1997). Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge: Cambridge University Press. MATHCrossRef Gusfield, D. (1997). Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge: Cambridge University Press. MATHCrossRef
go back to reference Hermelin, D., Landau, G. M., Landau, S., & Weimann, O. (2009). A unified algorithm for accelerating edit distance computation via text compression. In Proceedings of the 26th international symposium on theoretical aspects of computer science. Hermelin, D., Landau, G. M., Landau, S., & Weimann, O. (2009). A unified algorithm for accelerating edit distance computation via text compression. In Proceedings of the 26th international symposium on theoretical aspects of computer science.
go back to reference James, D. A., & Young, S. J. (1994). A fast lattice-based approach to vocabulary independent wordspotting. In Proceedings of the international conference on acoustics, speech, and signal processing (Vol. 1, pp. 337–380). Adelaide: IEEE Comput. Soc. James, D. A., & Young, S. J. (1994). A fast lattice-based approach to vocabulary independent wordspotting. In Proceedings of the international conference on acoustics, speech, and signal processing (Vol. 1, pp. 337–380). Adelaide: IEEE Comput. Soc.
go back to reference Padmanabhan, M., Ramaswamy, G., Ramabhadran, B., Gopalakrishnan, P. S., & Dunn, C. (1998). Voicemail Corpus I. Philadelphia, USA: Linguistic Data Consortium (LDC). Padmanabhan, M., Ramaswamy, G., Ramabhadran, B., Gopalakrishnan, P. S., & Dunn, C. (1998). Voicemail Corpus I. Philadelphia, USA: Linguistic Data Consortium (LDC).
go back to reference Padmanabhan, M., Kingsbury, B., Ramabhadran, B., Huang, J., Stanley, C., Saon, G., et al. (2002). Voicemail Corpus Part II. Philadelphia, USA: Linguistic Data Consortium (LDC). Padmanabhan, M., Kingsbury, B., Ramabhadran, B., Huang, J., Stanley, C., Saon, G., et al. (2002). Voicemail Corpus Part II. Philadelphia, USA: Linguistic Data Consortium (LDC).
go back to reference Pucher, M., Türk, A., Ajmera, J., & Fecher, N. (2007). Phonetic distance measures for speech recognition vocabulary and grammar optimization. In Proceedings of the tenth international conference on spoken language processing. Antwerp. Pucher, M., Türk, A., Ajmera, J., & Fecher, N. (2007). Phonetic distance measures for speech recognition vocabulary and grammar optimization. In Proceedings of the tenth international conference on spoken language processing. Antwerp.
go back to reference Szöke, I., Schwarz, P., Matějka, P., Burget, L., Karfiát, M., & Fapšo, M., et al. (2005). Comparison of keyword spotting approaches for informal continuous speech. In Proceedings of interspeech (pp. 633–636). Lisbon. Szöke, I., Schwarz, P., Matějka, P., Burget, L., Karfiát, M., & Fapšo, M., et al. (2005). Comparison of keyword spotting approaches for informal continuous speech. In Proceedings of interspeech (pp. 633–636). Lisbon.
go back to reference Tetariy, E., Aharonson, V., & Moyal, A. (2010). Phonetic search using an anchor-based algorithm. In Proceedings of the 26th convention of electrical and electronics engineers in Israel. Eilat. Tetariy, E., Aharonson, V., & Moyal, A. (2010). Phonetic search using an anchor-based algorithm. In Proceedings of the 26th convention of electrical and electronics engineers in Israel. Eilat.
go back to reference Thambiratnam, K., & Sridharan, S. (2005). Dynamic match phone-lattice searches for very fast and accurate unrestricted vocabulary keyword spotting. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP). Philadelphia. Thambiratnam, K., & Sridharan, S. (2005). Dynamic match phone-lattice searches for very fast and accurate unrestricted vocabulary keyword spotting. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP). Philadelphia.
go back to reference Wilpon, J. G., Rabiner, L. R., Lee, C. H., & Goldman, E. R. (1990). Automatic recognition of keywords in unconstrained speech using hidden Markov models. IEEE Transactions on Acoustics, Speech, and Signal Processing, 38(11), 1870–1878. CrossRef Wilpon, J. G., Rabiner, L. R., Lee, C. H., & Goldman, E. R. (1990). Automatic recognition of keywords in unconstrained speech using hidden Markov models. IEEE Transactions on Acoustics, Speech, and Signal Processing, 38(11), 1870–1878. CrossRef
go back to reference Yu, P., & Seide, F. (2004). A hybrid word/phoneme-based approach for improved vocabulary-independent search in spontaneous speech. In Proceedings of the first international conference on logistics strategy for ports. Dalian. Yu, P., & Seide, F. (2004). A hybrid word/phoneme-based approach for improved vocabulary-independent search in spontaneous speech. In Proceedings of the first international conference on logistics strategy for ports. Dalian.
Metadata
Title
An efficient lattice-based phonetic search method for accelerating keyword spotting in large speech databases
Authors
Ella Tetariy
Michal Gishri
Baruch Har-Lev
Vered Aharonson
Ami Moyal
Publication date
01-06-2013
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 2/2013
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-012-9171-3

Other articles of this Issue 2/2013

International Journal of Speech Technology 2/2013 Go to the issue