Skip to main content
Erschienen in:
Buchtitelbild

2013 | OriginalPaper | Buchkapitel

1. Keyword Spotting Out of Continuous Speech

verfasst von : Ami Moyal, Vered Aharonson, Ella Tetariy, Michal Gishri

Erschienen in: Phonetic Search Methods for Large Speech Databases

Verlag: Springer New York

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Successful Automatic Speech Recognition (ASR) technology has been a research aspiration for the past five decades. Ideally, computers would be able to transform any type of human speech into an accurate textual transcription. Today’s ASR technology generates fairly good results using structured speech with relatively low Signal to Noise Ratios (SNR), but performance degrades when using spontaneous speech in real-life noisy environments (Murveit et al. 1992; Young 1996; Furui 2003; Deng and Huang 2004). Performance that is acceptable for commercial applications can be achieved using large training corpora of speech and text. However, there are still problems that need to be resolved.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Alon G (2005) Key-word spotting—the base technology for speech analytics. Rishon lezion, NSC—natural speech communications Alon G (2005) Key-word spotting—the base technology for speech analytics. Rishon lezion, NSC—natural speech communications
Zurück zum Zitat Amir A, Efrat A et al (2001) Advances in phonetic word spotting. In: Tenth international conference on information and knowledge management, Atlanta Amir A, Efrat A et al (2001) Advances in phonetic word spotting. In: Tenth international conference on information and knowledge management, Atlanta
Zurück zum Zitat Baker J, Deng L et al (2009) Developments and directions in speech recognition and understanding, Part 1 [DSP Education]. Signal Process Mag IEEE 26(3):75–80CrossRef Baker J, Deng L et al (2009) Developments and directions in speech recognition and understanding, Part 1 [DSP Education]. Signal Process Mag IEEE 26(3):75–80CrossRef
Zurück zum Zitat Barras C, Allauzen A et al (2002) Transcribing audio-video archives. In: 2002 I.E. international conference on acoustics, speech, and signal processing (ICASSP), IEEE Barras C, Allauzen A et al (2002) Transcribing audio-video archives. In: 2002 I.E. international conference on acoustics, speech, and signal processing (ICASSP), IEEE
Zurück zum Zitat Bar-Yosef Y, Aloni-Lavi R et al (2012) Cross-language phonetic search for keyword spotting. In: Proceedings of 2012 speech processing conference, Tel-Aviv Bar-Yosef Y, Aloni-Lavi R et al (2012) Cross-language phonetic search for keyword spotting. In: Proceedings of 2012 speech processing conference, Tel-Aviv
Zurück zum Zitat Burget L, Černocký J et al (2006) Indexing and search methods for spoken document. In: Text, speech and dialogue 4188/2006 of Lecture notes in computer science. pp 351–358 Burget L, Černocký J et al (2006) Indexing and search methods for spoken document. In: Text, speech and dialogue 4188/2006 of Lecture notes in computer science. pp 351–358
Zurück zum Zitat Butzberger J, Murveit H et al (1992) Spontaneous speech effects in large vocabulary speech recognition applications. In: Workshop on speech and natural language, Association for Computational Linguistics Butzberger J, Murveit H et al (1992) Spontaneous speech effects in large vocabulary speech recognition applications. In: Workshop on speech and natural language, Association for Computational Linguistics
Zurück zum Zitat Cardillo PS, Clements M et al (2002) Phonetic searching vs. LVCSR: how to find what you really want in audio archives. Int J Speech Technol 5(1):9–22MATHCrossRef Cardillo PS, Clements M et al (2002) Phonetic searching vs. LVCSR: how to find what you really want in audio archives. Int J Speech Technol 5(1):9–22MATHCrossRef
Zurück zum Zitat Dubois C, Charlet D (2008) Using textual information from LVCSR transcripts for phonetic-based spoken term detection. In: IEEE international conference on acoustics, speech and signal processing (ICASSP’08), Las Vegas Dubois C, Charlet D (2008) Using textual information from LVCSR transcripts for phonetic-based spoken term detection. In: IEEE international conference on acoustics, speech and signal processing (ICASSP’08), Las Vegas
Zurück zum Zitat Evermann G, Chan H et al (2005) Training LVCSR systems on thousands of hours of data. In: Submitted to ICASSP’05 Evermann G, Chan H et al (2005) Training LVCSR systems on thousands of hours of data. In: Submitted to ICASSP’05
Zurück zum Zitat Furui S (2003) Recent advances in spontaneous speech recognition and understanding. ISCA & IEEE workshop on spontaneous speech processing and recognition Furui S (2003) Recent advances in spontaneous speech recognition and understanding. ISCA & IEEE workshop on spontaneous speech processing and recognition
Zurück zum Zitat Furui S, Deng L et al (2012) Fundamental technologies in modern speech recognition. IEEE Signal Process Mag (IEEE Signal Processing Society) 26:16–17CrossRef Furui S, Deng L et al (2012) Fundamental technologies in modern speech recognition. IEEE Signal Process Mag (IEEE Signal Processing Society) 26:16–17CrossRef
Zurück zum Zitat Gosztolya G, Tóth L (2011) Spoken term detection based on the most probable phoneme sequence. In: 2011 I.E. 9th international symposium on applied machine intelligence and informatics (SAMI), IEEE, Smolenice Gosztolya G, Tóth L (2011) Spoken term detection based on the most probable phoneme sequence. In: 2011 I.E. 9th international symposium on applied machine intelligence and informatics (SAMI), IEEE, Smolenice
Zurück zum Zitat Heigold G, Nguyen P et al (2012) Investigations on exemplar-based features for speech recognition towards thousands of hours of unsupervised, noisy data. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE Heigold G, Nguyen P et al (2012) Investigations on exemplar-based features for speech recognition towards thousands of hours of unsupervised, noisy data. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE
Zurück zum Zitat Hirsch HG, Pearce D (2000) The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: ASR2000-automatic speech recognition: challenges for the new millennium ISCA tutorial and research workshop (ITRW) Hirsch HG, Pearce D (2000) The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: ASR2000-automatic speech recognition: challenges for the new millennium ISCA tutorial and research workshop (ITRW)
Zurück zum Zitat Huo Q, Jiang H et al (1997) A Bayesian predictive classification approach to robust speech recognition. In: 1997 I.E. international conference on acoustics, speech, and signal processing (ICASSP’97), vol. 2, IEEE Computer Society Huo Q, Jiang H et al (1997) A Bayesian predictive classification approach to robust speech recognition. In: 1997 I.E. international conference on acoustics, speech, and signal processing (ICASSP’97), vol. 2, IEEE Computer Society
Zurück zum Zitat Kai T, Suzuki M et al (2012) Combination of SPLICE and feature normalization for noise robust speech recognition. In: International workshop on nonlinear circuits, communications and signal processing (NCSP’12), Honolulu Kai T, Suzuki M et al (2012) Combination of SPLICE and feature normalization for noise robust speech recognition. In: International workshop on nonlinear circuits, communications and signal processing (NCSP’12), Honolulu
Zurück zum Zitat Kamm TM, Meyer GGL (2002) Selective sampling of training data for speech recognition. In: Proceedings of the second international conference on human language technology research, Morgan Kaufmann Publishers Inc, San Francisco Kamm TM, Meyer GGL (2002) Selective sampling of training data for speech recognition. In: Proceedings of the second international conference on human language technology research, Morgan Kaufmann Publishers Inc, San Francisco
Zurück zum Zitat Mammone RJ, Zhang X et al (1996) Robust speaker recognition: a feature-based approach. IEEE Signal Process Mag 13:58CrossRef Mammone RJ, Zhang X et al (1996) Robust speaker recognition: a feature-based approach. IEEE Signal Process Mag 13:58CrossRef
Zurück zum Zitat Mamou J, Ramabhadran B (2008) Phonetic query expansion for spoken document retrieval. In: Interspeech’08, Brisbane Mamou J, Ramabhadran B (2008) Phonetic query expansion for spoken document retrieval. In: Interspeech’08, Brisbane
Zurück zum Zitat Matrouf D, Gauvain J-L (1997) Model compensation for noises in training and test data. In: 1997 I.E. international conference on acoustics, speech, and signal processing (ICASSP’97), IEEE Computer Society Matrouf D, Gauvain J-L (1997) Model compensation for noises in training and test data. In: 1997 I.E. international conference on acoustics, speech, and signal processing (ICASSP’97), IEEE Computer Society
Zurück zum Zitat Mishne G, Carmel D et al (2005) Automatic analysis of call-center conversations. In: The 14th ACM international conference on information and knowledge management Mishne G, Carmel D et al (2005) Automatic analysis of call-center conversations. In: The 14th ACM international conference on information and knowledge management
Zurück zum Zitat Motlicek P, Valente F et al (2012) Improving acoustic based keyword spotting using LVCSR lattices. In: International conference on acoustic speech and signal processing, Japan Motlicek P, Valente F et al (2012) Improving acoustic based keyword spotting using LVCSR lattices. In: International conference on acoustic speech and signal processing, Japan
Zurück zum Zitat Parada C, Sethy A et al (2010) Balancing false alarms and hits in spoken term detection. In: IEEE international conference on acoustics, speech and signal processing (ICASSP’10), IEEE, Dallas Parada C, Sethy A et al (2010) Balancing false alarms and hits in spoken term detection. In: IEEE international conference on acoustics, speech and signal processing (ICASSP’10), IEEE, Dallas
Zurück zum Zitat Park Y, Patwardhan S et al (2008) An empirical analysis of word error rate and keyword error rate. In: The international conference on spoken language processing (ICSLP), Brisbane Park Y, Patwardhan S et al (2008) An empirical analysis of word error rate and keyword error rate. In: The international conference on spoken language processing (ICSLP), Brisbane
Zurück zum Zitat Sankar A, Lee CH (1996) A maximum-likelihood approach to stochastic matching for robust speech recognition. IEEE Trans Speech Audio Process 4(3):190–202CrossRef Sankar A, Lee CH (1996) A maximum-likelihood approach to stochastic matching for robust speech recognition. IEEE Trans Speech Audio Process 4(3):190–202CrossRef
Zurück zum Zitat Saon G, Chien J-T (2012) Large vocabulary continuous speech recognition recognition systems. IEEE Signal Process Mag (IEEE Signal Processing Society) 29:18–33CrossRef Saon G, Chien J-T (2012) Large vocabulary continuous speech recognition recognition systems. IEEE Signal Process Mag (IEEE Signal Processing Society) 29:18–33CrossRef
Zurück zum Zitat Schneider D (2011) Holistic vocabulary independent spoken term detection. Ph.D. dissertation. Rheinischen Friedrich-Wilhelms-Universitaat Bonn, Bonn Schneider D (2011) Holistic vocabulary independent spoken term detection. Ph.D. dissertation. Rheinischen Friedrich-Wilhelms-Universitaat Bonn, Bonn
Zurück zum Zitat Šmídl, L, Psutka J (2006) Comparison of keyword spotting methods for searching in speech. In: Interspeech 2006, ISCA, Bonn Šmídl, L, Psutka J (2006) Comparison of keyword spotting methods for searching in speech. In: Interspeech 2006, ISCA, Bonn
Zurück zum Zitat Szöke I, Schwarz P et al (2005) Comparison of keyword spotting approaches for informal continuous speech. In: Eurospeech’05, Lisbon Szöke I, Schwarz P et al (2005) Comparison of keyword spotting approaches for informal continuous speech. In: Eurospeech’05, Lisbon
Zurück zum Zitat Szöke I, Fapšo M et al (2008) Spoken term detection system based on combination of LVCSR and phonetic search. In: The 4th international conference on machine learning for multimodal interaction, Springer, Berlin Szöke I, Fapšo M et al (2008) Spoken term detection system based on combination of LVCSR and phonetic search. In: The 4th international conference on machine learning for multimodal interaction, Springer, Berlin
Zurück zum Zitat Thambiratnam K (2005) Acoustic keyword spotting in speech with applications to data mining. PhD, Speech and Audio Research Laboratory of the SAIVT Program—Center for Built Environment and Engineering Research. Queensland University of Technology, Brisbane, p 248 Thambiratnam K (2005) Acoustic keyword spotting in speech with applications to data mining. PhD, Speech and Audio Research Laboratory of the SAIVT Program—Center for Built Environment and Engineering Research. Queensland University of Technology, Brisbane, p 248
Zurück zum Zitat Thambiratnam K, Sridharan S (2005) Dynamic match phone-lattice searches for very fast and accurate unrestricted vocabulary keyword spotting. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP’05), Philadelphia Thambiratnam K, Sridharan S (2005) Dynamic match phone-lattice searches for very fast and accurate unrestricted vocabulary keyword spotting. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP’05), Philadelphia
Zurück zum Zitat Thambiratnam K, Sridharan S (2007) Rapid yet accurate speech indexing using dynamic match lattice spotting. IEEE Trans Audio Speech Lang Process 15(1):346–357CrossRef Thambiratnam K, Sridharan S (2007) Rapid yet accurate speech indexing using dynamic match lattice spotting. IEEE Trans Audio Speech Lang Process 15(1):346–357CrossRef
Zurück zum Zitat Tsao Y, Li J et al (2009) Ensemble speaker and speaking environment modeling approach with advanced online estimation process. In: IEEE international conference on acoustics, speech and signal processing (ICASSP’09), IEEE Computer Society, Taipei Tsao Y, Li J et al (2009) Ensemble speaker and speaking environment modeling approach with advanced online estimation process. In: IEEE international conference on acoustics, speech and signal processing (ICASSP’09), IEEE Computer Society, Taipei
Zurück zum Zitat Viikki O, Laurila K (1998) Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Commun 25(1):133–147CrossRef Viikki O, Laurila K (1998) Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Commun 25(1):133–147CrossRef
Zurück zum Zitat Wallace R, Vogt R et al (2007) A phonetic search approach to the to the 2006 NIST spoken term detection evaluation. In: 8th annual conference of the international speech communication association (INTERSPEECH 2007), ISCA, Antwerp Wallace R, Vogt R et al (2007) A phonetic search approach to the to the 2006 NIST spoken term detection evaluation. In: 8th annual conference of the international speech communication association (INTERSPEECH 2007), ISCA, Antwerp
Zurück zum Zitat Wang, D, Tejedor J et al (2008) A comparison of phone and grapheme-based spoken term detection. In: IEEE international conference on acoustics, speech and signal processing (ICASSP’08), Las Vegas Wang, D, Tejedor J et al (2008) A comparison of phone and grapheme-based spoken term detection. In: IEEE international conference on acoustics, speech and signal processing (ICASSP’08), Las Vegas
Zurück zum Zitat Wilpon JG, Rabiner LR et al (1990) Automatic recognition of keywords in unconstrained speech using hidden Markov models. IEEE Trans Acoust Speech Signal Process 38(11):1870–1878CrossRef Wilpon JG, Rabiner LR et al (1990) Automatic recognition of keywords in unconstrained speech using hidden Markov models. IEEE Trans Acoust Speech Signal Process 38(11):1870–1878CrossRef
Zurück zum Zitat Witbrock MJ, Hauptmann AG (1997) Using words and phonetic strings for efficient information retrieval from imperfectly transcribed spoken documents. In: The second ACM international conference on digital libraries, ACM Witbrock MJ, Hauptmann AG (1997) Using words and phonetic strings for efficient information retrieval from imperfectly transcribed spoken documents. In: The second ACM international conference on digital libraries, ACM
Zurück zum Zitat Young SJ (1993) The HTK hidden Markov model toolkit: design and philosophy. Technical Report TR 153, Department of Engineering, Cambridge University, Cambridge Young SJ (1993) The HTK hidden Markov model toolkit: design and philosophy. Technical Report TR 153, Department of Engineering, Cambridge University, Cambridge
Metadaten
Titel
Keyword Spotting Out of Continuous Speech
verfasst von
Ami Moyal
Vered Aharonson
Ella Tetariy
Michal Gishri
Copyright-Jahr
2013
Verlag
Springer New York
DOI
https://doi.org/10.1007/978-1-4614-6489-1_1

Neuer Inhalt