nach oben

Erschienen in:

2013 | OriginalPaper | Buchkapitel

1. Keyword Spotting Out of Continuous Speech

verfasst von : Ami Moyal, Vered Aharonson, Ella Tetariy, Michal Gishri

Erschienen in: Phonetic Search Methods for Large Speech Databases

Verlag: Springer New York

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Successful Automatic Speech Recognition (ASR) technology has been a research aspiration for the past five decades. Ideally, computers would be able to transform any type of human speech into an accurate textual transcription. Today’s ASR technology generates fairly good results using structured speech with relatively low Signal to Noise Ratios (SNR), but performance degrades when using spontaneous speech in real-life noisy environments (Murveit et al. 1992; Young 1996; Furui 2003; Deng and Huang 2004). Performance that is acceptable for commercial applications can be achieved using large training corpora of speech and text. However, there are still problems that need to be resolved.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nächstes Kapitel Keyword Spotting Methods

Alon G (2005) Key-word spotting—the base technology for speech analytics. Rishon lezion, NSC—natural speech communications

Amir A, Efrat A et al (2001) Advances in phonetic word spotting. In: Tenth international conference on information and knowledge management, Atlanta

Baker J, Deng L et al (2009) Developments and directions in speech recognition and understanding, Part 1 [DSP Education]. Signal Process Mag IEEE 26(3):75–80CrossRef

Barras C, Allauzen A et al (2002) Transcribing audio-video archives. In: 2002 I.E. international conference on acoustics, speech, and signal processing (ICASSP), IEEE

Bar-Yosef Y, Aloni-Lavi R et al (2012) Cross-language phonetic search for keyword spotting. In: Proceedings of 2012 speech processing conference, Tel-Aviv

Burget L, Černocký J et al (2006) Indexing and search methods for spoken document. In: Text, speech and dialogue 4188/2006 of Lecture notes in computer science. pp 351–358

Butzberger J, Murveit H et al (1992) Spontaneous speech effects in large vocabulary speech recognition applications. In: Workshop on speech and natural language, Association for Computational Linguistics

Cardillo PS, Clements M et al (2002) Phonetic searching vs. LVCSR: how to find what you really want in audio archives. Int J Speech Technol 5(1):9–22MATHCrossRef

Deng L, Huang X (2004) Challenges in adopting speech recognition. Commun ACM 47(1):69–75MathSciNetCrossRef

Dubois C, Charlet D (2008) Using textual information from LVCSR transcripts for phonetic-based spoken term detection. In: IEEE international conference on acoustics, speech and signal processing (ICASSP’08), Las Vegas

Evermann G, Chan H et al (2005) Training LVCSR systems on thousands of hours of data. In: Submitted to ICASSP’05

Furui S (2003) Recent advances in spontaneous speech recognition and understanding. ISCA & IEEE workshop on spontaneous speech processing and recognition

Furui S, Deng L et al (2012) Fundamental technologies in modern speech recognition. IEEE Signal Process Mag (IEEE Signal Processing Society) 26:16–17CrossRef

Gosztolya G, Tóth L (2011) Spoken term detection based on the most probable phoneme sequence. In: 2011 I.E. 9th international symposium on applied machine intelligence and informatics (SAMI), IEEE, Smolenice

Heigold G, Nguyen P et al (2012) Investigations on exemplar-based features for speech recognition towards thousands of hours of unsupervised, noisy data. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE

Hirsch HG, Pearce D (2000) The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: ASR2000-automatic speech recognition: challenges for the new millennium ISCA tutorial and research workshop (ITRW)

Huo Q, Jiang H et al (1997) A Bayesian predictive classification approach to robust speech recognition. In: 1997 I.E. international conference on acoustics, speech, and signal processing (ICASSP’97), vol. 2, IEEE Computer Society

Kai T, Suzuki M et al (2012) Combination of SPLICE and feature normalization for noise robust speech recognition. In: International workshop on nonlinear circuits, communications and signal processing (NCSP’12), Honolulu

Kamm TM, Meyer GGL (2002) Selective sampling of training data for speech recognition. In: Proceedings of the second international conference on human language technology research, Morgan Kaufmann Publishers Inc, San Francisco

Mammone RJ, Zhang X et al (1996) Robust speaker recognition: a feature-based approach. IEEE Signal Process Mag 13:58CrossRef

Mamou J, Ramabhadran B (2008) Phonetic query expansion for spoken document retrieval. In: Interspeech’08, Brisbane

Matrouf D, Gauvain J-L (1997) Model compensation for noises in training and test data. In: 1997 I.E. international conference on acoustics, speech, and signal processing (ICASSP’97), IEEE Computer Society

Mishne G, Carmel D et al (2005) Automatic analysis of call-center conversations. In: The 14th ACM international conference on information and knowledge management

Motlicek P, Valente F et al (2012) Improving acoustic based keyword spotting using LVCSR lattices. In: International conference on acoustic speech and signal processing, Japan

Parada C, Sethy A et al (2010) Balancing false alarms and hits in spoken term detection. In: IEEE international conference on acoustics, speech and signal processing (ICASSP’10), IEEE, Dallas

Park Y, Patwardhan S et al (2008) An empirical analysis of word error rate and keyword error rate. In: The international conference on spoken language processing (ICSLP), Brisbane

Sankar A, Lee CH (1996) A maximum-likelihood approach to stochastic matching for robust speech recognition. IEEE Trans Speech Audio Process 4(3):190–202CrossRef

Saon G, Chien J-T (2012) Large vocabulary continuous speech recognition recognition systems. IEEE Signal Process Mag (IEEE Signal Processing Society) 29:18–33CrossRef

Schneider D (2011) Holistic vocabulary independent spoken term detection. Ph.D. dissertation. Rheinischen Friedrich-Wilhelms-Universitaat Bonn, Bonn

Šmídl, L, Psutka J (2006) Comparison of keyword spotting methods for searching in speech. In: Interspeech 2006, ISCA, Bonn

Szöke I, Schwarz P et al (2005) Comparison of keyword spotting approaches for informal continuous speech. In: Eurospeech’05, Lisbon

Szöke I, Fapšo M et al (2008) Spoken term detection system based on combination of LVCSR and phonetic search. In: The 4th international conference on machine learning for multimodal interaction, Springer, Berlin

Thambiratnam K (2005) Acoustic keyword spotting in speech with applications to data mining. PhD, Speech and Audio Research Laboratory of the SAIVT Program—Center for Built Environment and Engineering Research. Queensland University of Technology, Brisbane, p 248

Thambiratnam K, Sridharan S (2005) Dynamic match phone-lattice searches for very fast and accurate unrestricted vocabulary keyword spotting. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP’05), Philadelphia

Thambiratnam K, Sridharan S (2007) Rapid yet accurate speech indexing using dynamic match lattice spotting. IEEE Trans Audio Speech Lang Process 15(1):346–357CrossRef

Tsao Y, Li J et al (2009) Ensemble speaker and speaking environment modeling approach with advanced online estimation process. In: IEEE international conference on acoustics, speech and signal processing (ICASSP’09), IEEE Computer Society, Taipei

Viikki O, Laurila K (1998) Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Commun 25(1):133–147CrossRef

Wallace R, Vogt R et al (2007) A phonetic search approach to the to the 2006 NIST spoken term detection evaluation. In: 8th annual conference of the international speech communication association (INTERSPEECH 2007), ISCA, Antwerp

Wang, D, Tejedor J et al (2008) A comparison of phone and grapheme-based spoken term detection. In: IEEE international conference on acoustics, speech and signal processing (ICASSP’08), Las Vegas

Wilpon JG, Rabiner LR et al (1990) Automatic recognition of keywords in unconstrained speech using hidden Markov models. IEEE Trans Acoust Speech Signal Process 38(11):1870–1878CrossRef

Witbrock MJ, Hauptmann AG (1997) Using words and phonetic strings for efficient information retrieval from imperfectly transcribed spoken documents. In: The second ACM international conference on digital libraries, ACM

Young SJ (1993) The HTK hidden Markov model toolkit: design and philosophy. Technical Report TR 153, Department of Engineering, Cambridge University, Cambridge

Titel: Keyword Spotting Out of Continuous Speech
verfasst von: Ami Moyal
Vered Aharonson
Ella Tetariy
Michal Gishri
Verlag: Springer New York
Buch: Phonetic Search Methods for Large Speech Databases
Print ISBN: 978-1-4614-6488-4

Electronic ISBN: 978-1-4614-6489-1

Copyright-Jahr: 2013
DOI: https://doi.org/10.1007/978-1-4614-6489-1_1

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Jonas Klose/© Pine Valley Capital GmbH, Carina Kießling von der Strategieberatung Roland Berger/© Monika Walther Fotografie | ATZ, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.