nach oben

International Journal of Speech Technology

Erschienen in:

01.12.2013

A voice command system for AUTONOMY using a novel speech alignment algorithm

verfasst von: Helmut Hickersberger, Wolfgang L. Zagler

Erschienen in: International Journal of Speech Technology | Ausgabe 4/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The Viterbi dynamic programming algorithm is currently the de-facto standard for speech recognizers to deal with duration variations of the sub-word units of speech by properly aligning the sub-word units to the sub-word unit models. The algorithm is an integral part of the hidden Markov model speech recognizers. In this work a robust and simple voice command system is developed, implemented and tested. It uses a novel speech alignment algorithm, the so-called “run-length limited dynamic programming algorithm” (RLL-DP) instead. The voice command system described hereinafter facilitates the operation of the AUTONOMY system, which is an environmental control system combined with an alternative and augmentative communication system, using isolated words as voice commands. The activation of “run-length limits” causes a statistically significant reduction of the word error rate, even when using simple “centroid sequence word models” instead of acoustic models based on “hidden control neural networks” used in previous versions.

Vorheriger Artikel Optimal speech enhancement under signal presence uncertainty using Log Gabor Wavelet and Bayesian Joint Statistics

Nächster Artikel An overview of digital speech watermarking

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Dahl, G. E., Yu, D., Deng, L., & Acero, A. (2012). Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 20, 30–42. CrossRef

Do, V. H. (2011). Hybrid architectures for speech recognition. PhD Thesis, Nanyang, China: Nanyang Technological University.

Do, V. H., Xiao, X., & Chng, E. S. (2011). Comparison and combination of multilayer perceptrons and deep belief networks in hybrid automatic speech recognition systems. In Proceedings of the Asia-Pacific signal and information processing association annual summit and conference, APSIPA ASC, October 2011, Xi’an, China.

Ferguson, J. D. (1980). Variable duration models for speech. In Symposium on the application of hidden Markov models to text and speech, October 1980 (pp. 143–179). Princeton: Institute for Defense Analyses.

Forney, G. D. (1973). The viterbi algorithm. Proceedings of the IEEE, 61, 268–278. MathSciNetCrossRef

Fukunaga, K. (1990). Introduction to statistical pattern recognition. Boston: Academic Press. MATH

Gu, L., Harris, J. G., Shrivastav, R. S., & Sapienza, C. (2005). Disordered speech assessment using automatic methods based on quantitative measures. EURASIP Journal on Applied Signal Processing, 9, 1400–1409.

Hawley, M. S., Enderby, P., Green, P., Cunningham, S., Brownsell, S., Carmichael, J., Parker, M., Hatzis, A., O’Neill, P., & Palmer, R. (2007). A speech-controlled environmental control system for people with severe dysarthria. Medical Engineering & Physics, 29, 586–593. CrossRef

Hickersberger, H. (1998). Spracherkennung mit hidden control neural networks. E&I. Elektrotechnik und Informationstechnik, 115, 245–250.

Hüsken, M., & Stagge, P. (2003). Recurrent neural networks for time series classification. Neurocomputing, 50, 223–235. CrossRefMATH

Iso, K., & Watanabe, T. (1990). Speaker-independent word recognition using a neural prediction model. In Proceedings of the international conference on acoustics, speech, and signal processing, ICASSP, April 1990, Albuquerque, New Mexico, USA (pp. 441–444). CrossRef

Jaitly, N., Nguyen, P., Senior, A., & Vanhoucke, V. (2012). Application of pretrained deep neural network to large vocabulary conversational speech recognition. In Proceedings of the 13th annual conference of the international speech communication association INTERSPEECH, September 2012. Portland: ISCA.

Levin, E. (1993). Hidden control neural architecture modeling of nonlinear time varying systems and its applications. IEEE Transactions on Neural Networks, 4, 109–116. CrossRef

Levinson, S. E. (1986). Continuously variable duration hidden Markov models for speech analysis. In Proceedings of the international conference on acoustics, speech, and signal processing, ICASSP, April 1986, Tokyo, Japan (pp. 1241–1244).

Loidolt, G. (1995). AUTONOM III: Spracherkennung. Diploma thesis, Vienna, Austria: Vienna University of Technology.

Ostendorf, M., Digalakis, V. V., & Kimbal, O. A. (1996). From HMMs to segment models: a unified view of stochastic modeling for speech recognition. IEEE Transactions on Speech and Audio Processing, 4, 360–378. CrossRef

Panek, P., Beck, C., Mina, S., Seisenbacher, G., & Zagler, W. L. (2002). Technical assistance of motor- and multiple disabled children—some long term experiences. In Lecture notes in computer science: Vol. 2398. Proceedings of the 8th international conference on computers helping people with special needs, ICCHP, July 2002, Linz, Austria (pp. 181–188). CrossRef

Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77, 257–286. CrossRef

Ramesh, P., & Wilpon, J. G. (1992). Modeling state durations in hidden Markov models for automatic speech recognition. In Proceedings of the international conference on acoustics, speech, and signal processing, ICASSP, March 1992, San Francisco, California, USA (pp. 381–384).

Tebelskis, J., & Waibel, A. (1990). Large vocabulary recognition using linked predictive neural networks. In Proceedings of the international conference on acoustics, speech, and signal processing, ICASSP, April 1990, Albuquerque, New Mexico, USA (pp. 437–440). CrossRef

Tschirk, W. (2001). Neural net speech recognizers—voice remote control devices for disabled people. E&I. Elektrotechnik und Informationstechnik, 118, 367–370.

Vaseghi, S. V. (1991). Hidden Markov models with duration-dependent state transition probabilities. Electronics Letters, 27, 625–626. CrossRef

Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., & Lang, K. J. (1989). Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37, 328–339. CrossRef

Widrow, B., & Lehr, M. (1973). 30 years of adaptive neural networks: perceptron, madaline, and backpropagation. Proceedings of the IEEE, 78, 1415–1442. CrossRef

Yaniv, R., & Burshtein, D. (2003). An enhanced dynamic time warping model for improved estimation of DTW parameters. IEEE Transactions on Speech and Audio Processing, 11, 216–228. CrossRef

Yu, S.-Z. (2010). Hidden semi-Markov models. Artificial Intelligence, 174, 215–243. MathSciNetCrossRefMATH

Zagler, W. L., Panek, P., & Flachberger, C. (1997). Technical assistance for severely motor- and multiple impaired children. In Proceedings of the 10th IEEE symposium on computer-based medical systems, June 1997, Maribor, Slovenia (pp. 232–237). Washington: IEEE Computer Society.

Titel: A voice command system for AUTONOMY using a novel speech alignment algorithm
verfasst von: Helmut Hickersberger
Wolfgang L. Zagler
Publikationsdatum: 01.12.2013
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 4/2013
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-013-9196-2

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Kryptowährungen/© gopixa / Getty Images / iStock, MG4 aus China auf dem Prüfstand im ADAC-Technik-Zentrum in Landsberg am Lech/© ADAC e.V., Chassis eines Elektrofahrzeugs/© chesky / stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 4/2013

Pitch synchronous and glottal closure based speech analysis for language recognition

Performance evaluation of a wavelet-based pitch detection scheme

A new approach of speaker clustering based on the stereophonic differential energy

Wavelet fuzzy LVQ based speaker verification system

An overview of digital speech watermarking

A unified framework for domain independent online speaker indexing in eigen-voice space using an index tree of reference models

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.