Skip to main content
Erschienen in: International Journal of Speech Technology 4/2013

01.12.2013

A voice command system for AUTONOMY using a novel speech alignment algorithm

verfasst von: Helmut Hickersberger, Wolfgang L. Zagler

Erschienen in: International Journal of Speech Technology | Ausgabe 4/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The Viterbi dynamic programming algorithm is currently the de-facto standard for speech recognizers to deal with duration variations of the sub-word units of speech by properly aligning the sub-word units to the sub-word unit models. The algorithm is an integral part of the hidden Markov model speech recognizers. In this work a robust and simple voice command system is developed, implemented and tested. It uses a novel speech alignment algorithm, the so-called “run-length limited dynamic programming algorithm” (RLL-DP) instead. The voice command system described hereinafter facilitates the operation of the AUTONOMY system, which is an environmental control system combined with an alternative and augmentative communication system, using isolated words as voice commands. The activation of “run-length limits” causes a statistically significant reduction of the word error rate, even when using simple “centroid sequence word models” instead of acoustic models based on “hidden control neural networks” used in previous versions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Dahl, G. E., Yu, D., Deng, L., & Acero, A. (2012). Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 20, 30–42. CrossRef Dahl, G. E., Yu, D., Deng, L., & Acero, A. (2012). Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 20, 30–42. CrossRef
Zurück zum Zitat Do, V. H. (2011). Hybrid architectures for speech recognition. PhD Thesis, Nanyang, China: Nanyang Technological University. Do, V. H. (2011). Hybrid architectures for speech recognition. PhD Thesis, Nanyang, China: Nanyang Technological University.
Zurück zum Zitat Do, V. H., Xiao, X., & Chng, E. S. (2011). Comparison and combination of multilayer perceptrons and deep belief networks in hybrid automatic speech recognition systems. In Proceedings of the Asia-Pacific signal and information processing association annual summit and conference, APSIPA ASC, October 2011, Xi’an, China. Do, V. H., Xiao, X., & Chng, E. S. (2011). Comparison and combination of multilayer perceptrons and deep belief networks in hybrid automatic speech recognition systems. In Proceedings of the Asia-Pacific signal and information processing association annual summit and conference, APSIPA ASC, October 2011, Xi’an, China.
Zurück zum Zitat Ferguson, J. D. (1980). Variable duration models for speech. In Symposium on the application of hidden Markov models to text and speech, October 1980 (pp. 143–179). Princeton: Institute for Defense Analyses. Ferguson, J. D. (1980). Variable duration models for speech. In Symposium on the application of hidden Markov models to text and speech, October 1980 (pp. 143–179). Princeton: Institute for Defense Analyses.
Zurück zum Zitat Fukunaga, K. (1990). Introduction to statistical pattern recognition. Boston: Academic Press. MATH Fukunaga, K. (1990). Introduction to statistical pattern recognition. Boston: Academic Press. MATH
Zurück zum Zitat Gu, L., Harris, J. G., Shrivastav, R. S., & Sapienza, C. (2005). Disordered speech assessment using automatic methods based on quantitative measures. EURASIP Journal on Applied Signal Processing, 9, 1400–1409. Gu, L., Harris, J. G., Shrivastav, R. S., & Sapienza, C. (2005). Disordered speech assessment using automatic methods based on quantitative measures. EURASIP Journal on Applied Signal Processing, 9, 1400–1409.
Zurück zum Zitat Hawley, M. S., Enderby, P., Green, P., Cunningham, S., Brownsell, S., Carmichael, J., Parker, M., Hatzis, A., O’Neill, P., & Palmer, R. (2007). A speech-controlled environmental control system for people with severe dysarthria. Medical Engineering & Physics, 29, 586–593. CrossRef Hawley, M. S., Enderby, P., Green, P., Cunningham, S., Brownsell, S., Carmichael, J., Parker, M., Hatzis, A., O’Neill, P., & Palmer, R. (2007). A speech-controlled environmental control system for people with severe dysarthria. Medical Engineering & Physics, 29, 586–593. CrossRef
Zurück zum Zitat Hickersberger, H. (1998). Spracherkennung mit hidden control neural networks. E&I. Elektrotechnik und Informationstechnik, 115, 245–250. Hickersberger, H. (1998). Spracherkennung mit hidden control neural networks. E&I. Elektrotechnik und Informationstechnik, 115, 245–250.
Zurück zum Zitat Hüsken, M., & Stagge, P. (2003). Recurrent neural networks for time series classification. Neurocomputing, 50, 223–235. CrossRefMATH Hüsken, M., & Stagge, P. (2003). Recurrent neural networks for time series classification. Neurocomputing, 50, 223–235. CrossRefMATH
Zurück zum Zitat Iso, K., & Watanabe, T. (1990). Speaker-independent word recognition using a neural prediction model. In Proceedings of the international conference on acoustics, speech, and signal processing, ICASSP, April 1990, Albuquerque, New Mexico, USA (pp. 441–444). CrossRef Iso, K., & Watanabe, T. (1990). Speaker-independent word recognition using a neural prediction model. In Proceedings of the international conference on acoustics, speech, and signal processing, ICASSP, April 1990, Albuquerque, New Mexico, USA (pp. 441–444). CrossRef
Zurück zum Zitat Jaitly, N., Nguyen, P., Senior, A., & Vanhoucke, V. (2012). Application of pretrained deep neural network to large vocabulary conversational speech recognition. In Proceedings of the 13th annual conference of the international speech communication association INTERSPEECH, September 2012. Portland: ISCA. Jaitly, N., Nguyen, P., Senior, A., & Vanhoucke, V. (2012). Application of pretrained deep neural network to large vocabulary conversational speech recognition. In Proceedings of the 13th annual conference of the international speech communication association INTERSPEECH, September 2012. Portland: ISCA.
Zurück zum Zitat Levin, E. (1993). Hidden control neural architecture modeling of nonlinear time varying systems and its applications. IEEE Transactions on Neural Networks, 4, 109–116. CrossRef Levin, E. (1993). Hidden control neural architecture modeling of nonlinear time varying systems and its applications. IEEE Transactions on Neural Networks, 4, 109–116. CrossRef
Zurück zum Zitat Levinson, S. E. (1986). Continuously variable duration hidden Markov models for speech analysis. In Proceedings of the international conference on acoustics, speech, and signal processing, ICASSP, April 1986, Tokyo, Japan (pp. 1241–1244). Levinson, S. E. (1986). Continuously variable duration hidden Markov models for speech analysis. In Proceedings of the international conference on acoustics, speech, and signal processing, ICASSP, April 1986, Tokyo, Japan (pp. 1241–1244).
Zurück zum Zitat Loidolt, G. (1995). AUTONOM III: Spracherkennung. Diploma thesis, Vienna, Austria: Vienna University of Technology. Loidolt, G. (1995). AUTONOM III: Spracherkennung. Diploma thesis, Vienna, Austria: Vienna University of Technology.
Zurück zum Zitat Ostendorf, M., Digalakis, V. V., & Kimbal, O. A. (1996). From HMMs to segment models: a unified view of stochastic modeling for speech recognition. IEEE Transactions on Speech and Audio Processing, 4, 360–378. CrossRef Ostendorf, M., Digalakis, V. V., & Kimbal, O. A. (1996). From HMMs to segment models: a unified view of stochastic modeling for speech recognition. IEEE Transactions on Speech and Audio Processing, 4, 360–378. CrossRef
Zurück zum Zitat Panek, P., Beck, C., Mina, S., Seisenbacher, G., & Zagler, W. L. (2002). Technical assistance of motor- and multiple disabled children—some long term experiences. In Lecture notes in computer science: Vol. 2398. Proceedings of the 8th international conference on computers helping people with special needs, ICCHP, July 2002, Linz, Austria (pp. 181–188). CrossRef Panek, P., Beck, C., Mina, S., Seisenbacher, G., & Zagler, W. L. (2002). Technical assistance of motor- and multiple disabled children—some long term experiences. In Lecture notes in computer science: Vol. 2398. Proceedings of the 8th international conference on computers helping people with special needs, ICCHP, July 2002, Linz, Austria (pp. 181–188). CrossRef
Zurück zum Zitat Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77, 257–286. CrossRef Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77, 257–286. CrossRef
Zurück zum Zitat Ramesh, P., & Wilpon, J. G. (1992). Modeling state durations in hidden Markov models for automatic speech recognition. In Proceedings of the international conference on acoustics, speech, and signal processing, ICASSP, March 1992, San Francisco, California, USA (pp. 381–384). Ramesh, P., & Wilpon, J. G. (1992). Modeling state durations in hidden Markov models for automatic speech recognition. In Proceedings of the international conference on acoustics, speech, and signal processing, ICASSP, March 1992, San Francisco, California, USA (pp. 381–384).
Zurück zum Zitat Tebelskis, J., & Waibel, A. (1990). Large vocabulary recognition using linked predictive neural networks. In Proceedings of the international conference on acoustics, speech, and signal processing, ICASSP, April 1990, Albuquerque, New Mexico, USA (pp. 437–440). CrossRef Tebelskis, J., & Waibel, A. (1990). Large vocabulary recognition using linked predictive neural networks. In Proceedings of the international conference on acoustics, speech, and signal processing, ICASSP, April 1990, Albuquerque, New Mexico, USA (pp. 437–440). CrossRef
Zurück zum Zitat Tschirk, W. (2001). Neural net speech recognizers—voice remote control devices for disabled people. E&I. Elektrotechnik und Informationstechnik, 118, 367–370. Tschirk, W. (2001). Neural net speech recognizers—voice remote control devices for disabled people. E&I. Elektrotechnik und Informationstechnik, 118, 367–370.
Zurück zum Zitat Vaseghi, S. V. (1991). Hidden Markov models with duration-dependent state transition probabilities. Electronics Letters, 27, 625–626. CrossRef Vaseghi, S. V. (1991). Hidden Markov models with duration-dependent state transition probabilities. Electronics Letters, 27, 625–626. CrossRef
Zurück zum Zitat Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., & Lang, K. J. (1989). Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37, 328–339. CrossRef Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., & Lang, K. J. (1989). Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37, 328–339. CrossRef
Zurück zum Zitat Widrow, B., & Lehr, M. (1973). 30 years of adaptive neural networks: perceptron, madaline, and backpropagation. Proceedings of the IEEE, 78, 1415–1442. CrossRef Widrow, B., & Lehr, M. (1973). 30 years of adaptive neural networks: perceptron, madaline, and backpropagation. Proceedings of the IEEE, 78, 1415–1442. CrossRef
Zurück zum Zitat Yaniv, R., & Burshtein, D. (2003). An enhanced dynamic time warping model for improved estimation of DTW parameters. IEEE Transactions on Speech and Audio Processing, 11, 216–228. CrossRef Yaniv, R., & Burshtein, D. (2003). An enhanced dynamic time warping model for improved estimation of DTW parameters. IEEE Transactions on Speech and Audio Processing, 11, 216–228. CrossRef
Zurück zum Zitat Zagler, W. L., Panek, P., & Flachberger, C. (1997). Technical assistance for severely motor- and multiple impaired children. In Proceedings of the 10th IEEE symposium on computer-based medical systems, June 1997, Maribor, Slovenia (pp. 232–237). Washington: IEEE Computer Society. Zagler, W. L., Panek, P., & Flachberger, C. (1997). Technical assistance for severely motor- and multiple impaired children. In Proceedings of the 10th IEEE symposium on computer-based medical systems, June 1997, Maribor, Slovenia (pp. 232–237). Washington: IEEE Computer Society.
Metadaten
Titel
A voice command system for AUTONOMY using a novel speech alignment algorithm
verfasst von
Helmut Hickersberger
Wolfgang L. Zagler
Publikationsdatum
01.12.2013
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 4/2013
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-013-9196-2

Weitere Artikel der Ausgabe 4/2013

International Journal of Speech Technology 4/2013 Zur Ausgabe

Neuer Inhalt