Skip to main content
Erschienen in: International Journal of Speech Technology 4/2017

26.10.2017

A voice command detection system for aerospace applications

verfasst von: Shima Tabibian

Erschienen in: International Journal of Speech Technology | Ausgabe 4/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Nowadays, according to ever-increasing volumes of audio content, audio processing is a vital need. In the aerospace field, voice commands could be used instead of data commands in order to speed up the command transmission, help crewmembers to complete their tasks by allowing hands-free control of supplemental equipment and as a redundant system for increasing the reliability of command transmission. In this paper, a voice command detection (VCD) framework is proposed for aerospace applications, which decodes the voice commands to comprehensible and executable commands, in an acceptable speed with a low false alarm rate. The framework is mainly based on a keyword spotting method, which extracts some pre-defined target keywords from the input voice commands. The mentioned keywords are input arguments to the proposed rule-based language model (LM). The rule-based LM decodes the voice commands based on the input keywords and their locations. Two keyword spotters are trained and used in the VCD system. The phone-based keyword spotter is trained on TIMIT database. Then, speaker adaptation methods are exploited to modify the parameters of the trained models using non-native speaker utterances. The word-based keyword spotter is trained on a database prepared and specialized for aerospace applications. The experimental results show that the word-based VCD system decodes the voice commands with true detection rate equal to 88% and false alarm rate equal to 12%, in average. Additionally, using speaker adaptation methods in the phone-based VCD system improves the true detection and false alarm rates about 21% and 21%, respectively.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Ahmed, A., Ahmed, T., Ullah, M., et al. (2012) Controlling and securing a digital home using multiple sensor based perception system integrated with mobile and voice technology. arXiv preprint arXiv:1209.5420. Ahmed, A., Ahmed, T., Ullah, M., et al. (2012) Controlling and securing a digital home using multiple sensor based perception system integrated with mobile and voice technology. arXiv preprint arXiv:​1209.​5420.
Zurück zum Zitat Bahl, L., Brown, P., De Souza, P., et al. (1986) Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP’86 (pp. 49–52). IEEE. Bahl, L., Brown, P., De Souza, P., et al. (1986) Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP’86 (pp. 49–52). IEEE.
Zurück zum Zitat Benayed, Y., Fohr, D., Haton, J. P., et al. (2003a) Improving the performance of a keyword spotting system by using support vector machines. In 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, 2003. ASRU’03. (pp. 145–149). IEEE. Benayed, Y., Fohr, D., Haton, J. P., et al. (2003a) Improving the performance of a keyword spotting system by using support vector machines. In 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, 2003. ASRU’03. (pp. 145–149). IEEE.
Zurück zum Zitat Butt, M., Khanam, M., Khiyal, M., Khan, A., et al. (2011) Controlling home appliances remotely through voice command. (IJACSA) International Journal of Advanced Computer Science and Applications, Special Issue on Wireless and Mobile Networks, 35–39. doi:10.14569/SpecialIssue.2011.010206. Butt, M., Khanam, M., Khiyal, M., Khan, A., et al. (2011) Controlling home appliances remotely through voice command. (IJACSA) International Journal of Advanced Computer Science and Applications, Special Issue on Wireless and Mobile Networks, 35–39. doi:10.​14569/​SpecialIssue.​2011.​010206.
Zurück zum Zitat Chen, C.-P., Bilmes, J. A., & Kirchhoff, K. (2002) Low-resource noise-robust feature post-processing on AURORA 2.0. In Seventh International Conference on Spoken Language Processing. Chen, C.-P., Bilmes, J. A., & Kirchhoff, K. (2002) Low-resource noise-robust feature post-processing on AURORA 2.0. In Seventh International Conference on Spoken Language Processing.
Zurück zum Zitat Chen, G., Parada, C., & Heigold, G. (2014) Small-footprint keyword spotting using deep neural networks. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4087–4091). IEEE. Chen, G., Parada, C., & Heigold, G. (2014) Small-footprint keyword spotting using deep neural networks. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4087–4091). IEEE.
Zurück zum Zitat Cornu, E., Destrez, N., Dufaux, A., et al. (2002) An ultra low power, ultra miniature voice command system based on hidden markov models. In 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. IV-3800–IV-3803). IEEE. Cornu, E., Destrez, N., Dufaux, A., et al. (2002) An ultra low power, ultra miniature voice command system based on hidden markov models. In 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. IV-3800–IV-3803). IEEE.
Zurück zum Zitat Fernández, S., Graves, A., & Schmidhuber, J. (2007) An application of recurrent neural networks to discriminative keyword spotting. In International Conference on Artificial Neural Networks (pp. 220–229). Berlin: Springer. Fernández, S., Graves, A., & Schmidhuber, J. (2007) An application of recurrent neural networks to discriminative keyword spotting. In International Conference on Artificial Neural Networks (pp. 220–229). Berlin: Springer.
Zurück zum Zitat Fezari, M., Boumaza, M. S., & Aldahoud, A. (2012) Voice command system based on pipelining classifiers GMM-HMM. In 2012 International Conference on Information Technology and e-Services (ICITeS) (pp. 1–6). IEEE. Fezari, M., Boumaza, M. S., & Aldahoud, A. (2012) Voice command system based on pipelining classifiers GMM-HMM. In 2012 International Conference on Information Technology and e-Services (ICITeS) (pp. 1–6). IEEE.
Zurück zum Zitat Fezari, M., & Bousbia-Salah, M. (2006) A voice command system for autonomous robots guidance. In 9th IEEE International Workshop on Advanced Motion Control (pp. 261–265.). IEEE. Fezari, M., & Bousbia-Salah, M. (2006) A voice command system for autonomous robots guidance. In 9th IEEE International Workshop on Advanced Motion Control (pp. 261–265.). IEEE.
Zurück zum Zitat Firdaus, A. M., Yusof, R. M., Saharul, A., et al. (2015) Controlling an electric car starter system through voice. International Journal of Science & Technology Research, 4(4), 5–9. Firdaus, A. M., Yusof, R. M., Saharul, A., et al. (2015) Controlling an electric car starter system through voice. International Journal of Science & Technology Research, 4(4), 5–9.
Zurück zum Zitat Gupta, A., Patel, N., & Khan, S. (2014) Automatic speech recognition technique for voice command. In 2014 International Conference on Science Engineering and Management Research (ICSEMR) (pp. 1–5). IEEE. Gupta, A., Patel, N., & Khan, S. (2014) Automatic speech recognition technique for voice command. In 2014 International Conference on Science Engineering and Management Research (ICSEMR) (pp. 1–5). IEEE.
Zurück zum Zitat Hoque, E., Dickerson, R. F., & Stankovic, J. A. (2014) Vocal-diary: A voice command based ground truth collection system for activity recognition. In Proceedings of the Wireless Health 2014 on National Institutes of Health (pp. 1–6). ACM. Hoque, E., Dickerson, R. F., & Stankovic, J. A. (2014) Vocal-diary: A voice command based ground truth collection system for activity recognition. In Proceedings of the Wireless Health 2014 on National Institutes of Health (pp. 1–6). ACM.
Zurück zum Zitat Juang, B.-H., & Katagiri, S. (1992). Discriminative learning for minimum error classification (pattern recognition). IEEE Transactions on signal processing, 40, 3043–3054.CrossRefMATH Juang, B.-H., & Katagiri, S. (1992). Discriminative learning for minimum error classification (pattern recognition). IEEE Transactions on signal processing, 40, 3043–3054.CrossRefMATH
Zurück zum Zitat Keshet, J., Grangier, D., & Bengio, S. (2009). Discriminative keyword spotting. Speech Communication, 51, 317–329.CrossRef Keshet, J., Grangier, D., & Bengio, S. (2009). Discriminative keyword spotting. Speech Communication, 51, 317–329.CrossRef
Zurück zum Zitat Lamel, L. F., Kassel, R. H., & Seneff, S. (1989) Speech database development: Design and analysis of the acoustic-phonetic corpus. In Speech Input/Output Assessment and Speech Databases. Lamel, L. F., Kassel, R. H., & Seneff, S. (1989) Speech database development: Design and analysis of the acoustic-phonetic corpus. In Speech Input/Output Assessment and Speech Databases.
Zurück zum Zitat Li, J., Deng, L., Gong, Y., et al. (2014) An overview of noise-robust automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22, 745–777.CrossRef Li, J., Deng, L., Gong, Y., et al. (2014) An overview of noise-robust automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22, 745–777.CrossRef
Zurück zum Zitat Liu, W. K., & Fung, P. N. (2000) MLLR-based accent model adaptation without accented data. In Sixth International Conference on Spoken Language Processing (ICSLP 2000), Beijing. Liu, W. K., & Fung, P. N. (2000) MLLR-based accent model adaptation without accented data. In Sixth International Conference on Spoken Language Processing (ICSLP 2000), Beijing.
Zurück zum Zitat Manikandan, M., Araghuram, S. D., Vignesh, S., et al. (2015). Device control using voice recognition in wireless smart home system. International Journal of Innovative Research in Computer and Communication Engineering, 3, 104–108. Manikandan, M., Araghuram, S. D., Vignesh, S., et al. (2015). Device control using voice recognition in wireless smart home system. International Journal of Innovative Research in Computer and Communication Engineering, 3, 104–108.
Zurück zum Zitat Manos, A. S., & Zue, V. W. (1997) A segment-based wordspotter using phonetic filler models. In 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1997. ICASSP-97 (pp. 899–902). IEEE. Manos, A. S., & Zue, V. W. (1997) A segment-based wordspotter using phonetic filler models. In 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1997. ICASSP-97 (pp. 899–902). IEEE.
Zurück zum Zitat Morris, R. B., Whitmore, M., & Adam, S. C. (1993). How well does voice interaction work in space? IEEE Aerospace and Electronic Systems Magazine, 8, 26–31.CrossRef Morris, R. B., Whitmore, M., & Adam, S. C. (1993). How well does voice interaction work in space? IEEE Aerospace and Electronic Systems Magazine, 8, 26–31.CrossRef
Zurück zum Zitat Mporas, I., Ganchev, T., Siafarikas, M., et al. (2007). Comparison of speech features on the speech recognition task. Journal of Computer Science, 3, 608–616.CrossRef Mporas, I., Ganchev, T., Siafarikas, M., et al. (2007). Comparison of speech features on the speech recognition task. Journal of Computer Science, 3, 608–616.CrossRef
Zurück zum Zitat Ngo, K., Spriet, A., Moonen, M., et al. (2012). A combined multi-channel Wiener filter-based noise reduction and dynamic range compression in hearing aids. Signal Processing, 92, 417–426.CrossRef Ngo, K., Spriet, A., Moonen, M., et al. (2012). A combined multi-channel Wiener filter-based noise reduction and dynamic range compression in hearing aids. Signal Processing, 92, 417–426.CrossRef
Zurück zum Zitat Özkartal, S. G. (2015). Development of a system for human language commands and control for a quadcopter application. Journal of Management Research, 7, 1.CrossRef Özkartal, S. G. (2015). Development of a system for human language commands and control for a quadcopter application. Journal of Management Research, 7, 1.CrossRef
Zurück zum Zitat Povey, D., & Woodland, P. C. (2002) Minimum phone error and I-smoothing for improved discriminative training. In 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. I-105–I-108). IEEE. Povey, D., & Woodland, P. C. (2002) Minimum phone error and I-smoothing for improved discriminative training. In 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. I-105–I-108). IEEE.
Zurück zum Zitat Principi, E., Squartini, S., Bonfigli, R., et al. (2015). An integrated system for voice command recognition and emergency detection based on audio signals. Expert Systems with Applications, 42, 5668–5683.CrossRef Principi, E., Squartini, S., Bonfigli, R., et al. (2015). An integrated system for voice command recognition and emergency detection based on audio signals. Expert Systems with Applications, 42, 5668–5683.CrossRef
Zurück zum Zitat Rohlicek, J. R., Russell, W., Roukos, S., et al. (1989) Continuous hidden Markov modeling for speaker-independent word spotting. In 1989 International Conference on Acoustics, Speech, and Signal Processing, 1989. ICASSP-89 (pp. 627–630). IEEE. Rohlicek, J. R., Russell, W., Roukos, S., et al. (1989) Continuous hidden Markov modeling for speaker-independent word spotting. In 1989 International Conference on Acoustics, Speech, and Signal Processing, 1989. ICASSP-89 (pp. 627–630). IEEE.
Zurück zum Zitat Shokri, A., Tabibian, S., Akbari, A., et al. (2011) A robust keyword spotting system for Persian conversational telephone speech using feature and score normalization and ARMA filter. In 2011 IEEE GCC Conference and Exhibition (GCC) (pp. 497–500). IEEE. Shokri, A., Tabibian, S., Akbari, A., et al. (2011) A robust keyword spotting system for Persian conversational telephone speech using feature and score normalization and ARMA filter. In 2011 IEEE GCC Conference and Exhibition (GCC) (pp. 497–500). IEEE.
Zurück zum Zitat Szöke, I., Schwarz, P., Matejka, P., et al. (2005) Comparison of keyword spotting approaches for informal continuous speech. In Interspeech (pp. 633–636). Citeseer. Szöke, I., Schwarz, P., Matejka, P., et al. (2005) Comparison of keyword spotting approaches for informal continuous speech. In Interspeech (pp. 633–636). Citeseer.
Zurück zum Zitat Tabibian, S., Akbari, A., & Nasersharif, B. (2013). Keyword spotting using an evolutionary-based classifier and discriminative features. Engineering Applications of Artificial Intelligence, 26, 1660–1670.CrossRef Tabibian, S., Akbari, A., & Nasersharif, B. (2013). Keyword spotting using an evolutionary-based classifier and discriminative features. Engineering Applications of Artificial Intelligence, 26, 1660–1670.CrossRef
Zurück zum Zitat Tabibian, S., Akbari, A., & Nasersharif, B. (2014). Extension of a kernel-based classifier for discriminative spoken keyword spotting. Neural processing letters, 39, 195–218.CrossRef Tabibian, S., Akbari, A., & Nasersharif, B. (2014). Extension of a kernel-based classifier for discriminative spoken keyword spotting. Neural processing letters, 39, 195–218.CrossRef
Zurück zum Zitat Tabibian, S., Akbari, A., & Nasersharif, B. (2015). Speech enhancement using a wavelet thresholding method based on symmetric Kullback–Leibler divergence. Signal Processing, 106, 184–197.CrossRef Tabibian, S., Akbari, A., & Nasersharif, B. (2015). Speech enhancement using a wavelet thresholding method based on symmetric Kullback–Leibler divergence. Signal Processing, 106, 184–197.CrossRef
Zurück zum Zitat Tabibian, S., Akbari, A., & Nasersharif, B. (2016). A fast hierarchical search algorithm for discriminative keyword spotting. Information Sciences, 336, 45–59.CrossRef Tabibian, S., Akbari, A., & Nasersharif, B. (2016). A fast hierarchical search algorithm for discriminative keyword spotting. Information Sciences, 336, 45–59.CrossRef
Zurück zum Zitat Tranter, S., Yu, K., Everinann, G., et al. (2004) Generating and evaluating segmentations for automatic speech recognition of conversational telephone speech. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004. Proceedings.(ICASSP’04) (p. I-753.). IEEE, Tranter, S., Yu, K., Everinann, G., et al. (2004) Generating and evaluating segmentations for automatic speech recognition of conversational telephone speech. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004. Proceedings.(ICASSP’04) (p. I-753.). IEEE,
Zurück zum Zitat Vapnik, V. N., & Vapnik, V. (1998) Statistical learning theory. New York: Wiley.MATH Vapnik, V. N., & Vapnik, V. (1998) Statistical learning theory. New York: Wiley.MATH
Zurück zum Zitat Vaseghi, S. V. (2008) Advanced digital signal processing and noise reduction. Hoboken: Wiley.CrossRef Vaseghi, S. V. (2008) Advanced digital signal processing and noise reduction. Hoboken: Wiley.CrossRef
Zurück zum Zitat Vergyri, D., Lamel, L., & Gauvain, J.-L. (2010) Automatic speech recognition of multiple accented English data (pp. 1652–1655). In INTERSPEECH. Vergyri, D., Lamel, L., & Gauvain, J.-L. (2010) Automatic speech recognition of multiple accented English data (pp. 1652–1655). In INTERSPEECH.
Zurück zum Zitat Viikki, O., Bye, D., & Laurila, K. (1998) A recursive feature vector normalization approach for robust speech recognition in noise. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 733–736). IEEE. Viikki, O., Bye, D., & Laurila, K. (1998) A recursive feature vector normalization approach for robust speech recognition in noise. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 733–736). IEEE.
Zurück zum Zitat Wang, R., Shen, Z., Zhang, H., & Leung, C. (2015) Follow me: A personal robotic companion system for the elderly. International Journal of Information Technology (IJIT), 21(1). Wang, R., Shen, Z., Zhang, H., & Leung, C. (2015) Follow me: A personal robotic companion system for the elderly. International Journal of Information Technology (IJIT), 21(1).
Zurück zum Zitat Weinstein, C. J. (1995) Military and government applications of human-machine communication by voice. Proceedings of the National Academy of Sciences 92:10011–10016. Weinstein, C. J. (1995) Military and government applications of human-machine communication by voice. Proceedings of the National Academy of Sciences 92:10011–10016.
Zurück zum Zitat Yoshizawa, S., Hayasaka, N., Wada, N., et al. (2004) Cepstral gain normalization for noise robust speech recognition. In Proceedings.(ICASSP’04). IEEE International Conference on Acoustics, Speech, and Signal Processing (vol. 201, p. I-209-212). IEEE. Yoshizawa, S., Hayasaka, N., Wada, N., et al. (2004) Cepstral gain normalization for noise robust speech recognition. In Proceedings.(ICASSP’04). IEEE International Conference on Acoustics, Speech, and Signal Processing (vol. 201, p. I-209-212). IEEE.
Zurück zum Zitat Young, S. J., Woodland, P., & Byrne, W. (1993) HTK: Hidden Markov Model Toolkit V1. 5. Washington D.C.: Cambridge University Engineering Department Speech Group and Entropic Research Laboratories Inc. Young, S. J., Woodland, P., & Byrne, W. (1993) HTK: Hidden Markov Model Toolkit V1. 5. Washington D.C.: Cambridge University Engineering Department Speech Group and Entropic Research Laboratories Inc.
Metadaten
Titel
A voice command detection system for aerospace applications
verfasst von
Shima Tabibian
Publikationsdatum
26.10.2017
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 4/2017
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-017-9467-4

Weitere Artikel der Ausgabe 4/2017

International Journal of Speech Technology 4/2017 Zur Ausgabe

Neuer Inhalt