nach oben

International Journal of Speech Technology

Erschienen in:

26.10.2017

A voice command detection system for aerospace applications

verfasst von: Shima Tabibian

Erschienen in: International Journal of Speech Technology | Ausgabe 4/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Nowadays, according to ever-increasing volumes of audio content, audio processing is a vital need. In the aerospace field, voice commands could be used instead of data commands in order to speed up the command transmission, help crewmembers to complete their tasks by allowing hands-free control of supplemental equipment and as a redundant system for increasing the reliability of command transmission. In this paper, a voice command detection (VCD) framework is proposed for aerospace applications, which decodes the voice commands to comprehensible and executable commands, in an acceptable speed with a low false alarm rate. The framework is mainly based on a keyword spotting method, which extracts some pre-defined target keywords from the input voice commands. The mentioned keywords are input arguments to the proposed rule-based language model (LM). The rule-based LM decodes the voice commands based on the input keywords and their locations. Two keyword spotters are trained and used in the VCD system. The phone-based keyword spotter is trained on TIMIT database. Then, speaker adaptation methods are exploited to modify the parameters of the trained models using non-native speaker utterances. The word-based keyword spotter is trained on a database prepared and specialized for aerospace applications. The experimental results show that the word-based VCD system decodes the voice commands with true detection rate equal to 88% and false alarm rate equal to 12%, in average. Additionally, using speaker adaptation methods in the phone-based VCD system improves the true detection and false alarm rates about 21% and 21%, respectively.

Vorheriger Artikel Single-channel speech separation using combined EMD and speech-specific information

Nächster Artikel Deep neural network training for whispered speech recognition using small databases and generative model sampling

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Ahmed, A., Ahmed, T., Ullah, M., et al. (2012) Controlling and securing a digital home using multiple sensor based perception system integrated with mobile and voice technology. arXiv preprint arXiv:1209.5420.

Bahl, L., Brown, P., De Souza, P., et al. (1986) Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP’86 (pp. 49–52). IEEE.

Benayed, Y., Fohr, D., Haton, J. P., et al. (2003a) Improving the performance of a keyword spotting system by using support vector machines. In 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, 2003. ASRU’03. (pp. 145–149). IEEE.

Butt, M., Khanam, M., Khiyal, M., Khan, A., et al. (2011) Controlling home appliances remotely through voice command. (IJACSA) International Journal of Advanced Computer Science and Applications, Special Issue on Wireless and Mobile Networks, 35–39. doi:10.14569/SpecialIssue.2011.010206.

Chen, C.-P., Bilmes, J. A., & Kirchhoff, K. (2002) Low-resource noise-robust feature post-processing on AURORA 2.0. In Seventh International Conference on Spoken Language Processing.

Chen, G., Parada, C., & Heigold, G. (2014) Small-footprint keyword spotting using deep neural networks. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4087–4091). IEEE.

Cornu, E., Destrez, N., Dufaux, A., et al. (2002) An ultra low power, ultra miniature voice command system based on hidden markov models. In 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. IV-3800–IV-3803). IEEE.

Fernández, S., Graves, A., & Schmidhuber, J. (2007) An application of recurrent neural networks to discriminative keyword spotting. In International Conference on Artificial Neural Networks (pp. 220–229). Berlin: Springer.

Fezari, M., Boumaza, M. S., & Aldahoud, A. (2012) Voice command system based on pipelining classifiers GMM-HMM. In 2012 International Conference on Information Technology and e-Services (ICITeS) (pp. 1–6). IEEE.

Fezari, M., & Bousbia-Salah, M. (2006) A voice command system for autonomous robots guidance. In 9th IEEE International Workshop on Advanced Motion Control (pp. 261–265.). IEEE.

Firdaus, A. M., Yusof, R. M., Saharul, A., et al. (2015) Controlling an electric car starter system through voice. International Journal of Science & Technology Research, 4(4), 5–9.

Gupta, A., Patel, N., & Khan, S. (2014) Automatic speech recognition technique for voice command. In 2014 International Conference on Science Engineering and Management Research (ICSEMR) (pp. 1–5). IEEE.

Hoque, E., Dickerson, R. F., & Stankovic, J. A. (2014) Vocal-diary: A voice command based ground truth collection system for activity recognition. In Proceedings of the Wireless Health 2014 on National Institutes of Health (pp. 1–6). ACM.

Juang, B.-H., & Katagiri, S. (1992). Discriminative learning for minimum error classification (pattern recognition). IEEE Transactions on signal processing, 40, 3043–3054.CrossRefMATH

Keshet, J., Grangier, D., & Bengio, S. (2009). Discriminative keyword spotting. Speech Communication, 51, 317–329.CrossRef

Lamel, L. F., Kassel, R. H., & Seneff, S. (1989) Speech database development: Design and analysis of the acoustic-phonetic corpus. In Speech Input/Output Assessment and Speech Databases.

Li, J., Deng, L., Gong, Y., et al. (2014) An overview of noise-robust automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22, 745–777.CrossRef

Liu, W. K., & Fung, P. N. (2000) MLLR-based accent model adaptation without accented data. In Sixth International Conference on Spoken Language Processing (ICSLP 2000), Beijing.

Manikandan, M., Araghuram, S. D., Vignesh, S., et al. (2015). Device control using voice recognition in wireless smart home system. International Journal of Innovative Research in Computer and Communication Engineering, 3, 104–108.

Manos, A. S., & Zue, V. W. (1997) A segment-based wordspotter using phonetic filler models. In 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1997. ICASSP-97 (pp. 899–902). IEEE.

Morris, R. B., Whitmore, M., & Adam, S. C. (1993). How well does voice interaction work in space? IEEE Aerospace and Electronic Systems Magazine, 8, 26–31.CrossRef

Mporas, I., Ganchev, T., Siafarikas, M., et al. (2007). Comparison of speech features on the speech recognition task. Journal of Computer Science, 3, 608–616.CrossRef

Ngo, K., Spriet, A., Moonen, M., et al. (2012). A combined multi-channel Wiener filter-based noise reduction and dynamic range compression in hearing aids. Signal Processing, 92, 417–426.CrossRef

Özkartal, S. G. (2015). Development of a system for human language commands and control for a quadcopter application. Journal of Management Research, 7, 1.CrossRef

Povey, D., & Woodland, P. C. (2002) Minimum phone error and I-smoothing for improved discriminative training. In 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. I-105–I-108). IEEE.

Principi, E., Squartini, S., Bonfigli, R., et al. (2015). An integrated system for voice command recognition and emergency detection based on audio signals. Expert Systems with Applications, 42, 5668–5683.CrossRef

Rohlicek, J. R., Russell, W., Roukos, S., et al. (1989) Continuous hidden Markov modeling for speaker-independent word spotting. In 1989 International Conference on Acoustics, Speech, and Signal Processing, 1989. ICASSP-89 (pp. 627–630). IEEE.

Shokri, A., Tabibian, S., Akbari, A., et al. (2011) A robust keyword spotting system for Persian conversational telephone speech using feature and score normalization and ARMA filter. In 2011 IEEE GCC Conference and Exhibition (GCC) (pp. 497–500). IEEE.

Szöke, I., Schwarz, P., Matejka, P., et al. (2005) Comparison of keyword spotting approaches for informal continuous speech. In Interspeech (pp. 633–636). Citeseer.

Tabibian, S., Akbari, A., & Nasersharif, B. (2013). Keyword spotting using an evolutionary-based classifier and discriminative features. Engineering Applications of Artificial Intelligence, 26, 1660–1670.CrossRef

Tabibian, S., Akbari, A., & Nasersharif, B. (2014). Extension of a kernel-based classifier for discriminative spoken keyword spotting. Neural processing letters, 39, 195–218.CrossRef

Tabibian, S., Akbari, A., & Nasersharif, B. (2015). Speech enhancement using a wavelet thresholding method based on symmetric Kullback–Leibler divergence. Signal Processing, 106, 184–197.CrossRef

Tabibian, S., Akbari, A., & Nasersharif, B. (2016). A fast hierarchical search algorithm for discriminative keyword spotting. Information Sciences, 336, 45–59.CrossRef

Tranter, S., Yu, K., Everinann, G., et al. (2004) Generating and evaluating segmentations for automatic speech recognition of conversational telephone speech. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004. Proceedings.(ICASSP’04) (p. I-753.). IEEE,

Vapnik, V. N., & Vapnik, V. (1998) Statistical learning theory. New York: Wiley.MATH

Vaseghi, S. V. (2008) Advanced digital signal processing and noise reduction. Hoboken: Wiley.CrossRef

Vergyri, D., Lamel, L., & Gauvain, J.-L. (2010) Automatic speech recognition of multiple accented English data (pp. 1652–1655). In INTERSPEECH.

Viikki, O., Bye, D., & Laurila, K. (1998) A recursive feature vector normalization approach for robust speech recognition in noise. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 733–736). IEEE.

Wang, R., Shen, Z., Zhang, H., & Leung, C. (2015) Follow me: A personal robotic companion system for the elderly. International Journal of Information Technology (IJIT), 21(1).

Watile, Y., Ghotkar, P., & Rohankar, B. (2015) Computer control with voice command using matlab. Computer, doi:10.17148/IJIREEICE.2015.3613.

Weinstein, C. J. (1995) Military and government applications of human-machine communication by voice. Proceedings of the National Academy of Sciences 92:10011–10016.

Yoshizawa, S., Hayasaka, N., Wada, N., et al. (2004) Cepstral gain normalization for noise robust speech recognition. In Proceedings.(ICASSP’04). IEEE International Conference on Acoustics, Speech, and Signal Processing (vol. 201, p. I-209-212). IEEE.

Young, S. J., Woodland, P., & Byrne, W. (1993) HTK: Hidden Markov Model Toolkit V1. 5. Washington D.C.: Cambridge University Engineering Department Speech Group and Entropic Research Laboratories Inc.

Titel: A voice command detection system for aerospace applications
verfasst von: Shima Tabibian
Publikationsdatum: 26.10.2017
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 4/2017
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-017-9467-4

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Jonas Klose/© Pine Valley Capital GmbH, Carina Kießling von der Strategieberatung Roland Berger/© Monika Walther Fotografie | ATZ, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 4/2017

Performance enhancement of speaker identification systems using speech encryption and cancelable features

Dravidian language classification from speech signal using spectral and prosodic features

Single-channel speech separation using combined EMD and speech-specific information

Mono- and multi-lingual depression prediction based on speech processing

Phoneme class based feature adaptation for mismatch acoustic modeling and recognition of distant noisy speech

On the application of quantum clustering on speech data

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.