Skip to main content
Top

2021 | OriginalPaper | Chapter

Comparison of Speech Recognition and Natural Language Understanding Frameworks for Detection of Dangers with Smart Wearables

Authors : Dariusz Mrozek, Szymon Kwaśnicki, Vaidy Sunderam, Bożena Małysiak-Mrozek, Krzysztof Tokarz, Stanisław Kozielski

Published in: Computational Science – ICCS 2021

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Wearable IoT devices that can register and transmit human voice can be invaluable in personal situations, such as summoning assistance in emergency healthcare situations. Such applications would benefit greatly from automated voice analysis to detect and classify voice signals. In this paper, we compare selected Speech Recognition (SR) and Natural Language Understanding (NLU) frameworks for Cloud-based detection of voice-based assistance calls. We experimentally test several services for speech-to-text transcription and intention recognition available on selected large Cloud platforms. Finally, we evaluate the influence of the manner of speaking and ambient noise on the quality of recognition of emergency calls. Our results show that many services can correctly translate voice to text and provide a correct interpretation of caller intent. Still, speech artifacts (tone, accent, diction), which can differ even for each individual in various situations, significantly influences the performance of speech recognition.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
VoxForge open speech dataset with transcribed speech: http://​www.​voxforge.​org/​home/​downloads/​speech/​english.
 
Literature
1.
go back to reference World Health Organization: Global health and aging. Tech. Rep. 11–7737, NIH Publication (2011) World Health Organization: Global health and aging. Tech. Rep. 11–7737, NIH Publication (2011)
2.
go back to reference Ammari, T., Kaye, J., Tsai, J.Y., Bentley, F.: Music, search, and IoT: how people (really) use voice assistants. ACM Trans. Comput.-Hum. Interact. 26(3), 17 (2019) Ammari, T., Kaye, J., Tsai, J.Y., Bentley, F.: Music, search, and IoT: how people (really) use voice assistants. ACM Trans. Comput.-Hum. Interact. 26(3), 17 (2019)
3.
go back to reference Austerjost, J., et al.: Introducing a virtual assistant to the lab: a voice user interface for the intuitive control of laboratory instruments. SLAS Technol. Translating Life Sci. Innov. 23(5), 476–482 (2018)CrossRef Austerjost, J., et al.: Introducing a virtual assistant to the lab: a voice user interface for the intuitive control of laboratory instruments. SLAS Technol. Translating Life Sci. Innov. 23(5), 476–482 (2018)CrossRef
4.
go back to reference Bhosale, S., Sheikh, I., Dumpala, S.H., Kopparapu, S.K.: Transfer learning for low resource spoken language understanding without speech-to-text. In: 2019 IEEE Bombay Section Signature Conference (IBSSC), pp. 1–5 (2019) Bhosale, S., Sheikh, I., Dumpala, S.H., Kopparapu, S.K.: Transfer learning for low resource spoken language understanding without speech-to-text. In: 2019 IEEE Bombay Section Signature Conference (IBSSC), pp. 1–5 (2019)
5.
go back to reference Braines, D., O’Leary, N., Thomas, A., Harborne, D., Preece, A.D., Webberley, W.M.: Conversational homes: a uniform natural language approach for collaboration among humans and devices. Int. J. Intell. Syst. 10(3), 223–237 (2017) Braines, D., O’Leary, N., Thomas, A., Harborne, D., Preece, A.D., Webberley, W.M.: Conversational homes: a uniform natural language approach for collaboration among humans and devices. Int. J. Intell. Syst. 10(3), 223–237 (2017)
6.
go back to reference Braun, D., Hernandez Mendez, A., Matthes, F., Langen, M.: Evaluating natural language understanding services for conversational question answering systems. In: Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pp. 174–185. Association for Computational Linguistics, Saarbrücken, Germany (2017) Braun, D., Hernandez Mendez, A., Matthes, F., Langen, M.: Evaluating natural language understanding services for conversational question answering systems. In: Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pp. 174–185. Association for Computational Linguistics, Saarbrücken, Germany (2017)
7.
go back to reference Coucke, A., et al.: Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. ArXiv abs/1805.10190 (2018) Coucke, A., et al.: Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. ArXiv abs/1805.10190 (2018)
8.
go back to reference Cupek, R., et al.: Autonomous guided vehicles for smart industries - the state-of-the-art and research challenges. In: Krzhizhanovskaya, V.V., et al. (eds.) Computational Science - ICCS 2020, pp. 330–343. Springer International Publishing, Cham (2020)CrossRef Cupek, R., et al.: Autonomous guided vehicles for smart industries - the state-of-the-art and research challenges. In: Krzhizhanovskaya, V.V., et al. (eds.) Computational Science - ICCS 2020, pp. 330–343. Springer International Publishing, Cham (2020)CrossRef
9.
go back to reference de Velasco, M., Justo, R., Antón, J., Carrilero, M., Torres, M.I.: Emotion detection from speech and text. Proc. IberSPEECH 2018, 68–71 (2018)CrossRef de Velasco, M., Justo, R., Antón, J., Carrilero, M., Torres, M.I.: Emotion detection from speech and text. Proc. IberSPEECH 2018, 68–71 (2018)CrossRef
10.
go back to reference Deng, L., et al.: Recent advances in deep learning for speech research at microsoft. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8604–8608 (2013) Deng, L., et al.: Recent advances in deep learning for speech research at microsoft. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8604–8608 (2013)
12.
go back to reference Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376. ICML 2006. Association for Computing Machinery, New York, NY, USA (2006). https://doi.org/10.1145/1143844.1143891 Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376. ICML 2006. Association for Computing Machinery, New York, NY, USA (2006). https://​doi.​org/​10.​1145/​1143844.​1143891
13.
go back to reference Grzechca, D., Ziebinski, A., Rybka, P.: Enhanced reliability of ADAS sensors based on the observation of the power supply current and neural network application. In: Nguyen, N.T., Papadopoulos, G.A., Jedrzejowicz, P., Trawiński, B., Vossen, G. (eds.) Computational Collective Intelligence, pp. 215–226. Springer International Publishing, Cham (2017)CrossRef Grzechca, D., Ziebinski, A., Rybka, P.: Enhanced reliability of ADAS sensors based on the observation of the power supply current and neural network application. In: Nguyen, N.T., Papadopoulos, G.A., Jedrzejowicz, P., Trawiński, B., Vossen, G. (eds.) Computational Collective Intelligence, pp. 215–226. Springer International Publishing, Cham (2017)CrossRef
14.
go back to reference Kishore Kodali, R., Rajanarayanan, S.C., Boppana, L., Sharma, S., Kumar, A.: Low cost smart home automation system using smart phone. In: 2019 IEEE R10 Humanitarian Technology Conference (R10-HTC)(47129), pp. 120–125 (2019) Kishore Kodali, R., Rajanarayanan, S.C., Boppana, L., Sharma, S., Kumar, A.: Low cost smart home automation system using smart phone. In: 2019 IEEE R10 Humanitarian Technology Conference (R10-HTC)(47129), pp. 120–125 (2019)
15.
go back to reference Klakow, D., Peters, J.: Testing the correlation of word error rate and perplexity. Speech Commun. 38(1), 19–28 (2002)CrossRef Klakow, D., Peters, J.: Testing the correlation of word error rate and perplexity. Speech Commun. 38(1), 19–28 (2002)CrossRef
16.
go back to reference Lago, A.S., Dias, J.P., Ferreira, H.S.: Conversational interface for managing non-trivial internet-of-things systems. In: Krzhizhanovskaya, V.V., et al. (eds.) Computational Science - ICCS 2020, pp. 384–397. Springer International Publishing, Cham (2020)CrossRef Lago, A.S., Dias, J.P., Ferreira, H.S.: Conversational interface for managing non-trivial internet-of-things systems. In: Krzhizhanovskaya, V.V., et al. (eds.) Computational Science - ICCS 2020, pp. 384–397. Springer International Publishing, Cham (2020)CrossRef
18.
go back to reference Lero, R.D., Exton, C., Le Gear, A.: Communications using a speech-to-text-to-speech pipeline. In: 2019 International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), pp. 1–6 (2019) Lero, R.D., Exton, C., Le Gear, A.: Communications using a speech-to-text-to-speech pipeline. In: 2019 International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), pp. 1–6 (2019)
19.
go back to reference López, G., Quesada, L., Guerrero, L.A.: Alexa vs. Siri vs. Cortana vs. Google assistant: a comparison of speech-based natural user interfaces. In: Nunes, I.L. (ed.) Advances in Human Factors and Systems Interaction, pp. 241–250. Springer International Publishing, Cham (2018) López, G., Quesada, L., Guerrero, L.A.: Alexa vs. Siri vs. Cortana vs. Google assistant: a comparison of speech-based natural user interfaces. In: Nunes, I.L. (ed.) Advances in Human Factors and Systems Interaction, pp. 241–250. Springer International Publishing, Cham (2018)
21.
go back to reference Mishakova, A., Portet, F., Desot, T., Vacher, M.: Learning natural language understanding systems from unaligned labels for voice command in smart homes. In: 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), pp. 832–837 (2019) Mishakova, A., Portet, F., Desot, T., Vacher, M.: Learning natural language understanding systems from unaligned labels for voice command in smart homes. In: 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), pp. 832–837 (2019)
23.
go back to reference Mrozek, D., Koczur, A., Małysiak-Mrozek, B.: Fall detection in older adults with mobile IoT devices and machine learning in the cloud and on the edge. Inf. Sci. 537, 132–147 (2020)CrossRef Mrozek, D., Koczur, A., Małysiak-Mrozek, B.: Fall detection in older adults with mobile IoT devices and machine learning in the cloud and on the edge. Inf. Sci. 537, 132–147 (2020)CrossRef
24.
go back to reference Mrozek, D., Milik, M., Małysiak-Mrozek, B., Tokarz, K., Duszenko, A., Kozielski, S.: Fuzzy intelligence in monitoring older adults with wearables. In: Krzhizhanovskaya, V.V., et al. (eds.) Computational Science - ICCS 2020, pp. 288–301. Springer International Publishing, Cham (2020)CrossRef Mrozek, D., Milik, M., Małysiak-Mrozek, B., Tokarz, K., Duszenko, A., Kozielski, S.: Fuzzy intelligence in monitoring older adults with wearables. In: Krzhizhanovskaya, V.V., et al. (eds.) Computational Science - ICCS 2020, pp. 288–301. Springer International Publishing, Cham (2020)CrossRef
25.
go back to reference Schwitter, R.: Controlled natural languages for knowledge representation. In: Coling 2010: Posters, vol. 2, pp. 1113–1121 (2010) Schwitter, R.: Controlled natural languages for knowledge representation. In: Coling 2010: Posters, vol. 2, pp. 1113–1121 (2010)
26.
go back to reference Sovariova Soosova, M.: Determinants of quality of life in the elderly. Central Euro. J. Nurs. Midwifery 7(3), 484–493 (2016)CrossRef Sovariova Soosova, M.: Determinants of quality of life in the elderly. Central Euro. J. Nurs. Midwifery 7(3), 484–493 (2016)CrossRef
27.
go back to reference Vyas, M.: A Gaussian mixture model based speech recognition system using Matlab. Sign. Image Process. 4(4), 109–118 (2013) Vyas, M.: A Gaussian mixture model based speech recognition system using Matlab. Sign. Image Process. 4(4), 109–118 (2013)
28.
go back to reference Wan, J., et al.: Wearable IoT enabled real-time health monitoring system. EURASIP J. Wirel. Commun. Netw. (1), 298 (2018) Wan, J., et al.: Wearable IoT enabled real-time health monitoring system. EURASIP J. Wirel. Commun. Netw. (1), 298 (2018)
Metadata
Title
Comparison of Speech Recognition and Natural Language Understanding Frameworks for Detection of Dangers with Smart Wearables
Authors
Dariusz Mrozek
Szymon Kwaśnicki
Vaidy Sunderam
Bożena Małysiak-Mrozek
Krzysztof Tokarz
Stanisław Kozielski
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-77970-2_36

Premium Partner