Skip to main content
Top

2020 | OriginalPaper | Chapter

Black-Box Attacks via the Speech Interface Using Linguistically Crafted Input

Authors : Mary K. Bispham, Alastair Janse van Rensburg, Ioannis Agrafiotis, Michael Goldsmith

Published in: Information Systems Security and Privacy

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper presents the results of experiments demonstrating novel black-box attacks via the speech interface. We demonstrate two types of attack that use linguistically crafted adversarial input to target vulnerabilities in the handling of speech input by a speech interface. The first attack demonstrates the use of nonsensical word sounds to gain covert access to voice-controlled systems. This attack exploits vulnerabilities at the speech recognition stage of handling of speech input. The second attack demonstrates the use of crafted utterances that are misinterpreted by a target system as a valid voice command. This attack exploits vulnerabilities at the natural language understanding stage of handling of speech input.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
See The Telegraph, 1st May 2014, “Infants taught to read ‘nonsense words’ in English lessons”.
 
6
See Google AI blog, 11th August 2015, ‘The neural networks behind Google Voice transcription’, https://​ai.​googleblog.​com/​2015/​08/​the-neural-networks-behind-google-voice.​html.
 
8
Our approach was inspired by an educational game in which a set of nonsense words is generated by spinning lettered wooden cubes - see https://​rainydaymum.​co.​uk/​spin-a-word-real-vs-nonsense-words/​.
 
14
See for example The Guardian, “Laurel or Yanny explained: why do some people hear a different word?”, 17th May 2018, https://​www.​theguardian.​com/​technology/​2018/​may/​16/​yanny-or-laurel-sound-illusion-sets-off-ear-splitting-arguments.
 
18
This was a template for a banking assistant bot made available by IBM at https://​github.​com/​IBM/​watson-banking-chatbot.
 
22
See for example phys.org, 20th June 2018, ‘Banking by smart speaker arrives, but security issues exist’, https://​phys.​org/​news/​2018-06-banking-smart-speaker-issues.​html.
 
23
See BBC News, 24th May 2018, “Amazon Alexa heard and sent private chat”, https://​www.​bbc.​co.​uk/​news/​technology-44248122.
 
Literature
1.
go back to reference Carlini, N., et al.: Hidden voice commands. In 25th USENIX Security Symposium (USENIX Security 2016), Austin, TX (2016) Carlini, N., et al.: Hidden voice commands. In 25th USENIX Security Symposium (USENIX Security 2016), Austin, TX (2016)
2.
go back to reference Zhang, G., Yan, C., Ji, X., Zhang, T., Zhang, T., Xu, W. (2017). DolphinAttack: inaudible voice commands. arXiv preprint arXiv:1708.09537 Zhang, G., Yan, C., Ji, X., Zhang, T., Zhang, T., Xu, W. (2017). DolphinAttack: inaudible voice commands. arXiv preprint arXiv:​1708.​09537
3.
go back to reference Bispham, M. K., Agrafiotis, I., Goldsmith, M.: Nonsense attacks on Google Assistant and missense attacks on Amazon Alexa. In: Proceedings of International Conference on Information Systems Security and Privacy (2019) Bispham, M. K., Agrafiotis, I., Goldsmith, M.: Nonsense attacks on Google Assistant and missense attacks on Amazon Alexa. In: Proceedings of International Conference on Information Systems Security and Privacy (2019)
4.
go back to reference Bispham, M.K., Agrafiotis, I., Goldsmith, M.: A taxonomy of attacks via the speech interface. In: Proceedings of Third International Conference on Cyber-Technologies and Cyber-Systems (2018) Bispham, M.K., Agrafiotis, I., Goldsmith, M.: A taxonomy of attacks via the speech interface. In: Proceedings of Third International Conference on Cyber-Technologies and Cyber-Systems (2018)
5.
go back to reference Papernot, N., McDaniel, P., Swami, A., Harang, R.: Crafting adversarial input sequences for recurrent neural networks. In: Military Communications Conference, MILCOM 2016–2016 IEEE, pp. 49–54. IEEE (2016) Papernot, N., McDaniel, P., Swami, A., Harang, R.: Crafting adversarial input sequences for recurrent neural networks. In: Military Communications Conference, MILCOM 2016–2016 IEEE, pp. 49–54. IEEE (2016)
6.
go back to reference Nowak, M.A., Krakauer, D.C.: The evolution of language. Proc. Nat. Acad. Sci. 96(14), 8028–8033 (1999)CrossRef Nowak, M.A., Krakauer, D.C.: The evolution of language. Proc. Nat. Acad. Sci. 96(14), 8028–8033 (1999)CrossRef
7.
go back to reference McCurdy, N., Srikumar, V., Meyer, M.: RhymeDesign: a tool for analyzing sonic devices in poetry. In: Proceedings of the Fourth Workshop on Computational Linguistics for Literature, pp. 12–22 (2015) McCurdy, N., Srikumar, V., Meyer, M.: RhymeDesign: a tool for analyzing sonic devices in poetry. In: Proceedings of the Fourth Workshop on Computational Linguistics for Literature, pp. 12–22 (2015)
9.
go back to reference Hazen, T.J., Bazzi, I.: A comparison and combination of methods for OOV word detection and word confidence scoring. In: Proceedings of 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, p. 1. 397–400 (2001) Hazen, T.J., Bazzi, I.: A comparison and combination of methods for OOV word detection and word confidence scoring. In: Proceedings of 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, p. 1. 397–400 (2001)
10.
go back to reference Raju, A., Panchapagesan, S., Liu, X., Mandal, A., Strom, N.: Data augmentation for robust keyword spotting under playback interference (2018). Proceedings of arXiv preprint arXiv:1808.00563 Raju, A., Panchapagesan, S., Liu, X., Mandal, A., Strom, N.: Data augmentation for robust keyword spotting under playback interference (2018). Proceedings of arXiv preprint arXiv:​1808.​00563
11.
go back to reference Roberts, A.C., Wetterlin, A., Lahiri, A.: Aligning mispronounced words to meaning: evidence from ERP and reaction time studies. Mental Lexicon 8(2), 140–163 (2013)CrossRef Roberts, A.C., Wetterlin, A., Lahiri, A.: Aligning mispronounced words to meaning: evidence from ERP and reaction time studies. Mental Lexicon 8(2), 140–163 (2013)CrossRef
12.
go back to reference Lippmann, R.P., et al.: Speech recognition by machines and humans. Speech Commun. 22(1), 1–15 (1997)CrossRef Lippmann, R.P., et al.: Speech recognition by machines and humans. Speech Commun. 22(1), 1–15 (1997)CrossRef
13.
go back to reference Scharenborg, O., Cooke, M.: Comparing human and machine recognition performance on a VCV corpus (2008) Scharenborg, O., Cooke, M.: Comparing human and machine recognition performance on a VCV corpus (2008)
14.
go back to reference Bailey, T.M., Hahn, U.: Phoneme similarity and confusability. J. Memory Lang. 52(3), 339–362 (2005)CrossRef Bailey, T.M., Hahn, U.: Phoneme similarity and confusability. J. Memory Lang. 52(3), 339–362 (2005)CrossRef
15.
go back to reference Meutzner, H., Gupta, S., and Kolossa, D.: Constructing secure audio captchas by exploiting differences between humans and machines. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 2335–2338. ACM (2015) Meutzner, H., Gupta, S., and Kolossa, D.: Constructing secure audio captchas by exploiting differences between humans and machines. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 2335–2338. ACM (2015)
17.
18.
go back to reference Kuleshov, V., Thakoor, S., Lau, T., Ermon, S.: Adversarial Examples for Natural Language Classification Problems. OpenReview submission OpenReview:r1QZ3zbAZ (2018) Kuleshov, V., Thakoor, S., Lau, T., Ermon, S.: Adversarial Examples for Natural Language Classification Problems. OpenReview submission OpenReview:r1QZ3zbAZ (2018)
19.
go back to reference Mesnil, G., et al.: Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 23(3), 530–539 (2015)CrossRef Mesnil, G., et al.: Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 23(3), 530–539 (2015)CrossRef
20.
go back to reference Cambria, E., White, B.: Jumping NLP curves: a review of natural language processing research. IEEE Comput. Intell. Mag. 9(2), 48–57 (2014)CrossRef Cambria, E., White, B.: Jumping NLP curves: a review of natural language processing research. IEEE Comput. Intell. Mag. 9(2), 48–57 (2014)CrossRef
21.
go back to reference Tur, G., Deoras, A., Hakkani-Tür, D.: Detecting out-of-domain utterances addressed to a virtual personal assistant. In: Proceeding of Fifteenth Annual Conference of the International Speech Communication Association (2014) Tur, G., Deoras, A., Hakkani-Tür, D.: Detecting out-of-domain utterances addressed to a virtual personal assistant. In: Proceeding of Fifteenth Annual Conference of the International Speech Communication Association (2014)
22.
go back to reference Stolk, A., Verhagen, L., Toni, I.: Conceptual alignment: how brains achieve mutual understanding. Trends Cogn. Sci. 20(3), 180–191 (2016)CrossRef Stolk, A., Verhagen, L., Toni, I.: Conceptual alignment: how brains achieve mutual understanding. Trends Cogn. Sci. 20(3), 180–191 (2016)CrossRef
23.
go back to reference Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)CrossRef Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)CrossRef
24.
go back to reference Kumar, A., Gupta, A., Chan, J., Tucker, S., Hoffmeister, B., Dreyer, M.: Just ASK building an architecture for extensible self-service spoken language understanding (2017). arXiv preprint arXiv:1711.00549 Kumar, A., Gupta, A., Chan, J., Tucker, S., Hoffmeister, B., Dreyer, M.: Just ASK building an architecture for extensible self-service spoken language understanding (2017). arXiv preprint arXiv:​1711.​00549
25.
go back to reference Kollar, T., et al.: The Alexa meaning representation language. In: Proceedings of Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 3, pp. 177–184 (2018) Kollar, T., et al.: The Alexa meaning representation language. In: Proceedings of Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 3, pp. 177–184 (2018)
26.
go back to reference Bocklisch, T., Faulkner, J., Pawlowski, N., Nichol, A.: Rasa: open source language understanding and dialogue management (2017). arXiv preprint arXiv:1712.05181 Bocklisch, T., Faulkner, J., Pawlowski, N., Nichol, A.: Rasa: open source language understanding and dialogue management (2017). arXiv preprint arXiv:​1712.​05181
27.
go back to reference Qi, Y., Xiao, J.: Fintech: AI powers financial services to improve people’s lives. Commun. ACM 61(11), 65–69 (2018)CrossRef Qi, Y., Xiao, J.: Fintech: AI powers financial services to improve people’s lives. Commun. ACM 61(11), 65–69 (2018)CrossRef
28.
go back to reference King, B.: Bank 4.0: Banking Everywhere, Never at a Bank. Marshall Cavendish Business, Singapore (2018) King, B.: Bank 4.0: Banking Everywhere, Never at a Bank. Marshall Cavendish Business, Singapore (2018)
Metadata
Title
Black-Box Attacks via the Speech Interface Using Linguistically Crafted Input
Authors
Mary K. Bispham
Alastair Janse van Rensburg
Ioannis Agrafiotis
Michael Goldsmith
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-49443-8_5

Premium Partner