Skip to main content
Top

2019 | OriginalPaper | Chapter

Using Audio Transformations to Improve Comprehension in Voice Question Answering

Authors : Aleksandr Chuklin, Aliaksei Severyn, Johanne R. Trippas, Enrique Alfonseca, Hanna Silen, Damiano Spina

Published in: Experimental IR Meets Multilinguality, Multimodality, and Interaction

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Many popular form factors of digital assistants—such as Amazon Echo or Google Home—enable users to converse with speech-based systems. The lack of screens presents unique challenges. To satisfy users’ information needs, the presentation of answers has to be optimized for voice-only interactions. We evaluate the usefulness of audio transformations (i.e., prosodic modifications) for voice-only question answering. We introduce a crowdsourcing setup evaluating the quality of our proposed modifications along multiple dimensions corresponding to the informativeness, naturalness, and ability of users to identify key parts of the answer. We offer a set of prosodic modifications that highlight potentially important parts of the answer using various acoustic cues. Our experiments show that different modifications lead to better comprehension at the expense of slightly degraded naturalness of the audio.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
2
Experiments performed under Ethics Application BSEH 10–14 at RMIT University.
 
4
The emphasis feature is currently only available in the Google TTS and the implementation details are not specified in the SSML standard nor the documentation.
 
Literature
1.
go back to reference Chuklin, A., de Rijke, M.: Incorporating clicks, attention and satisfaction into a search engine result page evaluation model. In: CIKM (2016) Chuklin, A., de Rijke, M.: Incorporating clicks, attention and satisfaction into a search engine result page evaluation model. In: CIKM (2016)
3.
go back to reference Cutler, A., Foss, D.J.: On the role of sentence stress in sentence processing. Lang. Speech 20, 1–10 (1977)CrossRef Cutler, A., Foss, D.J.: On the role of sentence stress in sentence processing. Lang. Speech 20, 1–10 (1977)CrossRef
4.
go back to reference Filippova, K., Alfonseca, E., Colmenares, C.A., Kaiser, L., Vinyals, O.: Sentence compression by deletion with LSTMs. In: EMNLP (2015) Filippova, K., Alfonseca, E., Colmenares, C.A., Kaiser, L., Vinyals, O.: Sentence compression by deletion with LSTMs. In: EMNLP (2015)
5.
go back to reference Kumar, A.J., Schmidt, C., Köhler, J.: A knowledge graph-based speech interface for question answering systems. Speech Commun. 92, 1–12 (2017)CrossRef Kumar, A.J., Schmidt, C., Köhler, J.: A knowledge graph-based speech interface for question answering systems. Speech Commun. 92, 1–12 (2017)CrossRef
6.
go back to reference Mishra, T., Bangalore, S.: Qme!: a speech-based question-answering system on mobile devices. In: Proceedings of NAACL 2010, pp. 55–63 (2010) Mishra, T., Bangalore, S.: Qme!: a speech-based question-answering system on mobile devices. In: Proceedings of NAACL 2010, pp. 55–63 (2010)
7.
go back to reference Mitra, B., Simon, G., Gao, J., Craswell, N., Deng, L.: A proposal for evaluating answer distillation from web data. In: Proceedings of the SIGIR 2016 WebQA Workshop (2016) Mitra, B., Simon, G., Gao, J., Craswell, N., Deng, L.: A proposal for evaluating answer distillation from web data. In: Proceedings of the SIGIR 2016 WebQA Workshop (2016)
8.
go back to reference Nguyen, T., et al.: MS MARCO: a human generated machine reading comprehension dataset (2016) Nguyen, T., et al.: MS MARCO: a human generated machine reading comprehension dataset (2016)
9.
go back to reference Pannekamp, A., Toepel, U., Alter, K., Hahne, A., Friederici, A.D.: Prosody-driven sentence processing: an event-related brain potential study. J. Cogn. Neurosci. 17, 407–421 (2005)CrossRef Pannekamp, A., Toepel, U., Alter, K., Hahne, A., Friederici, A.D.: Prosody-driven sentence processing: an event-related brain potential study. J. Cogn. Neurosci. 17, 407–421 (2005)CrossRef
10.
go back to reference Philips, L.: The double metaphone search algorithm. C/C++ Users J. 18(6), 38–43 (2000) Philips, L.: The double metaphone search algorithm. C/C++ Users J. 18(6), 38–43 (2000)
11.
go back to reference Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ questions for machine comprehension of text. In: EMNLP (2016) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ questions for machine comprehension of text. In: EMNLP (2016)
12.
go back to reference Ratcliff, J.W., Metzener, D.E.: Pattern matching: the gestalt approach. Dr. Dobb’s J. 13(7), 46 (1988) Ratcliff, J.W., Metzener, D.E.: Pattern matching: the gestalt approach. Dr. Dobb’s J. 13(7), 46 (1988)
13.
go back to reference Sanderman, A.A., Collier, R.: Prosodic phrasing and comprehension. Lang. Speech 40(4), 391–409 (1997)CrossRef Sanderman, A.A., Collier, R.: Prosodic phrasing and comprehension. Lang. Speech 40(4), 391–409 (1997)CrossRef
14.
go back to reference Whittaker, E.W.D., Mrozinski, J., Furui, S.: Factoid question answering with web, mobile and speech interfaces. In: NAACL (2006) Whittaker, E.W.D., Mrozinski, J., Furui, S.: Factoid question answering with web, mobile and speech interfaces. In: NAACL (2006)
Metadata
Title
Using Audio Transformations to Improve Comprehension in Voice Question Answering
Authors
Aleksandr Chuklin
Aliaksei Severyn
Johanne R. Trippas
Enrique Alfonseca
Hanna Silen
Damiano Spina
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-28577-7_12

Premium Partner