Skip to main content
Top

2019 | OriginalPaper | Chapter

You Talkin’ to Me? A Practical Attention-Aware Embodied Agent

Authors : Rahul R. Divekar, Jeffrey O. Kephart, Xiangyang Mou, Lisha Chen, Hui Su

Published in: Human-Computer Interaction – INTERACT 2019

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Most present-day voice-based assistants require that users utter a wake-up word to signify that they are addressing the assistant. While this may be acceptable for one-shot requests such as “Turn on the lights”, it becomes tiresome when one is engaged in an extended interaction with such an assistant. To support the goal of developing low-complexity, low-cost alternatives to a wake-up word, we present the results of two studies in which users engage with an assistant that infers whether it is being addressed from the user’s head orientation. In the first experiment, we collected informal user feedback regarding a relatively simple application of head orientation as a substitute for a wake-up word. We discuss that feedback and how it influenced the design of a second prototype assistant designed to correct many of the issues identified in the first experiment. The most promising insight was that users were willing to adapt to the interface, leading us to hypothesize that it would be beneficial to provide visual feedback about the assistant’s belief about the user’s attentional state. In a second experiment conducted using the improved assistant, we collected more formal user feedback on likability and usability and used it to establish that, with high confidence, head orientation combined with visual feedback is preferable to the traditional wake-up word approach. We describe the visual feedback mechanisms and quantify their usefulness in the second experiment.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Video data were missing for one subject.
 
Literature
go back to reference Admoni, H., Scassellati, B.: Social eye gaze in human-robot interaction: a review. J. Hum. Robot Interact. 6(1), 25–63 (2017)CrossRef Admoni, H., Scassellati, B.: Social eye gaze in human-robot interaction: a review. J. Hum. Robot Interact. 6(1), 25–63 (2017)CrossRef
go back to reference Akhtiamov, O., Sidorov, M., Karpov, A.A., Minker, W.: Speech and text analysis for multimodal addressee detection in human-human-computer interaction. In: INTERSPEECH, pp. 2521–2525 (2017) Akhtiamov, O., Sidorov, M., Karpov, A.A., Minker, W.: Speech and text analysis for multimodal addressee detection in human-human-computer interaction. In: INTERSPEECH, pp. 2521–2525 (2017)
go back to reference Andrist, S., Bohus, D., Mutlu, B., Schlangen, D.: Turn-taking and coordination in human-machine interaction. AI Mag. 37(4), 5–6 (2016)CrossRef Andrist, S., Bohus, D., Mutlu, B., Schlangen, D.: Turn-taking and coordination in human-machine interaction. AI Mag. 37(4), 5–6 (2016)CrossRef
go back to reference Baba, N., Huang, H.H., Nakano, Y.I.: Addressee identification for human-human-agent multiparty conversations in different proxemics. In: Proceedings of the 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction, p. 6. ACM (2012) Baba, N., Huang, H.H., Nakano, Y.I.: Addressee identification for human-human-agent multiparty conversations in different proxemics. In: Proceedings of the 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction, p. 6. ACM (2012)
go back to reference Bakx, I., Van Turnhout, K., Terken, J.M.: Facial orientation during multi-party interaction with information kiosks. In: INTERACT (2003) Bakx, I., Van Turnhout, K., Terken, J.M.: Facial orientation during multi-party interaction with information kiosks. In: INTERACT (2003)
go back to reference Bentley, F., Luvogt, C., Silverman, M., Wirasinghe, R., White, B., Lottrjdge, D.: Understanding the long-term use of smart speaker assistants. Proc. ACM Interact Mobile Wearable Ubiquit. Technol. 2(3), 91 (2018) Bentley, F., Luvogt, C., Silverman, M., Wirasinghe, R., White, B., Lottrjdge, D.: Understanding the long-term use of smart speaker assistants. Proc. ACM Interact Mobile Wearable Ubiquit. Technol. 2(3), 91 (2018)
go back to reference Bohus, D., Horvitz, E.: Multiparty turn taking in situated dialog: Study, lessons, and directions. In: Proceedings of the SIGDIAL 2011 Conference, pp. 98–109. Association for Computational Linguistics (2011) Bohus, D., Horvitz, E.: Multiparty turn taking in situated dialog: Study, lessons, and directions. In: Proceedings of the SIGDIAL 2011 Conference, pp. 98–109. Association for Computational Linguistics (2011)
go back to reference Borghi, G., Fabbri, M., Vezzani, R., Calderara, S., Cucchiara, R.: Face-from-depth for head pose estimation on depth images. arXiv preprint arXiv:1712.05277 (2017) Borghi, G., Fabbri, M., Vezzani, R., Calderara, S., Cucchiara, R.: Face-from-depth for head pose estimation on depth images. arXiv preprint arXiv:​1712.​05277 (2017)
go back to reference De, J.G.X.Y.S., Kautz, M.J.: Dynamic facial analysis: from Bayesian filtering to recurrent neural network (2017) De, J.G.X.Y.S., Kautz, M.J.: Dynamic facial analysis: from Bayesian filtering to recurrent neural network (2017)
go back to reference Dementhon, D.F., Davis, L.S.: Model-based object pose in 25 lines of code. Int. J. Comput. Vis. 15(1–2), 123–141 (1995)CrossRef Dementhon, D.F., Davis, L.S.: Model-based object pose in 25 lines of code. Int. J. Comput. Vis. 15(1–2), 123–141 (1995)CrossRef
go back to reference Divekar, R.R., et al.: Interaction challenges in ai equipped environments built to teach foreign languages through dialogue and task-completion. In: Proceedings of the 2018 Designing Interactive Systems Conference, DIS 2018, pp. 597–609. ACM, New York (2018). ISBN 978-1-4503-5198-0, https://doi.org/10.1145/3196709.3196717 Divekar, R.R., et al.: Interaction challenges in ai equipped environments built to teach foreign languages through dialogue and task-completion. In: Proceedings of the 2018 Designing Interactive Systems Conference, DIS 2018, pp. 597–609. ACM, New York (2018). ISBN 978-1-4503-5198-0, https://​doi.​org/​10.​1145/​3196709.​3196717
go back to reference Divekar, R.R., Mou, X., Chen, L., de Bayser, M.G., Guerra, M.A., Su, H.: Embodied conversational AI agents in a multi-modal multi-agent competitive dialogue. In: IJCAI (2019) Divekar, R.R., Mou, X., Chen, L., de Bayser, M.G., Guerra, M.A., Su, H.: Embodied conversational AI agents in a multi-modal multi-agent competitive dialogue. In: IJCAI (2019)
go back to reference Farrell, R.G., et al.: Symbiotic cognitive computing. AI Mag. 37(3), 81–93 (2016)CrossRef Farrell, R.G., et al.: Symbiotic cognitive computing. AI Mag. 37(3), 81–93 (2016)CrossRef
go back to reference Frampton, M., Fernández, R., Ehlen, P., Christoudias, M., Darrell, T., Peters, S.: Who is you?: combining linguistic and gaze features to resolve second-person references in dialogue. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 273–281. Association for Computational Linguistics (2009) Frampton, M., Fernández, R., Ehlen, P., Christoudias, M., Darrell, T., Peters, S.: Who is you?: combining linguistic and gaze features to resolve second-person references in dialogue. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 273–281. Association for Computational Linguistics (2009)
go back to reference Gravano, A., Hirschberg, J.: Turn-yielding cues in task-oriented dialogue. In: Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp. 253–261. Association for Computational Linguistics (2009) Gravano, A., Hirschberg, J.: Turn-yielding cues in task-oriented dialogue. In: Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp. 253–261. Association for Computational Linguistics (2009)
go back to reference Gu, E., Badler, N.I.: Visual attention and eye gaze during multiparty conversations with distractions. In: Gratch, J., Young, M., Aylett, R., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS (LNAI), vol. 4133, pp. 193–204. Springer, Heidelberg (2006). https://doi.org/10.1007/11821830_16CrossRef Gu, E., Badler, N.I.: Visual attention and eye gaze during multiparty conversations with distractions. In: Gratch, J., Young, M., Aylett, R., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS (LNAI), vol. 4133, pp. 193–204. Springer, Heidelberg (2006). https://​doi.​org/​10.​1007/​11821830_​16CrossRef
go back to reference Katzenmaier, M.: Identifying the addressee in human-human-robot interactions based on head pose and speech. Ph.D. thesis, Carnegie Mellon University, USA and University of Karlsruhe TH, Germany (2004) Katzenmaier, M.: Identifying the addressee in human-human-robot interactions based on head pose and speech. Ph.D. thesis, Carnegie Mellon University, USA and University of Karlsruhe TH, Germany (2004)
go back to reference Kendon, A.: Some functions of gaze-direction in social interaction. Acta Psychol. 26, 22–63 (1967)CrossRef Kendon, A.: Some functions of gaze-direction in social interaction. Acta Psychol. 26, 22–63 (1967)CrossRef
go back to reference Kephart, J.O., Dibia, V.C., Ellis, J., Srivastava, B., Talamadupula, K., Dholakia, M.: A cognitive assistant for visualizing and analyzing exoplanets. In: Proc. AAAI 2018 (2018) Kephart, J.O., Dibia, V.C., Ellis, J., Srivastava, B., Talamadupula, K., Dholakia, M.: A cognitive assistant for visualizing and analyzing exoplanets. In: Proc. AAAI 2018 (2018)
go back to reference Lin, G.S., Tsai, T.S.: A face tracking method using feature point tracking. In: 2012 International Conference on Information Security and Intelligence Control, ISIC, pp. 210–213. IEEE (2012) Lin, G.S., Tsai, T.S.: A face tracking method using feature point tracking. In: 2012 International Conference on Information Security and Intelligence Control, ISIC, pp. 210–213. IEEE (2012)
go back to reference Nakano, Y.I., Baba, N., Huang, H.H., Hayashi, Y.: Implementation and evaluation of a multimodal addressee identification mechanism for multiparty conversation systems. In: Proceedings of the 15th ACM on International conference on multimodal interaction, pp. 35–42. ACM (2013) Nakano, Y.I., Baba, N., Huang, H.H., Hayashi, Y.: Implementation and evaluation of a multimodal addressee identification mechanism for multiparty conversation systems. In: Proceedings of the 15th ACM on International conference on multimodal interaction, pp. 35–42. ACM (2013)
go back to reference Norouzian, A., Mazoure, B., Connolly, D., Willett, D.: Exploring attention mechanism for acoustic-based classification of speech utterances into system-directed and non-system-directed. arXiv preprint arXiv:1902.00570 (2019) Norouzian, A., Mazoure, B., Connolly, D., Willett, D.: Exploring attention mechanism for acoustic-based classification of speech utterances into system-directed and non-system-directed. arXiv preprint arXiv:​1902.​00570 (2019)
go back to reference Radziwill, N.M., Benton, M.C.: Evaluating quality of chatbots and intelligent conversational agents. arXiv preprint arXiv:1704.04579 (2017) Radziwill, N.M., Benton, M.C.: Evaluating quality of chatbots and intelligent conversational agents. arXiv preprint arXiv:​1704.​04579 (2017)
go back to reference Ranganatha, S., Gowramma, Y.: An integrated robust approach for fast face tracking in noisy real-world videos with visual constraints. In: 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 772–776. IEEE (2017) Ranganatha, S., Gowramma, Y.: An integrated robust approach for fast face tracking in noisy real-world videos with visual constraints. In: 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 772–776. IEEE (2017)
go back to reference Ravuri, S., Stolcke, A.: Recurrent neural network and LSTM models for lexical utterance classification. In: Sixteenth Annual Conference of the International Speech Communication Association (2015) Ravuri, S., Stolcke, A.: Recurrent neural network and LSTM models for lexical utterance classification. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
go back to reference Ruhland, K., et al.: A review of eye gaze in virtual agents, social robotics and hci: Behaviour generation, user interaction and perception. In: Computer Graphics Forum, vol. 34, pp. 299–326. Wiley (2015) Ruhland, K., et al.: A review of eye gaze in virtual agents, social robotics and hci: Behaviour generation, user interaction and perception. In: Computer Graphics Forum, vol. 34, pp. 299–326. Wiley (2015)
go back to reference Sciuto, A., Saini, A., Forlizzi, J., Hong, J.I.: Hey alexa, what’s up?: A mixed-methods studies of in-home conversational agent usage. In: Proceedings of the 2018 on Designing Interactive Systems Conference 2018, pp. 857–868. ACM (2018) Sciuto, A., Saini, A., Forlizzi, J., Hong, J.I.: Hey alexa, what’s up?: A mixed-methods studies of in-home conversational agent usage. In: Proceedings of the 2018 on Designing Interactive Systems Conference 2018, pp. 857–868. ACM (2018)
go back to reference Shriberg, E., Stolcke, A., Ravuri, S.V.: Addressee detection for dialog systems using temporal and spectral dimensions of speaking style. In: INTERSPEECH, pp. 2559–2563 (2013) Shriberg, E., Stolcke, A., Ravuri, S.V.: Addressee detection for dialog systems using temporal and spectral dimensions of speaking style. In: INTERSPEECH, pp. 2559–2563 (2013)
go back to reference Stiefelhagen, R., Zhu, J.: Head orientation and gaze direction in meetings. In: CHI 2002 Extended Abstracts on Human Factors in Computing Systems, pp. 858–859. ACM (2002) Stiefelhagen, R., Zhu, J.: Head orientation and gaze direction in meetings. In: CHI 2002 Extended Abstracts on Human Factors in Computing Systems, pp. 858–859. ACM (2002)
go back to reference Tsai, T., Stolcke, A., Slaney, M.: A study of multimodal addressee detection in human-human-computer interaction. IEEE Transact. Multimedia 17(9), 1550–1561 (2015)CrossRef Tsai, T., Stolcke, A., Slaney, M.: A study of multimodal addressee detection in human-human-computer interaction. IEEE Transact. Multimedia 17(9), 1550–1561 (2015)CrossRef
go back to reference Van Turnhout, K., Terken, J., Bakx, I., Eggen, B.: Identifying the intended addressee in mixed human-human and human-computer interaction from non-verbal features. In: Proceedings of the 7th international conference on Multimodal interfaces, pp. 175–182. ACM (2005) Van Turnhout, K., Terken, J., Bakx, I., Eggen, B.: Identifying the intended addressee in mixed human-human and human-computer interaction from non-verbal features. In: Proceedings of the 7th international conference on Multimodal interfaces, pp. 175–182. ACM (2005)
go back to reference Venturelli, M., Borghi, G., Vezzani, R., Cucchiara, R.: From depth data to head pose estimation: a siamese approach. arXiv preprint arXiv:1703.03624 (2017) Venturelli, M., Borghi, G., Vezzani, R., Cucchiara, R.: From depth data to head pose estimation: a siamese approach. arXiv preprint arXiv:​1703.​03624 (2017)
go back to reference Wang, K., Ji, Q.: Real time eye gaze tracking with 3D deformable eye-face model. In: Proceedings of IEEE CVPR, pp. 1003–1011 (2017) Wang, K., Ji, Q.: Real time eye gaze tracking with 3D deformable eye-face model. In: Proceedings of IEEE CVPR, pp. 1003–1011 (2017)
go back to reference Wu, Y., Gou, C., Ji, Q.: Simultaneous facial landmark detection, pose and deformation estimation under facial occlusion. arXiv preprint arXiv:1709.08130 (2017) Wu, Y., Gou, C., Ji, Q.: Simultaneous facial landmark detection, pose and deformation estimation under facial occlusion. arXiv preprint arXiv:​1709.​08130 (2017)
go back to reference Wu, Y., Ji, Q.: Facial landmark detection: a literature survey. Int. J. Comput. Vis. 127, 1–28 (2017) Wu, Y., Ji, Q.: Facial landmark detection: a literature survey. Int. J. Comput. Vis. 127, 1–28 (2017)
go back to reference Xu, T.L., Zhang, H., Yu, C.: See you see me: The role of eye contact in multimodal human-robot interaction. ACM Transact. Interact. Intell. Syst. (TIIS) 6(1), 2 (2016) Xu, T.L., Zhang, H., Yu, C.: See you see me: The role of eye contact in multimodal human-robot interaction. ACM Transact. Interact. Intell. Syst. (TIIS) 6(1), 2 (2016)
go back to reference Zhao, R., Wang, K., Divekar, R., Rouhani, R., Su, H., Ji, Q.: An immersive system with multi-modal human-computer interaction. In: 13th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2018, pp. 517–524 (2018) Zhao, R., Wang, K., Divekar, R., Rouhani, R., Su, H., Ji, Q.: An immersive system with multi-modal human-computer interaction. In: 13th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2018, pp. 517–524 (2018)
Metadata
Title
You Talkin’ to Me? A Practical Attention-Aware Embodied Agent
Authors
Rahul R. Divekar
Jeffrey O. Kephart
Xiangyang Mou
Lisha Chen
Hui Su
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-29387-1_44