Top

Published in:

2019 | OriginalPaper | Chapter

You Talkin’ to Me? A Practical Attention-Aware Embodied Agent

Authors : Rahul R. Divekar, Jeffrey O. Kephart, Xiangyang Mou, Lisha Chen, Hui Su

Published in: Human-Computer Interaction – INTERACT 2019

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Most present-day voice-based assistants require that users utter a wake-up word to signify that they are addressing the assistant. While this may be acceptable for one-shot requests such as “Turn on the lights”, it becomes tiresome when one is engaged in an extended interaction with such an assistant. To support the goal of developing low-complexity, low-cost alternatives to a wake-up word, we present the results of two studies in which users engage with an assistant that infers whether it is being addressed from the user’s head orientation. In the first experiment, we collected informal user feedback regarding a relatively simple application of head orientation as a substitute for a wake-up word. We discuss that feedback and how it influenced the design of a second prototype assistant designed to correct many of the issues identified in the first experiment. The most promising insight was that users were willing to adapt to the interface, leading us to hypothesize that it would be beneficial to provide visual feedback about the assistant’s belief about the user’s attentional state. In a second experiment conducted using the improved assistant, we collected more formal user feedback on likability and usability and used it to establish that, with high confidence, head orientation combined with visual feedback is preferable to the traditional wake-up word approach. We describe the visual feedback mechanisms and quantify their usefulness in the second experiment.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Investigating the Use of an Online Peer-to-Peer Car Sharing Service

Video data were missing for one subject.

Admoni, H., Scassellati, B.: Social eye gaze in human-robot interaction: a review. J. Hum. Robot Interact. 6(1), 25–63 (2017)CrossRef

Ahn, B., Park, J., Kweon, I.S.: Real-time head orientation from a monocular camera using deep neural network. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9005, pp. 82–96. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16811-1_6CrossRef

Akhtiamov, O., Palkov, V.: Gaze, prosody and semantics: relevance of various multimodal signals to addressee detection in human-human-computer conversations. In: Karpov, A., Jokisch, O., Potapova, R. (eds.) SPECOM 2018. LNCS (LNAI), vol. 11096, pp. 1–10. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99579-3_1CrossRef

Akhtiamov, O., Sidorov, M., Karpov, A.A., Minker, W.: Speech and text analysis for multimodal addressee detection in human-human-computer interaction. In: INTERSPEECH, pp. 2521–2525 (2017)

Amazon: Avs ux attention system (2019). https://developer.amazon.com/docs/alexa-voice-service/ux-design-attention.html. Accessed 24 Jan 2019

Andrist, S., Bohus, D., Mutlu, B., Schlangen, D.: Turn-taking and coordination in human-machine interaction. AI Mag. 37(4), 5–6 (2016)CrossRef

Baba, N., Huang, H.H., Nakano, Y.I.: Addressee identification for human-human-agent multiparty conversations in different proxemics. In: Proceedings of the 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction, p. 6. ACM (2012)

Bakx, I., Van Turnhout, K., Terken, J.M.: Facial orientation during multi-party interaction with information kiosks. In: INTERACT (2003)

Bentley, F., Luvogt, C., Silverman, M., Wirasinghe, R., White, B., Lottrjdge, D.: Understanding the long-term use of smart speaker assistants. Proc. ACM Interact Mobile Wearable Ubiquit. Technol. 2(3), 91 (2018)

Bohus, D., Horvitz, E.: Multiparty turn taking in situated dialog: Study, lessons, and directions. In: Proceedings of the SIGDIAL 2011 Conference, pp. 98–109. Association for Computational Linguistics (2011)

Borghi, G., Fabbri, M., Vezzani, R., Calderara, S., Cucchiara, R.: Face-from-depth for head pose estimation on depth images. arXiv preprint arXiv:1712.05277 (2017)

Perreira Da Silva, M., Courboulay, V., Prigent, A., Estraillier, P.: Real-time face tracking for attention aware adaptive games. In: Gasteratos, A., Vincze, M., Tsotsos, J.K. (eds.) ICVS 2008. LNCS, vol. 5008, pp. 99–108. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79547-6_10CrossRef

De, J.G.X.Y.S., Kautz, M.J.: Dynamic facial analysis: from Bayesian filtering to recurrent neural network (2017)

Dementhon, D.F., Davis, L.S.: Model-based object pose in 25 lines of code. Int. J. Comput. Vis. 15(1–2), 123–141 (1995)CrossRef

Divekar, R.R., et al.: Interaction challenges in ai equipped environments built to teach foreign languages through dialogue and task-completion. In: Proceedings of the 2018 Designing Interactive Systems Conference, DIS 2018, pp. 597–609. ACM, New York (2018). ISBN 978-1-4503-5198-0, https://doi.org/10.1145/3196709.3196717

Divekar, R.R., Mou, X., Chen, L., de Bayser, M.G., Guerra, M.A., Su, H.: Embodied conversational AI agents in a multi-modal multi-agent competitive dialogue. In: IJCAI (2019)

Farrell, R.G., et al.: Symbiotic cognitive computing. AI Mag. 37(3), 81–93 (2016)CrossRef

Frampton, M., Fernández, R., Ehlen, P., Christoudias, M., Darrell, T., Peters, S.: Who is you?: combining linguistic and gaze features to resolve second-person references in dialogue. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 273–281. Association for Computational Linguistics (2009)

Gravano, A., Hirschberg, J.: Turn-yielding cues in task-oriented dialogue. In: Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp. 253–261. Association for Computational Linguistics (2009)

Gu, E., Badler, N.I.: Visual attention and eye gaze during multiparty conversations with distractions. In: Gratch, J., Young, M., Aylett, R., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS (LNAI), vol. 4133, pp. 193–204. Springer, Heidelberg (2006). https://doi.org/10.1007/11821830_16CrossRef

Katzenmaier, M.: Identifying the addressee in human-human-robot interactions based on head pose and speech. Ph.D. thesis, Carnegie Mellon University, USA and University of Karlsruhe TH, Germany (2004)

Kendon, A.: Some functions of gaze-direction in social interaction. Acta Psychol. 26, 22–63 (1967)CrossRef

Kephart, J.O., Dibia, V.C., Ellis, J., Srivastava, B., Talamadupula, K., Dholakia, M.: A cognitive assistant for visualizing and analyzing exoplanets. In: Proc. AAAI 2018 (2018)

Le Minh, T., Shimizu, N., Miyazaki, T., Shinoda, K.: Deep learning based multi-modal addressee recognition in visual scenes with utterances. In: IJCAI 2018, pp. 1546–1553 (2018). https://doi.org/10.24963/ijcai.2018/214

Lin, G.S., Tsai, T.S.: A face tracking method using feature point tracking. In: 2012 International Conference on Information Security and Intelligence Control, ISIC, pp. 210–213. IEEE (2012)

Mutlu, B., Kanda, T., Forlizzi, J., Hodgins, J., Ishiguro, H.: Conversational gaze mechanisms for humanlike robots. ACM Transact. Interact. Intell. Syst. 1(2), 1–33 (2012). https://doi.org/10.1145/2070719.2070725. ISSN 21606455, http://dl.acm.org/citation.cfm?doid=2070719.2070725CrossRef

Nakano, Y.I., Baba, N., Huang, H.H., Hayashi, Y.: Implementation and evaluation of a multimodal addressee identification mechanism for multiparty conversation systems. In: Proceedings of the 15th ACM on International conference on multimodal interaction, pp. 35–42. ACM (2013)

Norouzian, A., Mazoure, B., Connolly, D., Willett, D.: Exploring attention mechanism for acoustic-based classification of speech utterances into system-directed and non-system-directed. arXiv preprint arXiv:1902.00570 (2019)

Radziwill, N.M., Benton, M.C.: Evaluating quality of chatbots and intelligent conversational agents. arXiv preprint arXiv:1704.04579 (2017)

Ranganatha, S., Gowramma, Y.: An integrated robust approach for fast face tracking in noisy real-world videos with visual constraints. In: 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 772–776. IEEE (2017)

Ravuri, S., Stolcke, A.: Recurrent neural network and LSTM models for lexical utterance classification. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)

Ruhland, K., et al.: A review of eye gaze in virtual agents, social robotics and hci: Behaviour generation, user interaction and perception. In: Computer Graphics Forum, vol. 34, pp. 299–326. Wiley (2015)

van Schendel, J.A., Cuijpers, R.H.: Turn-yielding cues in robot-human conversation. New Front. Hum. Robot Interact., p. 85 (2015). URL http://www.mahasalem.net/AISB2015/NF-HRI-2015-full_proceedings.pdf#page=86

Sciuto, A., Saini, A., Forlizzi, J., Hong, J.I.: Hey alexa, what’s up?: A mixed-methods studies of in-home conversational agent usage. In: Proceedings of the 2018 on Designing Interactive Systems Conference 2018, pp. 857–868. ACM (2018)

Sheikhi, S., Odobez, J.M.: Combining dynamic head pose-gaze mapping with the robot conversational state for attention recognition in human-robot interactions. Pattern Recogn. Lett. 66, 81–90 (2015). https://doi.org/10.1016/j.patrec.2014.10.002. ISSN 01678655CrossRef

Shriberg, E., Stolcke, A., Ravuri, S.V.: Addressee detection for dialog systems using temporal and spectral dimensions of speaking style. In: INTERSPEECH, pp. 2559–2563 (2013)

Stiefelhagen, R., Zhu, J.: Head orientation and gaze direction in meetings. In: CHI 2002 Extended Abstracts on Human Factors in Computing Systems, pp. 858–859. ACM (2002)

Tsai, T., Stolcke, A., Slaney, M.: A study of multimodal addressee detection in human-human-computer interaction. IEEE Transact. Multimedia 17(9), 1550–1561 (2015)CrossRef

Van Turnhout, K., Terken, J., Bakx, I., Eggen, B.: Identifying the intended addressee in mixed human-human and human-computer interaction from non-verbal features. In: Proceedings of the 7th international conference on Multimodal interfaces, pp. 175–182. ACM (2005)

Venturelli, M., Borghi, G., Vezzani, R., Cucchiara, R.: From depth data to head pose estimation: a siamese approach. arXiv preprint arXiv:1703.03624 (2017)

Wang, K., Ji, Q.: Real time eye gaze tracking with 3D deformable eye-face model. In: Proceedings of IEEE CVPR, pp. 1003–1011 (2017)

Wu, Y., Gou, C., Ji, Q.: Simultaneous facial landmark detection, pose and deformation estimation under facial occlusion. arXiv preprint arXiv:1709.08130 (2017)

Wu, Y., Ji, Q.: Facial landmark detection: a literature survey. Int. J. Comput. Vis. 127, 1–28 (2017)

Xu, T.L., Zhang, H., Yu, C.: See you see me: The role of eye contact in multimodal human-robot interaction. ACM Transact. Interact. Intell. Syst. (TIIS) 6(1), 2 (2016)

ZDNet: How alexa developers are using visual elements for echo show (2018). https://www.youtube.com/watch?v=eZIouIY5p8Q

Zhao, R., Wang, K., Divekar, R., Rouhani, R., Su, H., Ji, Q.: An immersive system with multi-modal human-computer interaction. In: 13th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2018, pp. 517–524 (2018)

Title: You Talkin’ to Me? A Practical Attention-Aware Embodied Agent
Authors: Rahul R. Divekar
Jeffrey O. Kephart
Xiangyang Mou
Lisha Chen
Hui Su
Publisher: Springer International Publishing
Book: Human-Computer Interaction – INTERACT 2019
Print ISBN: 978-3-030-29386-4

Electronic ISBN: 978-3-030-29387-1

Copyright Year: 2019
DOI: https://doi.org/10.1007/978-3-030-29387-1_44

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"