ABSTRACT
Social robots need non-verbal behavior to make an interaction pleasant and efficient. Most of the models for generating non-verbal behavior are rule-based and hence can produce a limited set of motions and are tuned to a particular scenario. In contrast, data-driven systems are flexible and easily adjustable. Hence we aim to learn a data-driven model for generating non-verbal behavior (in a form of a 3D motion sequence) for humanoid robots.
Our approach is based on a popular and powerful deep generative model: Variation Autoencoder (VAE). Input for our model will be multi-modal and we will iteratively increase its complexity: first, it will only use the speech signal, then also the text transcription and finally - the non-verbal behavior of the conversation partner. We will evaluate our system on the virtual avatars as well as on two humanoid robots with different embodiments: NAO and Furhat. Our model will be easily adapted to a novel domain: this can be done by providing application specific training data.
- Henny Admoni and Brian Scassellati. 2014. Data-driven model of nonverbal behavior for socially assistive human-robot interactions. In International Conference on Multimodal Interaction. Google ScholarDigital Library
- Henny Admoni, Thomas Weng, Bradley Hayes, and Brian Scassellati. 2016. Robot nonverbal behavior improves task performance in difficult collaborations ACM/IEEE International Conference on Human Robot Interaction. Google ScholarDigital Library
- Samer Al Moubayed, Jonas Beskow, Gabriel Skantze, and Björn Granström. 2012. Furhat: a back-projected human-like robot head for multiparty human-machine interaction. In Cognitive Behavioural Systems. 114--130. Google ScholarDigital Library
- Sean Andrist, Xiang Zhi Tan, Michael Gleicher, and Bilge Mutlu. 2014. Conversational gaze aversion for humanlike robots. In ACM/IEEE International Conference on Human Robot Interaction. Google ScholarDigital Library
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).Google Scholar
- Judith Bütepage, Michael Black, Danica Kragic, and Hedvig Kjellström. 2017. Deep representation learning for human motion prediction and classification IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- Justine Cassell, Hannes Högni Vilhjálmsson, and Timothy Bickmore. 2001. Beat: the behavior expression animation toolkit. In Annual Conference on Computer Graphics and Interactive Techniques. Google ScholarDigital Library
- Chung-Cheng Chiu, Louis-Philippe Morency, and Stacy Marsella. 2015. Predicting co-verbal gestures: a deep and temporal modeling approach International Conference on Intelligent Virtual Agents.Google Scholar
- Paul Ekman and Wallace V. Friesen. 1969. The repertoire of nonverbal behavior: Categories, origins, usage, and coding. Semiotica Vol. 1, 1 (1969), 49--98.Google ScholarCross Ref
- Adso Fernández-Baena, Raúl Montaño, Marc Antonijoan, Arturo Roversi, David Miralles, and Francesc Al'ıas. 2014. Gesture synthesis adapted to speech emphasis. Speech Communication Vol. 57 (2014), 331--350. Google ScholarDigital Library
- Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. Speech recognition with deep recurrent neural networks IEEE International Conference on Acoustics, Speech and Signal Processing.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
- Chien-Ming Huang and Bilge Mutlu. 2012. Robot behavior toolkit: generating effective social behaviors for robots ACM/IEEE International Conference on Human Robot Interaction. Google ScholarDigital Library
- Patrik Jonell, Joseph Mendelson, Thomas Storskog, Goran Hagman, Per Ostberg, Iolanda Leite, Taras Kucherenko, Olga Mikheeva, Ulrika Akenine, Vesna Jelic, et al.. 2017. Machine Learning and Social Robotics for Detecting Early Signs of Dementia. arXiv preprint arXiv:1709.01613 (2017).Google Scholar
- Diederik P. Kingma and Max Welling. 2013. Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114 (2013).Google Scholar
- Mark L. Knapp, Judith A. Hall, and Terrence G. Horgan. 2013. Nonverbal communication in human interaction. Wadsworth, Cengage Learning.Google Scholar
- Robert M. Krauss, Yihsiu Chen, and Purnima Chawla. 1996. Nonverbal behavior and nonverbal communication: What do conversational hand gestures tell us? In Advances in Experimental Social Psychology. Vol. Vol. 28. 389--450.Google Scholar
- Phoebe Liu, Dylan F. Glas, Takayuki Kanda, and Hiroshi Ishiguro. 2016. Data-driven HRI: Learning social behaviors by example from human-human interaction. IEEE Transactions on Robotics Vol. 32, 4 (2016), 988--1008.Google ScholarDigital Library
- Julieta Martinez, Michael J. Black, and Javier Romero. 2017. On human motion prediction using recurrent neural networks IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- David Matsumoto, Mark G. Frank, and Hyi Sung Hwang. 2013. Nonverbal communication: Science and applications: Science and applications. Sage.Google Scholar
- Victor Ng-Thow-Hing, Pengcheng Luo, and Sandra Okita. 2010. Synchronized gesture and speech production for humanoid robots IEEE/RSJ International Conference on Intelligent Robots and Systems.Google Scholar
- Najmeh Sadoughi and Carlos Busso. 2017. Speech-driven animation with meaningful behaviors. arXiv preprint arXiv:1708.01640 (2017).Google Scholar
- Maha Salem, Stefan Kopp, Ipke Wachsmuth, Katharina Rohlfing, and Frank Joublin. 2012. Generation and evaluation of communicative robot gesture. International Journal of Social Robotics Vol. 4, 2 (2012), 201--217.Google ScholarCross Ref
- Kenta Takeuchi, Dai Hasegawa, Shinichi Shirakawa, Naoshi Kaneko, Hiroshi Sakuta, and Kazuhiko Sumi. 2017. Speech-to-Gesture Generation: A Challenge in Deep Learning Approach with Bi-Directional LSTM. In International Conference on Human Agent Interaction. Google ScholarDigital Library
- Kenta Takeuchi, Souichirou Kubota, Keisuke Suzuki, Dai Hasegawa, and Hiroshi Sakuta. 2017. Creating a Gesture-Speech Dataset for Speech-Based Automatic Gesture Generation International Conference on Human-Computer Interaction. Springer, 198--202.Google Scholar
Index Terms
- Data Driven Non-Verbal Behavior Generation for Humanoid Robots
Recommendations
Greek-language verbal and non-verbal interaction with a philosopher humanoid robot
PETRA '14: Proceedings of the 7th International Conference on PErvasive Technologies Related to Assistive EnvironmentsWe envision a world where humanoid robots can be used as exciting museum guides, shopping mall robots, or interactive theatre actors, impersonating characters such as ancient philosophers who talk about their theories and lives with the public. Towards ...
Whole-Body Motion Generation Integrating Operator's Intention and Robot's Autonomy in Controlling Humanoid Robots
This paper introduces a framework for whole-body motion generation integrating operator's control and robot's autonomous functions during online control of humanoid robots. Humanoid robots are biped machines that usually possess multiple degrees of ...
Head motions during dialogue speech and nod timing control in humanoid robots
HRI '10: Proceedings of the 5th ACM/IEEE international conference on Human-robot interactionHead motion naturally occurs in synchrony with speech and may carry paralinguistic information, such as intention, attitude and emotion, in dialogue communication. With the aim of verifying the relationship between head motion and the dialogue acts ...
Comments