ABSTRACT
A robot agent existing in the physical world must be able to understand the social states of the human users it interacts with in order to respond appropriately. We compared two implemented methods for estimating the engagement state of customers for a robot bartender based on low-level sensor data: a rule-based version derived from the analysis of human behaviour in real bars, and a trained version using supervised learning on a labelled multimodal corpus. We first compared the two implementations using cross-validation on real sensor data and found that nearly all classifier types significantly outperformed the rule-based classifier. We also carried out feature selection to see which sensor features were the most informative for the classification task, and found that the position of the head and hands were relevant, but that the torso orientation was not. Finally, we performed a user study comparing the ability of the two classifiers to detect the intended user engagement of actual customers of the robot bartender; this study found that the trained classifier was faster at detecting initial intended user engagement, but that the rule-based classifier was more stable.
- Weka primer. http://weka.wikispaces.com/Primer.Google Scholar
- D. Aha and D. Kibler. Instance-based learning algorithms. phMachine Learning, 6: 37--66, 1991. Google ScholarDigital Library
- R. Baayen, D. Davidson, and D. Bates. Mixed-effects modeling with crossed random effects for subjects and items. phJournal of Memory and Language, 59 (4): 390--412, 2008. 10.1016/j.jml.2007.12.005.Google ScholarCross Ref
- H. Baltzakis, M. Pateraki, and P. Trahanias. Visual tracking of hands, faces and facial features of multiple persons. phMachine Vision and Applications, 23 (6): 1141--1157, 2012. 10.1007/s00138-012-0409--5.Google ScholarCross Ref
- }Bohus.Horvitz:2009D. Bohus and E. Horvitz. Dialog in the open world: platform and applications. In phProceedings of ICMI-MLMI 2009, pages 31--38, Cambridge, MA, Nov. 2009\natexlaba. 10.1145/1647314.1647323. Google ScholarDigital Library
- }Bohus.Horvitz:2009aD. Bohus and E. Horvitz. Learning to predict engagement with a spoken dialog system in open-world settings. In phProceedings of SIGDIAL 2009, pages 244--252, 2009\natexlabb. Google ScholarDigital Library
- G. Castellano, I. Leite, A. Pereira, C. Martinho, A. Paiva, and P. McOwan. Detecting engagement in HRI: An exploration of social and task-based context. In phProceedings of SocialCom'12, pages 421--428, Sept. 2012. 10.1109/SocialCom-PASSAT.2012.51. Google ScholarDigital Library
- C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. phACM Trans. Intell. Syst. Technol., 2 (3): 27:1--27:27, May 2011. 10.1145/1961189.1961199. Google ScholarDigital Library
- W. W. Cohen. Fast effective rule induction. In phTwelfth International Conference on Machine Learning, pages 115--123. Morgan Kaufmann, 1995.Google Scholar
- M. E. Foster, A. Gaschler, M. Giuliani, A. Isard, M. Pateraki, and R. P. A. Petrick. Two people walk into a bar: Dynamic multi-party social interaction with a robot agent. In phProceedings of ICMI 2012, Oct. 2012. Google ScholarDigital Library
- E. Frank, Y. Wang, S. Inglis, G. Holmes, and I. Witten. Using model trees for classification. phMachine Learning, 32 (1): 63--76, 1998. Google ScholarDigital Library
- Gaschler, Huth, Giuliani, Kessler, de Ruiter, and Knoll}Gaschler2012aA. Gaschler, K. Huth, M. Giuliani, I. Kessler, J. de Ruiter, and A. Knoll. Modelling state of interaction from head poses for social Human-Robot Interaction. In phProceedings of the Gaze in Human-Robot Interaction Workshop held at the 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI 2012), Boston, MA, March 2012\natexlaba.Google Scholar
- Gaschler, Jentzsch, Giuliani, Huth, de Ruiter, and Knoll}Gaschler2012bA. Gaschler, S. Jentzsch, M. Giuliani, K. Huth, J. de Ruiter, and A. Knoll. Social Behavior Recognition using body posture and head pose for Human-Robot Interaction. In phIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), October 2012\natexlabb. 10.1109/IROS.2012.6385460.Google Scholar
- M. Giuliani, R. P. A. Petrick, M. E. Foster, A. Gaschler, A. Isard, M. Pateraki, and M. Sigalas. Comparing task-based and socially intelligent behaviour in a robot bartender. In phProceedings of the 15\textsuperscriptth International Conference on Multimodal Interfaces (ICMI 2013), Sydney, Australia, Dec. 2013. Google ScholarDigital Library
- M. Hall and G. Holmes. Benchmarking attribute selection techniques for discrete class data mining. phIEEE Transactions on Knowledge and Data Engineering, 15 (6): 1437--1447, 2003. 10.1109/TKDE.2003.1245283. Google ScholarDigital Library
- M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The WEKA data mining software: an update. phSIGKDD Explorations Newsletter, 11 (1): 10--18, Nov. 2009. 10.1145/1656274.1656278. Google ScholarDigital Library
- M. A. Hall. Correlation-based feature selection for discrete and numeric class machine learning. In phProceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), pages 359--366, 2000. Google ScholarDigital Library
- C.-W. Hsu, C.-C. Chang, and C.-J. Lin. A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University, 15 April 2010. http://www.csie.ntu.edu.tw/ cjlin/papers/guide/guide.pdf.Google Scholar
- K. Huth, S. Loth, and J. De Ruiter. Insights from the bar: A model of interaction. In phProceedings of Formal and Computational Approaches to Multimodal Communication, Aug. 2012.Google Scholar
- G. H. John and P. Langley. Estimating continuous distributions in Bayesian classifiers. In phEleventh Conference on Uncertainty in Artificial Intelligence, pages 338--345, San Mateo, 1995. Google ScholarDigital Library
- S. Keizer, M. E. Foster, O. Lemon, A. Gaschler, and M. Giuliani. Training and evaluation of an MDP model for social multi-user human-robot interaction. In phProceedings of the 14\textsuperscriptth Annual SIGdial Meeting on Discourse and Dialogue, 2013.Google Scholar
- R. Kohavi and G. H. John. Wrappers for feature subset selection. phArtificial intelligence, 97 (1): 273--324, 1997. Google ScholarDigital Library
- S. le Cessie and J. van Houwelingen. Ridge estimators in logistic regression. phApplied Statistics, 41 (1): 191--201, 1992.Google ScholarCross Ref
- L. Li, Q. Xu, and Y. K. Tan. Attention-based addressee selection for service and social robots to interact with multiple persons. In phProceedings of the Workshop at SIGGRAPH Asia, WASA '12, pages 131--136, 2012. 10.1145/2425296.2425319. Google ScholarDigital Library
- S. Loth, K. Huth, and J. P. De Ruiter. Automatic detection of service initiation signals used in bars. phFrontiers in Psychology, 4 (557), 2013. 10.3389/fpsyg.2013.00557.Google Scholar
- Z. MacHardy, K. Syharath, and P. Dewan. Engagement analysis through computer vision. In phProceedings of CollaborateCom 2012, pages 535--539, Oct. 2012.Google Scholar
- D. McColl and G. Nejat. Affect detection from body language during social HRI. In phProceedings of 2012 IEEE RO-MAN, pages 1013--1018, Sept. 2012. 10.1109/ROMAN.2012.6343882.Google Scholar
- }MicrosoftCorporation:2012Microsoft Corporation. Kinect for Windows. URL http://www.microsoft.com/en-us/kinectforwindows/.Google Scholar
- M. Pateraki, M. Sigalas, G. Chliveros, and P. Trahanias. Visual human-robot communication in social settings. In phProceedings of ICRA Workshop on Semantics, Identification and Control of Robot-Human-Environment Interaction, 2013.Google Scholar
- R. P. A. Petrick and M. E. Foster. Planning for social interaction in a robot bartender domain. In phProceedings of the ICAPS 2013 Special Track on Novel Applications, Rome, Italy, June 2013.Google Scholar
- R. P. A. Petrick, M. E. Foster, and A. Isard. Social state recognition and knowledge-level planning for human-robot interaction in a bartender domain. In phAAAI 2012 Workshop on Grounding Language for Physical Systems, Toronto, ON, Canada, July 2012.Google Scholar
- R. Quinlan. phC4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA, 1993. Google ScholarDigital Library
- A. Vinciarelli, M. Pantic, D. Heylen, C. Pelachaud, I. Poggi, F. D'Errico, and M. Schroeder. Bridging the gap between social animal and unsocial machine: A survey of social signal processing. phIEEE Transactions on Affective Computing, 3 (1): 69--87, Jan. 2012. 10.1109/T-AFFC.2011.27. Google ScholarDigital Library
- M. Walker, C. Kamm, and D. Litman. Towards developing general models of usability with PARADISE. phNatural Language Engineering, 6 (3&4): 363--377, 2000. 10.1017/S1351324900002503. Google ScholarDigital Library
- B. West, K. B. Welch, and A. T. Galecki. phLinear mixed models: a practical guide using statistical software. CRC Press, 2006.Google Scholar
- M. White. Efficient realization of coordinate structures in Combinatory Categorial Grammar. phResearch on Language and Computation, 4 (1): 39--75, 2006. 10.1007/s11168-006--9010--2.Google ScholarCross Ref
Index Terms
- How can i help you': comparing engagement classification strategies for a robot bartender
Recommendations
A Novel Online Stacked Ensemble for Multi-Label Stream Classification
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge ManagementAs data streams become more prevalent, the necessity for online algorithms that mine this transient and dynamic data becomes clearer. Multi-label data stream classification is a supervised learning problem where each instance in the data stream is ...
Automatic understanding of affective and social signals by multimodal mimicry recognition
ACII'11: Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part IIHuman mimicry is one of the important behavioral cues displayed during social interaction that inform us about the interlocutors' interpersonal states and attitudes. For example, the absence of mimicry is usually associated with negative attitudes. A ...
Body Language Without a Body: Nonverbal Communication in Technology Mediated Settings
AVEC '17: Proceedings of the 7th Annual Workshop on Audio/Visual Emotion ChallengeHumans are wired for face-to-face interaction because this was the only possible and available setting during the long evolutionary process that has led to Homo Sapiens. At the moment an increasingly significant fraction of our interactions take place ...
Comments