research-article

How can i help you': comparing engagement classification strategies for a robot bartender

Authors:
Mary Ellen Foster

Heriot-Watt University, Edinburgh, United Kingdom

Heriot-Watt University, Edinburgh, United Kingdom
View Profile

,
Andre Gaschler

fortiss An-Institut der TU München, Munich, Germany

fortiss An-Institut der TU München, Munich, Germany
View Profile

,
Manuel Giuliani

fortiss An-Institut der TU München, Munich, Germany

fortiss An-Institut der TU München, Munich, Germany
View Profile

ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interactionDecember 2013Pages 255–262https://doi.org/10.1145/2522848.2522879

Published:09 December 2013Publication History

ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction

Pages 255–262

ABSTRACT

A robot agent existing in the physical world must be able to understand the social states of the human users it interacts with in order to respond appropriately. We compared two implemented methods for estimating the engagement state of customers for a robot bartender based on low-level sensor data: a rule-based version derived from the analysis of human behaviour in real bars, and a trained version using supervised learning on a labelled multimodal corpus. We first compared the two implementations using cross-validation on real sensor data and found that nearly all classifier types significantly outperformed the rule-based classifier. We also carried out feature selection to see which sensor features were the most informative for the classification task, and found that the position of the head and hands were relevant, but that the torso orientation was not. Finally, we performed a user study comparing the ability of the two classifiers to detect the intended user engagement of actual customers of the robot bartender; this study found that the trained classifier was faster at detecting initial intended user engagement, but that the rule-based classifier was more stable.

References

Weka primer. http://weka.wikispaces.com/Primer.Google Scholar
D. Aha and D. Kibler. Instance-based learning algorithms. phMachine Learning, 6: 37--66, 1991. Google ScholarDigital Library
R. Baayen, D. Davidson, and D. Bates. Mixed-effects modeling with crossed random effects for subjects and items. phJournal of Memory and Language, 59 (4): 390--412, 2008. 10.1016/j.jml.2007.12.005.Google ScholarCross Ref
H. Baltzakis, M. Pateraki, and P. Trahanias. Visual tracking of hands, faces and facial features of multiple persons. phMachine Vision and Applications, 23 (6): 1141--1157, 2012. 10.1007/s00138-012-0409--5.Google ScholarCross Ref
}Bohus.Horvitz:2009D. Bohus and E. Horvitz. Dialog in the open world: platform and applications. In phProceedings of ICMI-MLMI 2009, pages 31--38, Cambridge, MA, Nov. 2009\natexlaba. 10.1145/1647314.1647323. Google ScholarDigital Library
}Bohus.Horvitz:2009aD. Bohus and E. Horvitz. Learning to predict engagement with a spoken dialog system in open-world settings. In phProceedings of SIGDIAL 2009, pages 244--252, 2009\natexlabb. Google ScholarDigital Library
G. Castellano, I. Leite, A. Pereira, C. Martinho, A. Paiva, and P. McOwan. Detecting engagement in HRI: An exploration of social and task-based context. In phProceedings of SocialCom'12, pages 421--428, Sept. 2012. 10.1109/SocialCom-PASSAT.2012.51. Google ScholarDigital Library
C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. phACM Trans. Intell. Syst. Technol., 2 (3): 27:1--27:27, May 2011. 10.1145/1961189.1961199. Google ScholarDigital Library
W. W. Cohen. Fast effective rule induction. In phTwelfth International Conference on Machine Learning, pages 115--123. Morgan Kaufmann, 1995.Google Scholar
M. E. Foster, A. Gaschler, M. Giuliani, A. Isard, M. Pateraki, and R. P. A. Petrick. Two people walk into a bar: Dynamic multi-party social interaction with a robot agent. In phProceedings of ICMI 2012, Oct. 2012. Google ScholarDigital Library
E. Frank, Y. Wang, S. Inglis, G. Holmes, and I. Witten. Using model trees for classification. phMachine Learning, 32 (1): 63--76, 1998. Google ScholarDigital Library
Gaschler, Huth, Giuliani, Kessler, de Ruiter, and Knoll}Gaschler2012aA. Gaschler, K. Huth, M. Giuliani, I. Kessler, J. de Ruiter, and A. Knoll. Modelling state of interaction from head poses for social Human-Robot Interaction. In phProceedings of the Gaze in Human-Robot Interaction Workshop held at the 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI 2012), Boston, MA, March 2012\natexlaba.Google Scholar
Gaschler, Jentzsch, Giuliani, Huth, de Ruiter, and Knoll}Gaschler2012bA. Gaschler, S. Jentzsch, M. Giuliani, K. Huth, J. de Ruiter, and A. Knoll. Social Behavior Recognition using body posture and head pose for Human-Robot Interaction. In phIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), October 2012\natexlabb. 10.1109/IROS.2012.6385460.Google Scholar
M. Giuliani, R. P. A. Petrick, M. E. Foster, A. Gaschler, A. Isard, M. Pateraki, and M. Sigalas. Comparing task-based and socially intelligent behaviour in a robot bartender. In phProceedings of the 15\textsuperscriptth International Conference on Multimodal Interfaces (ICMI 2013), Sydney, Australia, Dec. 2013. Google ScholarDigital Library
M. Hall and G. Holmes. Benchmarking attribute selection techniques for discrete class data mining. phIEEE Transactions on Knowledge and Data Engineering, 15 (6): 1437--1447, 2003. 10.1109/TKDE.2003.1245283. Google ScholarDigital Library
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The WEKA data mining software: an update. phSIGKDD Explorations Newsletter, 11 (1): 10--18, Nov. 2009. 10.1145/1656274.1656278. Google ScholarDigital Library
M. A. Hall. Correlation-based feature selection for discrete and numeric class machine learning. In phProceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), pages 359--366, 2000. Google ScholarDigital Library
C.-W. Hsu, C.-C. Chang, and C.-J. Lin. A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University, 15 April 2010. http://www.csie.ntu.edu.tw/ cjlin/papers/guide/guide.pdf.Google Scholar
K. Huth, S. Loth, and J. De Ruiter. Insights from the bar: A model of interaction. In phProceedings of Formal and Computational Approaches to Multimodal Communication, Aug. 2012.Google Scholar
G. H. John and P. Langley. Estimating continuous distributions in Bayesian classifiers. In phEleventh Conference on Uncertainty in Artificial Intelligence, pages 338--345, San Mateo, 1995. Google ScholarDigital Library
S. Keizer, M. E. Foster, O. Lemon, A. Gaschler, and M. Giuliani. Training and evaluation of an MDP model for social multi-user human-robot interaction. In phProceedings of the 14\textsuperscriptth Annual SIGdial Meeting on Discourse and Dialogue, 2013.Google Scholar
R. Kohavi and G. H. John. Wrappers for feature subset selection. phArtificial intelligence, 97 (1): 273--324, 1997. Google ScholarDigital Library
S. le Cessie and J. van Houwelingen. Ridge estimators in logistic regression. phApplied Statistics, 41 (1): 191--201, 1992.Google ScholarCross Ref
L. Li, Q. Xu, and Y. K. Tan. Attention-based addressee selection for service and social robots to interact with multiple persons. In phProceedings of the Workshop at SIGGRAPH Asia, WASA '12, pages 131--136, 2012. 10.1145/2425296.2425319. Google ScholarDigital Library
S. Loth, K. Huth, and J. P. De Ruiter. Automatic detection of service initiation signals used in bars. phFrontiers in Psychology, 4 (557), 2013. 10.3389/fpsyg.2013.00557.Google Scholar
Z. MacHardy, K. Syharath, and P. Dewan. Engagement analysis through computer vision. In phProceedings of CollaborateCom 2012, pages 535--539, Oct. 2012.Google Scholar
D. McColl and G. Nejat. Affect detection from body language during social HRI. In phProceedings of 2012 IEEE RO-MAN, pages 1013--1018, Sept. 2012. 10.1109/ROMAN.2012.6343882.Google Scholar
}MicrosoftCorporation:2012Microsoft Corporation. Kinect for Windows. URL http://www.microsoft.com/en-us/kinectforwindows/.Google Scholar
M. Pateraki, M. Sigalas, G. Chliveros, and P. Trahanias. Visual human-robot communication in social settings. In phProceedings of ICRA Workshop on Semantics, Identification and Control of Robot-Human-Environment Interaction, 2013.Google Scholar
R. P. A. Petrick and M. E. Foster. Planning for social interaction in a robot bartender domain. In phProceedings of the ICAPS 2013 Special Track on Novel Applications, Rome, Italy, June 2013.Google Scholar
R. P. A. Petrick, M. E. Foster, and A. Isard. Social state recognition and knowledge-level planning for human-robot interaction in a bartender domain. In phAAAI 2012 Workshop on Grounding Language for Physical Systems, Toronto, ON, Canada, July 2012.Google Scholar
R. Quinlan. phC4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA, 1993. Google ScholarDigital Library
A. Vinciarelli, M. Pantic, D. Heylen, C. Pelachaud, I. Poggi, F. D'Errico, and M. Schroeder. Bridging the gap between social animal and unsocial machine: A survey of social signal processing. phIEEE Transactions on Affective Computing, 3 (1): 69--87, Jan. 2012. 10.1109/T-AFFC.2011.27. Google ScholarDigital Library
M. Walker, C. Kamm, and D. Litman. Towards developing general models of usability with PARADISE. phNatural Language Engineering, 6 (3&4): 363--377, 2000. 10.1017/S1351324900002503. Google ScholarDigital Library
B. West, K. B. Welch, and A. T. Galecki. phLinear mixed models: a practical guide using statistical software. CRC Press, 2006.Google Scholar
M. White. Efficient realization of coordinate structures in Combinatory Categorial Grammar. phResearch on Language and Computation, 4 (1): 39--75, 2006. 10.1007/s11168-006--9010--2.Google ScholarCross Ref

Index Terms

How can i help you': comparing engagement classification strategies for a robot bartender
1. Computing methodologies
  1. Machine learning

Recommendations

A Novel Online Stacked Ensemble for Multi-Label Stream Classification
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

As data streams become more prevalent, the necessity for online algorithms that mine this transient and dynamic data becomes clearer. Multi-label data stream classification is a supervised learning problem where each instance in the data stream is ...
Read More
Automatic understanding of affective and social signals by multimodal mimicry recognition
ACII'11: Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II

Human mimicry is one of the important behavioral cues displayed during social interaction that inform us about the interlocutors' interpersonal states and attitudes. For example, the absence of mimicry is usually associated with negative attitudes. A ...
Read More
Body Language Without a Body: Nonverbal Communication in Technology Mediated Settings
AVEC '17: Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge

Humans are wired for face-to-face interaction because this was the only possible and available setting during the long evolutionary process that has led to Homo Sapiens. At the moment an increasingly significant fraction of our interactions take place ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction
December 2013
630 pages
ISBN:9781450321297
DOI:10.1145/2522848
General Chairs:
Julien Epps
The University of New South Wales, Australia
,
Fang Chen
National ICT Australia, Australia
,
Sharon Oviatt
Incaa Designs, USA
,
Kenji Mase
Nagoya University, Japan
,
Program Chairs:
Andrew Sears
Rochester Institute of Technology, USA
,
Kristiina Jokinen
University of Helsinki, Finland
,
Björn Schuller
Technische Universität München, Germany
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 December 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
social signal processing
supervised learning
Qualifiers
- research-article
Conference

Acceptance Rates
ICMI '13 Paper Acceptance Rate49of133submissions,37%Overall Acceptance Rate453of1,080submissions,42%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 47
  Total Citations
  View Citations
- 326
  Total Downloads
- Downloads (Last 12 months)19
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

How can i help you': comparing engagement classification strategies for a robot bartender

ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Novel Online Stacked Ensemble for Multi-Label Stream Classification

Automatic understanding of affective and social signals by multimodal mimicry recognition

Body Language Without a Body: Nonverbal Communication in Technology Mediated Settings