Top

KI - Künstliche Intelligenz

Published in:

16-09-2021 | Technical Contribution

Embodied Human Computer Interaction

Authors: James Pustejovsky, Nikhil Krishnaswamy

Published in: KI - Künstliche Intelligenz | Issue 3-4/2021

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In this paper, we argue that embodiment can play an important role in the design and modeling of systems developed for Human Computer Interaction. To this end, we describe a simulation platform for building Embodied Human Computer Interactions (EHCI). This system, VoxWorld, enables multimodal dialogue systems that communicate through language, gesture, action, facial expressions, and gaze tracking, in the context of task-oriented interactions. A multimodal simulation is an embodied 3D virtual realization of both the situational environment and the co-situated agents, as well as the most salient content denoted by communicative acts in a discourse. It is built on the modeling language VoxML (Pustejovsky and Krishnaswamy in VoxML: a visualization modeling language, proceedings of LREC, 2016), which encodes objects with rich semantic typing and action affordances, and actions themselves as multimodal programs, enabling contextually salient inferences and decisions in the environment. VoxWorld enables an embodied HCI by situating both human and artificial agents within the same virtual simulation environment, where they share perceptual and epistemic common ground. We discuss the formal and computational underpinnings of embodiment and common ground, how they interact and specify parameters of the interaction between humans and artificial agents, and demonstrate behaviors and types of interactions on different classes of artificial agents.

previous article Draw mir a Sheep: A Supersense-based Analysis of German Case and Adposition Semantics

next article Stance Detection Benchmark: How Robust is Your Stance Detection?

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

KI - Künstliche Intelligenz

The Scientific journal "KI – Künstliche Intelligenz" is the official journal of the division for artificial intelligence within the "Gesellschaft für Informatik e.V." (GI) – the German Informatics Society - with constributions from troughout the field of artificial intelligence.

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

This recalls the question of how to best model situated action [16, 97].

See Sect. 5 for details on integrating various sensor types and their relationships with the particulars of the artificial agent’s embodiment.

as = argument structure; qs = qualia structure.

Beginning in [52], voxemes have been denoted [[voxeme]].

It should be noted that Gibsonian affordances might be construed as the goal of an activity in some contexts.

TTR encodes actions (such as put and grasp above) as finite-state sequences of subevents (cf. [72]), but the computational effect of applying the updating functions over the current RobotState, given an action, are similar to our interpretation of events as state-transformers; e.g., mapping from RobotState to RobotState.

VoxSim source can be found here.

Shared aural perception is possible, while haptic technology is rapidly advancing. We expect that much of the semantics presented here would be suitable for modeling extra-visual shared perception. This is the topic of ongoing research, beginning with haptics in VR.

This is similar in many respects to the representations introduced in [20, 27] and [37] for modeling action and control with robots.

The theory of semiotic schemas introduced in [83] attempts to encode the perceptual context of a linguistic utterance as well, to resolve reference.

Forward kinematics computes the position of the end-effector from the joint parameters. Inverse kinematics computes the joint parameters from the position of the effector.

\([\![S ]\!]= ([\![\mathbf{NP} ]\!][\![\mathbf{GP} ]\!]).\)

\([\![\mathbf{GP}_1 ]\!]= \lambda j. ([\![\mathbf{D}_{Obj} ]\!];\lambda j'.(([\![\mathbf{G}_{af} ]\!]j')j)).\)

\([\![\mathbf{GP}_2 ]\!]= \lambda k. ([\![\mathbf{D}_{Loc} ]\!]; \lambda j. ([\![\mathbf{D}_{Obj} ]\!];\lambda j'.(([\![\mathbf{G}_{af} ]\!]j')j)k)).\)

\([\![\mathbf{GP}_3 ]\!]= \lambda k. ([\![\mathbf{D}_{Dir} ]\!]; \lambda j. ([\![\mathbf{D}_{Obj} ]\!];\lambda j'.(([\![\mathbf{G}_{af} ]\!]j')j)k)).\)

\([\alpha ]_{\sigma } (x_i \vee e_i)\), \([\beta ]_{\sigma } (x_i \vee e_i).\)

\([\alpha ]_{\sigma } ([\beta ]_{\sigma } (x_i \vee e_i))\), \([\beta ]_{\sigma } ([\alpha ]_{\sigma } (x_i \vee e_i)).\)

\([\beta ]_{\sigma } ([\alpha ]_{\sigma } ([\beta ]_{\sigma } (x_i \vee e_i))) \), \([\alpha ]_{\sigma } ([\beta ]_{\sigma } ([\alpha ]_{\sigma } (x_i \vee e_i))).\)

\([(\alpha \cup \beta )^*]_{\sigma } \varphi. \)

A video demo can be viewed here http://www.voxicon.net/wp-content/uploads/2020/07/DARPA-CwC-Brandeis-CSU-July-2020.mp4.

VoxML encodes relations using a number of common spatial reasoning calculi, including the Region Connection Calculus [82], where this would be encoded EC(y, sfc).

Anderson ML (2003) Embodied cognition: a field guide. Artif Intell 149(1):91–130

Asher N (1998) Common ground, corrections and coordination. J Semant

Asher N (2008) A type driven theory of predication with complex types. Fund Inf 84(2):151–183MathSciNetMATH

Asher N, Lascarides A (2003) Logics of conversation. Cambridge University Press, Cambridge

Asher N, Pogodalla S (2010) Sdrt and continuation semantics. In: JSAI international symposium on artificial intelligence, Springer, New York, pp 3–15

Asher N, Pustejovsky J (2006) A type composition logic for generative lexicon. J Cognit Sci 6:1–38

Baker CL, Jara-Ettinger J, Saxe R, Tenenbaum JB (2017) Rational quantitative attribution of beliefs, desires and percepts in human mentalizing. Nat Hum Behav 1(4):1–10

Ballard DH (1981) Generalizing the hough transform to detect arbitrary shapes. Pattern Recogn 13(2):111–122MATH

Barker C, Shan CC (2014) Continuations and natural language, vol 53. Oxford Studies in Theoretical Linguistics

10.

van Benthem JFAK (1991) Logic and the flow of information

11.

Bergen BK (2012) Louder than words: the new science of how the mind makes meaning. Basic Books

12.

Blackburn P, Bos J (2003) Computational semantics. Theor Int J Theory Hist Found Sci pp 27–45

13.

Cassell J, Stone M, Yan H (2000a) Coordination and context-dependence in the generation of embodied conversation. In: Proceedings of the first international conference on Natural language generation-Volume 14, ACL, pp 171–178

14.

Cassell J, Sullivan J, Churchill E, Prevost S (2000b) Embodied conversational agents. MIT Press, New York

15.

Chrisley R (2003) Embodied artificial intelligence. Artif Intell 149(1):131–150

16.

Clancey WJ (1993) Situated action: A neuropsychological interpretation response to vera and simon. Cogn Sci 17(1):87–116

17.

Clark HH, Brennan SE (1991) Grounding in communication. Perspect Soc Share Cognit 13(1991):127–149

18.

Cooper R (2005) Records and record types in semantic theory. J Logic Comput 15(2):99–112MathSciNetMATH

19.

Cooper R (2017) Adapting type theory with records for natural language semantics. In: Modern perspectives in type-theoretical semantics, Springer, New York, pp 71–94

20.

Cooper R, Ginzburg J (2015) Type theory with records for natural language semantics. The handbook of contemporary semantic theory p 375

21.

Coventry K, Garrod SC (2005) Spatial prepositions and the functional geometric framework. Towards a classification of extra-geometric influences

22.

Craik KJW (1943) The nature of explanation. Cambridge University, Cambridge

23.

De Groote P (2001) Type raising, continuations, and classical logic. In: Proceedings of the thirteenth Amsterdam Colloquium, pp 97–101

24.

Dekker PJ (2012) Predicate logic with anaphora. In: Dynamic Semantics, Springer, New York, pp 7–47

25.

Dobnik S, Cooper R (2017) Interfacing language, spatial perception and cognition in type theory with records. J Lang Modell 5(2):273–301

26.

Dobnik S, Cooper R, Larsson S (2012) Modelling language, action, and perception in type theory with records. In: International workshop on constraint solving and language processing, Springer, New York, pp 70–91

27.

Dobnik S, Cooper R, Larsson S (2013) Modelling language, action, and perception in type theory with records. In: Constraint solving and language processing, Springer, New York, pp 70–91

28.

Evans V (2013) Language and time: a cognitive linguistics approach. Cambridge University Press, Cambridge

29.

Feldman J (2010) Embodied language, best-fit analysis, and formal compositionality. Phys Life Rev 7(4):385–410

30.

Fernando T (2009) Situations in ltl as strings. Inf Comput 207(10):980–999MathSciNetMATH

31.

Fischer K (2011) How people talk with robots: designing dialog to reduce user uncertainty. AI Magn 32(4):31–38

32.

Foster ME (2007) Enhancing human–computer interaction with embodied conversational agents. In: International conference on universal access in human–computer interaction, Springer, New York, pp 828–837

33.

Gatsoulis Y, Alomari M, Burbridge C, Dondrup C, Duckworth P, Lightbody P, Hanheide M, Hawes N, Hogg D, Cohn A, et al. (2016) Qsrlib: a software library for online acquisition of qualitative spatial relations from video

34.

Gibson JJ (1977) The theory of affordances. Perceiving, acting, and knowing: toward an ecological psychology, pp 67–82

35.

Gibson JJ (1979) The ecological approach to visual perception. Psychology Press

36.

Ginzburg J (1996) Interrogatives: questions, facts and dialogue. The handbook of contemporary semantic theory. Blackwell, Oxford pp 359–423

37.

Ginzburg J, Fernández R (2010) Computational models of dialogue. The handbook of computational linguistics and natural language processing 57:1

38.

Goldman AI (1989) Interpretation psychologized*. Mind Lang 4(3):161–185

39.

Gordon RM (1986) Folk psychology as simulation. Mind Lang 1(2):158–171

40.

Gregoromichelaki E, Kempson R, Howes C (2020) Actionism in syntax and semantics. Dial Percept pp 12–27

41.

Griffiths TL, Chater N, Kemp C, Perfors A, Tenenbaum JB (2010) Probabilistic models of cognition: exploring representations and inductive biases. Trends Cogn Sci 14(8):357–364

42.

Groenendijk J, Stokhof M (1991) Dynamic predicate logic. Linguist Philos pp 39–100

43.

Harel D (1984) Dynamic logic. In: Gabbay M, Gunthner F (eds) Handbook of philosophical logic, volume II: extensions of classical logic, Reidel, p 497–604

44.

Harel D, Kozen D, Tiuyn J (2000) Dynamic logic, 1st edn. The MIT Press, New York

45.

Johnson M (1987) The body in the mind: the bodily basis of meaning, imagination, and reason. University of Chicago Press, Chicago

46.

Kamp H, Van Genabith J, Reyle U (2011) Discourse representation theory. In: Handbook of philosophical logic, Springer, New York, pp 125–394

47.

Kendon A (2004) Gesture: visible action as utterance. Cambridge University Press, Cambridge

48.

Kiela D, Bulat L, Vero AL, Clark S (2016) Virtual embodiment: A scalable long-term strategy for artificial intelligence research. arXiv preprint arXiv:161007432

49.

Klein E, Sag IA (1985) Type-driven translation. Linguist Philos 8(2):163–201

50.

Konrad K (2004) 4 minimal model generation. In: Model generation for natural language interpretation and analysis, Springer, New York, pp 55–56

51.

Kopp S, Wachsmuth I (2010) Gesture in embodied communication and human–computer interaction, vol 5934. Springer, New York

52.

Krishnaswamy N (2017) Monte-carlo simulation generation through operationalization of spatial primitives. PhD thesis, Brandeis University

53.

Krishnaswamy N, Pustejovsky J (2016a) Multimodal semantic simulations of linguistically underspecified motion events. In: Spatial Cognition X, Springer, New York, pp 177–197

54.

Krishnaswamy N, Pustejovsky J (2016b) VoxSim: a visual platform for modeling motion language. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics, ACL

55.

Krishnaswamy N, Pustejovsky J (2018) Deictic adaptation in a virtual environment. In: Spatial cognition XI, Springer, New York, pp 180–196

56.

Krishnaswamy N, Narayana P, Wang I, Rim K, Bangar R, Patil D, Mulay G, Ruiz J, Beveridge R, Draper B, Pustejovsky J (2017) Communicating and acting: Understanding gesture in simulation semantics. In: 12th International workshop on computational semantics

57.

Kruijff GJM, Lison P, Benjamin T, Jacobsson H, Zender H, Kruijff-Korbayová I, Hawes N (2010) Situated dialogue processing for human–robot interaction. In: Cognitive systems, Springer, pp 311–364

58.

Landragin F (2006) Visual perception, language and gesture: a model for their understanding in multimodal dialogue systems. Signal Process 86(12):3578–3595MATH

59.

Lascarides A, Stone M (2006) Formal semantics for iconic gesture. In: Proceedings of the 10th workshop on the semantics and pragmatics of dialogue (BRANDIAL), pp 64–71

60.

Lascarides A, Stone M (2009) A formal semantic analysis of gesture. J Semant p ffp004

61.

Lücking A, Pfeiffer T, Rieser H (2015) Pointing and reference reconsidered. J Pragmat 77:56–79

62.

Mani I, Pustejovsky J (2012) Interpreting motion: grounded representations for spatial language. Oxford University Press, Oxford

63.

Marge M, Rudnicky AI (2013) Towards evaluating recovery strategies for situated grounding problems in human–robot dialogue. In: 2013 IEEE RO-MAN, IEEE, pp 340–341

64.

Marshall P, Hornecker E (2013) Theories of embodiment in hci. SAGE Handb Digit Technol Res 1:144–158

65.

McNeely-White DG, Ortega FR, Beveridge JR, Draper BA, Bangar R, Patil D, Pustejovsky J, Krishnaswamy N, Rim K, Ruiz J, Wang I (2019) User-aware shared perception for embodied agents. In: 2019 IEEE international conference on humanized computing and communication (HCC), IEEE, pp 46–51

66.

Miller GA, Johnson-Laird PN (1976) Language and perception. Belknap Press, Cambridge

67.

Muller P, Prévot L (2009) Grounding information in route explanation dialogues

68.

Narayana P, Krishnaswamy N, Wang I, Bangar R, Patil D, Mulay G, Rim K, Beveridge R, Ruiz J, Pustejovsky J, Draper B (2018) Cooperating with avatars through gesture, language and action. In: Intelligent systems conference (IntelliSys)

69.

Narayanan S (2010) Mind changes: a simulation semantics account of counterfactuals. Cognit Sci

70.

Naumann R (2001) Aspects of changes: a dynamic event semantics. J Semant 18:27–81

71.

Plaza J (2007) Logics of public communications. Synthese 158(2):165–179MathSciNetMATH

72.

Pustejovsky J (1991) The syntax of event structure. Cognition 41(1–3):47–81

73.

Pustejovsky J (1995) The generative Lexicon. MIT Press, New York

74.

Pustejovsky J (2013) Dynamic event structure and habitat theory. In: Proceedings of the 6th international conference on generative approaches to the Lexicon (GL2013), ACL, pp 1–10

75.

Pustejovsky J (2018) From actions to events: communicating through language and gesture. Interact Stud 19(1–2):289–317

76.

Pustejovsky J, Batiukova O (2019) The lexicon. Cambridge University Press, Cambridge

77.

Pustejovsky J, Boguraev B (1993) Lexical knowledge representation and natural language processing. Artif Intell 63(1–2):193–223

78.

Pustejovsky J, Krishnaswamy N (2016) Voxml: a visualization modeling language. Proceedings of LREC

79.

Pustejovsky J, Krishnaswamy N (2020) Embodied human-computer interactions through situated grounding. In: IVA ’20: proceedings of the 20th international conference on intelligent virtual agents, ACM

80.

Pustejovsky J, Moszkowicz JL (2011) The qualitative spatial dynamics of motion in language. Spatial Cognit Comput 11(1):15–44

81.

Qing C, Goodman ND, Lassiter D (2016) A rational speech-act model of projective content. In: Proceedings of cognitive science, pp 1110–1115

82.

Randell D, Cui Z, Cohn A, Nebel B, Rich C, Swartout W (1992) A spatial logic based on regions and connection. In: KR’92. Principles of knowledge representation and reasoning: proceedings of the 3rd international conference, Morgan Kaufmann, San Mateo, pp 165–176

83.

Roy D (2005) Semiotic schemas: a framework for grounding language in action and perception. Artif Intell 167(1–2):170–205

84.

Schaffer S, Reithinger N (2019) Conversation is multimodal: thus conversational user interfaces should be as well. In: Proceedings of the 1st international conference on conversational user interfaces, pp 1–3

85.

Scheutz M, Cantrell R, Schermerhorn P (2011) Toward humanlike task-based dialogue processing for human robot interaction. AI Magn 32(4):77–84

86.

Schlenker P (2020) Gestural grammar. Nat Lang Linguist Theory pp 1–50

87.

Shapiro L (2014) The Routledge handbook of embodied cognition. Routledge, England

88.

Stalnaker R (2002) Common ground. Linguist Philos 25(5–6):701–721

89.

Tavares JMRS, Padilha AJMN (1995) A new approach for merging edge line segments. In: Proceedings RecPad’95, Aveiro

90.

Tellex S, Gopalan N, Kress-Gazit H, Matuszek C (2020) Robots that use language. Annu Rev Control Robot Auton Syst 3:25–55

91.

Tomasello M, Carpenter M (2007) Shared intentionality. Dev Sci 10(1):121–125

92.

Ullman TD, Goodman ND, Tenenbaum JB (2012) Theory learning as stochastic search in the language of thought. Cogn Dev 27(4):455–480

93.

Unger C (2011) Dynamic semantics as monadic computation. In: JSAI international symposium on artificial intelligence, Springer, New York, pp 68–81

94.

Van Benthem J (2011) Logical dynamics of information and interaction. Cambridge University Press, Cambridge

95.

Van Ditmarsch H, van Der Hoek W, Kooi B (2007) Dynamic epistemic logic, vol 337. Springer, New YorkMATH

96.

Van Eijck J, Unger C (2010) Computational semantics with functional programming. Cambridge University Press, CambridgeMATH

97.

Vera AH, Simon HA (1993) Situated action: a symbolic interpretation. Cognit Sci 17(1):7–48. https://doi.org/10.1016/S0364-0213(05)80008-4CrossRef

98.

Wahlster W (2006) Dialogue systems go multimodal: The smartkom experience. In: SmartKom: foundations of multimodal dialogue systems, Springer, New York, pp 3–27

99.

Wang I, Narayana P, Patil D, Mulay G, Bangar R, Draper B, Beveridge R, Ruiz J (2017) EGGNOG: A continuous, multi-modal data set of naturally occurring gestures with ground truth labels. In: To appear in the Proceedings of the 12th IEEE international conference on automatic face & gesture recognition

100.

Weiser M (1999) The computer for the 21st century. ACM SIGMOBILE Mob Comput Commun Rev 3(3):3–11

101.

Williams T, Bussing M, Cabrol S, Boyle E, Tran N (2019) Mixed reality deictic gesture for multi-modal robot communication. In: 2019 14th ACM/IEEE international conference on human–robot interaction (HRI), IEEE, pp 191–201

102.

Winston ME, Chaffin R, Herrmann D (1987) A taxonomy of part-whole relations. Cognit Sci 11(4):417–444

Title: Embodied Human Computer Interaction
Authors: James Pustejovsky
Nikhil Krishnaswamy
Publication date: 16-09-2021
Publisher: Springer Berlin Heidelberg
Published in: KI - Künstliche Intelligenz / Issue 3-4/2021
Print ISSN: 0933-1875
Electronic ISSN: 1610-1987
DOI: https://doi.org/10.1007/s13218-021-00727-5

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

KI - Künstliche Intelligenz

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 3-4/2021

Special Issue on NLP & Semantics

Data, Knowledge, and Computation

Consciousness: Just Another Technique?

Draw mir a Sheep: A Supersense-based Analysis of German Case and Adposition Semantics

Artificial Intelligence: Mind, Computer and the Dance of the Wu Li Masters

Do It Yourself, but Not Alone: Companion-Technology for Home Improvement—Bringing a Planning-Based Interactive DIY Assistant to Life

Premium Partner