In social interactions, one is continuously confronted with an intricate complexity of verbal and nonverbal behavior, including hand-arm gestures, body movements or facial expressions. All of these behaviors can be indicative of the other’s referential, communicative, or social intentions [
1]. In this paper, we focus on hand-arm gestures. Interlocutors in social interaction incessantly and concurrently produce and perceive a variety of gestures. The generation of a hand-arm gesture, coarsely, consists of two steps. First, finding the proper gesture for an intention that is to be realized under current context constraints. Second, performing the gesture using one’s motor repertoire. Similarly, the recipient perceives and analyzes the other’s movement both at motor and at intention levels. Cumulating evidence suggests that these two processes are not separate, but that recognizing and understanding a gesture is grounded in the perceiver’s own motor repertoire [
2,
3]. In other words, a hand movement is understood, at least partially, by evoking the motor system of the observer. This is evidenced by so-called
motor resonances showing that the motor and action (premotor) systems become activated during both performance and observation of bodily behavior [
4‐
6]. One hypothesis is that these neural resonances reflect the involvement of the motor system in deriving predictions and evaluating hypotheses about the incoming observations. This integration of perception and action enables imitating or mimicking the observed behavior, either overtly or covertly, and thus forms an embodied basis for understanding other embodied agents [
7], and for communication and intersubjectivity of intentional agents more generally (cf.
simulation theory [
8]). Hence, perception-action links (and resulting resonances) are assumed to be effective at various levels of a hierarchical perceptual-motor system, from kinematic features to motor commands to goals and intentions [
9], whereas these levels interact bi-directionally; bottom-up and top-down [
10]. Further, a close perception-action integration can be assumed to support two important ingredients of social interaction: First, fast and often subconscious inter-personal coordinations (e.g., alignment, mimicry, interactional synchrony) that lead to rapport [
11] and social resonance [
12] between interactants. Second, social learning of behavior by means of imitation, which helps to acquire and interactively establish behavior through connected perceiving, processing, and reproducing of their pertinent features. All of these aforementioned effects may also apply—at least to a certain extent—to the interaction between humans and embodied agents, be it physical robots or virtual characters (see [
12] for a detailed discussion). For example, brain imaging studies [
13,
14] showed that artificial agents with sufficiently natural appearance and movements can evoke motor resonances in human observers.
Against this background, we aim for interactive embodied systems ultimately able to engage in social interactions, in a human-like manner, based on cognitively plausible mechanisms. A central ingredient is a computational model for integrated perception and generation of hand-arm gestures. This model has to fulfill a number of requirements: (1) perceiving and generating behavior in a fast, robust, and incremental manner, (2) concurrent and mutually interacting perception and generation, (3) concurrent processing at different levels of motor abstraction, from movement trajectories to intentions; (4) incremental construction of hierarchical knowledge structures through learning from observation and imitation.
In this paper, we present a cognitive computational model that has been devised and developed to meet the above-mentioned requirements for the domain of hand-arm gestures. Focusing on the motor aspect of gestures, it should also serve as a basis for future modeling of higher cognitive levels of social intentions. In the section
"Shared Motor Knowledge Model", we introduce the Shared Motor Knowledge Model that serves as a basis for integrating perception and action, both of which operate upon these knowledge structures by means of forward/inverse models. In
"A Probabilistic Model of Motor Resonances" we present a probabilistic approach to simulate fast, incremental and concurrent resonances and their exploitation of these structures in both perceiving and generating behavior. Section
"Perception-Action Integration" details how the integration of perception and action is achieved in this model and how this helps to model and cope with characteristics of nonverbal human social interaction. Results of applying this model to real-world data (marker-free gesture tracking) from a human-agent interaction scenario are reported in
"Results". In the final section we discuss our work in comparison to other related work.