Studies presented in neuroscience (Pulvermüller et al.
2001; Hauk et al.
2004; Tettamanti et al.
2005; Buccino et al.
2005) and the behavioral sciences (Buccino et al.
2005; Scorolli and Borghi
2007) have demonstrated that language is embodied in perceptual and motor knowledge. According to this embodied perspective, language skills develop together with other cognitive capabilities and through the sensorimotor interaction of an agent with the environment. In such a context, particular attention has been given to the representation of action words, which are verbs referring to actions like pick, kick, lick. Through electroencephalography (EEG) recordings it has been shown that the processing of action words causes differential activation along the motor strip in the brain, with strongest in-going activity occurring close to the cortical representation of the body parts (e.g. hands, legs, lips) primarily used for carrying out the actions described by the processed verbs (Pulvermüller et al.
2001). Other studies have shown that action word meanings have correlates in the somatotopic activation of the motor and premotor cortex (Hauk et al.
2004). Moreover, transcranial magnetic stimulation (TMS) studies and behavioral experiments have shown that the processing of action-related sentences modulates the activity of the motor system (Buccino et al.
2005); according to the effector used in the action described by the processed action word, different sectors of the motor system are activated (Buccino et al.
2005).
Psychological studies and theories on the embodiment of language have been proposed as well. According to the perceptual symbol systems (PSSs) theory, conceptualization requires the simulation of past experience (Barsalou
1999). For example, when thinking about an object, the neural patterns in the brain formed during earlier experience done with it, are reactivated. The neural underpinnings of this simulation could be found in wide neural circuits that involve canonical and mirror neurons (Rizzolatti et al.
1996). In other studies performed in the field of language comprehension (Glenberg and Kaschak
2002), it has been observed that sentences are understood by creating a simulation of the actions that underlie them (Action-sentence Compatibility Effect).
In contrast to other forms of communication, language is a combinatorial system that permits the conveyance of new messages and concepts by combining words together. A finite number of terms (i.e. lexicon) can be combined and permuted according to specific structural rules (i.e. grammar) in order to convey new meanings (Pinker
1994). Growing evidence has suggested that the human motor system is also hierarchically organized; that is, low level motor primitives can be integrated and recombined in different action sequences in order to perform novel tasks (Mussa-Ivaldi and Bizzi
2000). Studies investigating how the brain accomplishes action organization have been proposed in Grafton and Hamilton (
2007). The authors have argued that action organization is based on a hierarchical model, which includes different levels of motor control: (i) the level of action intention, (ii) the level of object-goal to realize the intention, (iii) the level of kinematic that represents the actions required to achieve the movement goal, and (iv) the level of muscle that coordinates the activation of muscles to produce the movement goal. Moreover, in DeWolf and Eliasmith (
2011) authors have presented the Neural Optimal Control Hierarchy (NOCH), proposed as a framework for biologically plausible models of neural motor control. The simulation of the NOCH framework has suggested that the integration of control theory with the basic anatomical elements and functions of the motor system can be useful to have a unified account on a variety of motor system data. In our work for the implementation of the motor behaviors performed by the robot we were inspired by the ‘schema theory’ proposed in Arbib and Érdi (
1998), according to which complex human behavior are built through the hierarchical organization of the motor system within which reusable motor primitives can be re-organized into different motor sequences. For example, when we want to drink a cup of coffee we segment this complex action into a combination of low level primitives, like for example the reaching, grasping and bringing to the mouth of the cup. This theory has inspired many other studies on the hierarchical organization of the motor system. For example, in Mussa-Ivaldi and Bizzi (
2000) it has been suggested that low level motor primitives can be integrated and recombined in different action sequences in order to perform novel tasks. The authors have proposed that modular primitives are combined in the spinal cord in order to build the internal representation of a limb movement.
2.1 Embodied abstract language and hierarchical categories
The representation of abstract concepts poses a challenge for grounded theories of cognition. Different scholars have claimed that embodiment plays an important role even in representing abstract concepts; theories based on “simulations” (Barsalou
1999), “metaphors” (Lakoff and Johnson
1980) and “actions” (Glenberg and Kaschak
2002) have been presented. In Barsalou (
1999) it has been proposed that some abstract concepts arise from simulation processes of internal and external states. In particular, abstract concepts require to capture complex multi-modal simulations of temporally extended events, with simulations of introspections being central (Barsalou
1999); indeed, introspection gives access to subjective experiences linked to abstract concepts (Wiemer-Hastings et al.
2001). Considering that abstract concepts contain more information about introspection and events, simulators for abstract words develop to represent categories of internal experience (Barsalou
2009). Hence, according to this approach, abstract concepts, differently from concrete ones, require the activation of situations and introspections. Another theory proposed on the embodiment of abstract language revolves around the concept of “metaphor”. According to this approach, there are image-schemas derived from sensorimotor experience that can be transferred to experience which is not truly sensorimotor in nature (Lakoff and Johnson
1980). Human beings have an extensive knowledge about their bodies (e.g. eating) and situations (e.g. verticality) that they can use to metaphorically ground abstract concepts (Barsalou
2008); for example,
love can be understood as eating (e.g. “being consumed by a lover”), while an affective experience like
happy/
sad can be understood as verticality (e.g. “up/down”). The idea that embodiment plays an important role for representing abstract concepts has been supported by other scholars. For example according to Glenberg and Kaschak (
2002), sentences including both concrete and abstract words are understood by creating a simulation of the actions that underlie them. Indeed, abstract concepts containing motor information can be represented by using modal symbols. Moreover, through behavioral and neurophysiological studies it has been shown that the comprehension of abstract words activates the motor system (Glenberg et al.
2008). Hence, according to these studies, abstract concepts, similarly to concrete ones, can be grounded in perception and action.
However, other scholars have suggested that abstract concepts are only partially grounded in sensorimotor experience. Indeed, according to the theory proposed in Dove (
2011), although most concepts require two types of semantic representations [i.e. (i) based on perception and motor knowledge, and (ii) based on language], abstract concepts tend to depend more on linguistic representations. According to the Language and Situated Simulation (LASS) theory presented in Barsalou et al. (
2008), both the sensorimotor and linguistic systems are activated during language processing. However, concrete and abstract concepts activate different brain areas depending on their contents; moreover, according to the task to be performed (e.g. lexical decision vs. imagination task) there is a higher engagement of linguistic versus sensorimotor areas. For example, in lexical decision tasks using the linguistic system represents a shortcut as it allows to respond immediately without necessarily accessing the sensorimotor information used for conceptual meaning representation (Borghi et al.
2014). Other scholars have proposed the “Words As social Tools” (WAT) theory (Borghi and Binkofski
2014) that accounts how different kinds of abstract concepts and words (ACWs) are represented; words represent tools that permit to act in the social world. Indeed, the acquisition of ACWs relies more on language and on the contribution that other people can provide to the clarification of word meanings. In Kousta et al. (
2011) authors have claimed that words which refer to emotions should be categorized in a group distinct from concrete and abstract words. This proposal was motivated by the fact that concrete, abstract and emotion words received different ratings in term of concreteness, imageability and context availability.
Given the current debate in the field and the complexity of the matter, the representation of abstract concepts is increasingly proving to be an extremely complex task. Studies conducted on children’s early vocabulary acquisition (McGhee-Bidlack
1991) have shown that, when children learn to speak, they first learn concrete nouns (e.g. object’s name) and then the abstract ones (e.g. verbs). While concrete terms refer to tangible entities characterized by a direct mapping to perceptual-cognitive information, abstract words referring to many events, situations and bodily states (Barsalou
1999; Wiemer-Hastings and Xu
2005) have weaker perceptual-cognitive constraints with the physical world. Hence, during the process of word meaning acquisition, the mapping of perceptual-cognitive information related to concrete concepts into the linguistic domain occurs earlier than the mapping of perceptual-cognitive information related to abstract concepts. However, the transition from highly concrete concepts to the abstract ones is gradual; that is, the categorization of concrete and abstract terms cannot be simply regarded as a dichotomy (Wiemer-Hastings et al.
2001) but there is instead a continuum in the level of abstractness, according to which all words can be categorized. The most influential theories proposed on the learning and representation of categories/concepts are the Prototype Theory and the Exemplar Theory. According to the Prototype Theory, concepts are represented by characteristic features, which are weighted in the definition of prototypes used for judging the membership of other items to the same category (Rosch and Mervis
1975). According to the Exemplar Theory, a concept is represented by the exemplars of the categories (i.e. a set of instances of it) stored in the memory. A new item is classified as a member of a category if it is sufficiently similar to one of the stored exemplars in that category (Nosofsky et al.
1992). In the context of the Exemplar Theory, it has been proposed the instantiation principle (Heit and Barsalou
1996), according to which the representation of superordinate concepts evoke detailed information about its subordinate members (i.e. exemplars). In Murphy and Wisniewski (
1989) the authors conducted a categorization study that has shown that when an object is placed in an inappropriate scene, there is more interference for the identification of the exemplars of superordinate concepts than for basic level concepts. According to the classical theory of categorization, words can be organized in hierarchically structured categories (Gallese and Lakoff
2005) along which the level of abstraction can vary considerably. For example, in the hierarchy of categories “furniture/chair/rocking chair”, “furniture” is a superordinate word (i.e. generalization w.r.t. the concept related to the basic word “chair”) while “rocking chair” is a subordinate word (i.e. specialization w.r.t. the concept related to the basic word “chair”). In this framework, basic and subordinate words (e.g. “chair”, “rocking chair”), refer to single entities and they can be seen as more concrete words than the superordinate ones (e.g. “furniture”) which refer to sets of entities that differ in shape and other perceptual characteristics (Borghi et al.
2011). Moreover, categories like “furniture” that do not have corresponding motor programs for interacting with them, represent general and abstract concepts.
Among the different lexical categories (i.e. noun, verb, adjective, adverb, etc.), abstract action words represent a class of terms distant from immediate perception that describe actions (i.e. verbs) with a general meaning (e.g. USE, MAKE) and which can be referred to several events and situations (Barsalou
1999; Wiemer-Hastings et al.
2001). Therefore, they cannot be directly linked to sensorimotor experience through a one-to-one mapping with their physical referents in the world. For example, the meaning of words like USE and MAKE is general and it depends on the context in which they occur (Barsalou et al.
2003). In a scenario in which a person is interacting with a set of tools, the meaning of USE is specified by the particular tool employed during the interaction (e.g. USE [a] KNIFE, USE [a] BRUSH), while the meaning of MAKE depends on the outcome of interactions (e.g. MAKE [a] SLICE, MAKE [a] HOLE).
2.2 Goal of the study
In this work we present a model based on Recurrent Neural Networks (RNN) for the grounding of abstract action words (i.e. USE and MAKE) achieved through the hierarchical organization of words directly linked to perceptual and motor knowledge of a humanoid robot; indeed, building on our previous work (Cangelosi and Riga
2006; Stramandinoli et al.
2012) we attempt to extend the “grounding transfer mechanism” from sensorimotor experience to abstract concepts. Our proposal is that words that refer to objects and actions primitives can be grounded in sensorimotor experience, while abstract action words require linguistic information as well. Linguistic information permits to create the semantic referents of terms that cannot be directly mapped into their referents in the physical world (Stramandinoli et al.
2010,
2012; Stramandinoli
2014). The semantic referents of these words are formed by recalling and reusing the motor and perceptual knowledge directly grounded during previous experience of the robot with the environment. Words directly linked to sensorimotor experience, combined in hierarchical structures through language, permit the indirect grounding of abstract action words. We propose such a hierarchical organization of concepts as a possible account for the acquisition of abstract action words in cognitive robots.
The aim of this work is twofold. On the one hand, the robotic platform is enabled to ground the meaning of abstract action words and scaffold more complex behaviors through the sensorimotor interaction with the environment; on the other hand, the proposed model permits the investigation of the relation between perceptual and motor categories, and the development of conceptual knowledge in a humanoid robot.