Children use non-verbal cues to learn new words from robots as well as people

doi:10.1016/j.ijcci.2017.04.001

International Journal of Child-Computer Interaction

Volume 13, July 2017, Pages 1-9

https://doi.org/10.1016/j.ijcci.2017.04.001 Get rights and content

Highlights

•
Children followed a robot’s gaze and bodily orientation to determine what entity the robot was referring to.
•
Children recalled new words equally well whether acquired from a robot or from a human.
•
The distinctiveness of the nonverbal cues of both robot and human constrained children’s acquisition of new words.

Abstract

Social robots are innovative new technologies that have considerable potential to support children’s education as tutors and learning companions. Given this potential, it behooves us to study the mechanisms by which children learn from social robots, as well as the similarities and differences between children’s learning from robots as compared to human partners. In the present study, we examined whether young children will attend to the same nonverbal social cues from a robot as from a human partner during a word learning task, specifically gaze and bodily orientation to an unfamiliar referent. Thirty-six children viewed images of unfamiliar animals with a human and with a robot. The interlocutor (human or robot) oriented toward, and provided names for, some of the animals, and children were given a posttest to assess their recall of the names. We found that children performed equally well on the recall test whether they had been provided with names by the robot or by the human. Moreover, in each case, their performance was constrained by the spatial distinctiveness of nonverbal orientation cues available to determine which animal was being referred to during naming.

Introduction

Social robots are innovative new technologies that have considerable potential to support children’s education as tutors and learning companions. Social robots share physical spaces with us and leverage our means of communication–e.g., speech, gestures, gaze, and facial expressions–in order to interact with us in more natural, intuitive ways. They have the potential to combine the general benefits of technology–such as scalability, customization and the easy addition of new content, and student-paced, adaptive software–with the embodied, social world. Prior research has shown that young children will not only treat social robots as companions and guides [1], [2], [3], but will also readily learn new information from them [4], [5], [6], [7], [8].

Given this potential, it behooves us to study the mechanisms by which children learn from social robots, as well as the similarities and differences between children’s learning from robots as compared to human partners. Some existing work investigates these differences. Kennedy et al. [5] examined the effects of a human tutor versus a humanoid robot tutor on learning prime number categorization with children aged 8–9 years. With both tutors, children’s scores on the math task improved from pretest to posttest, but the human led to a greater effect size than the robot. Serholt et al. [9] compared the attitudes, success rate, and help-asking behaviors of children aged 11–15 years during a LEGO construction with either a humanoid robotic tutor or a human tutor. They found that while children with either tutor successfully completed the task, children were more likely to ask the human tutor for help, but were more eager to perform well with the robot tutor. These studies suggest that there may be important differences in how children treat human and robot tutors. However, both these studies were performed with older children. There is growing interest in developing social robots as tutors and learning companions for younger children aged 3–6 years (e.g., [4], [10], [11], [6], [7]). How might these children respond? Furthermore, Kennedy et al. [5] points out that they did not constrain the human’s social behavior. For some kinds of learning tasks such as language learning, social cues may be very important [12], [13]. How do social cues impact children’s learning from robots versus from humans? Do children respond to social cues from humans and robots in the same way?

In the present study, we examined whether young children will attend to the same social cues from a robot as from a human partner during a word learning task, specifically gaze and bodily orientation toward a novel referent.

Infants and young children are adroit at following another person’s gaze and that capacity makes an important contribution to early social cognition. For example, gaze following helps infants and young children to determine what object or event has triggered another’s emotion [14], [15]. Gaze following can also provide information about the goal of an agent’s ongoing action [16]. In addition, following a speaker’s gaze can provide information about his or her intended referent, facilitating the task of word learning. Baldwin [17], [18] demonstrated the key role of gaze following for language learning in a series of experiments with infants of 19–20 months. When infants heard a novel label, they did not immediately associate it with the object that they were concurrently looking at or exploring. Instead, by following the speaker’s line of regard, they were able to determine what object the speaker was attending to and to link the novel name provided by the speaker with that referent. Recent findings have also shown that infants more readily associate names with novel objects if the speaker’s gaze is directed to an object that is presented in a distinctive and consistent spatial locus [19]. By implication, infants treat a speaker’s gaze direction as a major index of the particular target that is being named by the speaker within a shared space.

Granted the early importance of gaze following in human social interaction, investigators have begun to examine whether, and under what conditions, young children will follow a robot’s gaze. Meltzoff, Brooks, Shon and Rao [20] presented 18-month-old infants with a humanoid robot (HOAP-2, manufactured by Fujitsu Laboratories, Japan) that behaved in one of four ways. Infants in the social interaction group observed the robot as it interacted with an adult experimenter. In the course of the interaction, the robot answered the adult’s questions and the two parties engaged in mutual imitation. By contrast, infants in the three other groups observed an interaction in which the impression of contingent, two-way communication between robot and adult was eliminated, because the adult remained stationary (passive adult group) or because the robot remained stationary (passive robot group) or because the gestures and utterances of the two parties were not aligned with each other (robot–adult mismatchgroup).

Following this observation period, infants’ tendency to follow the gaze and bodily orientation of the robot was assessed. As they faced one another, the robot turned through 45° to look at an object located on either the left or right side of the infant. Infants who had observed the robot engage in social interaction with the adult were likely to shift their gaze to match the target that the robot was looking at, whereas the other three groups responded unsystematically. By implication, having observed the robot’s capacity for contingent, social interaction, infants construed the robot as a partner or informant whose gaze signaled targets that were worth looking at.

Granted that infants can and do follow a robot’s gaze, it is plausible to ask whether young children will make use of a robot’s line of regard when learning new words, as they do with human partners. More specifically, when a robot introduces a name, are children able to use line of regard to determine which particular object the robot is naming and thereby learn the name of the object? To begin to answer this question, O’Connell et al. [21] presented 18-month-old infants with two learning trials involving pairs of novel objects. Infants heard a robot offer a name for one of the paired objects. In the coordinated labeling condition, the robot uttered a novel label only when both the infant and the robot were focused on the same novel object whereas in the discrepant labeling condition, the robot uttered a novel label when focusing on a different object from the infant. Infants were subsequently tested to check if they had associated the name with the appropriate object, notably the object that the robot had focused on. They were shown the two novel objects and asked a comprehension question (e.g., “Where is the dax?”).

Analysis of infants’ attention during object naming indicated that they adjusted their gaze appropriately depending on the gaze direction of the robot. Thus, in the discrepant labeling condition, infants were prone to shift their gaze so as to focus on the same toy as the robot, a coordination that was present by default in the coordinated labeling condition. Nevertheless, infants performed at chance in the comprehension test following both conditions. By contrast, in a follow-up study, in which a human rather than a robot served as the speaker, infants not only adjusted their gaze, they also performed well in the comprehension test. Finally, in a third study, infants were re-tested with a robot but before proceeding to the word learning phase, they were given an opportunity to watch a 60-s interaction in which the robot’s utterances and movements were contingent on the immediately preceding behavior of an adult. Despite this opportunity, infants continued to perform at chance in the comprehension test. Accordingly, O’Connell et al. [21] speculated that despite their tendency to follow the robot’s gaze, infants did not think of the robot a reliable or conventional speaker from whom it is appropriate to learn new words.

Two aspects of the study by O’Connell et al. [21] may have led infants to fail to learn new names from the robot. First, it is unclear whether the infants perceived the robot as an interlocutor with whom they could interact. During the familiarization phase, infants had only a brief opportunity to observe the robot communicate with an adult. It moved independently (turned its head) and vocalized (said “hello” and “ooh”). However, this may not have been sufficient for infants to regard the robot as a speaker from whom they could acquire language. Prior research suggests that a speaker’s contingent responding to the learner appears to play a key role in early language acquisition. For example, Kuhl [12] found that although infants will readily learn to differentiate new phonemes when they are presented by a live and contingent interlocutor, they fail to do so if they simply observe a video of the same interlocutor engaged in a conversation that is not directed at them. A second concern with the study conducted by O’Connell et al. [21] is that they tested 18-month-olds. In a series of studies, Horst and Samuelson [22] showed that, even at 24 months, infants can use a speaker’s gaze to map a novel name onto the appropriate referent but display poor retention of that name on subsequent retention tests.

Accordingly, in the study to be reported, we made two changes aimed at giving the robot the best opportunity to serve as a teacher of language for young children. First, guided by previous research, we tested older children. Second, we sought to ensure that the robot would be perceived as a contingently responsive interlocutor for both the child and the experimenter in the context of an initial three-way conversation. We describe these two changes in more detail below.

In prior work, we investigated whether children aged 4–6 years displayed fast mapping when interacting with a robot or an adult [23]. The study was modeled on a procedure adopted by Markson and Bloom [24] in which preschoolers displayed stable retention. The procedure lends itself easily to either child–caregiver interaction or child–robot interaction. Children viewed a series of ten pictures of unusual animals, one picture at a time, with each interlocutor. For eight of the ten pictures, the interlocutor commented positively but uninformatively (e.g., “Cool animal!”); for the other two pictures, the interlocutor provided the name of the animal shown (e.g., “Look, a binturong! See the binturong?”). In this word-learning task, children learned equally well from the robot and the adult. Thus, they performed similarly in an immediate comprehension test and a retention test one week later, showing that they did remember the animal names.

In the present study, we created a similar but more challenging task that required children to attend to their interlocutor’s gaze direction and bodily orientation in order to identify the referents of the new words, a task that more closely mirrors everyday language learning. Each child was shown multiple pairs of pictures depicting unfamiliar animals. For selected pairs, the child’s interlocutor named one of the two animals. We asked if children would use the robot’s gaze and bodily orientation, just as they do with humans, to identify which of the two animals was being named.

To increase the likelihood that children would regard the robot as an interlocutor from whom they could learn new names, the robot engaged in a brief conversation before name learning began. First, the experimenter spoke directly to the robot to introduce the child to the robot—thereby showing the child that the robot was an interaction partner. The robot then introduced itself to the child. The experimenter asked if the child and robot were ready to look at some animals, to which the robot expressed interest. The robot then invited the child to look at the pictures. The same procedure was followed when the adult female was the interlocutor to maintain consistency across conditions.

The study was also designed to find out how spatially distinctive the nonverbal cues had to be in order for children to identify which referent the interlocutor intended. Half the children were presented with pairs of animal pictures that were side-by-side whereas the other half were presented with pairs of animal pictures that were further apart. This enabled us to assess whether the ease with which children could differentiate the target of their interlocutor’s naming would affect their learning from both a robot and a human. If children were to display a similar pattern of variation–depending on the spatial distinctiveness of the non-verbal cues–when learning from each interlocutor, this would strengthen the claim that children rely on such cues when learning new vocabulary whether from a robot or a human.

Section snippets

Participants

Thirty-six children aged 2–5 years (22 female, 14 male) from two Boston-area, English-language preschools serving predominantly middle-class neighborhoods participated in the study (17 children from one school; 19 from the other). One 4-year-old girl and one three-year-old boy were removed from analysis because they did not complete the study. The children in the final sample included 17 children from each school aged 2–5 years (21 female, 13 male), with a mean age of 3.69 years ( $SD = 0.908$ ). Of

Recall

Fig. 2 shows the mean number ( $maximum = 6$ ) of correct animal responses, near miss responses (i.e., the animal that had appeared together with the named animal on learning trials), and incorrect animal responses as a function of condition. Overall, children produced a mean of 2.58 correct animal responses (43.0% correct, $SD = 1.58$ ). They chose the near miss animal for a mean of 1.70 (28.3% of the time, $SD = 1.10$ ), and an incorrect animal for a mean of 1.61 (26.8%, $SD = 1.20$ ).

A two-way ANOVA with

Discussion

We asked whether preschool children ranging from 2 to 5 years would treat the robot like a human interlocutor by attending to its social cues, specifically its gaze direction and bodily orientation, when learning the referent of a new word. The pattern of results that was obtained underlines important parallels between children’s learning from a robot as compared to a human interlocutor. Moreover, the pattern of results was stable across the age range tested. Two findings are especially

Conclusion

Given the potential of social robots as tutors and learning companions for children, it is important to explore the mechanisms by which children learn from robots, and how learning with this kind of social technology compares to learning with a human partner. In this study, we examined whether children will attend to the same social cues–specifically gaze direction and bodily orientation–from a robot as from a human partner during a word-learning task. Our results confirm that children are able

Acknowledgments

This research was supported by the National Science Foundation (NSF) under Grant 122886 and Graduate Research Fellowship Grant No. 1122374. Any opinions, findings and conclusions, or recommendations expressed in this paper are those of the authors and do not represent the views of the NSF.

References (33)

A.T. Phillips et al.
Infants’ ability to connect gaze and emotional expression to intentional action
Cognition
(2002)
A.N. Meltzoff et al.
Social robots are psychological agents for infants: A test of gaze following
Neural Netw.
(2010)
D. Willoughby et al.
Do ABC eBooks boost engagement and learning in preschoolers? An experimental study comparing eBooks with paper ABC and storybook controls
Comput. Educ.
(2015)
T. Belpaeme et al.
Multimodal child-robot interaction: Building social bonds
J. Hum.-Robot Interact.
(2012)
P.H. Kahn et al.
Robovie, you’ll have to go into the closet now: Children’s social and moral relationships with a humanoid robot
Dev. Psychol.
(2012)
M. Shiomi et al.
Can a social robot stimulate science curiosity in classrooms?
Int. J. Soc. Robot.
(2015)
C. Breazeal et al.
Young children treat robots as informants
Top. Cogn. Sci.
(2016)
J. Kennedy, P. Baxter, E. Senft, T. Belpaeme, Heart vs hard drive: Children learn more from a human tutor than a social...
J. Movellan et al.
Sociable robot improves toddler vocabulary skills
F. Tanaka et al.
Children teach a care-receiving robot to promote their learning: Field experiments in a classroom for vocabulary learning
J. Hum.-Robot Interact.
(2012)

V. Vouloutsi et al.

Towards a synthetic tutor assistant: The EASEL project and its architecture

S. Serholt, C.A. Basedow, W. Barendregt, M. Obaid, Comparing a humanoid tutor to a human tutor delivering an...

J. Kory

Storytelling with robots: Effects of robot language level on children’s language learning

(2014)

J. Kory Westlund et al.

The interplay of robot language level with children’s language learning during storytelling

P.K. Kuhl

Is speech learning “gated” by the social brain?

Dev. Sci.

(2007)

P.K. Kuhl

Social mechanisms in early language acquisition: Understanding integrated brain systems supporting language

Cited by (50)

Fast mapping in word-learning: A case study on the humanoid social robots' impacts on Children's performance
2023, International Journal of Child-Computer Interaction
The evolution of robotic technology has heightened the demands for robots capable of social interaction with humans, also known as social robots. The encouraging prospects of deploying this advanced assistive tool in entertainment, healthcare, and education have, in turn, promoted substantial growth in studies conducted on the human-robot interaction (HRI) topic. One hot HRI subject is the examination of social robots' efficacy within various educational scenarios, e.g., word learning. In the context of word learning, fast mapping (FM) is a well-established technique through which novel words are taught by contrasting them with previously known words. The concept of this approach originates from how infants comprehend new vocabularies' referents. In this study, we investigated the potential benefits of utilizing social robots during educational interventions, based on the FM strategy, by means of two measures, alternative forced-choice (AFC) test and gaze data analysis. The first metric explores participants' retention after a 10-min and a one-week delay, while the second is concerned with scrutinizing their gaze focus during training sessions. Our findings indicate that the utility of social robots during FM interventions could enhance children's engagement, attention, and retention of novel words.
Assessment of learning in child–computer interaction research: A semi-systematic literature review
2023, International Journal of Child-Computer Interaction
In this paper, we investigate and map out how learning is assessed in Child–computer interaction (CCI) research. We have conducted a semi-systematic literature review in the CCI community’s leading venues: the Interaction Design and Children (IDC) conference and the International Journal of Child–Computer Interaction (IJCCI). This eventually led to 30 publications that use the word stem ‘learn*’ in title, abstract and keywords being included in the corpus. Based on our analysis of these publications, the results demonstrate that there are three main strands of research approaches, namely quantitative, qualitative and mixed-methods, some of which are design-based. The case studies taking a qualitative approach dominate the field whereas the mixed-methods approach remains low in number. Furthermore, the findings showed that basic characteristics of research design and approaches to the assessment of learning are rarely defined, and that assessment of learning is scarcely operationalized. This affects the methodological rigor and possibility of understanding causality of technology interaction in children’s learning. It was also found that only a limited number of works include assessment of learning regarding transfer of learning and controlled groups. The main findings from this review describe the current state-of-the art and address the gaps in CCI research in presenting evidence for learning in children as a desired impact. We conclude with suggestions for future avenues for the assessment of learning in CCI.
Infants’ behaviours elicit different verbal, nonverbal, and multimodal responses from caregivers during early play
2023, Infant Behavior and Development
Caregivers use a range of verbal and nonverbal behaviours when responding to their infants. Previous studies have typically focused on the role of the caregiver in providing verbal responses, while communication is inherently multimodal (involving audio and visual information) and bidirectional (exchange of information between infant and caregiver). In this paper, we present a comprehensive study of caregivers’ verbal, nonverbal, and multimodal responses to 10-month-old infants’ vocalisations and gestures during free play. A new coding scheme was used to annotate 2036 infant vocalisations and gestures of which 87.1 % received a caregiver response. Most caregiver responses were verbal, but 39.7 % of all responses were multimodal. We also examined whether different infant behaviours elicited different responses from caregivers. Infant bimodal (i.e., vocal-gestural combination) behaviours elicited high rates of verbal responses and high rates of multimodal responses, while infant gestures elicited high rates of nonverbal responses. We also found that the types of verbal and nonverbal responses differed as a function of infant behaviour. The results indicate that infants influence the rates and types of responses they receive from caregivers. When examining caregiver-child interactions, analysing caregivers’ verbal responses alone undermines the multimodal richness and bidirectionality of early communication.
You, robot? The role of anthropomorphic emotion attributions in children's sharing with a robot
2021, International Journal of Child-Computer Interaction
Citation Excerpt :
While the development of children’s sharing behavior has received extensive attention in the literature (for reviews see Kuhlmeier, Dunfield, & O‘Neill, 2014; Martin & Olson, 2015), one contemporary aspect has not been acknowledged until now: children growing up today are not just interacting with other children and adults, they are also faced with a multitude of technological and digital agents, including robots. Robots are starting to appear in children’s daily lives as household tools, toys, and educational assistants (e.g., Fridin, 2014; Kory Westlund et al., 2017; Yu & Roque, 2019) and it is predicted that their presence at home and in the classroom will increase rapidly in the next years (International Federation of Robotics, 2019). As children will start to encounter robots more often, this leads to the important question: How do children treat robots in social contexts and do they emotionally connect to robots?
Sharing helps children form and maintain relationships with other children. Yet, children born today interact not only with other children, but increasingly with robots as well. Little is known on whether and how children treat robots as recipients of prosocial acts. We thus investigated children’s sharing behavior towards robots. Specifically, we assessed the effect of anthropomorphic appearance and affective state attributions. Children (4–9 years old; n = 120) were introduced to robots that varied in the extent to which they looked human-like. Children’s perceptions of the robots’ affective states were manipulated by explicitly demonstrating one robot as having feelings and the other one not. Subsequently, children’s sharing behavior towards and feelings about sharing with these robots were measured. Results indicate that there was no effect of anthropomorphic appearance on sharing behavior. However, importantly, children in both age groups shared more resources with a robot that they attributed with affective states, and expressed more positive emotional judgments about sharing with that robot as well. An exploratory mediation analysis further revealed that children’s positive feelings about sharing guided their actual sharing behavior with robots. In sum, children show more pro-social behavior when they believe a robot can feel.
The Qestion Is Not Whether; It Is How!
2024, ACM/IEEE International Conference on Human-Robot Interaction
The design of technology-enhanced vocabulary learning: A systematic review
2024, Education and Information Technologies

View all citing articles on Scopus

View full text

Children use non-verbal cues to learn new words from robots as well as people

Highlights

Abstract

Introduction

Section snippets

Participants

Recall

Discussion

Conclusion

Acknowledgments

Cognition

Neural Netw.

Comput. Educ.

Multimodal child-robot interaction: Building social bonds

J. Hum.-Robot Interact.

Robovie, you’ll have to go into the closet now: Children’s social and moral relationships with a humanoid robot

Dev. Psychol.

Can a social robot stimulate science curiosity in classrooms?

Int. J. Soc. Robot.

Young children treat robots as informants

Top. Cogn. Sci.

Sociable robot improves toddler vocabulary skills

Children teach a care-receiving robot to promote their learning: Field experiments in a classroom for vocabulary learning