Children use non-verbal cues to learn new words from robots as well as people
Introduction
Social robots are innovative new technologies that have considerable potential to support children’s education as tutors and learning companions. Social robots share physical spaces with us and leverage our means of communication–e.g., speech, gestures, gaze, and facial expressions–in order to interact with us in more natural, intuitive ways. They have the potential to combine the general benefits of technology–such as scalability, customization and the easy addition of new content, and student-paced, adaptive software–with the embodied, social world. Prior research has shown that young children will not only treat social robots as companions and guides [1], [2], [3], but will also readily learn new information from them [4], [5], [6], [7], [8].
Given this potential, it behooves us to study the mechanisms by which children learn from social robots, as well as the similarities and differences between children’s learning from robots as compared to human partners. Some existing work investigates these differences. Kennedy et al. [5] examined the effects of a human tutor versus a humanoid robot tutor on learning prime number categorization with children aged 8–9 years. With both tutors, children’s scores on the math task improved from pretest to posttest, but the human led to a greater effect size than the robot. Serholt et al. [9] compared the attitudes, success rate, and help-asking behaviors of children aged 11–15 years during a LEGO construction with either a humanoid robotic tutor or a human tutor. They found that while children with either tutor successfully completed the task, children were more likely to ask the human tutor for help, but were more eager to perform well with the robot tutor. These studies suggest that there may be important differences in how children treat human and robot tutors. However, both these studies were performed with older children. There is growing interest in developing social robots as tutors and learning companions for younger children aged 3–6 years (e.g., [4], [10], [11], [6], [7]). How might these children respond? Furthermore, Kennedy et al. [5] points out that they did not constrain the human’s social behavior. For some kinds of learning tasks such as language learning, social cues may be very important [12], [13]. How do social cues impact children’s learning from robots versus from humans? Do children respond to social cues from humans and robots in the same way?
In the present study, we examined whether young children will attend to the same social cues from a robot as from a human partner during a word learning task, specifically gaze and bodily orientation toward a novel referent.
Infants and young children are adroit at following another person’s gaze and that capacity makes an important contribution to early social cognition. For example, gaze following helps infants and young children to determine what object or event has triggered another’s emotion [14], [15]. Gaze following can also provide information about the goal of an agent’s ongoing action [16]. In addition, following a speaker’s gaze can provide information about his or her intended referent, facilitating the task of word learning. Baldwin [17], [18] demonstrated the key role of gaze following for language learning in a series of experiments with infants of 19–20 months. When infants heard a novel label, they did not immediately associate it with the object that they were concurrently looking at or exploring. Instead, by following the speaker’s line of regard, they were able to determine what object the speaker was attending to and to link the novel name provided by the speaker with that referent. Recent findings have also shown that infants more readily associate names with novel objects if the speaker’s gaze is directed to an object that is presented in a distinctive and consistent spatial locus [19]. By implication, infants treat a speaker’s gaze direction as a major index of the particular target that is being named by the speaker within a shared space.
Granted the early importance of gaze following in human social interaction, investigators have begun to examine whether, and under what conditions, young children will follow a robot’s gaze. Meltzoff, Brooks, Shon and Rao [20] presented 18-month-old infants with a humanoid robot (HOAP-2, manufactured by Fujitsu Laboratories, Japan) that behaved in one of four ways. Infants in the social interaction group observed the robot as it interacted with an adult experimenter. In the course of the interaction, the robot answered the adult’s questions and the two parties engaged in mutual imitation. By contrast, infants in the three other groups observed an interaction in which the impression of contingent, two-way communication between robot and adult was eliminated, because the adult remained stationary (passive adult group) or because the robot remained stationary (passive robot group) or because the gestures and utterances of the two parties were not aligned with each other (robot–adult mismatchgroup).
Following this observation period, infants’ tendency to follow the gaze and bodily orientation of the robot was assessed. As they faced one another, the robot turned through 45° to look at an object located on either the left or right side of the infant. Infants who had observed the robot engage in social interaction with the adult were likely to shift their gaze to match the target that the robot was looking at, whereas the other three groups responded unsystematically. By implication, having observed the robot’s capacity for contingent, social interaction, infants construed the robot as a partner or informant whose gaze signaled targets that were worth looking at.
Granted that infants can and do follow a robot’s gaze, it is plausible to ask whether young children will make use of a robot’s line of regard when learning new words, as they do with human partners. More specifically, when a robot introduces a name, are children able to use line of regard to determine which particular object the robot is naming and thereby learn the name of the object? To begin to answer this question, O’Connell et al. [21] presented 18-month-old infants with two learning trials involving pairs of novel objects. Infants heard a robot offer a name for one of the paired objects. In the coordinated labeling condition, the robot uttered a novel label only when both the infant and the robot were focused on the same novel object whereas in the discrepant labeling condition, the robot uttered a novel label when focusing on a different object from the infant. Infants were subsequently tested to check if they had associated the name with the appropriate object, notably the object that the robot had focused on. They were shown the two novel objects and asked a comprehension question (e.g., “Where is the dax?”).
Analysis of infants’ attention during object naming indicated that they adjusted their gaze appropriately depending on the gaze direction of the robot. Thus, in the discrepant labeling condition, infants were prone to shift their gaze so as to focus on the same toy as the robot, a coordination that was present by default in the coordinated labeling condition. Nevertheless, infants performed at chance in the comprehension test following both conditions. By contrast, in a follow-up study, in which a human rather than a robot served as the speaker, infants not only adjusted their gaze, they also performed well in the comprehension test. Finally, in a third study, infants were re-tested with a robot but before proceeding to the word learning phase, they were given an opportunity to watch a 60-s interaction in which the robot’s utterances and movements were contingent on the immediately preceding behavior of an adult. Despite this opportunity, infants continued to perform at chance in the comprehension test. Accordingly, O’Connell et al. [21] speculated that despite their tendency to follow the robot’s gaze, infants did not think of the robot a reliable or conventional speaker from whom it is appropriate to learn new words.
Two aspects of the study by O’Connell et al. [21] may have led infants to fail to learn new names from the robot. First, it is unclear whether the infants perceived the robot as an interlocutor with whom they could interact. During the familiarization phase, infants had only a brief opportunity to observe the robot communicate with an adult. It moved independently (turned its head) and vocalized (said “hello” and “ooh”). However, this may not have been sufficient for infants to regard the robot as a speaker from whom they could acquire language. Prior research suggests that a speaker’s contingent responding to the learner appears to play a key role in early language acquisition. For example, Kuhl [12] found that although infants will readily learn to differentiate new phonemes when they are presented by a live and contingent interlocutor, they fail to do so if they simply observe a video of the same interlocutor engaged in a conversation that is not directed at them. A second concern with the study conducted by O’Connell et al. [21] is that they tested 18-month-olds. In a series of studies, Horst and Samuelson [22] showed that, even at 24 months, infants can use a speaker’s gaze to map a novel name onto the appropriate referent but display poor retention of that name on subsequent retention tests.
Accordingly, in the study to be reported, we made two changes aimed at giving the robot the best opportunity to serve as a teacher of language for young children. First, guided by previous research, we tested older children. Second, we sought to ensure that the robot would be perceived as a contingently responsive interlocutor for both the child and the experimenter in the context of an initial three-way conversation. We describe these two changes in more detail below.
In prior work, we investigated whether children aged 4–6 years displayed fast mapping when interacting with a robot or an adult [23]. The study was modeled on a procedure adopted by Markson and Bloom [24] in which preschoolers displayed stable retention. The procedure lends itself easily to either child–caregiver interaction or child–robot interaction. Children viewed a series of ten pictures of unusual animals, one picture at a time, with each interlocutor. For eight of the ten pictures, the interlocutor commented positively but uninformatively (e.g., “Cool animal!”); for the other two pictures, the interlocutor provided the name of the animal shown (e.g., “Look, a binturong! See the binturong?”). In this word-learning task, children learned equally well from the robot and the adult. Thus, they performed similarly in an immediate comprehension test and a retention test one week later, showing that they did remember the animal names.
In the present study, we created a similar but more challenging task that required children to attend to their interlocutor’s gaze direction and bodily orientation in order to identify the referents of the new words, a task that more closely mirrors everyday language learning. Each child was shown multiple pairs of pictures depicting unfamiliar animals. For selected pairs, the child’s interlocutor named one of the two animals. We asked if children would use the robot’s gaze and bodily orientation, just as they do with humans, to identify which of the two animals was being named.
To increase the likelihood that children would regard the robot as an interlocutor from whom they could learn new names, the robot engaged in a brief conversation before name learning began. First, the experimenter spoke directly to the robot to introduce the child to the robot—thereby showing the child that the robot was an interaction partner. The robot then introduced itself to the child. The experimenter asked if the child and robot were ready to look at some animals, to which the robot expressed interest. The robot then invited the child to look at the pictures. The same procedure was followed when the adult female was the interlocutor to maintain consistency across conditions.
The study was also designed to find out how spatially distinctive the nonverbal cues had to be in order for children to identify which referent the interlocutor intended. Half the children were presented with pairs of animal pictures that were side-by-side whereas the other half were presented with pairs of animal pictures that were further apart. This enabled us to assess whether the ease with which children could differentiate the target of their interlocutor’s naming would affect their learning from both a robot and a human. If children were to display a similar pattern of variation–depending on the spatial distinctiveness of the non-verbal cues–when learning from each interlocutor, this would strengthen the claim that children rely on such cues when learning new vocabulary whether from a robot or a human.
Section snippets
Participants
Thirty-six children aged 2–5 years (22 female, 14 male) from two Boston-area, English-language preschools serving predominantly middle-class neighborhoods participated in the study (17 children from one school; 19 from the other). One 4-year-old girl and one three-year-old boy were removed from analysis because they did not complete the study. The children in the final sample included 17 children from each school aged 2–5 years (21 female, 13 male), with a mean age of 3.69 years (). Of
Recall
Fig. 2 shows the mean number () of correct animal responses, near miss responses (i.e., the animal that had appeared together with the named animal on learning trials), and incorrect animal responses as a function of condition. Overall, children produced a mean of 2.58 correct animal responses (43.0% correct, ). They chose the near miss animal for a mean of 1.70 (28.3% of the time, ), and an incorrect animal for a mean of 1.61 (26.8%, ).
A two-way ANOVA with
Discussion
We asked whether preschool children ranging from 2 to 5 years would treat the robot like a human interlocutor by attending to its social cues, specifically its gaze direction and bodily orientation, when learning the referent of a new word. The pattern of results that was obtained underlines important parallels between children’s learning from a robot as compared to a human interlocutor. Moreover, the pattern of results was stable across the age range tested. Two findings are especially
Conclusion
Given the potential of social robots as tutors and learning companions for children, it is important to explore the mechanisms by which children learn from robots, and how learning with this kind of social technology compares to learning with a human partner. In this study, we examined whether children will attend to the same social cues–specifically gaze direction and bodily orientation–from a robot as from a human partner during a word-learning task. Our results confirm that children are able
Acknowledgments
This research was supported by the National Science Foundation (NSF) under Grant 122886 and Graduate Research Fellowship Grant No. 1122374. Any opinions, findings and conclusions, or recommendations expressed in this paper are those of the authors and do not represent the views of the NSF.
References (33)
- et al.
Infants’ ability to connect gaze and emotional expression to intentional action
Cognition
(2002) - et al.
Social robots are psychological agents for infants: A test of gaze following
Neural Netw.
(2010) - et al.
Do ABC eBooks boost engagement and learning in preschoolers? An experimental study comparing eBooks with paper ABC and storybook controls
Comput. Educ.
(2015) - et al.
Multimodal child-robot interaction: Building social bonds
J. Hum.-Robot Interact.
(2012) - et al.
Robovie, you’ll have to go into the closet now: Children’s social and moral relationships with a humanoid robot
Dev. Psychol.
(2012) - et al.
Can a social robot stimulate science curiosity in classrooms?
Int. J. Soc. Robot.
(2015) - et al.
Young children treat robots as informants
Top. Cogn. Sci.
(2016) - J. Kennedy, P. Baxter, E. Senft, T. Belpaeme, Heart vs hard drive: Children learn more from a human tutor than a social...
- et al.
Sociable robot improves toddler vocabulary skills
- et al.
Children teach a care-receiving robot to promote their learning: Field experiments in a classroom for vocabulary learning
J. Hum.-Robot Interact.
(2012)
Towards a synthetic tutor assistant: The EASEL project and its architecture
Storytelling with robots: Effects of robot language level on children’s language learning
The interplay of robot language level with children’s language learning during storytelling
Is speech learning “gated” by the social brain?
Dev. Sci.
Social mechanisms in early language acquisition: Understanding integrated brain systems supporting language
Cited by (50)
Fast mapping in word-learning: A case study on the humanoid social robots' impacts on Children's performance
2023, International Journal of Child-Computer InteractionAssessment of learning in child–computer interaction research: A semi-systematic literature review
2023, International Journal of Child-Computer InteractionInfants’ behaviours elicit different verbal, nonverbal, and multimodal responses from caregivers during early play
2023, Infant Behavior and DevelopmentYou, robot? The role of anthropomorphic emotion attributions in children's sharing with a robot
2021, International Journal of Child-Computer InteractionCitation Excerpt :While the development of children’s sharing behavior has received extensive attention in the literature (for reviews see Kuhlmeier, Dunfield, & O‘Neill, 2014; Martin & Olson, 2015), one contemporary aspect has not been acknowledged until now: children growing up today are not just interacting with other children and adults, they are also faced with a multitude of technological and digital agents, including robots. Robots are starting to appear in children’s daily lives as household tools, toys, and educational assistants (e.g., Fridin, 2014; Kory Westlund et al., 2017; Yu & Roque, 2019) and it is predicted that their presence at home and in the classroom will increase rapidly in the next years (International Federation of Robotics, 2019). As children will start to encounter robots more often, this leads to the important question: How do children treat robots in social contexts and do they emotionally connect to robots?
The Qestion Is Not Whether; It Is How!
2024, ACM/IEEE International Conference on Human-Robot InteractionThe design of technology-enhanced vocabulary learning: A systematic review
2024, Education and Information Technologies