5.1 Implication
5.1.1 Error
The results indicate that learning with a RALL system can colorredreduce speaking errors more than that when learning with a human tutor. First, the lexical and grammatical error rates were not significantly different between the human and RALL system groups on the pre-test. The graph (Fig.
6a) also shows that the error rates in the pre-test of these groups are the same. Second, for both groups, the means were significantly lower on post-test A than on pre-test A, which indicates that both groups had reduced lexical/grammatical errors. However, in post-test A, the mean error rate of the RALL system group was significantly lower than that of the human tutor group. This indicates that the RALL system group reduced their errors more than that of the human tutor group. Further, the graph shows a greater decrease for the RALL system group than for the human tutor group.
The mean error rates of the RALL system group were significantly lower than those of the human tutor group in post-tests B and C. The effect sizes were 0.897 and 1.422 for post-tests B and C, respectively. We believe that the effect sizes were generally large; these results indicate that learning with a RALL system reduces errors of utterances more than that with human tutors in advanced tests.
5.1.2 Fluency
For fluency, we reached the same conclusion as for error. The results indicated that learning with a RALL system may improve speech fluency compared to learning with a human tutor. In the pre-test, there was no significant difference in the number of words uttered per second between the human and RALL system groups. The graph shows that the human tutor group was slightly more scattered than the RALL system group; however, the averages were almost identical. In both groups, the number of words uttered per second was significantly higher in post-test A than in the pre-test. Speech fluency improved in both tutor groups. However, in post-test A, the number of words uttered per second of the RALL system group was significantly higher than that of the human tutor group. This suggests that the RALL system group improved their fluency more than the human tutor group. Further, the graph shows that the RALL system group exhibited a greater increase in fluency than the human tutor group.
In post-tests B and C, there was no significant difference in the number of words uttered per second between the robot and human tutor groups. Thus, it cannot be concluded if learning with a RALL system improved speech fluency more than that when learning with a human tutor, even in advanced tests. However, the graph shows that the RALL system group tended to speak faster than the human tutor group in both post-tests B and C. We believe that these data suggest that RALL systems can improve speech fluency more than human tutors in applied situations.
5.1.3 Rhythm
There was no clear evidence of a difference in the outcomes between human and RALL systems regarding speaking rhythm. The results showed an interaction between the tutor factor and the pre-post factor; however, there were no significant differences between the groups in post-test A. Further, the graph shows that both groups exhibited similar levels of rhythmic goodness. The RALL system group showed a significant difference between pre-test and post-test A, whereas the human tutor group did not. However, these results should not be interpreted to indicate a larger effect on the RALL system group. This is because, at the time of the pre-test, the mean good rate of the RALL system group was significantly lower than that of the human tutor group. It is appropriate to interpret this as a greater growth potential for the RALL system group. In light of the above, it is difficult to conclude that there is a difference in outcomes between the human and RALL system groups.
However, some results suggest that RALL systems are likely to lead to higher outcomes. Although the results of post-tests B and C showed no significant differences between the human and RALL system groups, the graph shows consistent data that the RALL system group had a higher mean and smaller variance than the human tutor group. Considering that the RALL system group had a lower mean in the pre-test, this suggests that the RALL system group may have achieved higher learning outcomes than those of the human tutor group.
5.1.4 Pronunciation, Complexity, and Task Achievement
No differences were found in the outcomes of the human and RALL system groups regarding pronunciation, complexity, or task achievement. Therefore, we discuss the other characteristics observed in the graphs.
Pronunciation Some characteristics suggest that the RALL system group may be more effective than the human tutor group. For example, the human tutor group had greater variance in post-test A than in the pre-test and some participants decreased their score in post-test A. However, the RALL system group showed that the variance remained the same between pre-test and post-test A and most participants improved their scores.
Complexity Several interesting features were observed. First, in post-test A, the number of words per unit of AS of the RALL system group converged around a certain point. A similar phenomenon was observed in post-test B, in which the conversational situation was the same. We believe that this phenomenon may be attributed to most of the participants who used the expressions they had learned in the post-tests. The RALL system may be more effective than human tutors in fostering a steady output of learned expressions in the test. However, for some participants who produced somewhat complex sentences, RALL systems may have prevented the improvement of their speaking skills in terms of complexity. For example, one participant in the RALL system group uttered complex sentences in the pre-test but converged on sentences using learned expressions in post-test A. Two participants in the human tutor group uttered sentences with greater complexity in post-tests B and C. Lessons with human tutors may include chitchat and human tutors’ empirical judgment in teaching applied expressions. We speculate that these results may have led to complex utterances in post-tests B and C.
Task Achievement The scores of task achievement were improved in both groups; however, there was no effect in the tutor factor and no features that would indicate a difference between the two groups was found in the graphs. Note that the scores of task achievement were relatively high even in the pretest. To obtain stable scores for other items such as Lexical/Grammatical error and Fluency, the task difficulty must be set at a level that participants can respond to without much trouble. The fact that task achievement was high suggests that the difficulty level of the pretest and posttests were designed well.
5.1.5 Summary
The findings for each measurement item are summarized as follows.
-
Error rates improved in both learning with the robot and human tutors. Learning with the RALL system improved error rates more than that with the human tutor.
-
Fluency improved in both learning with the robot and human tutors. Learning with the RALL system improved fluency more than that with the human tutors.
-
Rhythm improved in both learning with the robot and human tutors. We could not say their outcomes are different. However, some data suggested that learning with the RALL system may be more effective than learning with the human tutors.
-
Pronunciation did not improve in both learning with the robot and human tutors. We found no differences in the outcomes between the robot and human tutors. However, learning with the RALL system tended to improve pronunciation more consistently than learning with human tutors.
-
Complexity improved in both learning with the robot and human tutors. We found no differences in the outcomes between the robot and human tutors. However, some data suggested that learning with human tutors may have been more effective in utilizing more complex utterances in advanced tests.
-
Task achievement improved in both learning with the robot and human tutors. We found no differences in the outcomes between the robot and human tutors. The raw data also did not reveal any notable characteristics.
5.2 Why did the RALL System Produce Better Outcomes of Error and Fluency than Human Tutors?
Exercises with the RALL system involved many repetitions of vocalizing expressions (especially shadowing). Such exercises encourage the consolidation of expressions practiced in memory. In the “Role play with two answer choices” and “Flashcards practice” exercises, participants repeated the same expressions. In addition, in the “Role play” exercise, the participants vocalized the expressions they had learned in the previous exercises from memory without looking at the sentences. We believe that basic training may have helped the participants retain the expressions in their memory and resulted in them recalling the exact expressions quickly during the post-tests.
However, learning with human tutors contained fewer repetitions of vocalization of expressions and fewer exercises to have participants commit the practiced expressions to memory than learning with the robot tutor. The repetition of vocal exercises and memorization of expressions comprise basic and individualized training. It is not cost effective to conduct such training with human tutors. Basic training can only be performed alone; however, practical communication can be achieved with only a human tutor. Therefore, learning with human tutors may have involved less basic training and more communication with participants. Although such communicative training improves error rates and fluency to some extent, it may not promote memory retention compared with the basic training provided by the robot tutor.
Furthermore, the participants’ tension and social anxiety may explain the difference in effectiveness between the RALL system and the human tutors. Participants in the human tutor group may have felt that they were constantly being evaluated in some way by the human tutors. For example, participants may have felt that the human tutors thought that their grammar was messed up or their pronunciation was bad. On the other hand, the participants in the RALL system group would not have felt such tension or social anxiety. This is because the robot instructors did not change their facial expressions, tone of voice, or other behaviors in response to the participants’ speech in any way. As a result, participants in the RALL system group were able to focus on speaking English, which may have contributed to lower vocabulary errors and increased fluency.
Based on the above discussion, we believe the RALL system may have been able to make the participants remember the learned expressions better than the human tutor.
5.3 Did the Better Error Rates and Fluency Outcomes Occur Because of RALL Systems?
The differences in the results cannot be simply reduced to differences in attributes inherent to the tutors, such as appearance and voice quality, but should be reduced to differences in their overall nature, including aspects of competence, such as what exercises they were able to provide.
It would be difficult for human tutors to perform the same exercises as RALL systems. The repeated practice of memorizing expressions is a boring exercise for both students and human tutors. Students may feel that it is a waste to assign a (costly) tutor to something they can do alone, and they and may feel uncomfortable about making intelligent human tutors go through boring exercises. Further, human tutors also want to engage in communicative exercises because they are proud of their interactive tutoring skills. Therefore, learning with human tutors did not motivate students or human tutors to perform the same exercises as that with the robot tutor.
Further, it is difficult for students to perform memory consolidation exercises alone. As discussed previously, this type of exercise requires patience. Students who train alone using smartphones or PC are tempted to stop halfway. According to the RALL studies, RALL systems can increase student compliance [
6]. In other words, it is likely to reduce the urge to stop the exercise midstream. In this study, participants were asked to study in the laboratory, and we were unable to test the effect of suppressing the urge to stop the exercises. We believe that the presence of the robot may contribute to strengthening the will to continue with the exercises.
Thus, we believe that the better error rates and fluency outcomes were likely brought about by exercises that promoted memory retention, and that such exercises worked well because of the robot tutor.
5.4 Application and Limitation
The extent to which the findings of this study can be applied is discussed in terms of language type, participant demographics, learning content, robot type, and AI technology.
Language This study dealt with English. The findings of this study are likely to be applied to speaking practices in other languages such as Chinese, French, and Spanish. The retention of basic phrases in memory through repeated utterances is basic training, regardless of language type. We believe that in other languages, learning with a RALL system would be more effective for basic training than learning with human tutors.
Participant Attributes This study employed university students whose native language is Japanese. The findings of this study are probably applicable to children, middle-aged people, and elderly people other than university students. The exercises conducted in this study were simple and could be practiced easily by both children and the elderly once they become familiar with them. However, the findings of this study are not applicable to people who can create complex sentences in English. Because these people would achieve high scores even before learning, it would be difficult to find differences in the outcomes of studying with each tutor.
Learning Materials In this study, we created learning materials that emphasized role play for speaking practice. This learning material may have had a considerable impact on the present results because it maximized the advantages of the RALL system over human tutors. If the learning materials were free-talk, the results for error rates and fluency might have been different. Therefore, the findings of this study are limited to the use of learning materials that emphasize role play, including the repeated practice of basic expressions.
Robot Type This study used a table-top robot called “CommU”. Because previous studies have not reported consistent findings regarding robot appearance and learning effectiveness, it is unclear whether other types of robots would produce results similar to those in this study. We speculate that lifelike robots such as Nao, Pepper, and Tega could produce results similar to those of this study. As one of the implications of this study is the effectiveness of repeated practice through role-play, it is important that RALL systems make students feel that they are monitoring the students and can behave as partners in role play.
Audio Variation Because it was difficult to match participants’ and human tutors’ schedules, participants received lessons from more than one human tutor; according to Barcroft et al. audio variation has a positive effect on second language vocabulary learning [
3]. In this regard, the participants in the human tutor group may have had a better effect than the participants in the RALL group, who only had two different voices (one for the robot and one for the tablet). In order to discuss such effects in depth, it is necessary to experiment with different types of human voices in the RALL system under controlled conditions.
Physical Presence In this experiment, physical presence could not be controlled between the RALL system and human tutor conditions. In the RALL system, the robot was in front of the participants, whereas the human tutor was online with a video display. HRI’s previous studies have shown that the physical body of a robot has positive effects in interaction. Given these findings, the difference between the human and RALL system conditions might have been smaller if the human tutor had been in front of the participants. It should be added, however, that even if such a result were obtained, it would not make the findings of this study any less meaningful. This is because the physical face-to-face learning between the human tutor and the learner is extremely costly, and the actual learning is mostly online. In this sense, the findings of this study provide useful insights into the actual situation.
AI Techniques In this study, the robot behavior is based on classical scenario-based techniques and does not use newer techniques such as large-scale language models or personalization. Even with these newer techniques, the findings of this study would still be useful. If new technologies are used, RALL systems can provide adaptive instructions that are more similar to those of human tutors. However, the RALL systems are still machines. It is likely that RALL systems will make it easier for students to request repeated practice than with human tutors.
If technology develops further and robots and humans become almost indistinguishable, students may become uncomfortable with robots. In the future, it may be necessary to change the level of humanness and intelligence perceived by students between partner robots for basic and advanced practices.
Combination of Human and RALL Systems We believe that a combination of human and RALL systems will produce the best results for English learning. However, the role of the RALL system will increase as AI and robot technology advances. At the time this study is conducted, it would be appropriate for the RALL system to provide basic training, such as repeated utterances of key phrases. This is because the technology to accurately recognize non-native speakers’ speech, to synthesize speech like a native speaker, and to understand the intent of the learner’s utterances is not sufficient. Instead, human tutors should conduct classes and open-ended dialogues that proceed interactively according to the learner’s situation. When the above technologies are sufficiently developed, RALL systems will be able to conduct the interactive lessons and open-ended dialogues that human tutors have been doing until then. Robots will be able to replace most of the exercises in English conversation learning. However, this does not mean that human tutors will be unnecessary. If the purpose of learning English conversation is to communicate with English-speaking people, then real human communication practice will still be necessary. Some English conversation learners may feel nervous or anxious about communicating with others in a language with which they are unfamiliar. Practicing communication with real people will be indispensable to get used to such nervousness and anxiety.