Skip to main content
Erschienen in:
Buchtitelbild

2015 | OriginalPaper | Buchkapitel

Developing Embodied Agents for Education Applications with Accurate Synchronization of Gesture and Speech

verfasst von : Jianfeng Xu, Yuki Nagai, Shinya Takayama, Shigeyuki Sakazawa

Erschienen in: Transactions on Computational Collective Intelligence XX

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Embodied agents have great potential for education field, which are promising to maximize the learner’s learning gains and enjoyment. In many education applications, multimodal representation of embodied agents is a powerful approach for obtaining the above benefit, which requires accurate synchronization of gesture and speech. For this purpose, we investigate the important issues in synchronization as a practical guideline for our algorithm design through a precedent case study and propose a two-step synchronization method. Our case study reveals that two issues (i.e. duration and timing) play an important role in synchronizing of gesture with speech. Considering the synchronization problem as a motion synthesis problem instead of a behavior scheduling problem used in the conventional methods, we employ a motion graph technique with constraints on gesture structure for coarse synchronization in a first step and refine this further by shifting and scaling the gesture in a second step. Subjective evaluation has demonstrated that the proposed method achieves more accurate synchronization with respect to both duration and timing, and higher motion quality than the state-of-the-art methods.
Furthermore, we have implemented the proposed synchronization method in an authoring tool for education applications. We have conducted several experiments in a university, whose results have demonstrated that our system makes the creation of attractive animations easier and faster (only about 10 % operation time) than manual creation of equal quality, and it is effective to use embodied agents in education applications.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
A growth point is assumed to be a minimal psychological unit with special focus on speech-gesture synchrony and co-expressivity.
 
2
T-test is most commonly applied to determine if two sets of data are significantly different from each other when the samples follow a normal distribution.
 
3
A lower p value means a higher probability that a significant difference exists.
 
4
In this paper, the so called MikuMikuDance has two meanings. First, it may mean the authoring tool used in Sect. 3.1. Second, it may mean the specifications for mesh model, motion, and other data in animation.
 
Literatur
1.
Zurück zum Zitat Arikan, O., Forsyth, D.: Interactive motion generation from examples. ACM Trans. Graph. 21(3), 483–490 (2002)MATHCrossRef Arikan, O., Forsyth, D.: Interactive motion generation from examples. ACM Trans. Graph. 21(3), 483–490 (2002)MATHCrossRef
2.
Zurück zum Zitat Arikan, O., Forsyth, D.A., O’Brien, J.F.: Motion synthesis from annotations. ACM Trans. Graph. 22(3), 402–408 (2003)MATHCrossRef Arikan, O., Forsyth, D.A., O’Brien, J.F.: Motion synthesis from annotations. ACM Trans. Graph. 22(3), 402–408 (2003)MATHCrossRef
3.
Zurück zum Zitat Beaudoin, P., Coros, S., van de Panne, M., Poulin, P.: Motion-motif graphs. In: SCA 2008, pp. 117–126 (2008) Beaudoin, P., Coros, S., van de Panne, M., Poulin, P.: Motion-motif graphs. In: SCA 2008, pp. 117–126 (2008)
4.
Zurück zum Zitat Beskow, J., Engwall, O., Granstrom, B., Wik, P.: Design strategies for a virtual language tutor. In: INTERSPEECH-2004, pp. 1693–1696 (2004) Beskow, J., Engwall, O., Granstrom, B., Wik, P.: Design strategies for a virtual language tutor. In: INTERSPEECH-2004, pp. 1693–1696 (2004)
5.
Zurück zum Zitat Cassell, J., Bickmore, T., Billinghurst, M., Campbell, L., Chang, K., Vilhjálmsson, H., Yan, H.: Embodiment in conversational interfaces: Rea. In: CHI 1999, pp. 520–527 (1999) Cassell, J., Bickmore, T., Billinghurst, M., Campbell, L., Chang, K., Vilhjálmsson, H., Yan, H.: Embodiment in conversational interfaces: Rea. In: CHI 1999, pp. 520–527 (1999)
6.
Zurück zum Zitat Cassell, J., Sullivan, J., Prevost, S., Churchill, E.F.: Embodied Conversational Agents, 1st edn. The MIT Press, Cambridge (2000) Cassell, J., Sullivan, J., Prevost, S., Churchill, E.F.: Embodied Conversational Agents, 1st edn. The MIT Press, Cambridge (2000)
7.
Zurück zum Zitat Cassell, J., Vilhjálmsson, H.H., Bickmore, T.: Beat: the behavior expression animation toolkit. In: ACM SIGGRAPH 2001, pp. 477–486 (2001) Cassell, J., Vilhjálmsson, H.H., Bickmore, T.: Beat: the behavior expression animation toolkit. In: ACM SIGGRAPH 2001, pp. 477–486 (2001)
8.
Zurück zum Zitat Dutoit, T.: An Introduction to Text-to-Speech Synthesis. Springer, New York (2001) Dutoit, T.: An Introduction to Text-to-Speech Synthesis. Springer, New York (2001)
9.
Zurück zum Zitat Ekman, P., Friesen, W.V., Hager, J.C.: Facial Action Coding System: The Manual on CD ROM. A Human Face, Salt Lake City (2002) Ekman, P., Friesen, W.V., Hager, J.C.: Facial Action Coding System: The Manual on CD ROM. A Human Face, Salt Lake City (2002)
10.
Zurück zum Zitat Forsyth, D.A., Arikan, O., Ikemoto, L., O’Brien, J.F.: Computational studies of human motion: Part 1, tracking and motion synthesis. Found. Trends Comput. Graph. Vis. 1(2), 77–254 (2006) Forsyth, D.A., Arikan, O., Ikemoto, L., O’Brien, J.F.: Computational studies of human motion: Part 1, tracking and motion synthesis. Found. Trends Comput. Graph. Vis. 1(2), 77–254 (2006)
11.
Zurück zum Zitat Gleicher, M., Shin, H.J., Kovar, L., Jepsen, A.: Snap-together motion: assembling run-time animations. In: I3D 2003, pp. 181–188 (2003) Gleicher, M., Shin, H.J., Kovar, L., Jepsen, A.: Snap-together motion: assembling run-time animations. In: I3D 2003, pp. 181–188 (2003)
12.
Zurück zum Zitat Gulz, A., Haake, M., Silvervarg, A.: Extending a teachable agent with a social conversation module – effects on student experiences and learning. In: Biswas, G., Bull, S., Kay, J., Mitrovic, A. (eds.) AIED 2011. LNCS, vol. 6738, pp. 106–114. Springer, Heidelberg (2011) CrossRef Gulz, A., Haake, M., Silvervarg, A.: Extending a teachable agent with a social conversation module – effects on student experiences and learning. In: Biswas, G., Bull, S., Kay, J., Mitrovic, A. (eds.) AIED 2011. LNCS, vol. 6738, pp. 106–114. Springer, Heidelberg (2011) CrossRef
13.
Zurück zum Zitat Huang, J., Pelachaud, C.: Expressive body animation pipeline for virtual agent. In: Nakano, Y., Neff, M., Paiva, A., Walker, M. (eds.) IVA 2012. LNCS, vol. 7502, pp. 355–362. Springer, Heidelberg (2012) CrossRef Huang, J., Pelachaud, C.: Expressive body animation pipeline for virtual agent. In: Nakano, Y., Neff, M., Paiva, A., Walker, M. (eds.) IVA 2012. LNCS, vol. 7502, pp. 355–362. Springer, Heidelberg (2012) CrossRef
14.
Zurück zum Zitat Ieronutti, L., Chittaro, L.: Employing virtual humans for education and training in X3D/VRML worlds. Comput. Educ. 49(1), 93–109 (2007)CrossRef Ieronutti, L., Chittaro, L.: Employing virtual humans for education and training in X3D/VRML worlds. Comput. Educ. 49(1), 93–109 (2007)CrossRef
15.
Zurück zum Zitat Kopp, S., Krenn, B., Marsella, S.C., Marshall, A.N., Pelachaud, C., Pirker, H., Thórisson, K.R., Vilhjálmsson, H.H.: Towards a common framework for multimodal generation: the behavior markup language. In: Gratch, J., Young, M., Aylett, R.S., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS (LNAI), vol. 4133, pp. 205–217. Springer, Heidelberg (2006) CrossRef Kopp, S., Krenn, B., Marsella, S.C., Marshall, A.N., Pelachaud, C., Pirker, H., Thórisson, K.R., Vilhjálmsson, H.H.: Towards a common framework for multimodal generation: the behavior markup language. In: Gratch, J., Young, M., Aylett, R.S., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS (LNAI), vol. 4133, pp. 205–217. Springer, Heidelberg (2006) CrossRef
16.
Zurück zum Zitat Kovar, L., Gleicher, M., Pighin, F.: Motion graphs. ACM Trans. Graph. 21(3), 473–482 (2002)CrossRef Kovar, L., Gleicher, M., Pighin, F.: Motion graphs. ACM Trans. Graph. 21(3), 473–482 (2002)CrossRef
17.
Zurück zum Zitat Lee, J., Chai, J., Reitsma, P., Hodgins, J., Pollard, N.: Interactive control of avatars animated with human motion data. ACM Trans. Graph. 21(3), 491–500 (2002) Lee, J., Chai, J., Reitsma, P., Hodgins, J., Pollard, N.: Interactive control of avatars animated with human motion data. ACM Trans. Graph. 21(3), 491–500 (2002)
18.
Zurück zum Zitat Lee, J., Lee, K.H.: Precomputing avatar behavior from human motion data. In: Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation, pp. 79–87 (2004) Lee, J., Lee, K.H.: Precomputing avatar behavior from human motion data. In: Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation, pp. 79–87 (2004)
19.
Zurück zum Zitat van Luin, J., op den Akker, R., Nijholt, A.: A dialogue agent for navigation support in virtual reality. In: CHI EA 2001, pp. 117–118 (2001) van Luin, J., op den Akker, R., Nijholt, A.: A dialogue agent for navigation support in virtual reality. In: CHI EA 2001, pp. 117–118 (2001)
20.
Zurück zum Zitat Maldonado, H., Lee, J.E.R., Brave, S., Nass, C., Nakajima, H., Yamada, R., Iwamura, K., Morishima, Y.: We learn better together: enhancing elearning with emotional characters. In: CSCL 2005, pp. 408–417 (2005) Maldonado, H., Lee, J.E.R., Brave, S., Nass, C., Nakajima, H., Yamada, R., Iwamura, K., Morishima, Y.: We learn better together: enhancing elearning with emotional characters. In: CSCL 2005, pp. 408–417 (2005)
21.
Zurück zum Zitat Marsella, S., Xu, Y., Lhommet, M., Feng, A., Scherer, S., Shapiro, A.: Virtual character performance from speech. In: SCA 2013, pp. 25–35 (2013) Marsella, S., Xu, Y., Lhommet, M., Feng, A., Scherer, S., Shapiro, A.: Virtual character performance from speech. In: SCA 2013, pp. 25–35 (2013)
22.
Zurück zum Zitat McGurk, H., MacDonald, J.: Hearing lips and seeing voices. Nature 264, 746–748 (1976)CrossRef McGurk, H., MacDonald, J.: Hearing lips and seeing voices. Nature 264, 746–748 (1976)CrossRef
23.
Zurück zum Zitat McNeill, D.: Gesture and Thought. University of Chicago Press, Chicago (2005)CrossRef McNeill, D.: Gesture and Thought. University of Chicago Press, Chicago (2005)CrossRef
24.
Zurück zum Zitat McNeill, D.: So you think gestures are nonverbal? Psychol. Rev. 92(3), 350–371 (1985)CrossRef McNeill, D.: So you think gestures are nonverbal? Psychol. Rev. 92(3), 350–371 (1985)CrossRef
25.
Zurück zum Zitat Miller, L.M., D’Esposito, M.: Perceptual fusion and stimulus coincidence in the cross-modal integration of speech. J. Neurosci. 25(25), 5884–5893 (2005)CrossRef Miller, L.M., D’Esposito, M.: Perceptual fusion and stimulus coincidence in the cross-modal integration of speech. J. Neurosci. 25(25), 5884–5893 (2005)CrossRef
26.
Zurück zum Zitat Mizuguchi, M., Buchanan, J., Calvert, T.: Data driven motion transitions for interactive games. In: Proceedings of EUROGRAPHICS 2001 short papers (2001) Mizuguchi, M., Buchanan, J., Calvert, T.: Data driven motion transitions for interactive games. In: Proceedings of EUROGRAPHICS 2001 short papers (2001)
27.
Zurück zum Zitat Neff, M., Kipp, M., Albrecht, I., Seidel, H.P.: Gesture modeling and animation based on a probabilistic re-creation of speaker style. ACM Trans. Graph. 27(1), 5:1–5:24 (2008)CrossRef Neff, M., Kipp, M., Albrecht, I., Seidel, H.P.: Gesture modeling and animation based on a probabilistic re-creation of speaker style. ACM Trans. Graph. 27(1), 5:1–5:24 (2008)CrossRef
28.
Zurück zum Zitat Ng-Thow-Hing, V., Luo, P., Okita, S.: Synchronized gesture and speech production for humanoid robots. In: IEEE/RSJ IROS 2010, pp. 4617–4624 (2010) Ng-Thow-Hing, V., Luo, P., Okita, S.: Synchronized gesture and speech production for humanoid robots. In: IEEE/RSJ IROS 2010, pp. 4617–4624 (2010)
29.
Zurück zum Zitat Niewiadomski, R., Bevacqua, E., Mancini, M., Pelachaud, C.: Greta: an interactive expressive ECA system. In: AAMAS 2009, vol. 2. pp. 1399–1400 (2009) Niewiadomski, R., Bevacqua, E., Mancini, M., Pelachaud, C.: Greta: an interactive expressive ECA system. In: AAMAS 2009, vol. 2. pp. 1399–1400 (2009)
30.
Zurück zum Zitat Nishida, T.: Conversational Informatics: An Engineering Approach. Wiley, New York (2007)CrossRef Nishida, T.: Conversational Informatics: An Engineering Approach. Wiley, New York (2007)CrossRef
31.
Zurück zum Zitat Noma, T., Zhao, L., Badler, N.: Design of a virtual human presenter. IEEE Comput. Graph. Appl. 20(4), 79–85 (2000)CrossRef Noma, T., Zhao, L., Badler, N.: Design of a virtual human presenter. IEEE Comput. Graph. Appl. 20(4), 79–85 (2000)CrossRef
32.
Zurück zum Zitat Ogan, A., Finkelstein, S., Mayfield, E., D’Adamo, C., Matsuda, N., Cassell, J.: “oh dear stacy!": Social interaction, elaboration, and learning with teachable agents. In: CHI 2012, pp. 39–48 (2012) Ogan, A., Finkelstein, S., Mayfield, E., D’Adamo, C., Matsuda, N., Cassell, J.: “oh dear stacy!": Social interaction, elaboration, and learning with teachable agents. In: CHI 2012, pp. 39–48 (2012)
33.
Zurück zum Zitat Oura, K., Yamamoto, D., Takumi, I., Lee, A., Tokuda, K.: On-campus, user-participatable, and voice-interactive digital signage. J. Jpn Soc. Artif. Intell. 28(1), 60–67 (2013) Oura, K., Yamamoto, D., Takumi, I., Lee, A., Tokuda, K.: On-campus, user-participatable, and voice-interactive digital signage. J. Jpn Soc. Artif. Intell. 28(1), 60–67 (2013)
34.
Zurück zum Zitat Reitsma, P.S.A., Pollard, N.S.: Evaluating motion graphs for character animation. ACM Trans. Graph. 26(4), 18 (2007)CrossRef Reitsma, P.S.A., Pollard, N.S.: Evaluating motion graphs for character animation. ACM Trans. Graph. 26(4), 18 (2007)CrossRef
35.
Zurück zum Zitat Ren, C., Zhao, L., Safonova, A.: Human motion synthesis with optimization-based graphs. Comput. Graph. Forum 29(2), 545–554 (2010)CrossRef Ren, C., Zhao, L., Safonova, A.: Human motion synthesis with optimization-based graphs. Comput. Graph. Forum 29(2), 545–554 (2010)CrossRef
36.
Zurück zum Zitat Rist, T., Andr, E., Baldes, S., Gebhard, P., Klesen, M., Kipp, M., Rist, P., Schmitt, M.: A review of the development of embodied presentation agents and their application fields. In: Prendinger, H., Ishizuka, M. (eds.) Life-Like Characters. Cognitive Technologies, pp. 377–404. Springer, Berlin (2004)CrossRef Rist, T., Andr, E., Baldes, S., Gebhard, P., Klesen, M., Kipp, M., Rist, P., Schmitt, M.: A review of the development of embodied presentation agents and their application fields. In: Prendinger, H., Ishizuka, M. (eds.) Life-Like Characters. Cognitive Technologies, pp. 377–404. Springer, Berlin (2004)CrossRef
37.
Zurück zum Zitat Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161–1178 (1980)CrossRef Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161–1178 (1980)CrossRef
38.
Zurück zum Zitat Safonova, A., Hodgins, J.K.: Construction and optimal search of interpolated motion graphs. ACM Trans. Graph. 26(3), 106 (2007)CrossRef Safonova, A., Hodgins, J.K.: Construction and optimal search of interpolated motion graphs. ACM Trans. Graph. 26(3), 106 (2007)CrossRef
39.
Zurück zum Zitat Shoemake, K.: Animating rotation with quaternion curves. In: ACM SIGGRAPH 1985, pp. 245–254 (1985) Shoemake, K.: Animating rotation with quaternion curves. In: ACM SIGGRAPH 1985, pp. 245–254 (1985)
40.
Zurück zum Zitat Soliman, M., Guetl, C.: Intelligent pedagogical agents in immersive virtual learning environments: a review. In: MIPRO 2010, pp. 827–832 (2010) Soliman, M., Guetl, C.: Intelligent pedagogical agents in immersive virtual learning environments: a review. In: MIPRO 2010, pp. 827–832 (2010)
41.
Zurück zum Zitat Stone, M., DeCarlo, D., Oh, I., Rodriguez, C., Stere, A., Lees, A., Bregler, C.: Speaking with hands: creating animated conversational characters from recordings of human performance. ACM Trans. Graph. 23(3), 506–513 (2004)CrossRef Stone, M., DeCarlo, D., Oh, I., Rodriguez, C., Stere, A., Lees, A., Bregler, C.: Speaking with hands: creating animated conversational characters from recordings of human performance. ACM Trans. Graph. 23(3), 506–513 (2004)CrossRef
42.
Zurück zum Zitat Čerekovič, A., Pandžič, I.: Multimodal behavior realization for embodied conversational agents. Multimedia Tools Appl. 54(1), 143–164 (2011)CrossRef Čerekovič, A., Pandžič, I.: Multimodal behavior realization for embodied conversational agents. Multimedia Tools Appl. 54(1), 143–164 (2011)CrossRef
43.
Zurück zum Zitat Wang, J., Bodenheimer, B.: An evaluation of a cost metric for selecting transitions between motion segments. In: SCA 2003, pp. 232–238 (2003) Wang, J., Bodenheimer, B.: An evaluation of a cost metric for selecting transitions between motion segments. In: SCA 2003, pp. 232–238 (2003)
44.
Zurück zum Zitat Xu, J., Myodo, E., Sakazawa, S.: Motion synthesis for affective agents using piecewise principal component regression. In: IEEE ICME 2013, pp. 1–7 (2013) Xu, J., Myodo, E., Sakazawa, S.: Motion synthesis for affective agents using piecewise principal component regression. In: IEEE ICME 2013, pp. 1–7 (2013)
45.
Zurück zum Zitat Xu, J., Takagi, K., Sakazawa, S.: Motion synthesis for synchronizing with streaming music by segment-based search on metadata motion graphs. IEEE ICME 2011, pp. 1–6 (2011) Xu, J., Takagi, K., Sakazawa, S.: Motion synthesis for synchronizing with streaming music by segment-based search on metadata motion graphs. IEEE ICME 2011, pp. 1–6 (2011)
46.
Zurück zum Zitat Zhao, L., Safonova, A.: Achieving good connectivity in motion graphs. Graph. Models 71(4), 139–152 (2009)CrossRef Zhao, L., Safonova, A.: Achieving good connectivity in motion graphs. Graph. Models 71(4), 139–152 (2009)CrossRef
Metadaten
Titel
Developing Embodied Agents for Education Applications with Accurate Synchronization of Gesture and Speech
verfasst von
Jianfeng Xu
Yuki Nagai
Shinya Takayama
Shigeyuki Sakazawa
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-27543-7_1