nach oben

Erschienen in:

2015 | OriginalPaper | Buchkapitel

Developing Embodied Agents for Education Applications with Accurate Synchronization of Gesture and Speech

verfasst von : Jianfeng Xu, Yuki Nagai, Shinya Takayama, Shigeyuki Sakazawa

Erschienen in: Transactions on Computational Collective Intelligence XX

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Embodied agents have great potential for education field, which are promising to maximize the learner’s learning gains and enjoyment. In many education applications, multimodal representation of embodied agents is a powerful approach for obtaining the above benefit, which requires accurate synchronization of gesture and speech. For this purpose, we investigate the important issues in synchronization as a practical guideline for our algorithm design through a precedent case study and propose a two-step synchronization method. Our case study reveals that two issues (i.e. duration and timing) play an important role in synchronizing of gesture with speech. Considering the synchronization problem as a motion synthesis problem instead of a behavior scheduling problem used in the conventional methods, we employ a motion graph technique with constraints on gesture structure for coarse synchronization in a first step and refine this further by shifting and scaling the gesture in a second step. Subjective evaluation has demonstrated that the proposed method achieves more accurate synchronization with respect to both duration and timing, and higher motion quality than the state-of-the-art methods.

Furthermore, we have implemented the proposed synchronization method in an authoring tool for education applications. We have conducted several experiments in a university, whose results have demonstrated that our system makes the creation of attractive animations easier and faster (only about 10 % operation time) than manual creation of equal quality, and it is effective to use embodied agents in education applications.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nächstes Kapitel Abstraction of Heterogeneous Supplier Models in Hierarchical Resource Allocation

A growth point is assumed to be a minimal psychological unit with special focus on speech-gesture synchrony and co-expressivity.

T-test is most commonly applied to determine if two sets of data are significantly different from each other when the samples follow a normal distribution.

A lower p value means a higher probability that a significant difference exists.

In this paper, the so called MikuMikuDance has two meanings. First, it may mean the authoring tool used in Sect. 3.1. Second, it may mean the specifications for mesh model, motion, and other data in animation.

Arikan, O., Forsyth, D.: Interactive motion generation from examples. ACM Trans. Graph. 21(3), 483–490 (2002)MATHCrossRef

Arikan, O., Forsyth, D.A., O’Brien, J.F.: Motion synthesis from annotations. ACM Trans. Graph. 22(3), 402–408 (2003)MATHCrossRef

Beaudoin, P., Coros, S., van de Panne, M., Poulin, P.: Motion-motif graphs. In: SCA 2008, pp. 117–126 (2008)

Beskow, J., Engwall, O., Granstrom, B., Wik, P.: Design strategies for a virtual language tutor. In: INTERSPEECH-2004, pp. 1693–1696 (2004)

Cassell, J., Bickmore, T., Billinghurst, M., Campbell, L., Chang, K., Vilhjálmsson, H., Yan, H.: Embodiment in conversational interfaces: Rea. In: CHI 1999, pp. 520–527 (1999)

Cassell, J., Sullivan, J., Prevost, S., Churchill, E.F.: Embodied Conversational Agents, 1st edn. The MIT Press, Cambridge (2000)

Cassell, J., Vilhjálmsson, H.H., Bickmore, T.: Beat: the behavior expression animation toolkit. In: ACM SIGGRAPH 2001, pp. 477–486 (2001)

Dutoit, T.: An Introduction to Text-to-Speech Synthesis. Springer, New York (2001)

Ekman, P., Friesen, W.V., Hager, J.C.: Facial Action Coding System: The Manual on CD ROM. A Human Face, Salt Lake City (2002)

10.

Forsyth, D.A., Arikan, O., Ikemoto, L., O’Brien, J.F.: Computational studies of human motion: Part 1, tracking and motion synthesis. Found. Trends Comput. Graph. Vis. 1(2), 77–254 (2006)

11.

Gleicher, M., Shin, H.J., Kovar, L., Jepsen, A.: Snap-together motion: assembling run-time animations. In: I3D 2003, pp. 181–188 (2003)

12.

Gulz, A., Haake, M., Silvervarg, A.: Extending a teachable agent with a social conversation module – effects on student experiences and learning. In: Biswas, G., Bull, S., Kay, J., Mitrovic, A. (eds.) AIED 2011. LNCS, vol. 6738, pp. 106–114. Springer, Heidelberg (2011) CrossRef

13.

Huang, J., Pelachaud, C.: Expressive body animation pipeline for virtual agent. In: Nakano, Y., Neff, M., Paiva, A., Walker, M. (eds.) IVA 2012. LNCS, vol. 7502, pp. 355–362. Springer, Heidelberg (2012) CrossRef

14.

Ieronutti, L., Chittaro, L.: Employing virtual humans for education and training in X3D/VRML worlds. Comput. Educ. 49(1), 93–109 (2007)CrossRef

15.

Kopp, S., Krenn, B., Marsella, S.C., Marshall, A.N., Pelachaud, C., Pirker, H., Thórisson, K.R., Vilhjálmsson, H.H.: Towards a common framework for multimodal generation: the behavior markup language. In: Gratch, J., Young, M., Aylett, R.S., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS (LNAI), vol. 4133, pp. 205–217. Springer, Heidelberg (2006) CrossRef

16.

Kovar, L., Gleicher, M., Pighin, F.: Motion graphs. ACM Trans. Graph. 21(3), 473–482 (2002)CrossRef

17.

Lee, J., Chai, J., Reitsma, P., Hodgins, J., Pollard, N.: Interactive control of avatars animated with human motion data. ACM Trans. Graph. 21(3), 491–500 (2002)

18.

Lee, J., Lee, K.H.: Precomputing avatar behavior from human motion data. In: Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation, pp. 79–87 (2004)

19.

van Luin, J., op den Akker, R., Nijholt, A.: A dialogue agent for navigation support in virtual reality. In: CHI EA 2001, pp. 117–118 (2001)

20.

Maldonado, H., Lee, J.E.R., Brave, S., Nass, C., Nakajima, H., Yamada, R., Iwamura, K., Morishima, Y.: We learn better together: enhancing elearning with emotional characters. In: CSCL 2005, pp. 408–417 (2005)

21.

Marsella, S., Xu, Y., Lhommet, M., Feng, A., Scherer, S., Shapiro, A.: Virtual character performance from speech. In: SCA 2013, pp. 25–35 (2013)

22.

McGurk, H., MacDonald, J.: Hearing lips and seeing voices. Nature 264, 746–748 (1976)CrossRef

23.

McNeill, D.: Gesture and Thought. University of Chicago Press, Chicago (2005)CrossRef

24.

McNeill, D.: So you think gestures are nonverbal? Psychol. Rev. 92(3), 350–371 (1985)CrossRef

25.

Miller, L.M., D’Esposito, M.: Perceptual fusion and stimulus coincidence in the cross-modal integration of speech. J. Neurosci. 25(25), 5884–5893 (2005)CrossRef

26.

Mizuguchi, M., Buchanan, J., Calvert, T.: Data driven motion transitions for interactive games. In: Proceedings of EUROGRAPHICS 2001 short papers (2001)

27.

Neff, M., Kipp, M., Albrecht, I., Seidel, H.P.: Gesture modeling and animation based on a probabilistic re-creation of speaker style. ACM Trans. Graph. 27(1), 5:1–5:24 (2008)CrossRef

28.

Ng-Thow-Hing, V., Luo, P., Okita, S.: Synchronized gesture and speech production for humanoid robots. In: IEEE/RSJ IROS 2010, pp. 4617–4624 (2010)

29.

Niewiadomski, R., Bevacqua, E., Mancini, M., Pelachaud, C.: Greta: an interactive expressive ECA system. In: AAMAS 2009, vol. 2. pp. 1399–1400 (2009)

30.

Nishida, T.: Conversational Informatics: An Engineering Approach. Wiley, New York (2007)CrossRef

31.

Noma, T., Zhao, L., Badler, N.: Design of a virtual human presenter. IEEE Comput. Graph. Appl. 20(4), 79–85 (2000)CrossRef

32.

Ogan, A., Finkelstein, S., Mayfield, E., D’Adamo, C., Matsuda, N., Cassell, J.: “oh dear stacy!": Social interaction, elaboration, and learning with teachable agents. In: CHI 2012, pp. 39–48 (2012)

33.

Oura, K., Yamamoto, D., Takumi, I., Lee, A., Tokuda, K.: On-campus, user-participatable, and voice-interactive digital signage. J. Jpn Soc. Artif. Intell. 28(1), 60–67 (2013)

34.

Reitsma, P.S.A., Pollard, N.S.: Evaluating motion graphs for character animation. ACM Trans. Graph. 26(4), 18 (2007)CrossRef

35.

Ren, C., Zhao, L., Safonova, A.: Human motion synthesis with optimization-based graphs. Comput. Graph. Forum 29(2), 545–554 (2010)CrossRef

36.

Rist, T., Andr, E., Baldes, S., Gebhard, P., Klesen, M., Kipp, M., Rist, P., Schmitt, M.: A review of the development of embodied presentation agents and their application fields. In: Prendinger, H., Ishizuka, M. (eds.) Life-Like Characters. Cognitive Technologies, pp. 377–404. Springer, Berlin (2004)CrossRef

37.

Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161–1178 (1980)CrossRef

38.

Safonova, A., Hodgins, J.K.: Construction and optimal search of interpolated motion graphs. ACM Trans. Graph. 26(3), 106 (2007)CrossRef

39.

Shoemake, K.: Animating rotation with quaternion curves. In: ACM SIGGRAPH 1985, pp. 245–254 (1985)

40.

Soliman, M., Guetl, C.: Intelligent pedagogical agents in immersive virtual learning environments: a review. In: MIPRO 2010, pp. 827–832 (2010)

41.

Stone, M., DeCarlo, D., Oh, I., Rodriguez, C., Stere, A., Lees, A., Bregler, C.: Speaking with hands: creating animated conversational characters from recordings of human performance. ACM Trans. Graph. 23(3), 506–513 (2004)CrossRef

42.

Čerekovič, A., Pandžič, I.: Multimodal behavior realization for embodied conversational agents. Multimedia Tools Appl. 54(1), 143–164 (2011)CrossRef

43.

Wang, J., Bodenheimer, B.: An evaluation of a cost metric for selecting transitions between motion segments. In: SCA 2003, pp. 232–238 (2003)

44.

Xu, J., Myodo, E., Sakazawa, S.: Motion synthesis for affective agents using piecewise principal component regression. In: IEEE ICME 2013, pp. 1–7 (2013)

45.

Xu, J., Takagi, K., Sakazawa, S.: Motion synthesis for synchronizing with streaming music by segment-based search on metadata motion graphs. IEEE ICME 2011, pp. 1–6 (2011)

46.

Zhao, L., Safonova, A.: Achieving good connectivity in motion graphs. Graph. Models 71(4), 139–152 (2009)CrossRef

Titel: Developing Embodied Agents for Education Applications with Accurate Synchronization of Gesture and Speech
verfasst von: Jianfeng Xu
Yuki Nagai
Shinya Takayama
Shigeyuki Sakazawa
Verlag: Springer International Publishing
Buch: Transactions on Computational Collective Intelligence XX
Print ISBN: 978-3-319-27542-0

Electronic ISBN: 978-3-319-27543-7

Copyright-Jahr: 2015
DOI: https://doi.org/10.1007/978-3-319-27543-7_1

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"