Top

Journal on Multimodal User Interfaces

Published in:

18-02-2021 | Original Paper

Predicting multimodal presentation skills based on instance weighting domain adaptation

Authors: Yutaro Yagi, Shogo Okada, Shota Shiobara, Sota Sugimura

Published in: Journal on Multimodal User Interfaces | Issue 1/2022

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Presentation skills assessment is one of the central challenges of multimodal modeling. Presentation skills are composed of verbal and nonverbal skill components, but because people demonstrate their presentation skills in a variety of manners, the observed multimodal features vary widely. Due to the differences in features, when test data samples are generated on different training data sample distributions, in many cases, the prediction accuracy of the skills degrades. In machine learning theory, this problem in which training (source) data are biased is known as instance selection bias or covariate shift. To solve this problem, this paper presents an instance weighting adaptation method that is applied to estimate the presentation skills of each participant from multimodal (verbal and nonverbal) features. For this purpose, we collect a novel multimodal presentation dataset that includes audio signal data, body motion sensor data, and text data of the speech content for participants observed in 58 presentation sessions. The dataset also includes both verbal and nonverbal presentation skills, which are assessed by two external experts from a human resources department. We extract multimodal features, such as spoken utterances, acoustic features, and the amount of body motion, to estimate the presentation skills. We propose two approaches, early fusing and late fusing, for the regression models based on multimodal instance weighting adaptation. The experimental results show that the early fusing regression model with instance weighting adaptation achieved \(\rho =0.39\) for the Pearson correlation, which presents the regression accuracy for the clarity of presentation goal elements. In the maximum case, the accuracy (correlation coefficient) is improved from \(-0.34\) to +0.35 by instance weighting adaptation.

next article Training public speaking with virtual social interactions: effectiveness of real-time feedback and delayed feedback

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

The spoken content in the presentations include private information related to the company and the presenter, so the data set is not available to the public due to privacy policies.

The lecturers provide feedback comments, including the good points in the presentation or points to be improved, to the attendees after the program.

https://www.audeering.com/opensmile/.

https://github.com/TadasBaltrusaitis/OpenFace.

Aran O, Gatica-Perez D (2013) One of a kind: inferring personality impressions in meetings. In: Proceedings of ACM ICMI, pp 11–18

Baltruŝaitis T, Mahmoud M, Robinson P (2015) Cross-dataset learning and person-specific normalisation for automatic action unit detection. In: Proceedings of FG workshops

Batrinca L, Mana N, Lepri B, Sebe N, Pianesi F (2016) Multimodal personality recognition in collaborative goal-oriented tasks. IEEE Trans Multimedia 18(4):659–673CrossRef

Berger CR (2003) Chapter 7 “Message Production Skill in Social Interaction”. In: Handbook of communication and social interaction skills. Psychology Press

Biel JI, Teijeiro-Mosquera L, Gatica-Perez D (2012) Facetube: predicting personality from facial expressions of emotion in online conversational video. In: Proceedings of ACM ICMI

Chen L, Feng G, Joe J, Leong CW, Kitchen C, Lee CM (2014) Towards automated assessment of public speaking skills using multimodal cues. In: Proceedings of ACM ICMI

Chollet M, Massachi T, Scherer S (2017) Racing heart and sweaty palms. In: Beskow J, Peters C, Castellano G, O’Sullivan C, Leite I, Kopp S (eds) Intelligent virtual agents. Springer International Publishing

Chollet M, Prendinger H, Scherer S (2016) Native versus non-native language fluency implications on multimodal interaction for interpersonal skills training. In: Proceedings of ACM ICMI

Chollet M, Scherer S (2017) Assessing public speaking ability from thin slices of behavior. In: Proceedings of IEEE FG

10.

Chollet M, Stefanov K, Prendinger H, Scherer S (2015) Public speaking training with a multimodal interactive virtual audience framework. In: Proceedings of ACM ICMI

11.

Chollet M, Wörtwein T, Morency LP, Shapiro A, Scherer S (2015) Exploring feedback strategies to improve public speaking: An interactive virtual audience framework. In: Proceedings of ACM UbiComp

12.

Greene JO, Burleson BR (2003) Handbook of communication and social interaction skills. Psychology Press

13.

Hall JA (1984) Nonverbal sex differences? Communication accuracy and expressive style. Johns Hopkins University Press

14.

Hoque ME, Courgeon M, Martin JC, Mutlu B, Picard RW (2013) Mach: my automated conversation coach. In: Proceedings of ACM UbiComp. ACM, pp 697–706

15.

Härdle W, Müller M, Sperlich S, Werwatz A (2004) Nonparametric and semiparametric models

16.

Ishii R, Otsuka K, Kumano S, Higashinaka R, Tomita J (2018) Analyzing gaze behavior and dialogue act during turn-taking for estimating empathy skill level. In: Proceedings of ACM ICMI

17.

Jayagopi DB, Sanchez-Cortes D, Otsuka K, Yamato J, Gatica-Perez D (2012) Linking speaking and looking behavior patterns with group composition, perception, and performance. In: Proceedings of ACM ICMI

18.

Kanamori T, Hido S, Sugiyama M (2009) A least-squares approach to direct importance estimation. J Mach Learn Res 10:1391–1445MathSciNetMATH

19.

Kanamori T, Suzuki T, Sugiyama M (2012) Statistical analysis of kernel-based least-squares density-ratio estimation. Mach Learn 86(3):335–367MathSciNetCrossRef

20.

Kudo T, Yamamoto K, Matsumoto Y (2004) Applying conditional random fields to Japanese morphological analysis. In: Proceedings of EMNLP

21.

Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174CrossRef

22.

Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of ICML

23.

LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436CrossRef

24.

Li Y, Kambara H, Koike Y, Sugiyama M (2010) Application of covariate shift adaptation techniques in brain-computer interfaces. IEEE Trans Biomed Eng 57(6):1318–1324CrossRef

25.

Lin YS, Lee CC (2018) Using interlocutor-modulated attention blstm to predict personality traits in small group interaction. In: Proceedings of ACM ICMI

26.

Lombard M, Snyder-Duch J, Bracken C (2005) Practical resources for assessing and reporting intercoder reliability in content analysis research projects. Retrieved April 19

27.

Mikolov T, Corrado G, Chen K, Dean J (2013) Efficient estimation of word representations in vector space

28.

Nguyen L, Frauendorfer D, Mast M, Gatica-Perez D (2014) Hire me: computational inference of hirability in employment interviews based on nonverbal behavior. IEEE Trans Multimedia

29.

Okada S, Komatani K (2018) Investigating effectiveness of linguistic features based on speech recognition for storytelling skill assessment. In: Recent trends and future technology in applied intelligence. Springer International Publishing, pp 148–157

30.

Okada S, Ohtake Y, Nakano YI, Hayashi Y, Huang HH, Takase Y, Nitta K (2016) Estimating communication skills using dialogue acts and nonverbal features in multiple discussion datasets. In: Proceedings of ACM ICMI

31.

Park S, Shim HS, Chatterjee M, Sagae K, Morency LP (2014) Computational analysis of persuasiveness in social multimedia: A novel dataset and multimodal prediction approach. In: Proceedings of ACM ICMI

32.

Pérez-Rosas V, Mihalcea R, Morency LP (2013) Utterance-level multimodal sentiment analysis. In: Proceedings of ACL

33.

Pianesi F, Mana N, Cappelletti A, Lepri B, Zancanaro M (2008) Multimodal recognition of personality traits in social interactions. In: Proceedings of ACM ICMI

34.

Ramanarayanan V, Leong CW, Chen L, Feng G, Suendermann-Oeft D (2015) Evaluating speech, face, emotion and body movement time-series features for automated multimodal presentation scoring. In: Proceedings of ACM ICMI

35.

Rosenberg A, Hirschberg J (2005) Acoustic/prosodic and lexical correlates of charismatic speech. In: Proceedings of INTERSPEECH

36.

Sanchez-Cortes D, Aran O, Mast MS, Gatica-Perez D (2012) A nonverbal behavior approach to identify emergent leaders in small groups. IEEE Trans Multimedia 14

37.

Scherer S, Weibel N, Morency LP, Oviatt S (2012) Multimodal prediction of expertise and leadership in learning groups. In: Proceedings of the international workshop on MLA

38.

Shimodaira H (2000) Improving predictive inference under covariate shift by weighting the log-likelihood function. J Stat Plan Inference 90(2):227–244MathSciNetCrossRef

39.

Sugiyama M, Kawanabe M (2012) Machine learning in non-stationary environments: introduction to covariate shift adaptation. The MIT Press

40.

Sugiyama M, Nakajima S, Kashima H, Buenau PV, Kawanabe M (2008) Direct importance estimation with model selection and its application to covariate shift adaptation. In: Proceedings of advances in neural information processing systems

41.

Tanaka H, Negoro H, Iwasaka H, Nakamura S (2018) Listening skills assessment through computer agents. In: Proceedings of ACM ICMI

42.

Tanaka H, Sakti S, Neubig G, Toda T, Negoro H, Iwasaka H, Nakamura S (2015) Automated social skills trainer. In: Proceedings of ACM IUI

43.

Tsuboi Y, Kashima H, Hido S, Bickel S, Sugiyama M (2009) Direct density ratio estimation for large-scale covariate shift adaptation. J Inf Process 17:138–155

44.

Valente F, Kim S, Motlicek P (2012) Annotation and recognition of personality traits in spoken conversations from the ami meetings corpus. In: Proceedings of INTERSPEECH

45.

Wood E, Baltruaitis T, Zhang X, Sugano Y, Robinson P, Bulling A (2015) Rendering of eyes for eye-shape registration and gaze estimation. In: Proceedings of IEEE ICCV

46.

Wörtwein T, Chollet M, Schauerte B, Morency LP, Stiefelhagen R, Scherer S (2015) Multimodal public speaking performance assessment. In: Proceedings of ACM ICMI

47.

Wörtwein T, Morency L, Scherer S (2015) Automatic assessment and analysis of public speaking anxiety: a virtual audience case study. In: Proceedings of ACII

48.

Zadrozny B (2004) Learning and evaluating classifiers under sample selection bias. In: Proceedings of ICML

Title: Predicting multimodal presentation skills based on instance weighting domain adaptation
Authors: Yutaro Yagi
Shogo Okada
Shota Shiobara
Sota Sugimura
Publication date: 18-02-2021
Publisher: Springer International Publishing
Published in: Journal on Multimodal User Interfaces / Issue 1/2022
Print ISSN: 1783-7677
Electronic ISSN: 1783-8738
DOI: https://doi.org/10.1007/s12193-021-00367-x

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 1/2022

Combining audio and visual displays to highlight temporal and spatial seismic patterns

A gaze-based interactive system to explore artwork imagery

Interactive exploration of a hierarchical spider web structure with sound

A wearable virtual touch system for IVIS in cars

RFID-based tangible and touch tabletop for dual reality in crisis management context

SoundSight: a mobile sensory substitution device that sonifies colour, distance, and temperature

Premium Partner