skip to main content
10.1145/2818346.2820765acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Evaluating Speech, Face, Emotion and Body Movement Time-series Features for Automated Multimodal Presentation Scoring

Authors Info & Claims
Published:09 November 2015Publication History

ABSTRACT

We analyze how fusing features obtained from different multimodal data streams such as speech, face, body movement and emotion tracks can be applied to the scoring of multimodal presentations. We compute both time-aggregated and time-series based features from these data streams--the former being statistical functionals and other cumulative features computed over the entire time series, while the latter, dubbed histograms of cooccurrences, capture how different prototypical body posture or facial configurations co-occur within different time-lags of each other over the evolution of the multimodal, multivariate time series. We examine the relative utility of these features, along with curated speech stream features in predicting human-rated scores of multiple aspects of presentation proficiency. We find that different modalities are useful in predicting different aspects, even outperforming a naive human inter-rater agreement baseline for a subset of the aspects analyzed.

References

  1. 1. C. Chang and C. Lin. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3):27, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2. L. Chen, G. Feng, J. Joe, C. W. Leong, C. Kitchen, and C. M. Lee. Towards automated assessment of public speaking skills using multimodal cues. In Proceedings of the 16th International Conference on Multimodal Interaction, pages 200--203. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3. L. Chen, J. Tetreault, and X. Xi. Towards using structural events to assess non-native speech. In Proceedings of the 5th Workshop on Innovative Use of NLP for Building Educational Applications, NAACL-HLT, Los Angeles, CA, 2010. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4. L. Chen and S.-Y. Yoon. Application of structural events detected on ASR outputs for automated speaking assessment. In Proceedings of Interspeech, 2012.Google ScholarGoogle Scholar
  5. 5. L. Chen and K. Zechner. Applying rhythm features to automatically assess non-native speech. In Proceedings of Interspeech, 2011.Google ScholarGoogle Scholar
  6. 6. L. Chen, K. Zechner, and X. Xi. Improved pronunciation features for construct-driven assessment of non-native spontaneous speech. In Proceedings of NAACL-HLT, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7. D. Higgins, X. Xi, K. Zechner, and D. M. Williamson. A three-stage approach to the automated scoring of spontaneous spoken responses. Computer Speech and Language, 25(2):282--306, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8. J. H. Jeon and S.-Y. Yoon. Acoustic feature-based non-scorable response detection for an automated speaking proficiency assessment. In Proceedings of Interspeech, pages 1275--1278, 2012.Google ScholarGoogle Scholar
  9. 9. A. Kapoor and R. W. Picard. Multimodal affect recognition in learning environments. In Proceedings of the 13th annual ACM international conference on Multimedia, pages 677--682. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10. I. Naim, M. I. Tanveer, D. Gildea, and M. E. Hoque. Automated prediction and analysis of job interview performance: The role of what you say and how you say it.Google ScholarGoogle Scholar
  11. 11. L. Nguyen, D. Frauendorfer, M. Schmid Mast, and D. Gatica-Perez. Hire me: Computational inference of hirability in employment interviews based on nonverbal behavior. IEEE transactions on multimedia, 16(4):1018--1031, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12. F. Pianesi, N. Mana, A. Cappelletti, B. Lepri, and M. Zancanaro. Multimodal recognition of personality traits in social interactions. In Proceedings of the 10th international conference on Multimodal interfaces, pages 53--60. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13. V. Ramanarayanan, M. Van Segbroeck, and S. Narayanan. Directly data-derived articulatory gesture-like representations retain discriminatory information about phone categories. Computer Speech and Language, in press.Google ScholarGoogle Scholar
  14. 14. R. Ranganath, D. Jurafsky, and D. A. McFarland. Detecting friendly, flirtatious, awkward, and assertive speech in speed-dates. Computer Speech & Language, 27(1):89--115, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15. D. Sanchez-Cortes, J.-I. Biel, S. Kumano, J. Yamato, K. Otsuka, and D. Gatica-Perez. Inferring mood in ubiquitous conversational video. In Proceedings of the 12th International Conference on Mobile and Ubiquitous Multimedia, page 22. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16. L. M. Schreiber, G. D. Paul, and L. R. Shibley. The development and test of the public speaking competence rubric. Communication Education, 61(3):205--233, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  17. 17. B. Schuller, A. Batliner, S. Steidl, F. Schiel, and J. Krajewski. The interspeech 2011 speaker state challenge. In Proceedings INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, pages 3201--3204, 2011.Google ScholarGoogle Scholar
  18. 18. B. Schuller, S. Steidl, A. Batliner, E. Nöth, A. Vinciarelli, F. Burkhardt, R. Van Son, F. Weninger, F. Eyben, T. Bocklet, et al. The interspeech 2012 speaker trait challenge. In INTERSPEECH, 2012.Google ScholarGoogle Scholar
  19. 19. H. Van hamme. HAC-models: a novel approach to continuous speech recognition. In Interspeech, 2008.Google ScholarGoogle Scholar
  20. 20. M. Van Segbroeck and H. Van hamme. Unsupervised learning of time-frequency patches as a noise-robust representation of speech. Speech Communication, 51(11):1124--1138, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21. S. M. Witt. Use of Speech Recognition in Computer-assisted Language Learning. PhD thesis, University of Cambridge, 1999.Google ScholarGoogle Scholar
  22. 22. K. Zechner, D. Higgins, X. Xi, and D. M. Williamson. Automatic scoring of non-native spontaneous speech in tests of spoken english. Speech Communication, 51(10):883--895, 2009. Figure2:Example-gureshowingalldatastreams. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Evaluating Speech, Face, Emotion and Body Movement Time-series Features for Automated Multimodal Presentation Scoring

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction
      November 2015
      678 pages
      ISBN:9781450339124
      DOI:10.1145/2818346

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 November 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      ICMI '15 Paper Acceptance Rate52of127submissions,41%Overall Acceptance Rate453of1,080submissions,42%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader