Skip to main content

2023 | OriginalPaper | Chapter

AI-Based Visualization of Voice Characteristics in Lecture Videos’ Captions

Authors : Tim Schlippe, Katrin Fritsche, Ying Sun, Matthias Wölfel

Published in: Artificial Intelligence in Education Technologies: New Development and Innovative Practices

Publisher: Springer Nature Singapore

Activate our intelligent search to find suitable subject content or patents.

loading …


More and more educational institutions are making lecture videos available online. Since 100+ empirical studies document that captioning a video improves comprehension of, attention to, and memory for the video [1], it makes sense to provide those lecture videos with captions. However, studies also show that the words themselves contribute only 7% and how we say those words with our tone, intonation, and verbal pace contributes 38% to making messages clear in human communication [2]. Consequently, in this paper, we address the question of whether an AI-based visualization of voice characteristics in captions helps students further improve the watching and learning experience in lecture videos. For the AI-based visualization of the speaker’s voice characteristics in the captions we use the WaveFont technology [35], which processes the voice signal and intuitively displays loudness, speed and pauses in the subtitle font. In our survey of 48 students, it could be shown that in all surveyed categories—visualization of voice characteristics, understanding the content, following the content, linguistic understanding, and identifying important words—always a significant majority of the participants prefers the WaveFont captions to watch lecture videos.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"


Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"


Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe


Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"


Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

In this research we refer to interlingual translation as subtitles and transcription in the same language as captions.
BMBF funding number: 16DHB3006; running time 1.1.2020–31.12.2022.
go back to reference Gernsbacher, M.A.: Video captions benefit everyone. Policy Insights Behav. Brain Sci. 2(1), 195–202 (2015)CrossRef Gernsbacher, M.A.: Video captions benefit everyone. Policy Insights Behav. Brain Sci. 2(1), 195–202 (2015)CrossRef
go back to reference Wölfel, M., Schlippe, T., Stitz, A.: Voice driven type design. In: International Conference on Speech Technology and Human-Computer Dialog (SpeD), Bucharest, Romania (2015) Wölfel, M., Schlippe, T., Stitz, A.: Voice driven type design. In: International Conference on Speech Technology and Human-Computer Dialog (SpeD), Bucharest, Romania (2015)
go back to reference Schlippe, T., Wölfel, M., Stitz, A.: Generation of text from an audio speech signal. US Patent 10043519B2 (2018) Schlippe, T., Wölfel, M., Stitz, A.: Generation of text from an audio speech signal. US Patent 10043519B2 (2018)
go back to reference Schlippe, T., Alessai, S., El-Taweel, G., Wölfel, M., Zaghouani, W.: Visualizing voice characteristics with type design in closed captions for Arabic, International Conference on Cyberworlds (CW 2020), Caen, France (2020) Schlippe, T., Alessai, S., El-Taweel, G., Wölfel, M., Zaghouani, W.: Visualizing voice characteristics with type design in closed captions for Arabic, International Conference on Cyberworlds (CW 2020), Caen, France (2020)
go back to reference Correia, A.P., Liu, C., Xu, F.: Evaluating videoconferencing systems for the quality of the educational experience. Distance Educ. 41(4), 429–452 (2020)CrossRef Correia, A.P., Liu, C., Xu, F.: Evaluating videoconferencing systems for the quality of the educational experience. Distance Educ. 41(4), 429–452 (2020)CrossRef
go back to reference Koravuna, S., Surepally, U.K.: Educational gamification and artificial intelligence for promoting digital literacy. Association for Computing Machinery, New York, NY, USA (2020) Koravuna, S., Surepally, U.K.: Educational gamification and artificial intelligence for promoting digital literacy. Association for Computing Machinery, New York, NY, USA (2020)
go back to reference Rakhmanov, O., Schlippe, T.: Sentiment analysis for Hausa: Classifying students’ comments. The 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages (SIGUL 2022). Marseille, France (2022) Rakhmanov, O., Schlippe, T.: Sentiment analysis for Hausa: Classifying students’ comments. The 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages (SIGUL 2022). Marseille, France (2022)
go back to reference Libbrecht, P., Declerck, T., Schlippe, T., Mandl, T., Schiffner, D.: NLP for student and teacher: Concept for an AI based information literacy tutoring system. In: The 29th ACM International Conference on Information and Knowledge Management (CIKM2020). Galway, Ireland (2020) Libbrecht, P., Declerck, T., Schlippe, T., Mandl, T., Schiffner, D.: NLP for student and teacher: Concept for an AI based information literacy tutoring system. In: The 29th ACM International Conference on Information and Knowledge Management (CIKM2020). Galway, Ireland (2020)
go back to reference Sawatzki, J., Schlippe, T., Benner-Wickner, M.: Deep learning techniques for automatic short answer grading: Predicting scores for English and German answers. In: The 2nd International Conference on Artificial Intelligence in Education Technology (AIET 2021). Wuhan, China (2021) Sawatzki, J., Schlippe, T., Benner-Wickner, M.: Deep learning techniques for automatic short answer grading: Predicting scores for English and German answers. In: The 2nd International Conference on Artificial Intelligence in Education Technology (AIET 2021). Wuhan, China (2021)
go back to reference Schlippe, T., Sawatzki, J.: Cross-lingual automatic short answer grading. In: The 2nd International Conference on Artificial Intelligence in Education Technology (AIET 2021). Wuhan, China (2021) Schlippe, T., Sawatzki, J.: Cross-lingual automatic short answer grading. In: The 2nd International Conference on Artificial Intelligence in Education Technology (AIET 2021). Wuhan, China (2021)
go back to reference Bothmer, K., Schlippe, T.: Investigating natural language processing techniques for a recommendation system to support employers, job seekers and educational institutions. In: The 23rd International Conference on Artificial Intelligence in Education (AIED) (2022) Bothmer, K., Schlippe, T.: Investigating natural language processing techniques for a recommendation system to support employers, job seekers and educational institutions. In: The 23rd International Conference on Artificial Intelligence in Education (AIED) (2022)
go back to reference Bothmer, K., Schlippe, T.: Skill Scanner: Connecting and supporting employers, job seekers and educational institutions with an AI-based recommendation system. In: Proceedings of The Learning Ideas Conference 2022 (15th Annual Conference), New York, 15–17 June (2022) Bothmer, K., Schlippe, T.: Skill Scanner: Connecting and supporting employers, job seekers and educational institutions with an AI-based recommendation system. In: Proceedings of The Learning Ideas Conference 2022 (15th Annual Conference), New York, 15–17 June (2022)
go back to reference Wölfel, M.: Towards the automatic generation of pedagogical conversational agents from lecture slides. In: International Conference on Multimedia Technology and Enhanced Learning (2021) Wölfel, M.: Towards the automatic generation of pedagogical conversational agents from lecture slides. In: International Conference on Multimedia Technology and Enhanced Learning (2021)
go back to reference Ou, C., Joyner, D.A., Goel, A.K.: Designing and developing video lessons for online learning: A seven-principle model. Online Learn. 23(2), 82–104 (2019)CrossRef Ou, C., Joyner, D.A., Goel, A.K.: Designing and developing video lessons for online learning: A seven-principle model. Online Learn. 23(2), 82–104 (2019)CrossRef
go back to reference Perego, E., Del Missier, F., Porta, M., Mosconi, M.: The cognitive effectiveness of subtitle processing. Media Psychol. 13, 243–272 (2010)CrossRef Perego, E., Del Missier, F., Porta, M., Mosconi, M.: The cognitive effectiveness of subtitle processing. Media Psychol. 13, 243–272 (2010)CrossRef
go back to reference Linebarger, D.L.: Learning to read from television: The effects of using captions and narration. J. Educ. Psychol. 93, 288–298 (2001)CrossRef Linebarger, D.L.: Learning to read from television: The effects of using captions and narration. J. Educ. Psychol. 93, 288–298 (2001)CrossRef
go back to reference Bowe, F.G., Kaufman, A.: Captioned Media: Teacher Perceptions of Potential Value for Students with No Hearing Impairments: A National Survey of Special Educators. Described and Captioned Media Program, Spartanburg, SC (2001) Bowe, F.G., Kaufman, A.: Captioned Media: Teacher Perceptions of Potential Value for Students with No Hearing Impairments: A National Survey of Special Educators. Described and Captioned Media Program, Spartanburg, SC (2001)
go back to reference Alfayez, Z.H.: Designing educational videos for university websites based on students’ preferences. Online Learn. 25(2), 280–298 (2021)CrossRef Alfayez, Z.H.: Designing educational videos for university websites based on students’ preferences. Online Learn. 25(2), 280–298 (2021)CrossRef
go back to reference Persson, J.R., Wattengård, E., Lilledahl, M.B.: The effect of captions and written text on viewing behavior in educational videos. Int. J. Math Sci. Technol. Educ. 7(1), 124–147 (2019) Persson, J.R., Wattengård, E., Lilledahl, M.B.: The effect of captions and written text on viewing behavior in educational videos. Int. J. Math Sci. Technol. Educ. 7(1), 124–147 (2019)
go back to reference Brown, A., et al.: Dynamic subtitles: The user experience. In: TVX (2015) Brown, A., et al.: Dynamic subtitles: The user experience. In: TVX (2015)
go back to reference Fox, W.: Integrated titles: An improved viewing experience. In: Eyetracking and Applied Linguistics (2016) Fox, W.: Integrated titles: An improved viewing experience. In: Eyetracking and Applied Linguistics (2016)
go back to reference Ohene-Djan, J., Wright, J., Combie-Smith, K.: Emotional subtitles: A system and potential applications for deaf and hearing impaired people. In: CVHI (2007) Ohene-Djan, J., Wright, J., Combie-Smith, K.: Emotional subtitles: A system and potential applications for deaf and hearing impaired people. In: CVHI (2007)
go back to reference Rashid, R., Aitken, J., Fels, D.: Expressing emotions using animated text captions. Web Design for Dyslexics: Accessibility of Arabic Content (2006) Rashid, R., Aitken, J., Fels, D.: Expressing emotions using animated text captions. Web Design for Dyslexics: Accessibility of Arabic Content (2006)
go back to reference Bessemans, A., Renckens, M., Bormans, K., Nuyts, E., Larson, K.: Visual prosody supports reading aloud expressively. Visible Lang. 53, 28–49 (2019) Bessemans, A., Renckens, M., Bormans, K., Nuyts, E., Larson, K.: Visual prosody supports reading aloud expressively. Visible Lang. 53, 28–49 (2019)
go back to reference Gernsbacher, M.: Video captions benefit everyone. Policy Insights Behav. Brain Sci. 2, 195–202 (2015)CrossRef Gernsbacher, M.: Video captions benefit everyone. Policy Insights Behav. Brain Sci. 2, 195–202 (2015)CrossRef
go back to reference El-Taweel, G.: Conveying emotions in Arabic SDH: The case of pride and prejudice. Master thesis, Hamad Bin Khalifa University (2016) El-Taweel, G.: Conveying emotions in Arabic SDH: The case of pride and prejudice. Master thesis, Hamad Bin Khalifa University (2016)
go back to reference de Lacerda Pataca, C., Costa, P.D.P.: Speech modulated typography: Towards an affective representation model. In: International Conference on Intelligent User Interfaces, pp. 139–143 (2020) de Lacerda Pataca, C., Costa, P.D.P.: Speech modulated typography: Towards an affective representation model. In: International Conference on Intelligent User Interfaces, pp. 139–143 (2020)
go back to reference de Lacerda Pataca, C., Dornhofer Paro Costa, P.: Hidden bawls, whispers, and yelps: Can text be made to sound more than just its words? (2022). arXiv:2202.10631 de Lacerda Pataca, C., Dornhofer Paro Costa, P.: Hidden bawls, whispers, and yelps: Can text be made to sound more than just its words? (2022). arXiv:​2202.​10631
go back to reference Bringhurst, R.: The elements of typographic style, vol. 3.2, pp. 55–56. Hartley and Marks Publishers (2008) Bringhurst, R.: The elements of typographic style, vol. 3.2, pp. 55–56. Hartley and Marks Publishers (2008)
go back to reference Unger, G.: Wie man’s liest, pp. 63–65. Niggli Verlag (2006) Unger, G.: Wie man’s liest, pp. 63–65. Niggli Verlag (2006)
go back to reference Rayner, S.G.: Cognitive styles and learning styles. In: Wright, J.D. (ed.) International Encyclopedia of Social and Behavioral Sciences, vol. 4, 2nd edn, pp. 110–117. Elsevier, Oxford (2015) Rayner, S.G.: Cognitive styles and learning styles. In: Wright, J.D. (ed.) International Encyclopedia of Social and Behavioral Sciences, vol. 4, 2nd edn, pp. 110–117. Elsevier, Oxford (2015)
AI-Based Visualization of Voice Characteristics in Lecture Videos’ Captions
Tim Schlippe
Katrin Fritsche
Ying Sun
Matthias Wölfel
Copyright Year
Springer Nature Singapore

Premium Partner