Skip to main content
Erschienen in: Neural Computing and Applications 5/2018

19.12.2017 | S.I. : Neural Computing in Next Generation Virtual Reality Technology

A novel approach to automatic detection of presentation slides in educational videos

verfasst von: Baoquan Zhao, Shujin Lin, Xin Qi, Ruomei Wang, Xiaonan Luo

Erschienen in: Neural Computing and Applications | Ausgabe 5/2018

Einloggen

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Recent advancement in learning and teaching methodology experimented with virtual reality (VR)-based presentation form to create immersive learning and training environment. The quality of such educational VR applications not only relies on the virtual model, but the 2D presentation materials such as text, diagrams and figures. However, manual designing or seeking these educational resources is both labor intensive and time-consuming. In this paper, we introduce a new automatic algorithm to detect and extract presentation slides in educational videos, which will provide abundant resources for creating slide-based immersive presentation environment. The proposed approach mainly involves five core components: shot boundary detection, training instances collection, shot classification, slide region detection and slide transition detection. We conducted comparison experiment to evaluate the performance of the proposed method. The results indicate that, in comparison with peer method, the proposed method improves the precision of slide detection from 81.6 to 92.6% and recall from 74.7 to 86.3% on average. With the detected slides, content analyzer can be employed to further extract reusable elements, which can be used for developing VR-based educational applications.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Franziska P, Wittstock V, Lorenz M, Riedel T (2013) Immersive presentations: enabling engaging virtual reality based training and teaching by merging slide-based and vr-based elements. In: 5th international conference on changeable, agile, reconfigurable and virtual production (CARV 2013), Springer, pp 125–130 Franziska P, Wittstock V, Lorenz M, Riedel T (2013) Immersive presentations: enabling engaging virtual reality based training and teaching by merging slide-based and vr-based elements. In: 5th international conference on changeable, agile, reconfigurable and virtual production (CARV 2013), Springer, pp 125–130
2.
Zurück zum Zitat Price CB (2008) Unreal powerpoint: immersing powerpoint presentations in a virtual computer game engine world. Comput Hum Behav 24(6):2486–2495CrossRef Price CB (2008) Unreal powerpoint: immersing powerpoint presentations in a virtual computer game engine world. Comput Hum Behav 24(6):2486–2495CrossRef
3.
Zurück zum Zitat Guo PJ, Reinecke K (2014) Demographic differences in how students navigate through MOOCs. In: Proceedings of the first ACM conference on learning@ scale conference, ACM, pp 21–30 Guo PJ, Reinecke K (2014) Demographic differences in how students navigate through MOOCs. In: Proceedings of the first ACM conference on learning@ scale conference, ACM, pp 21–30
4.
Zurück zum Zitat Krishnan SS, Sitaraman RK (2013) Video stream quality impacts viewer behavior: inferring causality using quasi-experimental designs. IEEE/ACM Trans Netw 21(6):2001–2014CrossRef Krishnan SS, Sitaraman RK (2013) Video stream quality impacts viewer behavior: inferring causality using quasi-experimental designs. IEEE/ACM Trans Netw 21(6):2001–2014CrossRef
5.
Zurück zum Zitat Matejka J, Grossman T, Fitzmaurice G (2012) Swift: reducing the effects of latency in online video scrubbing. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 637–646 Matejka J, Grossman T, Fitzmaurice G (2012) Swift: reducing the effects of latency in online video scrubbing. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 637–646
6.
Zurück zum Zitat Matejka J, Grossman T, Fitzmaurice G (2013) Swifter: improved online video scrubbing. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp. 1159–1168 Matejka J, Grossman T, Fitzmaurice G (2013) Swifter: improved online video scrubbing. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp. 1159–1168
7.
Zurück zum Zitat Goldman DB, Curless B, Salesin D, Seitz SM (2006) Schematic storyboarding for video visualization and editing. In: ACM transactions on graphics (TOG), ACM, vol 5, pp 862–871 Goldman DB, Curless B, Salesin D, Seitz SM (2006) Schematic storyboarding for video visualization and editing. In: ACM transactions on graphics (TOG), ACM, vol 5, pp 862–871
8.
Zurück zum Zitat Calic J, Gibson DP, Campbell NW (2007) Efficient layout of comic-like video summaries. IEEE Trans Circuits Syst Video Technol 17(7):931–936CrossRef Calic J, Gibson DP, Campbell NW (2007) Efficient layout of comic-like video summaries. IEEE Trans Circuits Syst Video Technol 17(7):931–936CrossRef
9.
Zurück zum Zitat Mei T, Yang B, Yang S-Q, Hua X-S (2009) Video collage: presenting a video sequence using a single image. Vis Comput 25(1):39–51CrossRef Mei T, Yang B, Yang S-Q, Hua X-S (2009) Video collage: presenting a video sequence using a single image. Vis Comput 25(1):39–51CrossRef
10.
Zurück zum Zitat Adjeroh D, Lee MC, Banda N (2009) Adaptive edge-oriented shot boundary detection. EURASIP J Image Video Process 2009(1):1CrossRef Adjeroh D, Lee MC, Banda N (2009) Adaptive edge-oriented shot boundary detection. EURASIP J Image Video Process 2009(1):1CrossRef
11.
Zurück zum Zitat Yoo H-W, Ryoo H-J, Jang D-S (2006) Gradual shot boundary detection using localized edge blocks. Multimed Tools Appl 28(3):283–300CrossRef Yoo H-W, Ryoo H-J, Jang D-S (2006) Gradual shot boundary detection using localized edge blocks. Multimed Tools Appl 28(3):283–300CrossRef
12.
Zurück zum Zitat Li W-K, Lai S-H (2003) Integrated video shot segmentation algorithm. In: Electronic imaging 2003, international society for optics and photonics, pp 264–271 Li W-K, Lai S-H (2003) Integrated video shot segmentation algorithm. In: Electronic imaging 2003, international society for optics and photonics, pp 264–271
13.
Zurück zum Zitat Zhe-Ming L, Shi Y (2013) Fast video shot boundary detection based on svd and pattern matching. IEEE Trans Image Process 22(12):5136–5145MathSciNetCrossRef Zhe-Ming L, Shi Y (2013) Fast video shot boundary detection based on svd and pattern matching. IEEE Trans Image Process 22(12):5136–5145MathSciNetCrossRef
14.
Zurück zum Zitat Boreczky J, Girgensohn A, Golovchinsky G, Uchihashi S (2000) An interactive comic book presentation for exploring video. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 185–192 Boreczky J, Girgensohn A, Golovchinsky G, Uchihashi S (2000) An interactive comic book presentation for exploring video. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 185–192
15.
Zurück zum Zitat Chiu P, Girgensohn A, Liu Q (2004) Stained-glass visualization for highly condensed video summaries. In: IEEE international conference on multimedia and expo, 2004. ICME’04, IEEE, vol 3, pp 2059–2062 Chiu P, Girgensohn A, Liu Q (2004) Stained-glass visualization for highly condensed video summaries. In: IEEE international conference on multimedia and expo, 2004. ICME’04, IEEE, vol 3, pp 2059–2062
16.
Zurück zum Zitat Teodosio L, Bender W (2005) Salient stills. ACM Trans Multimed Comput Commun Appl (TOMM) 1(1):16–36CrossRef Teodosio L, Bender W (2005) Salient stills. ACM Trans Multimed Comput Commun Appl (TOMM) 1(1):16–36CrossRef
17.
Zurück zum Zitat Jing G, Yongtao H, Guo Y, Yizhou Y, Wang W (2015) Content-aware video2comics with manga-style layout. IEEE Trans Multimed 17(12):2122–2133CrossRef Jing G, Yongtao H, Guo Y, Yizhou Y, Wang W (2015) Content-aware video2comics with manga-style layout. IEEE Trans Multimed 17(12):2122–2133CrossRef
18.
Zurück zum Zitat Chen Y-N, Huang Y, Kong S-Y, Lee L-S (2010) Automatic key term extraction from spoken course lectures using branching entropy and prosodic/semantic features. In: Spoken language technology workshop (SLT), 2010 IEEE, pp 265–270 Chen Y-N, Huang Y, Kong S-Y, Lee L-S (2010) Automatic key term extraction from spoken course lectures using branching entropy and prosodic/semantic features. In: Spoken language technology workshop (SLT), 2010 IEEE, pp 265–270
19.
Zurück zum Zitat Balasubramanian V, Doraisamy SG, Kanakarajan NK (2016) A multimodal approach for extracting content descriptive metadata from lecture videos. J Intell Inf Syst 46(1):121–145CrossRef Balasubramanian V, Doraisamy SG, Kanakarajan NK (2016) A multimodal approach for extracting content descriptive metadata from lecture videos. J Intell Inf Syst 46(1):121–145CrossRef
20.
Zurück zum Zitat Haubold A (2004) Analysis and visualization of index words from audio transcripts of instructional videos. In: Proceedings of IEEE sixth international symposium on multimedia software engineering, 2004, pp 570–573. IEEE Haubold A (2004) Analysis and visualization of index words from audio transcripts of instructional videos. In: Proceedings of IEEE sixth international symposium on multimedia software engineering, 2004, pp 570–573. IEEE
21.
Zurück zum Zitat Haubold A, Kender JR (2005) Augmented segmentation and visualization for presentation videos. In: Proceedings of the 13th annual ACM international conference on multimedia, ACM, pp 51–60 Haubold A, Kender JR (2005) Augmented segmentation and visualization for presentation videos. In: Proceedings of the 13th annual ACM international conference on multimedia, ACM, pp 51–60
22.
Zurück zum Zitat Zhao B, Xu S, Lin S, Luo X, Duan L (2015) A new visual navigation system for exploring biomedical open educational resource (OER) videos. J Am Med Inf Assoc 23:e34CrossRef Zhao B, Xu S, Lin S, Luo X, Duan L (2015) A new visual navigation system for exploring biomedical open educational resource (OER) videos. J Am Med Inf Assoc 23:e34CrossRef
23.
Zurück zum Zitat Xiangyu W, Ramanathan S, Kankanhalli M (2009) A robust framework for aligning lecture slides with video. In: 2009 16th IEEE international conference on image processing (ICIP), IEEE, pp 249–252 Xiangyu W, Ramanathan S, Kankanhalli M (2009) A robust framework for aligning lecture slides with video. In: 2009 16th IEEE international conference on image processing (ICIP), IEEE, pp 249–252
24.
Zurück zum Zitat Schroth G, Cheung N-M, Steinbach E, Girod B (2011) Synchronization of presentation slides and lecture videos using bit rate sequences. In: 2011 18th IEEE international conference on image processing, IEEE, pp 925–928 Schroth G, Cheung N-M, Steinbach E, Girod B (2011) Synchronization of presentation slides and lecture videos using bit rate sequences. In: 2011 18th IEEE international conference on image processing, IEEE, pp 925–928
25.
Zurück zum Zitat Kao JL, Chen SY, Duh DJ (2013) Detecting handwritten annotation by synchronization of lecture slides and videos. In: Proceedings of the international conference on image processing, computer vision, and pattern recognition (IPCV), pp 1. The steering committee of the world congress in computer science, computer engineering and applied computing (WorldComp) Kao JL, Chen SY, Duh DJ (2013) Detecting handwritten annotation by synchronization of lecture slides and videos. In: Proceedings of the international conference on image processing, computer vision, and pattern recognition (IPCV), pp 1. The steering committee of the world congress in computer science, computer engineering and applied computing (WorldComp)
26.
Zurück zum Zitat Adcock J, Cooper M, Denoue L, Pirsiavash H, Rowe LA (2010) Talkminer: a lecture webcast search engine. In: Proceedings of the 18th ACM international conference on Multimedia, ACM, pp 241–250 Adcock J, Cooper M, Denoue L, Pirsiavash H, Rowe LA (2010) Talkminer: a lecture webcast search engine. In: Proceedings of the 18th ACM international conference on Multimedia, ACM, pp 241–250
27.
Zurück zum Zitat Tuna T, Subhlok J, Barker L, Varghese V, Johnson O, Shah S (2012) Development and evaluation of indexed captioned searchable videos for stem coursework. In: Proceedings of the 43rd ACM technical symposium on computer science education, ACM, pp 129–134 Tuna T, Subhlok J, Barker L, Varghese V, Johnson O, Shah S (2012) Development and evaluation of indexed captioned searchable videos for stem coursework. In: Proceedings of the 43rd ACM technical symposium on computer science education, ACM, pp 129–134
28.
Zurück zum Zitat Yang H, Meinel C (2014) Content based lecture video retrieval using speech and video text information. IEEE Trans Learn Technol 7:142–154CrossRef Yang H, Meinel C (2014) Content based lecture video retrieval using speech and video text information. IEEE Trans Learn Technol 7:142–154CrossRef
29.
Zurück zum Zitat Rubner Y, Tomasi C, Guibas LJ (2000) The earth mover’s distance as a metric for image retrieval. Int J Comput Vis 40:99–121CrossRefMATH Rubner Y, Tomasi C, Guibas LJ (2000) The earth mover’s distance as a metric for image retrieval. Int J Comput Vis 40:99–121CrossRefMATH
30.
Zurück zum Zitat Baltru T, Robinson P, Morency L-P, et al (2016) Openface: an open source facial behavior analysis toolkit. In: 2016 IEEE winter conference on applications of computer vision (WACV), IEEE, pp 1–10 Baltru T, Robinson P, Morency L-P, et al (2016) Openface: an open source facial behavior analysis toolkit. In: 2016 IEEE winter conference on applications of computer vision (WACV), IEEE, pp 1–10
31.
Zurück zum Zitat Smith R, Antonova D, Lee D-S (2009) Adapting the tesseract open source OCR engine for multilingual OCR. In: Proceedings of the international workshop on multilingual OCR, ACM, pp 1 Smith R, Antonova D, Lee D-S (2009) Adapting the tesseract open source OCR engine for multilingual OCR. In: Proceedings of the international workshop on multilingual OCR, ACM, pp 1
32.
Zurück zum Zitat Khan R, Van de Weijer J, Shahbaz KF, Muselet D, Ducottet C, Barat C (2013) Discriminative color descriptors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2866–2873 Khan R, Van de Weijer J, Shahbaz KF, Muselet D, Ducottet C, Barat C (2013) Discriminative color descriptors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2866–2873
33.
Zurück zum Zitat Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175CrossRefMATH Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175CrossRefMATH
34.
Zurück zum Zitat Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), IEEE, vol 1, pp 886–893 Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), IEEE, vol 1, pp 886–893
35.
Zurück zum Zitat Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987CrossRefMATH Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987CrossRefMATH
36.
Zurück zum Zitat Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110CrossRef Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110CrossRef
37.
Zurück zum Zitat Shechtman E, Irani M (2007) Matching local self-similarities across images and videos. In: 2007 IEEE conference on computer vision and pattern recognition, IEEE, pp 1–8 Shechtman E, Irani M (2007) Matching local self-similarities across images and videos. In: 2007 IEEE conference on computer vision and pattern recognition, IEEE, pp 1–8
38.
Zurück zum Zitat Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 3360–3367 Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 3360–3367
39.
Zurück zum Zitat Jeong HJ, Kim T-E, Kim HG, Kim MH (2015) Automatic detection of slide transitions in lecture videos. Multimed Tools Appl 74(18):7537–7554CrossRef Jeong HJ, Kim T-E, Kim HG, Kim MH (2015) Automatic detection of slide transitions in lecture videos. Multimed Tools Appl 74(18):7537–7554CrossRef
40.
Zurück zum Zitat Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66MathSciNetCrossRef Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66MathSciNetCrossRef
Metadaten
Titel
A novel approach to automatic detection of presentation slides in educational videos
verfasst von
Baoquan Zhao
Shujin Lin
Xin Qi
Ruomei Wang
Xiaonan Luo
Publikationsdatum
19.12.2017
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 5/2018
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-017-3276-1

Weitere Artikel der Ausgabe 5/2018

Neural Computing and Applications 5/2018 Zur Ausgabe

S.I. : Neural Computing in Next Generation Virtual Reality Technology

Editorial: neural computing in next-generation virtual reality technology

Neural Computing in Next Generation Virtual Reality Technology

Scale invariant point feature (SIPF) for 3D point clouds and 3D multi-scale object detection