Skip to main content
Top
Published in: International Journal on Document Analysis and Recognition (IJDAR) 3/2019

15-06-2019 | Special Issue Paper

Generalized framework for summarization of fixed-camera lecture videos by detecting and binarizing handwritten content

Authors: Bhargava Urala Kota, Kenny Davila, Alexander Stone, Srirangaraj Setlur, Venu Govindaraju

Published in: International Journal on Document Analysis and Recognition (IJDAR) | Issue 3/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We propose a framework to extract and binarize handwritten content in lecture videos. The extracted content could potentially be used to index video collections powering content-based search and navigation within lecture videos helping students and educators across the world. A deep learning pipeline is used to detect handwritten text, formulae and sketches and then binarize the extracted content. We exploit the spatio-temporal structure of our binarized detections to compute associativity information of content across all video frames. This information is later used to segment the video. Experiments are conducted to compare the performance of key components of our framework in isolation, as well as the impact on overall performance, with respect to existing methods. We evaluate our framework on the publicly available AccessMath lecture video dataset obtaining an f-measure of \(94.32\%\) for binary connected components. Code for the framework (including trained weights) and summarization will be released.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Banerjee, P., Bhattacharya, U., Chaudhuri, B.B.: Automatic detection of handwritten texts from video frames of lectures. In: 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 627–632. IEEE (2014) Banerjee, P., Bhattacharya, U., Chaudhuri, B.B.: Automatic detection of handwritten texts from video frames of lectures. In: 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 627–632. IEEE (2014)
2.
go back to reference Bernardin, K., Stiefelhagen, R.: Evaluating multiple object tracking performance: the clear mot metrics. J. Image Video Process. 2008, 1 (2008)CrossRef Bernardin, K., Stiefelhagen, R.: Evaluating multiple object tracking performance: the clear mot metrics. J. Image Video Process. 2008, 1 (2008)CrossRef
3.
go back to reference Castellanos, K.D.: Symbolic and Visual Retrieval of Mathematical Notation Using Formula Graph Symbol Pair Matching and Structural Alignment. Rochester Institute of Technology, Rochester (2017) Castellanos, K.D.: Symbolic and Visual Retrieval of Mathematical Notation Using Formula Graph Symbol Pair Matching and Structural Alignment. Rochester Institute of Technology, Rochester (2017)
4.
go back to reference Choudary, C., Liu, T.: Summarization of visual content in instructional videos. IEEE Trans. Multimed. 9(7), 1443–1455 (2007)CrossRef Choudary, C., Liu, T.: Summarization of visual content in instructional videos. IEEE Trans. Multimed. 9(7), 1443–1455 (2007)CrossRef
5.
go back to reference Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)CrossRef Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)CrossRef
6.
go back to reference Davila, K., Agarwal, A., Gaborski, R., Zanibbi, R., Ludi, S.: Accessmath: indexing and retrieving video segments containing math expressions based on visual similarity. In: Image processing workshop (WNYIPW), 2013 IEEE Western New York, pp. 14–17. IEEE (2013) Davila, K., Agarwal, A., Gaborski, R., Zanibbi, R., Ludi, S.: Accessmath: indexing and retrieving video segments containing math expressions based on visual similarity. In: Image processing workshop (WNYIPW), 2013 IEEE Western New York, pp. 14–17. IEEE (2013)
7.
go back to reference Davila, K., Zanibbi, R.: Whiteboard video summarization via spatio-temporal conflict minimization. In: International Conference on Document Analysis and Recognition (ICDAR) (2017) Davila, K., Zanibbi, R.: Whiteboard video summarization via spatio-temporal conflict minimization. In: International Conference on Document Analysis and Recognition (ICDAR) (2017)
8.
go back to reference Davila, K., Zanibbi, R.: Visual search engine for handwritten and typeset math in lecture videos and latex notes. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE (2018) Davila, K., Zanibbi, R.: Visual search engine for handwritten and typeset math in lecture videos and latex notes. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE (2018)
9.
go back to reference Dickson, P.E., Adrion, W.R., Hanson, A.R.: Whiteboard content extraction and analysis for the classroom environment. In: 10th IEEE International Symposium on Multimedia, 2008. ISM 2008, pp. 702–707. IEEE (2008) Dickson, P.E., Adrion, W.R., Hanson, A.R.: Whiteboard content extraction and analysis for the classroom environment. In: 10th IEEE International Symposium on Multimedia, 2008. ISM 2008, pp. 702–707. IEEE (2008)
10.
go back to reference Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2963–2970. IEEE (2010) Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2963–2970. IEEE (2010)
11.
go back to reference Everingham, M., Eslami, S.M.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015)CrossRef Everingham, M., Eslami, S.M.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015)CrossRef
12.
go back to reference Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016) Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016)
13.
go back to reference He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
14.
go back to reference Huang, L., Yang, Y., Deng, Y., Yu, Y.: Densebox: unifying landmark localization with end to end object detection (2015). arXiv preprint arXiv:1509.04874 Huang, L., Yang, Y., Deng, Y., Yu, Y.: Densebox: unifying landmark localization with end to end object detection (2015). arXiv preprint arXiv:​1509.​04874
15.
go back to reference Jia, W., Sun, L., Zhong, Z., Huo, Q.: A CNN-based approach to detecting text from images of whiteboards and handwritten notes. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE (2018) Jia, W., Sun, L., Zhong, Z., Huo, Q.: A CNN-based approach to detecting text from images of whiteboards and handwritten notes. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE (2018)
16.
go back to reference Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)
17.
go back to reference Khosla, A., Hamid, R., Lin, C.J., Sundaresan, N.: Large-scale video summarization using web-image priors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2698–2705 (2013) Khosla, A., Hamid, R., Lin, C.J., Sundaresan, N.: Large-scale video summarization using web-image priors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2698–2705 (2013)
18.
go back to reference Kim, G., Sigal, L., Xing, E.P.: Joint summarization of large-scale collections of web images and videos for storyline reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4225–4232 (2014) Kim, G., Sigal, L., Xing, E.P.: Joint summarization of large-scale collections of web images and videos for storyline reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4225–4232 (2014)
19.
go back to reference Kim, K.H., Hong, S., Roh, B., Cheon, Y., Park, M.: PVANet: deep but lightweight neural networks for real-time object detection (2016). arXiv preprint arXiv:1608.08021 Kim, K.H., Hong, S., Roh, B., Cheon, Y., Park, M.: PVANet: deep but lightweight neural networks for real-time object detection (2016). arXiv preprint arXiv:​1608.​08021
20.
go back to reference Kota, B.U., Davila, K., Stone, A., Setlur, S., Govindaraju, V.: Automated detection of handwritten whiteboard content in lecture videos for summarization. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 19–24. IEEE (2018) Kota, B.U., Davila, K., Stone, A., Setlur, S., Govindaraju, V.: Automated detection of handwritten whiteboard content in lecture videos for summarization. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 19–24. IEEE (2018)
21.
go back to reference Lee, G.C., Yeh, F.H., Chen, Y.J., Chang, T.K.: Robust handwriting extraction and lecture video summarization. Multimed. Tools Appl. 76(5), 7067–7085 (2017)CrossRef Lee, G.C., Yeh, F.H., Chen, Y.J., Chang, T.K.: Robust handwriting extraction and lecture video summarization. Multimed. Tools Appl. 76(5), 7067–7085 (2017)CrossRef
22.
go back to reference Lee, Y.J., Ghosh, J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1346–1353. IEEE (2012) Lee, Y.J., Ghosh, J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1346–1353. IEEE (2012)
23.
go back to reference Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. In: AAAI, pp. 4161–4167 (2017) Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. In: AAAI, pp. 4161–4167 (2017)
24.
go back to reference Lin, T.Y., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: CVPR, vol. 1, p. 4 (2017) Lin, T.Y., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: CVPR, vol. 1, p. 4 (2017)
25.
go back to reference Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016) Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016)
26.
go back to reference Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
27.
go back to reference Meng, G., Yuan, K., Wu, Y., Xiang, S., Pan, C.: Deep networks for degraded document image binarization through pyramid reconstruction. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 727–732. IEEE (2017) Meng, G., Yuan, K., Wu, Y., Xiang, S., Pan, C.: Deep networks for degraded document image binarization through pyramid reconstruction. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 727–732. IEEE (2017)
28.
go back to reference Meng, J., Wang, H., Yuan, J., Tan, Y.P.: From keyframes to key objects: video summarization by representative object proposal selection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1039–1048 (2016) Meng, J., Wang, H., Yuan, J., Tan, Y.P.: From keyframes to key objects: video summarization by representative object proposal selection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1039–1048 (2016)
29.
go back to reference Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Asian Conference on Computer Vision, pp. 770–783. Springer (2010) Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Asian Conference on Computer Vision, pp. 770–783. Springer (2010)
30.
go back to reference Ntirogiannis, K., Gatos, B., Pratikakis, I.: Performance evaluation methodology for historical document image binarization. IEEE Trans. Image Process. 22(2), 595–609 (2013)MathSciNetCrossRefMATH Ntirogiannis, K., Gatos, B., Pratikakis, I.: Performance evaluation methodology for historical document image binarization. IEEE Trans. Image Process. 22(2), 595–609 (2013)MathSciNetCrossRefMATH
31.
go back to reference Onishi, M., Izumi, M., Fukunaga, K.: Blackboard segmentation using video image of lecture and its applications. In: Proceedings of 15th International Conference on Pattern Recognition, 2000, vol. 4, pp. 615–618. IEEE (2000) Onishi, M., Izumi, M., Fukunaga, K.: Blackboard segmentation using video image of lecture and its applications. In: Proceedings of 15th International Conference on Pattern Recognition, 2000, vol. 4, pp. 615–618. IEEE (2000)
32.
go back to reference Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)CrossRef Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)CrossRef
33.
go back to reference Pratikakis, I., Zagoris, K., Barlas, G., Gatos, B.: ICFHR2016 handwritten document image binarization contest (H-DIBCO 2016). In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 619–623. IEEE (2016) Pratikakis, I., Zagoris, K., Barlas, G., Gatos, B.: ICFHR2016 handwritten document image binarization contest (H-DIBCO 2016). In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 619–623. IEEE (2016)
34.
go back to reference Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
35.
go back to reference Ren, S., He, K., Girshick, R., Sun, J.: Faster r-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015) Ren, S., He, K., Girshick, R., Sun, J.: Faster r-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
36.
go back to reference Shah, R.R., Yu, Y., Shaikh, A.D., Tang, S., Zimmermann, R.: Atlas: automatic temporal segmentation and annotation of lecture videos based on modelling transition time. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 209–212. ACM (2014) Shah, R.R., Yu, Y., Shaikh, A.D., Tang, S., Zimmermann, R.: Atlas: automatic temporal segmentation and annotation of lecture videos based on modelling transition time. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 209–212. ACM (2014)
37.
go back to reference Sudre, C.H., Li, W., Vercauteren, T., Ourselin, S., Cardoso, M.J.: Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 240–248. Springer (2017) Sudre, C.H., Li, W., Vercauteren, T., Ourselin, S., Cardoso, M.J.: Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 240–248. Springer (2017)
38.
go back to reference Tang, L., Kender, J.R.: A unified text extraction method for instructional videos. In: IEEE International Conference on Image Processing, 2005. ICIP 2005, vol. 3, pp. III–1216. IEEE (2005) Tang, L., Kender, J.R.: A unified text extraction method for instructional videos. In: IEEE International Conference on Image Processing, 2005. ICIP 2005, vol. 3, pp. III–1216. IEEE (2005)
39.
go back to reference Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: European Conference on Computer Vision, pp. 56–72. Springer (2016) Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: European Conference on Computer Vision, pp. 56–72. Springer (2016)
40.
go back to reference Vajda, S., Rothacker, L., Fink, G.A.: A method for camera-based interactive whiteboard reading. In: International Workshop on Camera-Based Document Analysis and Recognition, pp. 112–125. Springer (2011) Vajda, S., Rothacker, L., Fink, G.A.: A method for camera-based interactive whiteboard reading. In: International Workshop on Camera-Based Document Analysis and Recognition, pp. 112–125. Springer (2011)
41.
go back to reference Veit, A., Matera, T., Neumann, L., Matas, J., Belongie, S.: Coco-text: dataset and benchmark for text detection and recognition in natural images (2016). arXiv preprint arXiv:1601.07140 Veit, A., Matera, T., Neumann, L., Matas, J., Belongie, S.: Coco-text: dataset and benchmark for text detection and recognition in natural images (2016). arXiv preprint arXiv:​1601.​07140
42.
go back to reference Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), 1480–1500 (2015)CrossRef Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), 1480–1500 (2015)CrossRef
43.
go back to reference Yin, X.C., Zuo, Z.Y., Tian, S., Liu, C.L.: Text detection, tracking and recognition in video: a comprehensive survey. IEEE Trans. Image Process. 25(6), 2752–2773 (2016)MathSciNetCrossRefMATH Yin, X.C., Zuo, Z.Y., Tian, S., Liu, C.L.: Text detection, tracking and recognition in video: a comprehensive survey. IEEE Trans. Image Process. 25(6), 2752–2773 (2016)MathSciNetCrossRefMATH
44.
go back to reference Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J.: East: an efficient and accurate scene text detector. In: Proceedings of CVPR, pp. 2642–2651 (2017) Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J.: East: an efficient and accurate scene text detector. In: Proceedings of CVPR, pp. 2642–2651 (2017)
45.
go back to reference Zhu, Y., Yao, C., Bai, X.: Scene text detection and recognition: recent advances and future trends. Front. Comput. Sci. 10(1), 19–36 (2016)CrossRef Zhu, Y., Yao, C., Bai, X.: Scene text detection and recognition: recent advances and future trends. Front. Comput. Sci. 10(1), 19–36 (2016)CrossRef
Metadata
Title
Generalized framework for summarization of fixed-camera lecture videos by detecting and binarizing handwritten content
Authors
Bhargava Urala Kota
Kenny Davila
Alexander Stone
Srirangaraj Setlur
Venu Govindaraju
Publication date
15-06-2019
Publisher
Springer Berlin Heidelberg
Published in
International Journal on Document Analysis and Recognition (IJDAR) / Issue 3/2019
Print ISSN: 1433-2833
Electronic ISSN: 1433-2825
DOI
https://doi.org/10.1007/s10032-019-00327-y

Other articles of this Issue 3/2019

International Journal on Document Analysis and Recognition (IJDAR) 3/2019 Go to the issue

Premium Partner