Skip to main content

2021 | OriginalPaper | Buchkapitel

Skeleton-Based Methods for Speaker Action Classification on Lecture Videos

verfasst von : Fei Xu, Kenny Davila, Srirangaraj Setlur, Venu Govindaraju

Erschienen in: Pattern Recognition. ICPR International Workshops and Challenges

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The volume of online lecture videos is growing at a frenetic pace. This has led to an increased focus on methods for automated lecture video analysis to make these resources more accessible. These methods consider multiple information channels including the actions of the lecture speaker. In this work, we analyze two methods that use spatio-temporal features of the speaker skeleton for action classification in lecture videos. The first method is the AM Pose model which is based on Random Forests with motion-based features. The second is a state-of-the-art action classifier based on a two-stream adaptive graph convolutional network (2S-AGCN) that uses features of both joints and bones of the speaker skeleton. Each video is divided into fixed-length temporal segments. Then, the speaker skeleton is estimated on every frame in order to build a representation for each segment for further classification. Our experiments used the AccessMath dataset and a novel extension which will be publicly released. We compared four state-of-the-art pose estimators: OpenPose, Deep High Resolution, AlphaPose and Detectron2. We found that AlphaPose is the most robust to the encoding noise found in online videos. We also observed that 2S-AGCN outperforms the AM Pose model by using the right domain adaptations.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Xu, F., Davila, K., Setlur, S., Govindaraju, V.: Content extraction from lecture video via speaker action classification based on pose information. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1047–1054. IEEE (2019) Xu, F., Davila, K., Setlur, S., Govindaraju, V.: Content extraction from lecture video via speaker action classification based on pose information. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1047–1054. IEEE (2019)
2.
Zurück zum Zitat Davila, K., Agarwal, A., Gaborski, R., Zanibbi, R., Ludi, S.: Accessmath: indexing and retrieving video segments containing math expressions based on visual similarity. In: 2013 Western New York Image Processing Workshop (WNYIPW), pp. 14–17. IEEE (2013) Davila, K., Agarwal, A., Gaborski, R., Zanibbi, R., Ludi, S.: Accessmath: indexing and retrieving video segments containing math expressions based on visual similarity. In: 2013 Western New York Image Processing Workshop (WNYIPW), pp. 14–17. IEEE (2013)
3.
Zurück zum Zitat Shi, L., Zhang, Y., Cheng, J., Lu, H.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: 2019 Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12018–12027. IEEE/CVF (2019) Shi, L., Zhang, Y., Cheng, J., Lu, H.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: 2019 Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12018–12027. IEEE/CVF (2019)
4.
Zurück zum Zitat Cao, Z., Hidalgo, G., Simon, T., Wei, S., Sheikh, Y.: Openpose: realtime multi-person 2d pose estimation using part affinity fields. arXiv preprint arXiv:1812.08008 (2018) Cao, Z., Hidalgo, G., Simon, T., Wei, S., Sheikh, Y.: Openpose: realtime multi-person 2d pose estimation using part affinity fields. arXiv preprint arXiv:​1812.​08008 (2018)
5.
Zurück zum Zitat Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: 2019 Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5693–5703. IEEE/CVF (2019) Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: 2019 Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5693–5703. IEEE/CVF (2019)
7.
Zurück zum Zitat Fang, H.-S., Xie, S., Tai, Y.-W., Lu., C.: Rmpe: regional multi-person pose estimation. In: 2017 International Conference on Computer Vision (ICCV), pp. 2353–2362. IEEE/CVF (2017) Fang, H.-S., Xie, S., Tai, Y.-W., Lu., C.: Rmpe: regional multi-person pose estimation. In: 2017 International Conference on Computer Vision (ICCV), pp. 2353–2362. IEEE/CVF (2017)
8.
Zurück zum Zitat Chen, Y., Tian, Y., He, M.: Monocular human pose estimation: a survey of deep learning-based methods. Computer Vision and Image Understanding, pp. 102897 (2020) Chen, Y., Tian, Y., He, M.: Monocular human pose estimation: a survey of deep learning-based methods. Computer Vision and Image Understanding, pp. 102897 (2020)
9.
Zurück zum Zitat Kuhn, H.W.: The hungarian method for the assignment problem. Naval Res. Logistics Q. 2(1–2), 83–97 (1955)MathSciNetCrossRef Kuhn, H.W.: The hungarian method for the assignment problem. Naval Res. Logistics Q. 2(1–2), 83–97 (1955)MathSciNetCrossRef
10.
Zurück zum Zitat He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: 2017 International Conference on Computer Vision (ICCV), pp. 2961–2969. IEEE/CVF (2017) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: 2017 International Conference on Computer Vision (ICCV), pp. 2961–2969. IEEE/CVF (2017)
11.
Zurück zum Zitat Max Jaderberg, Karen Simonyan, Andrew Zisserman, et al. Spatial transformer networks. In Advances in neural information processing systems, pages 2017–2025, 2015 Max Jaderberg, Karen Simonyan, Andrew Zisserman, et al. Spatial transformer networks. In Advances in neural information processing systems, pages 2017–2025, 2015
13.
Zurück zum Zitat Nguyen, T.V., Song, Z., Yan, S.: Stap: spatial-temporal attention-aware pooling for action recognition. IEEE Trans. Circuits Syst. Video Technol. 25(1), 77–86 (2014)CrossRef Nguyen, T.V., Song, Z., Yan, S.: Stap: spatial-temporal attention-aware pooling for action recognition. IEEE Trans. Circuits Syst. Video Technol. 25(1), 77–86 (2014)CrossRef
14.
Zurück zum Zitat Yi, Y., Zheng, Z., Lin, M.: Realistic action recognition with salient foreground trajectories. Expert Syst. Appl. 75, 44–55 (2017)CrossRef Yi, Y., Zheng, Z., Lin, M.: Realistic action recognition with salient foreground trajectories. Expert Syst. Appl. 75, 44–55 (2017)CrossRef
15.
Zurück zum Zitat Wang, P., Li, W., Li, C., Hou, Y.: Action recognition based on joint trajectory maps with convolutional neural networks. Knowl.-Based Syst. 158, 43–53 (2018)CrossRef Wang, P., Li, W., Li, C., Hou, Y.: Action recognition based on joint trajectory maps with convolutional neural networks. Knowl.-Based Syst. 158, 43–53 (2018)CrossRef
16.
Zurück zum Zitat Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second Conference on Artificial Intelligence (AAAI) (2018) Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second Conference on Artificial Intelligence (AAAI) (2018)
17.
Zurück zum Zitat Ma, D., Xie, B., Agam, G.: A machine learning based lecture video segmentation and indexing algorithm. In: Document Recognition and Retrieval XXI, vol. 9021, pp. 90210V. International Society for Optics and Photonics (2014) Ma, D., Xie, B., Agam, G.: A machine learning based lecture video segmentation and indexing algorithm. In: Document Recognition and Retrieval XXI, vol. 9021, pp. 90210V. International Society for Optics and Photonics (2014)
18.
Zurück zum Zitat Davila, K., Zanibbi, R.: Whiteboard video summarization via spatio-temporal conflict minimization. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 355–362. IEEE (2017) Davila, K., Zanibbi, R.: Whiteboard video summarization via spatio-temporal conflict minimization. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 355–362. IEEE (2017)
19.
Zurück zum Zitat Davila, K., Zanibbi, R.: Visual search engine for handwritten and typeset math in lecture videos and latex notes. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 50–55. IEEE (2018) Davila, K., Zanibbi, R.: Visual search engine for handwritten and typeset math in lecture videos and latex notes. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 50–55. IEEE (2018)
20.
Zurück zum Zitat Kota, B.U., Davila, K., Stone, A., Setlur, S., Govindaraju, V.: Generalized framework for summarization of fixed-camera lecture videos by detecting and binarizing handwritten content. Int. J. Doc. Anal. Recogn. (IJDAR) 22(3), 221–233 (2019)CrossRef Kota, B.U., Davila, K., Stone, A., Setlur, S., Govindaraju, V.: Generalized framework for summarization of fixed-camera lecture videos by detecting and binarizing handwritten content. Int. J. Doc. Anal. Recogn. (IJDAR) 22(3), 221–233 (2019)CrossRef
21.
Zurück zum Zitat Soares, E.R., Barrére, E.: An optimization model for temporal video lecture segmentation using word2vec and acoustic features. In: 25th Brazillian Symposium on Multimedia and the Web, pp. 513–520 (2019) Soares, E.R., Barrére, E.: An optimization model for temporal video lecture segmentation using word2vec and acoustic features. In: 25th Brazillian Symposium on Multimedia and the Web, pp. 513–520 (2019)
22.
Zurück zum Zitat Shah, R.R., Yu, Y., Shaikh, A.D., Zimmermann, R.: Trace: linguistic-based approach for automatic lecture video segmentation leveraging wikipedia texts. In: 2015 IEEE International Symposium on Multimedia (ISM), pp. 217–220. IEEE (2015) Shah, R.R., Yu, Y., Shaikh, A.D., Zimmermann, R.: Trace: linguistic-based approach for automatic lecture video segmentation leveraging wikipedia texts. In: 2015 IEEE International Symposium on Multimedia (ISM), pp. 217–220. IEEE (2015)
23.
Zurück zum Zitat Yang, H., Meinel, C.: Content based lecture video retrieval using speech and video text information. IEEE Trans. Learn. Technol. 7(2), 142–154 (2014)CrossRef Yang, H., Meinel, C.: Content based lecture video retrieval using speech and video text information. IEEE Trans. Learn. Technol. 7(2), 142–154 (2014)CrossRef
24.
Zurück zum Zitat Radha, N.: Video retrieval using speech and text in video. In: 2016 International Conference on Inventive Computation Technologies (ICICT), vol. 2, pp. 1–6. IEEE (2016) Radha, N.: Video retrieval using speech and text in video. In: 2016 International Conference on Inventive Computation Technologies (ICICT), vol. 2, pp. 1–6. IEEE (2016)
Metadaten
Titel
Skeleton-Based Methods for Speaker Action Classification on Lecture Videos
verfasst von
Fei Xu
Kenny Davila
Srirangaraj Setlur
Venu Govindaraju
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-68799-1_18

Premium Partner