Skip to main content

2016 | OriginalPaper | Buchkapitel

Online Human Action Detection Using Joint Classification-Regression Recurrent Neural Networks

verfasst von : Yanghao Li, Cuiling Lan, Junliang Xing, Wenjun Zeng, Chunfeng Yuan, Jiaying Liu

Erschienen in: Computer Vision – ECCV 2016

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Human action recognition from well-segmented 3D skeleton data has been intensively studied and has been attracting an increasing attention. Online action detection goes one step further and is more challenging, which identifies the action type and localizes the action positions on the fly from the untrimmed stream data. In this paper, we study the problem of online action detection from streaming skeleton data. We propose a multi-task end-to-end Joint Classification-Regression Recurrent Neural Network to better explore the action type and temporal localization information. By employing a joint classification and regression optimization objective, this network is capable of automatically localizing the start and end points of actions more accurately. Specifically, by leveraging the merits of the deep Long Short-Term Memory (LSTM) subnetwork, the proposed model automatically captures the complex long-range temporal dynamics, which naturally avoids the typical sliding window design and thus ensures high computational efficiency. Furthermore, the subtask of regression optimization provides the ability to forecast the action prior to its occurrence. To evaluate our proposed model, we build a large streaming video dataset with annotations. Experimental results on our dataset and the public G3D dataset both demonstrate very promising performance of our scheme.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Weinland, D., Ronfard, R., Boyerc, E.: A survey of vision-based methods for action representation, segmentation and recognition. Comput. Vis. Image Underst. 115(2), 224–241 (2011)CrossRef Weinland, D., Ronfard, R., Boyerc, E.: A survey of vision-based methods for action representation, segmentation and recognition. Comput. Vis. Image Underst. 115(2), 224–241 (2011)CrossRef
3.
Zurück zum Zitat Johansson, G.: Visual perception of biological motion and a model for it is analysis. Percept. Psychophys. 14(2), 201–211 (1973)CrossRef Johansson, G.: Visual perception of biological motion and a model for it is analysis. Percept. Psychophys. 14(2), 201–211 (1973)CrossRef
4.
Zurück zum Zitat Han, F., Reily, B., Hoff, W., Zhang, H.: Space-time representation of people based on 3D skeletal data: a review, pp. 1–20 (2016). arXiv:1601.01006 Han, F., Reily, B., Hoff, W., Zhang, H.: Space-time representation of people based on 3D skeletal data: a review, pp. 1–20 (2016). arXiv:​1601.​01006
5.
6.
Zurück zum Zitat Zanfir, M., Leordeanu, M., Sminchisescu, C.: The moving pose: an efficient 3D kinematics descriptor for low-latency action recognition and detection. In: Proceedings of IEEE International Conference on Computer Vision, pp. 2752–2759 (2013) Zanfir, M., Leordeanu, M., Sminchisescu, C.: The moving pose: an efficient 3D kinematics descriptor for low-latency action recognition and detection. In: Proceedings of IEEE International Conference on Computer Vision, pp. 2752–2759 (2013)
7.
Zurück zum Zitat Oneata, D., Verbeek, J., Schmid, C.: The LEAR submission at THUMOS 2014 (2014) Oneata, D., Verbeek, J., Schmid, C.: The LEAR submission at THUMOS 2014 (2014)
8.
Zurück zum Zitat Siva, P., Xiang, T.: Weakly supervised action detection. In: British Machine Vision Conference, Citeseer, vol. 2, p. 6 (2011) Siva, P., Xiang, T.: Weakly supervised action detection. In: British Machine Vision Conference, Citeseer, vol. 2, p. 6 (2011)
9.
Zurück zum Zitat Wang, L., Qiao, Y., Tang, X.: Action recognition and detection by combining motion and appearance feature (2014) Wang, L., Qiao, Y., Tang, X.: Action recognition and detection by combining motion and appearance feature (2014)
10.
Zurück zum Zitat Sharaf, A., Torki, M., Hussein, M.E., El-Saban, M.: Real-time multi-scale action detection from 3D skeleton data. In: Proceedings of IEEE Winter Conference on Applications of Computer Vision, pp. 998–1005 (2015) Sharaf, A., Torki, M., Hussein, M.E., El-Saban, M.: Real-time multi-scale action detection from 3D skeleton data. In: Proceedings of IEEE Winter Conference on Applications of Computer Vision, pp. 998–1005 (2015)
11.
Zurück zum Zitat Wang, L., Wang, Z., Xiong, Y., Qiao, Y.: CUHK&SIAT submission for THUMOS15 action recognition challenge (2015) Wang, L., Wang, Z., Xiong, Y., Qiao, Y.: CUHK&SIAT submission for THUMOS15 action recognition challenge (2015)
12.
Zurück zum Zitat Wu, Z., Wang, X., Jiang, Y.G., Ye, H., Xue, X.: Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In: Proceedings of ACM International Conference on Multimedia (2015) Wu, Z., Wang, X., Jiang, Y.G., Ye, H., Xue, X.: Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In: Proceedings of ACM International Conference on Multimedia (2015)
13.
Zurück zum Zitat Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015) Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015)
14.
Zurück zum Zitat Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1110–1118 (2015) Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1110–1118 (2015)
15.
Zurück zum Zitat Zhu, W., Lan, C., Xing, J., Zeng, W., Li, Y., Shen, L., Xie, X.: Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: AAAI Conference on Artificial Intelligence (2016) Zhu, W., Lan, C., Xing, J., Zeng, W., Li, Y., Shen, L., Xie, X.: Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: AAAI Conference on Artificial Intelligence (2016)
16.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
17.
Zurück zum Zitat Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014) Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014)
18.
Zurück zum Zitat Wei, P., Zheng, N., Zhao, Y., Zhu, S.C.: Concurrent action detection with structural prediction. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 3136–3143 (2013) Wei, P., Zheng, N., Zhao, Y., Zhu, S.C.: Concurrent action detection with structural prediction. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 3136–3143 (2013)
19.
Zurück zum Zitat Tian, Y., Sukthankar, R., Shah, M.: Spatiotemporal deformable part models for action detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2642–2649 (2013) Tian, Y., Sukthankar, R., Shah, M.: Spatiotemporal deformable part models for action detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2642–2649 (2013)
20.
Zurück zum Zitat Ryoo, M.S.: Human activity prediction: early recognition of ongoing activities from streaming videos. In: Proceedings of IEEE International Conference on Computer Vision, pp. 1036–1043 (2011) Ryoo, M.S.: Human activity prediction: early recognition of ongoing activities from streaming videos. In: Proceedings of IEEE International Conference on Computer Vision, pp. 1036–1043 (2011)
21.
Zurück zum Zitat Jain, M., Van Gemert, J., Jégou, H., Bouthemy, P., Snoek, C.G.: Action localization with tubelets from motion. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 740–747 (2014) Jain, M., Van Gemert, J., Jégou, H., Bouthemy, P., Snoek, C.G.: Action localization with tubelets from motion. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 740–747 (2014)
22.
Zurück zum Zitat Yu, G., Yuan, J.: Fast action proposals for human action detection and search. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1302–1311 (2015) Yu, G., Yuan, J.: Fast action proposals for human action detection and search. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1302–1311 (2015)
23.
Zurück zum Zitat Böck, S., Arzt, A., Krebs, F., Schedl, M.: Online real-time onset detection with recurrent neural networks. In: Proceedings of IEEE International Conference on Digital Audio Effects (2012) Böck, S., Arzt, A., Krebs, F., Schedl, M.: Online real-time onset detection with recurrent neural networks. In: Proceedings of IEEE International Conference on Digital Audio Effects (2012)
24.
Zurück zum Zitat Wollmer, M., Blaschke, C., Schindl, T., Schuller, B., Farber, B., Mayer, S., Trefflich, B.: Online driver distraction detection using long short-term memory. IEEE Trans. Intell. Transp. Syst. 12(2), 574–582 (2011)CrossRef Wollmer, M., Blaschke, C., Schindl, T., Schuller, B., Farber, B., Mayer, S., Trefflich, B.: Online driver distraction detection using long short-term memory. IEEE Trans. Intell. Transp. Syst. 12(2), 574–582 (2011)CrossRef
25.
Zurück zum Zitat Graves, A.: Supervised Sequence Labelling with Recurrent Neural Networks. SCI, vol. 385. Springer, Heidelberg (2012)MATH Graves, A.: Supervised Sequence Labelling with Recurrent Neural Networks. SCI, vol. 385. Springer, Heidelberg (2012)MATH
26.
Zurück zum Zitat Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer, S.C., Kolen, J.F. (eds.) A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press, Los Alamitos (2001) Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer, S.C., Kolen, J.F. (eds.) A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press, Los Alamitos (2001)
27.
Zurück zum Zitat Werbos, P.J.: Backpropagation through time: what it does and how to do it. Proc. IEEE 78(10), 1550–1560 (1990)CrossRef Werbos, P.J.: Backpropagation through time: what it does and how to do it. Proc. IEEE 78(10), 1550–1560 (1990)CrossRef
28.
Zurück zum Zitat Glocker, B., Pauly, O., Konukoglu, E., Criminisi, A.: Joint classification-regression forests for spatially structured multi-object segmentation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 870–881. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33765-9_62 Glocker, B., Pauly, O., Konukoglu, E., Criminisi, A.: Joint classification-regression forests for spatially structured multi-object segmentation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 870–881. Springer, Heidelberg (2012). doi:10.​1007/​978-3-642-33765-9_​62
29.
Zurück zum Zitat Schulter, S., Leistner, C., Wohlhart, P., Roth, P.M., Bischof, H.: Accurate object detection with joint classification-regression random forests. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 923–930 (2014) Schulter, S., Leistner, C., Wohlhart, P., Roth, P.M., Bischof, H.: Accurate object detection with joint classification-regression random forests. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 923–930 (2014)
30.
Zurück zum Zitat Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3D points. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition Workshops, pp. 9–14 (2010) Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3D points. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition Workshops, pp. 9–14 (2010)
31.
Zurück zum Zitat Yun, K., Honorio, J., Chattopadhyay, D., Berg, T.L., Samaras, D.: Two-person interaction detection using body pose features and multiple instance learning. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition Workshops, pp. 28–35 (2012) Yun, K., Honorio, J., Chattopadhyay, D., Berg, T.L., Samaras, D.: Two-person interaction detection using body pose features and multiple instance learning. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition Workshops, pp. 28–35 (2012)
32.
Zurück zum Zitat Bloom, V., Makris, D., Argyriou, V.: G3D: a gaming action dataset and real time action recognition evaluation framework. In: Proceedings of International Conference on Computer Vision and Pattern Recognition Workshops, pp. 7–12 (2012) Bloom, V., Makris, D., Argyriou, V.: G3D: a gaming action dataset and real time action recognition evaluation framework. In: Proceedings of International Conference on Computer Vision and Pattern Recognition Workshops, pp. 7–12 (2012)
33.
Zurück zum Zitat Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRef Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRef
34.
Zurück zum Zitat Freund, Y., Schapire, R.E., et al.: Experiments with a new boosting algorithm. In: Proceedings of International Conference on Machine Learning, vol. 96, pp. 148–156 (1996) Freund, Y., Schapire, R.E., et al.: Experiments with a new boosting algorithm. In: Proceedings of International Conference on Machine Learning, vol. 96, pp. 148–156 (1996)
Metadaten
Titel
Online Human Action Detection Using Joint Classification-Regression Recurrent Neural Networks
verfasst von
Yanghao Li
Cuiling Lan
Junliang Xing
Wenjun Zeng
Chunfeng Yuan
Jiaying Liu
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-46478-7_13

Premium Partner