Skip to main content
Top

2018 | OriginalPaper | Chapter

Graph Distillation for Action Detection with Privileged Modalities

Authors : Zelun Luo, Jun-Ting Hsieh, Lu Jiang, Juan Carlos Niebles, Li Fei-Fei

Published in: Computer Vision – ECCV 2018

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We propose a technique that tackles action detection in multimodal videos under a realistic and challenging condition in which only limited training data and partially observed modalities are available. Common methods in transfer learning do not take advantage of the extra modalities potentially available in the source domain. On the other hand, previous work on multimodal learning only focuses on a single domain or task and does not handle the modality discrepancy between training and testing. In this work, we propose a method termed graph distillation that incorporates rich privileged information from a large-scale multimodal dataset in the source domain, and improves the learning in the target domain where training data and modalities are scarce. We evaluate our approach on action classification and detection tasks in multimodal videos, and show that our model outperforms the state-of-the-art by a large margin on the NTU RGB+D and PKU-MMD benchmarks. The code is released at http://​alan.​vision/​eccv18_​graph/​.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
1.
go back to reference Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: International Conference on Machine Learning (ICML) (2009) Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: International Conference on Machine Learning (ICML) (2009)
2.
go back to reference Buch, S., Escorcia, V., Shen, C., Ghanem, B., Niebles, J.C.: SST: single-stream temporal action proposals. In: CVPR (2017) Buch, S., Escorcia, V., Shen, C., Ghanem, B., Niebles, J.C.: SST: single-stream temporal action proposals. In: CVPR (2017)
3.
go back to reference Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: Computer Vision and Pattern Recognition (CVPR) (2017) Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: Computer Vision and Pattern Recognition (CVPR) (2017)
5.
go back to reference Chen, X., Gupta, A.: Webly supervised learning of convolutional networks. In: International Conference on Computer Vision (ICCV) (2015) Chen, X., Gupta, A.: Webly supervised learning of convolutional networks. In: International Conference on Computer Vision (ICCV) (2015)
6.
go back to reference Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling (2014) Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling (2014)
7.
go back to reference Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Computer Vision and Pattern Recognition (CVPR) (2009) Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Computer Vision and Pattern Recognition (CVPR) (2009)
9.
go back to reference Ding, Z., Wang, P., Ogunbona, P.O., Li, W.: Investigation of different skeleton features for CNN-based 3D action recognition. arXiv preprint arXiv:1705.00835 (2017) Ding, Z., Wang, P., Ogunbona, P.O., Li, W.: Investigation of different skeleton features for CNN-based 3D action recognition. arXiv preprint arXiv:​1705.​00835 (2017)
10.
go back to reference Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: Computer Vision and Pattern Recognition (CVPR) (2015) Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: Computer Vision and Pattern Recognition (CVPR) (2015)
12.
go back to reference Fernando, B., Habrard, A., Sebban, M., Tuytelaars, T.: Unsupervised visual domain adaptation using subspace alignment. In: International Conference on Computer Vision (ICCV), pp. 2960–2967 (2013) Fernando, B., Habrard, A., Sebban, M., Tuytelaars, T.: Unsupervised visual domain adaptation using subspace alignment. In: International Conference on Computer Vision (ICCV), pp. 2960–2967 (2013)
13.
go back to reference Girshick, R.: Fast R-CNN. In: International Conference on Computer Vision (ICCV) (2015) Girshick, R.: Fast R-CNN. In: International Conference on Computer Vision (ICCV) (2015)
14.
go back to reference Gorban, A., et al.: Thumos challenge: action recognition with a large number of classes. In: Computer Vision and Pattern Recognition (CVPR) Workshop (2015) Gorban, A., et al.: Thumos challenge: action recognition with a large number of classes. In: Computer Vision and Pattern Recognition (CVPR) Workshop (2015)
15.
go back to reference Gupta, S., Hoffman, J., Malik, J.: Cross modal distillation for supervision transfer. In: Computer Vision and Pattern Recognition (CVPR) (2016) Gupta, S., Hoffman, J., Malik, J.: Cross modal distillation for supervision transfer. In: Computer Vision and Pattern Recognition (CVPR) (2016)
16.
go back to reference Haque, A., et al.: Towards vision-based smart hospitals: a system for tracking and monitoring hand hygiene compliance. In: Proceedings of Machine Learning for Healthcare 2017 (2017) Haque, A., et al.: Towards vision-based smart hospitals: a system for tracking and monitoring hand hygiene compliance. In: Proceedings of Machine Learning for Healthcare 2017 (2017)
17.
go back to reference He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition (CVPR) (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition (CVPR) (2016)
18.
go back to reference Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In: NIPS Workshop (2015) Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In: NIPS Workshop (2015)
19.
go back to reference Hoffman, J., Gupta, S., Darrell, T.: Learning with side information through modality hallucination. In: Computer Vision and Pattern Recognition (CVPR) (2016) Hoffman, J., Gupta, S., Darrell, T.: Learning with side information through modality hallucination. In: Computer Vision and Pattern Recognition (CVPR) (2016)
20.
go back to reference Jiang, L., Meng, D., Mitamura, T., Hauptmann, A.G.: Easy samples first: self-paced reranking for zero-example multimedia search. In: MM (2014) Jiang, L., Meng, D., Mitamura, T., Hauptmann, A.G.: Easy samples first: self-paced reranking for zero-example multimedia search. In: MM (2014)
21.
go back to reference Kingma, P.K., Ba, J.: Adam: a method for stochastic optimization (2015) Kingma, P.K., Ba, J.: Adam: a method for stochastic optimization (2015)
22.
go back to reference Koppula, H.S., Gupta, R., Saxena, A.: Learning human activities and object affordances from RGB-D videos. Int. J. Robot. Res. 32(8), 951–970 (2013)CrossRef Koppula, H.S., Gupta, R., Saxena, A.: Learning human activities and object affordances from RGB-D videos. Int. J. Robot. Res. 32(8), 951–970 (2013)CrossRef
23.
go back to reference Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS) (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS) (2012)
24.
go back to reference Li, C., Zhong, Q., Xie, D., Pu, S.: Skeleton-based action recognition with convolutional neural networks. arXiv preprint arXiv:1704.07595 (2017) Li, C., Zhong, Q., Xie, D., Pu, S.: Skeleton-based action recognition with convolutional neural networks. arXiv preprint arXiv:​1704.​07595 (2017)
26.
go back to reference Li, Y., Yang, J., Song, Y., Cao, L., Luo, J., Li, J.: Learning from noisy labels with distillation. In: International Conference on Computer Vision (ICCV) (2017) Li, Y., Yang, J., Song, Y., Cao, L., Luo, J., Li, J.: Learning from noisy labels with distillation. In: International Conference on Computer Vision (ICCV) (2017)
27.
go back to reference Liang, J., Jiang, L., Meng, D., Hauptmann, A.G.: Learning to detect concepts from webly-labeled video data. In: IJCAI (2016) Liang, J., Jiang, L., Meng, D., Hauptmann, A.G.: Learning to detect concepts from webly-labeled video data. In: IJCAI (2016)
28.
go back to reference Liu, C., Hu, Y., Li, Y., Song, S., Liu, J.: PKU-MMD: a large scale benchmark for continuous multi-modal human action understanding. arXiv preprint arXiv:1703.07475 (2017) Liu, C., Hu, Y., Li, Y., Song, S., Liu, J.: PKU-MMD: a large scale benchmark for continuous multi-modal human action understanding. arXiv preprint arXiv:​1703.​07475 (2017)
29.
31.
go back to reference Liu, J., Wang, G., Hu, P., Duan, L.Y., Kot, A.C.: Global context-aware attention LSTM networks for 3D action recognition. In: Computer Vision and Pattern Recognition (CVPR) (2017) Liu, J., Wang, G., Hu, P., Duan, L.Y., Kot, A.C.: Global context-aware attention LSTM networks for 3D action recognition. In: Computer Vision and Pattern Recognition (CVPR) (2017)
32.
go back to reference Liu, M., Liu, H., Chen, C.: Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognit. 68, 346–362 (2017)CrossRef Liu, M., Liu, H., Chen, C.: Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognit. 68, 346–362 (2017)CrossRef
33.
go back to reference Lopez-Paz, D., Bottou, L., Schölkopf, B., Vapnik, V.: Unifying distillation and privileged information. In: International Conference on Learning Representations (ICLR) (2016) Lopez-Paz, D., Bottou, L., Schölkopf, B., Vapnik, V.: Unifying distillation and privileged information. In: International Conference on Learning Representations (ICLR) (2016)
34.
go back to reference Luo, Z., et al.: Computer vision-based descriptive analytics of seniors’ daily activities for long-term health monitoring. In: Machine Learning for Healthcare (MLHC) (2018) Luo, Z., et al.: Computer vision-based descriptive analytics of seniors’ daily activities for long-term health monitoring. In: Machine Learning for Healthcare (MLHC) (2018)
35.
go back to reference Luo, Z., Peng, B., Huang, D.A., Alahi, A., Fei-Fei, L.: Unsupervised learning of long-term motion dynamics for videos. In: Computer Vision and Pattern Recognition (CVPR) (2017) Luo, Z., Peng, B., Huang, D.A., Alahi, A., Fei-Fei, L.: Unsupervised learning of long-term motion dynamics for videos. In: Computer Vision and Pattern Recognition (CVPR) (2017)
36.
go back to reference Luo, Z., Zou, Y., Hoffman, J., Fei-Fei, L.: Label efficient learning of transferable representations across domains and tasks. In: Advances in Neural Information Processing Systems (NIPS) (2017) Luo, Z., Zou, Y., Hoffman, J., Fei-Fei, L.: Label efficient learning of transferable representations across domains and tasks. In: Advances in Neural Information Processing Systems (NIPS) (2017)
37.
go back to reference Montes, A., Salvador, A., Giro-i Nieto, X.: Temporal activity detection in untrimmed videos with recurrent neural networks. arXiv preprint arXiv:1608.08128 (2016) Montes, A., Salvador, A., Giro-i Nieto, X.: Temporal activity detection in untrimmed videos with recurrent neural networks. arXiv preprint arXiv:​1608.​08128 (2016)
38.
go back to reference Motiian, S., Piccirilli, M., Adjeroh, D.A., Doretto, G.: Information bottleneck learning using privileged information for visual recognition. In: Computer Vision and Pattern Recognition (CVPR) (2016) Motiian, S., Piccirilli, M., Adjeroh, D.A., Doretto, G.: Information bottleneck learning using privileged information for visual recognition. In: Computer Vision and Pattern Recognition (CVPR) (2016)
39.
go back to reference Ni, B., Wang, G., Moulin, P.: RGBD-HUDaACT: a color-depth video database for human daily activity recognition. In: Consumer Depth Cameras for Computer Vision (2013)CrossRef Ni, B., Wang, G., Moulin, P.: RGBD-HUDaACT: a color-depth video database for human daily activity recognition. In: Consumer Depth Cameras for Computer Vision (2013)CrossRef
40.
go back to reference Noury, N., et al.: Fall detection-principles and methods. In: Engineering in Medicine and Biology Society (2007) Noury, N., et al.: Fall detection-principles and methods. In: Engineering in Medicine and Biology Society (2007)
42.
go back to reference Qin, Z., Shelton, C.R.: Event detection in continuous video: an inference in point process approach. IEEE Trans. Image Process. 26(12), 5680–5691 (2017)MathSciNetCrossRef Qin, Z., Shelton, C.R.: Event detection in continuous video: an inference in point process approach. IEEE Trans. Image Process. 26(12), 5680–5691 (2017)MathSciNetCrossRef
43.
go back to reference Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Computer Vision and Pattern Recognition (CVPR) (2016) Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Computer Vision and Pattern Recognition (CVPR) (2016)
44.
go back to reference Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Neural Information Processing Systems (NIPS) (2015) Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Neural Information Processing Systems (NIPS) (2015)
45.
go back to reference Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+ D: a large scale dataset for 3D human activity analysis. In: Computer Vision and Pattern Recognition (CVPR) (2016) Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+ D: a large scale dataset for 3D human activity analysis. In: Computer Vision and Pattern Recognition (CVPR) (2016)
46.
go back to reference Shahroudy, A., Ng, T.T., Gong, Y., Wang, G.: Deep multimodal feature analysis for action recognition in RGB+ D videos. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) (2017) Shahroudy, A., Ng, T.T., Gong, Y., Wang, G.: Deep multimodal feature analysis for action recognition in RGB+ D videos. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) (2017)
47.
go back to reference Shao, L., Cai, Z., Liu, L., Lu, K.: Performance evaluation of deep feature learning for RGB-D image/video classification. Inf. Sci. 385, 266–283 (2017)CrossRef Shao, L., Cai, Z., Liu, L., Lu, K.: Performance evaluation of deep feature learning for RGB-D image/video classification. Inf. Sci. 385, 266–283 (2017)CrossRef
48.
go back to reference Shi, Z., Kim, T.K.: Learning and refining of privileged information-based RNNS for action recognition from depth sequences. In: Computer Vision and Pattern Recognition (CVPR) (2017) Shi, Z., Kim, T.K.: Learning and refining of privileged information-based RNNS for action recognition from depth sequences. In: Computer Vision and Pattern Recognition (CVPR) (2017)
49.
go back to reference Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems (NIPS) (2014) Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems (NIPS) (2014)
50.
go back to reference Sung, J., Ponce, C., Selman, B., Saxena, A.: Human activity detection from RGBD images. In: AAAI Workshop on Pattern, Activity and Intent Recognition (2011) Sung, J., Ponce, C., Selman, B., Saxena, A.: Human activity detection from RGBD images. In: AAAI Workshop on Pattern, Activity and Intent Recognition (2011)
51.
go back to reference Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: International Conference on Computer Vision (ICCV) (2015) Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: International Conference on Computer Vision (ICCV) (2015)
52.
go back to reference Vapnik, V., Vashist, A.: A new learning paradigm: learning using privileged information. Neural Netw. 22(5), 544–557 (2009)CrossRef Vapnik, V., Vashist, A.: A new learning paradigm: learning using privileged information. Neural Netw. 22(5), 544–557 (2009)CrossRef
53.
go back to reference Wang, H., Wang, L.: Learning robust representations using recurrent neural networks for skeleton based action classification and detection. In: International Conference on Multimedia & Expo Workshops (ICMEW) (2017) Wang, H., Wang, L.: Learning robust representations using recurrent neural networks for skeleton based action classification and detection. In: International Conference on Multimedia & Expo Workshops (ICMEW) (2017)
54.
go back to reference Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: Computer Vision and Pattern Recognition (CVPR) (2012) Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: Computer Vision and Pattern Recognition (CVPR) (2012)
55.
go back to reference Wang, Z., Ji, Q.: Classifier learning with hidden information. In: Computer Vision and Pattern Recognition (CVPR) (2015) Wang, Z., Ji, Q.: Classifier learning with hidden information. In: Computer Vision and Pattern Recognition (CVPR) (2015)
56.
go back to reference Xu, D., Ouyang, W., Ricci, E., Wang, X., Sebe, N.: Learning cross-modal deep representations for robust pedestrian detection. In: Computer Vision and Pattern Recognition (CVPR) (2017) Xu, D., Ouyang, W., Ricci, E., Wang, X., Sebe, N.: Learning cross-modal deep representations for robust pedestrian detection. In: Computer Vision and Pattern Recognition (CVPR) (2017)
57.
go back to reference Yang, H., Zhou, J.T., Cai, J., Ong, Y.S.: MIML-FCN+: multi-instance multi-label learning via fully convolutional networks with privileged information. In: Computer Vision and Pattern Recognition (CVPR) (2017) Yang, H., Zhou, J.T., Cai, J., Ong, Y.S.: MIML-FCN+: multi-instance multi-label learning via fully convolutional networks with privileged information. In: Computer Vision and Pattern Recognition (CVPR) (2017)
58.
go back to reference Yeung, S., Ramanathan, V., Russakovsky, O., Shen, L., Mori, G., Fei-Fei, L.: Learning to learn from noisy web videos (2017) Yeung, S., Ramanathan, V., Russakovsky, O., Shen, L., Mori, G., Fei-Fei, L.: Learning to learn from noisy web videos (2017)
59.
go back to reference Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Advances in Neural Information Processing Systems (NIPS) (2014) Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Advances in Neural Information Processing Systems (NIPS) (2014)
60.
go back to reference Yu, M., Liu, L., Shao, L.: Structure-preserving binary representations for RGB-D action recognition. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 38(8), 1651–1664 (2016)CrossRef Yu, M., Liu, L., Shao, L.: Structure-preserving binary representations for RGB-D action recognition. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 38(8), 1651–1664 (2016)CrossRef
62.
go back to reference Zhang, S., Liu, X., Xiao, J.: On geometric features for skeleton-based action recognition using multilayer LSTM networks. In: IEEE Winter Conference on Applications of Computer Vision (WACV) (2017) Zhang, S., Liu, X., Xiao, J.: On geometric features for skeleton-based action recognition using multilayer LSTM networks. In: IEEE Winter Conference on Applications of Computer Vision (WACV) (2017)
63.
go back to reference Zhang, Z., Conly, C., Athitsos, V.: A survey on vision-based fall detection. In: Conference on PErvasive Technologies Related to Assistive Environments (PETRA) (2015) Zhang, Z., Conly, C., Athitsos, V.: A survey on vision-based fall detection. In: Conference on PErvasive Technologies Related to Assistive Environments (PETRA) (2015)
64.
go back to reference Zhao, Y., Xiong, Y., Wang, L., Wu, Z., Tang, X., Lin, D.: Temporal action detection with structured segment networks. In: International Conference on Computer Vision (ICCV) (2017) Zhao, Y., Xiong, Y., Wang, L., Wu, Z., Tang, X., Lin, D.: Temporal action detection with structured segment networks. In: International Conference on Computer Vision (ICCV) (2017)
Metadata
Title
Graph Distillation for Action Detection with Privileged Modalities
Authors
Zelun Luo
Jun-Ting Hsieh
Lu Jiang
Juan Carlos Niebles
Li Fei-Fei
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-030-01264-9_11

Premium Partner