Skip to main content

2023 | OriginalPaper | Buchkapitel

Pose Driven Deep Appearance Feature Learning for Action Classification

verfasst von : Rejeti Hima Sameer, S. Rambabu, P. V. V. Kishore, D. Anil Kumar, M. Suneetha

Erschienen in: International Conference on Innovative Computing and Communications

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this work, we propose to learn the fusion process between the dominant skeletal features and the RGB features. This is in contrast to the previous fusion methods that simply fused these multimodal features, without learning the fusion process to exploit the semantic relationship between them. Here, we propose a gated feature fusion (GFF) of multimodal feature data which provides attention to the appearance stream of RGB data using the temporal skeletal data. Initially, the features from RGB and skeletal frames are extracted using CNN models. Subsequently, the gated fusion network fuses the features from pose and appearance domains using temporal convolutions which are further combined into a latent subspace. Finally, the latent subspace features are classified using fully connected layers with the combined loss embeddings. The proposed architecture has performed better than the state-of-the-art models on RGB-D action datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
3.
Zurück zum Zitat Yan Y, Qin J, Chen J, Liu L, Zhu F, Tai Y, Shao L (2020) Learning multi-granular hypergraphs for video-based person re-identification. In: IEEE/CVF conference on computer vision and pattern recognition, Seattle, pp 2899–2908 Yan Y, Qin J, Chen J, Liu L, Zhu F, Tai Y, Shao L (2020) Learning multi-granular hypergraphs for video-based person re-identification. In: IEEE/CVF conference on computer vision and pattern recognition, Seattle, pp 2899–2908
4.
Zurück zum Zitat Hashmi MF, Kiran Kumar Ashish B, Keskar AG (2019) GAIT analysis: 3D pose estimation and prediction in defence applications using pattern recognition. In: Twelfth international conference on machine vision (ICMV 2019). Amsterdam, Netherlands, pp 114–330 Hashmi MF, Kiran Kumar Ashish B, Keskar AG (2019) GAIT analysis: 3D pose estimation and prediction in defence applications using pattern recognition. In: Twelfth international conference on machine vision (ICMV 2019). Amsterdam, Netherlands, pp 114–330
5.
Zurück zum Zitat Ijjina EP (2020) Action recognition in sports videos using stacked auto encoder and HOG3D features. Third international conference on computational intelligence and informatics. Springer, Singapore, pp 849–856 Ijjina EP (2020) Action recognition in sports videos using stacked auto encoder and HOG3D features. Third international conference on computational intelligence and informatics. Springer, Singapore, pp 849–856
10.
Zurück zum Zitat Peng W, Hong X, Chen H, Zhao G (2020) Learning graph convolutional network for skeleton-based human action recognition by neural searching. AAAI conference on artificial intelligence. Hilton New York Midtown, New York, pp 2669–2676 Peng W, Hong X, Chen H, Zhao G (2020) Learning graph convolutional network for skeleton-based human action recognition by neural searching. AAAI conference on artificial intelligence. Hilton New York Midtown, New York, pp 2669–2676
12.
Zurück zum Zitat Li Z, Lyu F, Feng W, Wang S (2020) Modeling cross-view interaction consistency for paired egocentric interaction recognition. In: IEEE international conference on multimedia and expo (ICME). London, pp 1–6 Li Z, Lyu F, Feng W, Wang S (2020) Modeling cross-view interaction consistency for paired egocentric interaction recognition. In: IEEE international conference on multimedia and expo (ICME). London, pp 1–6
13.
Zurück zum Zitat Moon G, Kwon H, Lee KM, Cho M (2021) Integral action: pose-driven feature integration for robust human action recognition in videos. In: IEEE/CVF conference on computer vision and pattern recognition, USA (Virtual), pp 3339–3348 Moon G, Kwon H, Lee KM, Cho M (2021) Integral action: pose-driven feature integration for robust human action recognition in videos. In: IEEE/CVF conference on computer vision and pattern recognition, USA (Virtual), pp 3339–3348
14.
Zurück zum Zitat Li C, Zhang J, Shan S, Chen X (2020) PAS-Net: pose-based and appearance-based spatiotemporal networks fusion for action recognition. IEEE international conference on automatic face and gesture recognition (FG 2020). Jodhpur, India (Virtual), pp 215–221CrossRef Li C, Zhang J, Shan S, Chen X (2020) PAS-Net: pose-based and appearance-based spatiotemporal networks fusion for action recognition. IEEE international conference on automatic face and gesture recognition (FG 2020). Jodhpur, India (Virtual), pp 215–221CrossRef
15.
Zurück zum Zitat Choutas V, Weinzaepfel P, Revaud J, Schmid C (2018) Potion: pose motion representation for action recognition. In: IEEE conference on computer vision and pattern recognition. Salt Lake City, pp 7024–7033 Choutas V, Weinzaepfel P, Revaud J, Schmid C (2018) Potion: pose motion representation for action recognition. In: IEEE conference on computer vision and pattern recognition. Salt Lake City, pp 7024–7033
16.
Zurück zum Zitat Du W, Wang Y, Qiao Y (2017) Rpan: an end-to-end recurrent pose-attention network for action recognition in videos. In: IEEE international conference on computer vision. Venice, pp 3725–3734 Du W, Wang Y, Qiao Y (2017) Rpan: an end-to-end recurrent pose-attention network for action recognition in videos. In: IEEE international conference on computer vision. Venice, pp 3725–3734
19.
Zurück zum Zitat Xia L, Chen C-C, Aggarwal JK (2012) View invariant human action recognition using histograms of 3d joints. IEEE computer society conference on computer vision and pattern recognition workshops. Providence, Rhode Island, pp 20–27 Xia L, Chen C-C, Aggarwal JK (2012) View invariant human action recognition using histograms of 3d joints. IEEE computer society conference on computer vision and pattern recognition workshops. Providence, Rhode Island, pp 20–27
20.
Zurück zum Zitat Shahroudy A, Liu J, Ng T-T, Wang G (2016) Ntu rgb+ d: a large scale dataset for 3d human activity analysis.In: IEEE conference on computer vision and pattern recognition. Las Vegas, pp 1010–1019 Shahroudy A, Liu J, Ng T-T, Wang G (2016) Ntu rgb+ d: a large scale dataset for 3d human activity analysis.In: IEEE conference on computer vision and pattern recognition. Las Vegas, pp 1010–1019
21.
Zurück zum Zitat Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2017) View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: IEEE international conference on computer vision. Venice, pp 2117–2126 Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2017) View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: IEEE international conference on computer vision. Venice, pp 2117–2126
24.
Zurück zum Zitat Wang L, Li W, Li W, Van Gool L (2018) Appearance-and-relation networks for video classification. In: IEEE conference on computer vision and pattern recognition. Salt Lake City, pp 1430–1439 Wang L, Li W, Li W, Van Gool L (2018) Appearance-and-relation networks for video classification. In: IEEE conference on computer vision and pattern recognition. Salt Lake City, pp 1430–1439
25.
Zurück zum Zitat Du W, Wang Y, Qiao Y (2017) Rpan: An end-to-end recurrent pose-attention network for action recognition in videos.In: IEEE international conference on computer vision. Venice, pp 3725–3734 Du W, Wang Y, Qiao Y (2017) Rpan: An end-to-end recurrent pose-attention network for action recognition in videos.In: IEEE international conference on computer vision. Venice, pp 3725–3734
26.
Zurück zum Zitat Liu M, Yuan J (2018) Recognizing human actions as the evolution of pose estimation maps. In: Proceedings of the IEEE conference on computer vision and pattern recognition. Lake City, pp 1159–1168 Liu M, Yuan J (2018) Recognizing human actions as the evolution of pose estimation maps. In: Proceedings of the IEEE conference on computer vision and pattern recognition. Lake City, pp 1159–1168
27.
Zurück zum Zitat Choutas V, Weinzaepfel P, Revaud J, Schmid C (2018) Potion: pose motion representation for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. Lake City, pp 7024–7033 Choutas V, Weinzaepfel P, Revaud J, Schmid C (2018) Potion: pose motion representation for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. Lake City, pp 7024–7033
28.
Zurück zum Zitat Zolfaghari M, Oliveira GL, Sedaghat N, Brox T (2017) Chained multi-stream networks exploiting pose, motion, and appearance for action classification and detection. In: IEEE international conference on computer vision. Venice, pp 2904–2913 Zolfaghari M, Oliveira GL, Sedaghat N, Brox T (2017) Chained multi-stream networks exploiting pose, motion, and appearance for action classification and detection. In: IEEE international conference on computer vision. Venice, pp 2904–2913
29.
Zurück zum Zitat Pan G, Song YH, Wei SH (2019) Combining pose and trajectory for skeleton based action recognition using two-stream RNN. In: Chinese automation congress (CAC). Hangzhou, pp 4375–4380 Pan G, Song YH, Wei SH (2019) Combining pose and trajectory for skeleton based action recognition using two-stream RNN. In: Chinese automation congress (CAC). Hangzhou, pp 4375–4380
32.
Zurück zum Zitat Feichtenhofer C, Pinz A, Wildes RP (2017) Spatiotemporal multiplier networks for video action recognition. In: IEEE conference on computer vision and pattern recognition, Honolulu, pp 4768–4777 Feichtenhofer C, Pinz A, Wildes RP (2017) Spatiotemporal multiplier networks for video action recognition. In: IEEE conference on computer vision and pattern recognition, Honolulu, pp 4768–4777
33.
Zurück zum Zitat Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: IEEE/CVF conference on computer vision and pattern recognition, Long Beach, pp 1227–1236 Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: IEEE/CVF conference on computer vision and pattern recognition, Long Beach, pp 1227–1236
Metadaten
Titel
Pose Driven Deep Appearance Feature Learning for Action Classification
verfasst von
Rejeti Hima Sameer
S. Rambabu
P. V. V. Kishore
D. Anil Kumar
M. Suneetha
Copyright-Jahr
2023
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-19-2535-1_8

Neuer Inhalt