nach oben

Erschienen in:

2023 | OriginalPaper | Buchkapitel

Pose Driven Deep Appearance Feature Learning for Action Classification

verfasst von : Rejeti Hima Sameer, S. Rambabu, P. V. V. Kishore, D. Anil Kumar, M. Suneetha

Erschienen in: International Conference on Innovative Computing and Communications

Verlag: Springer Nature Singapore

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this work, we propose to learn the fusion process between the dominant skeletal features and the RGB features. This is in contrast to the previous fusion methods that simply fused these multimodal features, without learning the fusion process to exploit the semantic relationship between them. Here, we propose a gated feature fusion (GFF) of multimodal feature data which provides attention to the appearance stream of RGB data using the temporal skeletal data. Initially, the features from RGB and skeletal frames are extracted using CNN models. Subsequently, the gated fusion network fuses the features from pose and appearance domains using temporal convolutions which are further combined into a latent subspace. Finally, the latent subspace features are classified using fully connected layers with the combined loss embeddings. The proposed architecture has performed better than the state-of-the-art models on RGB-D action datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Locating Potholes with Internet of Things Technology Using Cloud Computing Analysis

Nächstes Kapitel Machine Learning Approaches in Smart Cities

Pareek P, Thakkar A (2021) A survey on video-based human action recognition: recent updates, datasets, challenges, and applications. Artif Intell Rev 54(3):2259–2322. https://doi.org/10.1007/s10462-020-09904-8CrossRef

Khan MA, Javed K, Khan SA, Saba T et al (2020) Human action recognition using fusion of multiview and deep features: an application to video surveillance. Multimedia Tools Appl 1–27. https://doi.org/10.1007/s11042-020-08806-9

Yan Y, Qin J, Chen J, Liu L, Zhu F, Tai Y, Shao L (2020) Learning multi-granular hypergraphs for video-based person re-identification. In: IEEE/CVF conference on computer vision and pattern recognition, Seattle, pp 2899–2908

Hashmi MF, Kiran Kumar Ashish B, Keskar AG (2019) GAIT analysis: 3D pose estimation and prediction in defence applications using pattern recognition. In: Twelfth international conference on machine vision (ICMV 2019). Amsterdam, Netherlands, pp 114–330

Ijjina EP (2020) Action recognition in sports videos using stacked auto encoder and HOG3D features. Third international conference on computational intelligence and informatics. Springer, Singapore, pp 849–856

Khraief C, Benzarti F, Amiri H (2020) Elderly fall detection based on multi-stream deep convolutional networks. Multimedia Tools Appl 79:19537–19560. https://doi.org/10.1007/s11042-020-08812-xCrossRef

Jiang N, Dong X, Zhou J, Yan H, Wan T, Zheng J (2020) Toward optimal participant decisions with voting-based incentive model for crowd sensing. Inf Sci 512:1–17. https://doi.org/10.1016/j.ins.2019.09.068CrossRef

Jiaze W, Yu XP, Qiao (2020) Cascade multi-head attention networks for action recognition. Comput Vis Image Understand 192:1–21. https://doi.org/10.1016/j.cviu.2019.102898

Wang H, Song Z, Li W, Wang P (2020) A hybrid network for large-scale action recognition from RGB and depth modalities. Sensors 20(11):1–25. https://doi.org/10.3390/s20113305CrossRef

10.

Peng W, Hong X, Chen H, Zhao G (2020) Learning graph convolutional network for skeleton-based human action recognition by neural searching. AAAI conference on artificial intelligence. Hilton New York Midtown, New York, pp 2669–2676

11.

Dai C, Liu X, Lai J (2020) Human action recognition using two-stream attention based LSTM networks. Appl Soft Comput 86:1–19. https://doi.org/10.1016/j.asoc.2019.105820CrossRef

12.

Li Z, Lyu F, Feng W, Wang S (2020) Modeling cross-view interaction consistency for paired egocentric interaction recognition. In: IEEE international conference on multimedia and expo (ICME). London, pp 1–6

13.

Moon G, Kwon H, Lee KM, Cho M (2021) Integral action: pose-driven feature integration for robust human action recognition in videos. In: IEEE/CVF conference on computer vision and pattern recognition, USA (Virtual), pp 3339–3348

14.

Li C, Zhang J, Shan S, Chen X (2020) PAS-Net: pose-based and appearance-based spatiotemporal networks fusion for action recognition. IEEE international conference on automatic face and gesture recognition (FG 2020). Jodhpur, India (Virtual), pp 215–221CrossRef

15.

Choutas V, Weinzaepfel P, Revaud J, Schmid C (2018) Potion: pose motion representation for action recognition. In: IEEE conference on computer vision and pattern recognition. Salt Lake City, pp 7024–7033

16.

Du W, Wang Y, Qiao Y (2017) Rpan: an end-to-end recurrent pose-attention network for action recognition in videos. In: IEEE international conference on computer vision. Venice, pp 3725–3734

17.

Li S-J, AbuFarha Y, Liu Y, Cheng M-M, Gall J (2020) Ms-tcn++: Multi-stage temporal convolutional network for action segmentation. IEEE Trans Pattern Anal Mach Intell 2020; Early Access. https://doi.org/10.1109/TPAMI.2020.3021756

18.

Srihari D, Kishore PVV, Kumar EK et al (2020) A four-stream ConvNet based on spatial and depth flow for human action classification using RGB-D data. Multimedia Tools Appl 79:11723–11746. https://doi.org/10.1007/s11042-019-08588-9CrossRef

19.

Xia L, Chen C-C, Aggarwal JK (2012) View invariant human action recognition using histograms of 3d joints. IEEE computer society conference on computer vision and pattern recognition workshops. Providence, Rhode Island, pp 20–27

20.

Shahroudy A, Liu J, Ng T-T, Wang G (2016) Ntu rgb+ d: a large scale dataset for 3d human activity analysis.In: IEEE conference on computer vision and pattern recognition. Las Vegas, pp 1010–1019

21.

Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2017) View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: IEEE international conference on computer vision. Venice, pp 2117–2126

22.

Yoon Y, Jongmin Yu, Jeon M (2019) Spatio-temporal representation matching-based open-set action recognition by joint learning of motion and appearance. IEEE Access 7:165997–166010. https://doi.org/10.1109/ACCESS.2019.2953455CrossRef

23.

Zhang J, Haifeng H, Liu Z (2020) Appearance-and-dynamic learning with bifurcated convolution neural network for action recognition. IEEE Trans Circuits Syst Video Technol 31(4):1593–1606. https://doi.org/10.1109/TCSVT.2020.3006223CrossRef

24.

Wang L, Li W, Li W, Van Gool L (2018) Appearance-and-relation networks for video classification. In: IEEE conference on computer vision and pattern recognition. Salt Lake City, pp 1430–1439

25.

Du W, Wang Y, Qiao Y (2017) Rpan: An end-to-end recurrent pose-attention network for action recognition in videos.In: IEEE international conference on computer vision. Venice, pp 3725–3734

26.

Liu M, Yuan J (2018) Recognizing human actions as the evolution of pose estimation maps. In: Proceedings of the IEEE conference on computer vision and pattern recognition. Lake City, pp 1159–1168

27.

Choutas V, Weinzaepfel P, Revaud J, Schmid C (2018) Potion: pose motion representation for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. Lake City, pp 7024–7033

28.

Zolfaghari M, Oliveira GL, Sedaghat N, Brox T (2017) Chained multi-stream networks exploiting pose, motion, and appearance for action classification and detection. In: IEEE international conference on computer vision. Venice, pp 2904–2913

29.

Pan G, Song YH, Wei SH (2019) Combining pose and trajectory for skeleton based action recognition using two-stream RNN. In: Chinese automation congress (CAC). Hangzhou, pp 4375–4380

30.

Khan MA, Sharif M, Akram T, Raza M, Saba T, Rehman A (2020) Hand-crafted and deep convolutional neural network features fusion and selection strategy: an application to intelligent human action recognition. Appl Soft Comput 87:105986. https://doi.org/10.1016/j.asoc.2019.105986CrossRef

31.

Liu J, Shahroudy A, Dong X, Kot AC, Wang G (2017) Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE Trans Pattern Anal Mach Intell 40(12):3007–3021. https://doi.org/10.1109/TPAMI.2017.2771306CrossRef

32.

Feichtenhofer C, Pinz A, Wildes RP (2017) Spatiotemporal multiplier networks for video action recognition. In: IEEE conference on computer vision and pattern recognition, Honolulu, pp 4768–4777

33.

Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: IEEE/CVF conference on computer vision and pattern recognition, Long Beach, pp 1227–1236

34.

Zhang S, Yang Y, Xiao J, Liu X, Yang Y, Xie D, Zhuang Y (2018) Fusing geometric features for skeleton-based action recognition using multilayer LSTM networks. IEEE Trans Multimedia 20(9):2330–2343. https://doi.org/10.1109/TMM.2018.2802648CrossRef

Titel: Pose Driven Deep Appearance Feature Learning for Action Classification
verfasst von: Rejeti Hima Sameer
S. Rambabu
P. V. V. Kishore
D. Anil Kumar
M. Suneetha
Verlag: Springer Nature Singapore
Buch: International Conference on Innovative Computing and Communications
Print ISBN: 978-981-19-2534-4

Electronic ISBN: 978-981-19-2535-1

Copyright-Jahr: 2023
DOI: https://doi.org/10.1007/978-981-19-2535-1_8

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Jonas Klose/© Pine Valley Capital GmbH, Carina Kießling von der Strategieberatung Roland Berger/© Monika Walther Fotografie | ATZ, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.