Skip to main content
Erschienen in:

23.08.2024

A Hybrid Convolutional and Graph Neural Network for Human Action Detection in Static Images

verfasst von: Xinbiao Lu, Hao Xing

Erschienen in: Circuits, Systems, and Signal Processing | Ausgabe 12/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Human action detection in static images is a hot and challenging field within computer vision. Given the limited features of a single image, achieving precision detection results require the full utilization of the image’s intrinsic features, as well as the integration of methods from other fields to process the images for generating additional features. In this paper, we propose a novel dual pathway model for action detection, whose main pathway employs a convolutional neural network to extract image features and predict the probability of the image belonging to each respective action. Meanwhile, the auxiliary pathway uses a pose estimate algorithm to obtain human key points and connection information for constructing a graphical human model for each image. These graphical models are then transformed into graph data and input into a graph neural network for features extracting and probability prediction. Finally, a corresponding connected neural network propose by us is used to fusing the probability vectors generated from the two pathways, which learns the weight of each action class in each vector to enable their subsequent fusion. It is noted that transfer learning is also used in our model to improve the training speed and detection accuracy of it. Experimental results upon three challenging datasets: Stanford40, PPMI and MPII illustrate the superiority of the proposed method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik. 

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information. 

Order your 30-days-trial for free and without any commitment.

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat M. Andriluka, L. Pishchulin, P. Gehler, B. Schiele, 2D human pose estimation: New benchmark and state of the art analysis. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3686–3693 (2014) M. Andriluka, L. Pishchulin, P. Gehler, B. Schiele, 2D human pose estimation: New benchmark and state of the art analysis. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3686–3693 (2014)
2.
Zurück zum Zitat S.S. Ashrafi, S.B. Shokouhi, Knowledge distillation framework for action recognition in still images. in 2020 10th international conference on computer and knowledge engineering, IEEE, pp. 274–277 (2020) S.S. Ashrafi, S.B. Shokouhi, Knowledge distillation framework for action recognition in still images. in 2020 10th international conference on computer and knowledge engineering, IEEE, pp. 274–277 (2020)
3.
Zurück zum Zitat A. Banerjee, P.K. Singh, R. Sarkar, Fuzzy integral-based CNN classifier fusion for 3D skeleton action recognition. IEEE Trans. Circuits Syst. Video Technol. 31(6), 2206–2216 (2020)CrossRef A. Banerjee, P.K. Singh, R. Sarkar, Fuzzy integral-based CNN classifier fusion for 3D skeleton action recognition. IEEE Trans. Circuits Syst. Video Technol. 31(6), 2206–2216 (2020)CrossRef
4.
Zurück zum Zitat F. Bozkurt, A comparative study on classifying human activities using classical machine and deep learning methods. Arab. J. Sci. Eng. 47(2), 1507–1521 (2022)CrossRef F. Bozkurt, A comparative study on classifying human activities using classical machine and deep learning methods. Arab. J. Sci. Eng. 47(2), 1507–1521 (2022)CrossRef
5.
Zurück zum Zitat Z. Cao, T. Simon, S.E. Wei, Y. Sheikh, Realtime multi-person 2D pose estimation using part affinity fields. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7291–7299 (2017) Z. Cao, T. Simon, S.E. Wei, Y. Sheikh, Realtime multi-person 2D pose estimation using part affinity fields. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7291–7299 (2017)
6.
Zurück zum Zitat S. Chakraborty, R. Mondal, P.K. Singh, R. Sarkar, D. Bhattacharjee, Transfer learning with fine tuning for human action recognition from still images. Multimed. Tools Appl. 80, 20547–20578 (2021)CrossRef S. Chakraborty, R. Mondal, P.K. Singh, R. Sarkar, D. Bhattacharjee, Transfer learning with fine tuning for human action recognition from still images. Multimed. Tools Appl. 80, 20547–20578 (2021)CrossRef
7.
Zurück zum Zitat S.K. Dash, S. Acharya, P. Pakray, R. Das, A. Gelbukh, Topic-based image caption generation. Arab. J. Sci. Eng. 45(4), 3025–3034 (2020)CrossRef S.K. Dash, S. Acharya, P. Pakray, R. Das, A. Gelbukh, Topic-based image caption generation. Arab. J. Sci. Eng. 45(4), 3025–3034 (2020)CrossRef
8.
Zurück zum Zitat H.A. Dehkordi, A.S. Nezhad, S.S. Ashrafi, S.B. Shokouhi, Still image action recognition using ensemble learning. in 2021 7th international conference on web research, pp. 125–129 (2021) H.A. Dehkordi, A.S. Nezhad, S.S. Ashrafi, S.B. Shokouhi, Still image action recognition using ensemble learning. in 2021 7th international conference on web research, pp. 125–129 (2021)
9.
Zurück zum Zitat H.S. Fang, S. Xie, Y.W. Tai, C. Lu, RMPE: Regional multi-person pose estimation. in Proceedings of the IEEE international conference on computer vision, pp. 2334–2343 (2017) H.S. Fang, S. Xie, Y.W. Tai, C. Lu, RMPE: Regional multi-person pose estimation. in Proceedings of the IEEE international conference on computer vision, pp. 2334–2343 (2017)
10.
Zurück zum Zitat W. Hamilton, Z. Ying, J. Leskovec, Inductive representation learning on large graphs. Adv. Neural Inf. Process. Syst. 30 (2017) W. Hamilton, Z. Ying, J. Leskovec, Inductive representation learning on large graphs. Adv. Neural Inf. Process. Syst. 30 (2017)
11.
Zurück zum Zitat K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016) K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)
12.
Zurück zum Zitat J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141 (2018) J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141 (2018)
13.
Zurück zum Zitat A. Kumar, S. Abrams, A. Kumar, V. Narayanan, STAR: Efficient spatiotemporal modeling for action recognition. Circuits Syst. Signal Process. 42(2), 705–723 (2023)CrossRef A. Kumar, S. Abrams, A. Kumar, V. Narayanan, STAR: Efficient spatiotemporal modeling for action recognition. Circuits Syst. Signal Process. 42(2), 705–723 (2023)CrossRef
14.
Zurück zum Zitat Y. Lavinia, H. Vo, A. Verma, New colour fusion deep learning model for large-scale action recognition. Int. J. Comput. Vis. Robot. 10(1), 41–60 (2020)CrossRef Y. Lavinia, H. Vo, A. Verma, New colour fusion deep learning model for large-scale action recognition. Int. J. Comput. Vis. Robot. 10(1), 41–60 (2020)CrossRef
16.
Zurück zum Zitat W. Li, H. Liu, H. Tang, P. Wang, L. Van Gool, Mhformer: Multi-hypothesis transformer for 3D human pose estimation. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13147–13156 (2022) W. Li, H. Liu, H. Tang, P. Wang, L. Van Gool, Mhformer: Multi-hypothesis transformer for 3D human pose estimation. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13147–13156 (2022)
17.
Zurück zum Zitat Z. Li, Y. Ge, J. Feng, X. Qin, J. Yu, H. Yu, Deep selective feature learning for action recognition. in 2020 IEEE international conference on multimedia and expo, pp. 1–6 (2020) Z. Li, Y. Ge, J. Feng, X. Qin, J. Yu, H. Yu, Deep selective feature learning for action recognition. in 2020 IEEE international conference on multimedia and expo, pp. 1–6 (2020)
18.
Zurück zum Zitat Y. Lin, W. Chi, W. Sun, S. Liu, D. Fan, Human action recognition algorithm based on improved resnet and skeletal keypoints in single image. Math. Probl. Eng. 2020, 1–12 (2020) Y. Lin, W. Chi, W. Sun, S. Liu, D. Fan, Human action recognition algorithm based on improved resnet and skeletal keypoints in single image. Math. Probl. Eng. 2020, 1–12 (2020)
19.
Zurück zum Zitat S. Liu, N. Wu, H. Jin, Human action recognition based on attention mechanism and HRNet. in Proceeding of 2021 international conference on wireless communications, networking and applications, pp. 279–291 (2022) S. Liu, N. Wu, H. Jin, Human action recognition based on attention mechanism and HRNet. in Proceeding of 2021 international conference on wireless communications, networking and applications, pp. 279–291 (2022)
20.
Zurück zum Zitat X. Lu, H. Xing, C. Ye, X. Xie, Z. Liu, A key-points-assisted network with transfer learning for precision human action recognition in still images. Signal Image Video Process. 1–15 (2023) X. Lu, H. Xing, C. Ye, X. Xie, Z. Liu, A key-points-assisted network with transfer learning for precision human action recognition in still images. Signal Image Video Process. 1–15 (2023)
21.
Zurück zum Zitat D. Pavllo, C. Feichtenhofer, D. Grangier, M. Auli, 3D human pose estimation in video with temporal convolutions and semi-supervised training. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7753–7762 (2019) D. Pavllo, C. Feichtenhofer, D. Grangier, M. Auli, 3D human pose estimation in video with temporal convolutions and semi-supervised training. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7753–7762 (2019)
22.
Zurück zum Zitat T. Qi, Y. Xu, Y. Quan, Y. Wang, H. Ling, Image-based action recognition using hint-enhanced deep neural networks. Neurocomputing 267, 475–488 (2017)CrossRef T. Qi, Y. Xu, Y. Quan, Y. Wang, H. Ling, Image-based action recognition using hint-enhanced deep neural networks. Neurocomputing 267, 475–488 (2017)CrossRef
23.
Zurück zum Zitat E.J. Rechy-Ramirez, A. Marin-Hernandez, H.V. Rios-Figueroa, A human–computer interface for wrist rehabilitation: a pilot study using commercial sensors to detect wrist movements. Vis. Comput. 35(1), 41–55 (2019)CrossRef E.J. Rechy-Ramirez, A. Marin-Hernandez, H.V. Rios-Figueroa, A human–computer interface for wrist rehabilitation: a pilot study using commercial sensors to detect wrist movements. Vis. Comput. 35(1), 41–55 (2019)CrossRef
24.
Zurück zum Zitat M. Safaei, Action recognition in still images: Confluence of multilinear methods and deep learning (2020) M. Safaei, Action recognition in still images: Confluence of multilinear methods and deep learning (2020)
25.
Zurück zum Zitat M. Safaei, P. Balouchian, H. Foroosh, UCF-STAR: A large scale still image dataset for understanding human actions. in Proceedings of the AAAI conference on artificial intelligence, pp. 2677–2684 (2020) M. Safaei, P. Balouchian, H. Foroosh, UCF-STAR: A large scale still image dataset for understanding human actions. in Proceedings of the AAAI conference on artificial intelligence, pp. 2677–2684 (2020)
26.
Zurück zum Zitat C. Shorten, T.M. Khoshgoftaar, A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019)CrossRef C. Shorten, T.M. Khoshgoftaar, A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019)CrossRef
27.
Zurück zum Zitat M. Tan, Q. Le, Efficientnetv2: Smaller models and faster training. in International conference on machine learning, PMLR, pp. 10096–10106 (2021) M. Tan, Q. Le, Efficientnetv2: Smaller models and faster training. in International conference on machine learning, PMLR, pp. 10096–10106 (2021)
28.
Zurück zum Zitat D. Tran, L.D. Bourdev, R. Fergus, L. Torresani, M. Paluri, C3D: Generic features for video analysis. CoRR 2(7), 8 (2014) D. Tran, L.D. Bourdev, R. Fergus, L. Torresani, M. Paluri, C3D: Generic features for video analysis. CoRR 2(7), 8 (2014)
29.
Zurück zum Zitat L. Wang, R. Liu, Human activity recognition based on wearable sensor using hierarchical deep LSTM networks. Circuits Syst. Signal Process. 39, 837–856 (2020)CrossRef L. Wang, R. Liu, Human activity recognition based on wearable sensor using hierarchical deep LSTM networks. Circuits Syst. Signal Process. 39, 837–856 (2020)CrossRef
30.
Zurück zum Zitat K. Weiss, T.M. Khoshgoftaar, D. Wang, A survey of transfer learning. J. Big data 3(1), 1–40 (2016)CrossRef K. Weiss, T.M. Khoshgoftaar, D. Wang, A survey of transfer learning. J. Big data 3(1), 1–40 (2016)CrossRef
31.
Zurück zum Zitat S. Woo, J. Park, J.Y. Lee, I.S. Kweon: CBAM: Convolutional block attention module. in Proceedings of the European conference on computer vision (ECCV), pp. 3–19 (2018) S. Woo, J. Park, J.Y. Lee, I.S. Kweon: CBAM: Convolutional block attention module. in Proceedings of the European conference on computer vision (ECCV), pp. 3–19 (2018)
32.
Zurück zum Zitat W. Wu, J. Yu, A part fusion model for action recognition in still images. in Neural information processing: 27th international conference, pp. 101–112 (2020) W. Wu, J. Yu, A part fusion model for action recognition in still images. in Neural information processing: 27th international conference, pp. 101–112 (2020)
33.
Zurück zum Zitat B. Yao, X. Jiang, A. Khosla, A.L. Lin, L. Guibas, L. Fei-Fei, Human action recognition by learning bases of action attributes and parts. in 2011 International conference on computer vision, pp. 1331–1338 (2011) B. Yao, X. Jiang, A. Khosla, A.L. Lin, L. Guibas, L. Fei-Fei, Human action recognition by learning bases of action attributes and parts. in 2011 International conference on computer vision, pp. 1331–1338 (2011)
34.
Zurück zum Zitat B. Yao, Fei-Fei L., Grouplet: A structured image representation for recognizing human and object interactions. in 2010 IEEE computer society conference on computer vision and pattern recognition, pp. 9–16 (2010) B. Yao, Fei-Fei L., Grouplet: A structured image representation for recognizing human and object interactions. in 2010 IEEE computer society conference on computer vision and pattern recognition, pp. 9–16 (2010)
35.
Zurück zum Zitat X. Yu, Z. Zhang, L. Wu, W. Pang, H. Chen, Z. Yu, B. Li, Deep ensemble learning for human action recognition in still images. Complexity 2020, 23 (2020) X. Yu, Z. Zhang, L. Wu, W. Pang, H. Chen, Z. Yu, B. Li, Deep ensemble learning for human action recognition in still images. Complexity 2020, 23 (2020)
36.
Zurück zum Zitat J. Zhang, Y. Han, J. Jiang, Tucker decomposition-based tensor learning for human action recognition. Multimed. Syst. 22, 343–353 (2016)CrossRef J. Zhang, Y. Han, J. Jiang, Tucker decomposition-based tensor learning for human action recognition. Multimed. Syst. 22, 343–353 (2016)CrossRef
37.
Zurück zum Zitat Z. Zhao, H. Ma, X. Chen, Generalized symmetric pair model for action classification in still images. Pattern Recogn. 64, 347–360 (2017)CrossRef Z. Zhao, H. Ma, X. Chen, Generalized symmetric pair model for action classification in still images. Pattern Recogn. 64, 347–360 (2017)CrossRef
Metadaten
Titel
A Hybrid Convolutional and Graph Neural Network for Human Action Detection in Static Images
verfasst von
Xinbiao Lu
Hao Xing
Publikationsdatum
23.08.2024
Verlag
Springer US
Erschienen in
Circuits, Systems, and Signal Processing / Ausgabe 12/2024
Print ISSN: 0278-081X
Elektronische ISSN: 1531-5878
DOI
https://doi.org/10.1007/s00034-024-02815-x