Top

Artificial Intelligence Review

Published in:

27-11-2021

A comparative review of graph convolutional networks for human skeleton-based action recognition

Authors: Liqi Feng, Yaqin Zhao, Wenxuan Zhao, Jiaxi Tang

Published in: Artificial Intelligence Review | Issue 5/2022

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Human action recognition is one of the hottest topics in the research field, so there are many relevant review papers illustrating the multi-modality of data, the selection of feature vectors, and the pros and cons of classification networks. With the continuous development of relational networks, graph convolutional networks (GCNs) have been applied to many different fields, including human action recognition. Although the graph convolutional networks have been demonstrated the powerful functionality in human action recognition, few literatures review GCN-based human action recognition. In this review, we not only give a detailed introduction to the structure of graph convolutional networks and data modalities used for human action recognition, but also focus on the application of GCNs in the field of human action recognition. Most importantly, we conduct experiments on five benchmark datasets for comparing the performance of seven state-of-the-art GCN-based algorithms for human skeleton-based action recognition. The five datasets selected in the experiments cover data of different scales (large-scale vs. small-scale) and different types (single-person, human-object interaction, and two-person-interaction) to explore the promising applicable scope of graph networks. We adopt the frequently used performance metrics such as accuracy, network parameters and loss function. Specifically, we analyze the impact of the multi-stream fusion strategies on improving the performance of the human action recognition schemes. To our best knowledge, it is the first time to survey human action recognition strategies related to GCNs, and to give a thorough experimental comparison of GCN-based human action recognition techniques with various datasets.

previous article Information-utilization strengthened equilibrium optimizer

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Ahmed ST, Mun HS, Islam MM, Yoe H, Yang CJ (2016) Monitoring activity for recognition of illness in experimentally infected weaned piglets using received signal strength indication ZigBee-based wireless acceleration sensor. Asian Austr J Anim Sci 29:149–156CrossRef

Akila M, Rajeswari R (2016) Human action recognition techniques- a survey international journal of advanced research in basic engineering sciences and technology (IJARBEST) 2(19)

Andrew C, Fiona R (2018) A survey on video classification using action recognition. Int J Eng Technol 7(2):89–93CrossRef

Antoshchuk S, Kovalenko M, Sieck J (2018) Gesture recognition-based human–computer interaction interface for multimedia applications. In: digitisation of culture: namibian and international perspectives. Springer, pp 269–286

Bhardwaj R, Singh PK (2016) Analytical review on human activity recognition in video. In: 2016 6th international conference cloud system and big data engineering (Confluence). IEEE, pp 531–536

Bhattacharya U, Mittal T, Chandra R et al (2020) Step: spatial temporal graph convolutional networks for emotion perception from gaits. Proc AAAI Conf Artif Intell 34(02):1342–1350

Bridle JS (1990) Neurocomputing. Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. Springer, Berlin, pp 227–236CrossRef

Cai J, Jiang N, Han X, Jia K, Lu J (2021) ”JOLO-GCN: Mining Joint-Centered Light-Weight Information for Skeleton-Based Action Recognition,” 2021 IEEE winter conference on applications of computer vision (WACV), pp 2734–2743, https://doi.org/10.1109/WACV48630.2021.00278

Cao Z, Simon T, Wei S-E, Sheikh Y, (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: CVPR

Chen C, Jafari R, Kehtarnavaz N (2017) A survey of depth and inertial sensor fusion for human action recognition. Multimed Tools Appl 76:4405–4425CrossRef

Cheng G, Wan Y, Saudagar AN, Namuduri K, Buckles BP (2015) Advances in human action recognition: a survey. arXiv preprint arXiv:1501.05964

Cheng K, Zhang Y, He X, Chen W, Cheng J, Lu H (2020) Skeleton-based action recognition with shift graph convolutional network. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition pp 183–192

Cheng K, Zhang Y, Cao C, Shi L, Cheng J, Lu H (2020) Decoupling gcn with dropgraph module for skeleton-based action recognition. In: computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIV 16 pp 536–553. Springer International Publishing

Cho K, Merrienboer Van B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv

Cui X, Zhang W, Tüske Z, Picheny M (2018) Evolutionary stochastic gradient descent for optimization of deep neural networks. In: advances in neural information processing systems pp 6048–6058

Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering, In: advances in neural information processing systems (NIPS), pp 3844–3852

Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In: NIPS, pp 3837–3845

Dong J, Gao Y, Lee HJ, Zhou H, Yao Y, Fang Z, Huang B (2020) Action recognition based on the fusion of graph convolutional networks with high order features. Appl Sci 10(4):1482CrossRef

Gao J, Zhang T, Xu C (2019) I know the relationships: Zero-shot action recognition via two-stream graph convolutional networks and knowledge graphs. Proc AAAI Conf Artif Intell 33(01):8303–8311

Gao J, He T, Zhou X, Ge S (2019) Focusing and diffusion: bidirectional attentive graph convolutional networks for skeleton-based action recognition. arXiv preprint arXiv:1912.11521

Guo G, Lai A (2014) A survey on still image based human action recognition. Pattern Recognit 47:3343–3361CrossRef

Han F, Reily B, Hoff W, Zhang H (2017) “Space-time representation of people based on 3d skeletal data,” CVIU

Hassner T (2013) A critical review of action recognition benchmarks. In: proceedings of the IEEE conference on computer vision and pattern recognition workshops. pp 245–250

He Kaiming, Zhang Xiangyu, Ren Shaoqing, Sun Jian (2016). Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition (CVPR). pp 770–778

Heidarivincheh F, Mirmehdi M, Damen D (2016) Beyond action recognition: action completion in RGB-D data. British Machine Vision Conference (BMVC), York, UK

Herath S, Harandi M, Porikli F (2016) Going deeper into action recognition: a survey. Image Vision Comput 60:4–21CrossRef

Huang L, Huang Y, Ouyang W, Wang L (2020) Part-level graph convolutional network for skeleton-based action recognition. Proc AAAI Conf Artif Intell 34(07):11045–11052

Huang L, Xie F, Shen S et al (2020) Human emotion recognition based on face and facial expression detection using deep belief network under complicated backgrounds. Int J Pattern Recognit Artif Intell 34(14):2056010CrossRef

Huang Z, Zhu T, Li Z et al (2021) Non-destructive testing of moisture and nitrogen content in pinus massoniana seedling leaves with NIRS based on MS-SC-CNN. Appl Sci 11(6):2754CrossRef

Huang J, Xiang X, Gong X, Zhang B (2020) Long-short graph memory network for skeleton-based action recognition. In: proceedings of the IEEE/CVF winter conference on applications of computer vision pp 645–652

Iosifidis A, Tefas A, Pitas I (2013, October) Multi-view human action recognition: a survey. In: 2013 9th international conference on intelligent information hiding and multimedia signal processing pp 522–525. IEEE

Jian-Fang Hu, Zheng Wei-Shi, Lai Jianhuang, Zhang Jianguo (2017) Jointly learning heterogeneous features for rgb-d activity recognition. IEEE Trans Pattern Analysis Mach Intell 39(11):2186–2200CrossRef

Kay W, Carreira J, Simonyan K, Zhang B, Hillier C, Vijayanarasimhan S, Viola F, Green T, Back T, Natsev P, Suleyman M, Zisserman A (2017) The kinetics human action video dataset, 1, 2, 6

Kay W, Carreira J, Simonyan K, Zhang B, Hillier C, Vijayanarasimhan S, Viola F, Green T,Back T, Natsev P, et al (2017) The kinetics human action video dataset. In: arXiv:1705.06950

Ke Qiuhong, Bennamoun Mohammed, An Senjian, Sohel Ferdous, Boussaid Farid (2018) Learning clip representations for skeleton-based 3d action recognition. IEEE Trans Image Process 27(6):2842–2855MathSciNetMATHCrossRef

Keselman L, Iselin Woodfifill J, GrunnetJepsen A, Bhowmik A(2017) Intel real sense stereoscopic depth cameras. In: CVPRW

Kim TS, Reiter A (2017) Interpretable 3d human action analysis with temporal convolutional networks. In: BNMW CVPRW

Kipf Thomas N, Welling M (2016) Semi-supervised classification with graph convolutional networks, arXiv preprint arXiv:1609.02907

Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907

Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inform Process Syst 25:1097–1105

Larra aga P, Lozano JA (2001) Estimation of distribution algorithms: A new tool for evolutionary computation, vol 2. Springer Science & Business Media, Berlin

Li B, Han C, Bai B (2019) Hybrid approach for human posture recognition using anthropometry and BP neural network based on Kinect V2. EURASIP J Image Video Process 2019(1):8CrossRef

Li B, Li X, Zhang Z, Wu F (2019) Spatio-temporal graph routing for skeleton-based action recognition. Proc AAAI Conf Artif Intell 33(01):8561–8568

Li Sheng, Jiang Tingting, Huang Tiejun, Tian Yonghong (2020) Global Co-occurrence Feature Learning and Active Coordinate System Conversion for Skeleton-based Action Recognition Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp 586–59416

Li W, Wen L, Chang MC, Lim SN, Lyu S (2017) Adaptive rnn tree for large-scale human action recognition, In: IEEE international conference on computer vision

Li R, Tapaswi M, Liao R, Jia J, Urtasun R, Fidler S (2017) Situation recognition with graph neural networks. In: IEEE international conference on computer vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp 4183–4192

Li M, Chen S, Chen X, Zhang Y, Wang Y, Tian Q (2019) Actional-structural graph convolutional networks for skeleton-based action recognition. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition pp 3595–3603

Liu M, Liu H, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognit 68:346–362CrossRef

Liu Z, Zhang H, Chen Z, Wang Z, Ouyang W (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition pp 143–152

Majd M, Safabakhsh R (2019) Correlational Convolutional LSTM for human action recognition. Neurocomputing 396:224–229CrossRef

Majumder S, Kehtarnavaz N (2020) Vision and inertial sensing fusion for human action recognition: a review. IEEE Sensors Journal

Mazari A, Sahbi H (2019) MLGCN: multi-laplacian graph convolutional networks for human action recognition. In: BMVC 281

Minnen D, Westeyn T, Starner T, Ward J, Lukowicz P (2006) Performance metrics and evaluationissues for continuous activity recognition. Perform Metrics Intell Syst 4:303–317

Ni C, Li Z, Zhang X, Sun X, Huang Y, Zhao L, Zhu T, Wang D (2020) Online sorting of the film on cotton based on deep learning and hyperspectral imaging. Ieee Access 8:93028–93038CrossRef

PRIMESENSE (2010) http://www.primesense.com, 3

Pareek P, Thakkar A (2021) A survey on video-based human action recognition: recent updates, datasets, challenges, and applications. Artif Intell Rev 54:2259–2322CrossRef

Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in PyTorch. In: NIPS-W, 6

Peng W, Hong X, Chen H, Zhao G (2020) Learning graph convolutional network for skeleton-based human action recognition by neural searching. Proc AAAI Conf Artif Intell 34(03):2669–2676

Pishchulin L, et al (2016) DeepCut: joint subset partition and labeling for multi person pose estimation. In: proceedings of the IEEE conference on computer vision and pattern recognition pp 4929–4937 IEEE, Piscataway, NJ, USA

Prati A, Shan C, Wang KIK (2019) Sensors vision Networks From video surveillance to activity recognition and health monitoring. J Ambient Intell Smart Environ 11(1):5

Qiu J, Chen Q, Dong Y, Zhang J, Yang H, Ding M, Tang J (2020) Gcc: graph contrastive coding for graph neural network pre-training. In: proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining pp 1150–1160

Rani SS, Naidu GAR, Shree VUA (November 2019) A fine grained research over human action recognition. Int J Innov Technol Exploring Eng (IJITEE). 9(1)

Ren B, Liu M, Ding R, Liu H (2020) A survey on 3d skeleton-based action recognition using learning method. arXiv preprint arXiv:2002.05907

Sagayam KM, Hemanth DJ (2017) Hand posture and gesture recognition techniques for virtual reality applications: a survey. Virt Real 21(2):91CrossRef

Shahroudy A, Liu J, Ng T-T, Wang G (2016) NTU RGB+D: a large-scale dataset for 3D human activity analysis, CVPR, pp 1010–1019

Shi L, Zhang Y, Cheng J, Lu H (2020) Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. IEEE Trans Image Process 29:9532–9545CrossRef

Shi L, Zhang Y, Cheng J, Lu H (2019) Skeleton-based action recognition with directed graph neural networks. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7912–7921

Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition pp 1227–1236

Singh T, Vishwakarma D (2018) Video benchmarks of human action datasets: a review. Artif Intell Rev, pp 1–48

Song S, Lan C, Xing J, Zeng W, Liu J (2017) An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: AAAI, pp 4263–4270

Song YF, Zhang Z, Wang L (2019, September) Richly activated graph convolutional network for action recognition with incomplete skeletons. In: 2019 IEEE international conference on image processing (ICIP) pp 1–5. IEEE

Song YF, Zhang Z, Shan C, Wang L (2020, October) Stronger, faster and more explainable: a graph convolutional baseline for skeleton-based action recognition. In: proceedings of the 28th ACM international conference on multimedia pp 1625–1633

Stergiou A, Poppe R (2019) Analyzing human-human interactions: a survey. Comput Vision Image Understand 188:102799CrossRef

Sun Z, Liu J, Ke Q, Rahmani H (2020) Human Action recognition from various data modalities: a review. arXiv preprint arXiv:2012.11866

Tang Y, Tian Y, Lu J, Li P, Zhou J (2018) Deep progressive reinforcement learning for skeleton-based action recognition. In: CVPR, pp 5323–5332

Tang Y, Wang Y, Xu Y, Chen H, Shi B, Xu C, Xu C, Tian Q, Xu C (2020) A semisupervised assessor of neural architectures. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1810–1819, 1

Ullah A, Muhammad K, Haq IU, Baik SW (2019) Action recognition using optimized deep autoencoder and CNN for surveillance data streams of non-stationary environments. Future Gener Comput Syst 96:386–397CrossRef

Verma S, Zhang ZL (2019) Learning universal graph neural network embeddings with aid of transfer learning. arXiv preprint arXiv:1909.10086

Wang L, Huynh DQ, Koniusz P (2019) A comparative review of recent kinect-based action recognition algorithms. IEEE Tran Image Process 29:15–28MathSciNetCrossRef

Wang X, Xiong X, Neumann M, Piergiovanni AJ, Ryoo Michael S, Angelova A, Kitani Kris M, Hua W (2020) Attentionnas: spatiotemporal attention cell search for video classification

Weinland Daniel, Ronfard Rémi, Boyer Edmond (2011) A survey of vision-based methods for action representation, segmentation and recognition. Comput Vision Image Understand 115(2):224–241CrossRef

Whitehouse S, Yordanova K, Ludtke S, Paiement A, Mirmehdi M (March 2018) Evaluation of cupboard door sensors for improving activity recognition in the kitchen. In: proceedings of the 2018 ieee international conference on pervasive computing and communications workshops (PerCom Workshops), Athens, Greece, pp 167–172. 15 19–23

Wu Z, Yao T, Fu Y, Jiang YG (2016) Deep learning for video classification and captioning

Wu C, Wu XJ, Kittler J (2019) Spatial residual layer and dense connection block enhanced spatial temporal graph convolutional network for skeleton-based action recognition. In: proceedings of the IEEE/CVF international conference on computer vision workshops pp 0–0

Wu Bichen, Wan Alvin, Yue Xiangyu, Jin Peter, Zhao Sicheng, Golmant Noah, Gholaminejad Amir, Gonzalez Joseph, Keutzer Kurt (2018) Shift: A zero flop, zero parameteralternative to spatial convolutions. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 9127–9135

YANG Hongye et al (2020) PGCN-TCA: pseudo graph convolutional network with temporal and channel-wise attention for skeleton-based action recognition. IEEE Access 8:10040–10047CrossRef

Yan S, Xiong Y, Lin D (2018, April) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: proceedings of the AAAI conference on artificial intelligence (Vol. 32, No. 1)

Yang H, Yan D, Zhang L, Li D, Sun Y, You S, Maybank SJ (2020) Feedback graph convolutional network for skeleton-based action recognition. arXiv preprint arXiv:2003.07564

Yang D, Li MM, Fu H, Fan J, Leung H (2020) Centrality graph convolutional networks for skeleton-based action recognition. arXiv preprint arXiv:2003.03007

You Y, Chen T, Wang Z, Shen Y (2020, November) When does self-supervision help graph convolutional networks?. In: international conference on machine learning pp 10871–10880. PMLR

Yu J, Yoon Y, Jeon M (2020) Predictively encoded graph convolutional network for noise-robust skeleton-based action recognition. arXiv preprint arXiv:2003.07514

Yuan J, Ni B, Yang X, Kassim AA (2016) Temporal action localization with pyramid of score distribution features. Proc IEEE Conf Comput Vision Pattern Recognit 13:3093–3102

Yun Kiwon, Honorio Jean, Chattopadhyay Debaleena, Berg Tamara L, Samaras D (2012) The 2nd International Workshop on Human Activity Understanding from 3D Data at Conference on Computer Vision and Pattern Recognition, CVPR (Rhode Island/USA)

Zhang J, Li W, Ogunbona PO, Wang P, Tang C (2016) RGB-D-based action recognition datasets: a survey. Pattern Recognit 60:86–105CrossRef

Zhang P, Lan C, Zeng W, Xing J, Xue J, Zheng N (2020) Semantics-guided neural networks for efficient skeleton-based human action recognition. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition pp 1112–1121

Zheng Sun, Xing Guo, Wei Li, Zhengyi Liu (2019) Cooperative warp of two discriminative features for skeleton based action recognition. J Phys: Conf Ser 1187(4):042027

Zhou J, Cui G, Zhang Z, Yang C, Liu Z, Wang L, Sun M (2018) Graph neural networks: a review of methods and applications. arXiv preprint arXiv:1812.08434

Zhu Y, Li X, Liu C, Zolfaghari M, Xiong Y, Wu C, Li M (2020) A comprehensive study of deep video action recognition. arXiv preprint arXiv:2012.06567

Ziaeefard M, Bergevin R (2015) Semantic human activity recognition: a literature review. Pattern Recognit 48:2329–2345CrossRef

Title: A comparative review of graph convolutional networks for human skeleton-based action recognition
Authors: Liqi Feng
Yaqin Zhao
Wenxuan Zhao
Jiaxi Tang
Publication date: 27-11-2021
Publisher: Springer Netherlands
Published in: Artificial Intelligence Review / Issue 5/2022
Print ISSN: 0269-2821
Electronic ISSN: 1573-7462
DOI: https://doi.org/10.1007/s10462-021-10107-y

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 5/2022

Hybrid group decision-making technique under spherical fuzzy N-soft expert sets

Federated learning attack surface: taxonomy, cyber defences, challenges, and future directions

A survey on artificial intelligence techniques for chronic diseases: open issues and challenges

Neural attention for image captioning: review of outstanding methods

A comprehensive review of the video-to-text problem

Explainable artificial intelligence: a comprehensive review

Premium Partner