Skip to main content
Top
Published in: Neural Computing and Applications 5/2023

20-10-2022 | Original Article

Toward human activity recognition: a survey

Authors: Gulshan Saleem, Usama Ijaz Bajwa, Rana Hammad Raza

Published in: Neural Computing and Applications | Issue 5/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Human activity recognition (HAR) is a complex and multifaceted problem. The research community has reported numerous approaches to perform HAR. Along with HAR approaches, various surveys have revealed HAR trends in various environments and applications. HAR is linked to a variety of technology-dependent daily life systems, such as human–computer interaction systems, security surveillance, video surveillance, healthcare surveillance, robotics, content-based information retrieval, and monitoring systems. Because of technological advancements, HAR trends change quickly and necessitate an up-to-date and broader perspective. This study offers an HAR taxonomy, which includes online/offline HAR, multimodal/unimodal HAR, handcrafted feature-based, and learning-based approaches. This study attempts to present the multidisciplinary nature of HAR, such as application areas, activity types, task complexities, benchmark datasets, and/methods. This research includes a comparative analysis of state-of-the-art HAR methods and a discussion of popular datasets. The selected studies have been categorized using taxonomy, and different attributes such as activity complexity, dataset size, and recognition rate have been used for their analysis. The comparative analysis of HAR approaches has also helped to highlight domain challenges and open research directions for HAR researchers to follow.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Beddiar DR, Nini B, Sabokrou M, Hadid A (2020) Vision-based human activity recognition: a survey. Multimed Tools Appl 79(41):30509–30555CrossRef Beddiar DR, Nini B, Sabokrou M, Hadid A (2020) Vision-based human activity recognition: a survey. Multimed Tools Appl 79(41):30509–30555CrossRef
2.
go back to reference Huang S-C (2010) An advanced motion detection algorithm with video quality analysis for video surveillance systems. IEEE Trans Circuits Syst Video Technol 21(1):1–14CrossRef Huang S-C (2010) An advanced motion detection algorithm with video quality analysis for video surveillance systems. IEEE Trans Circuits Syst Video Technol 21(1):1–14CrossRef
3.
go back to reference Cheng F-C, Huang S-C, Ruan S-J (2010) "Scene analysis for object detection in advanced surveillance systems using Laplacian distribution model. IEEE Trans Syst Man Cybern Part C 41(5):589–598CrossRef Cheng F-C, Huang S-C, Ruan S-J (2010) "Scene analysis for object detection in advanced surveillance systems using Laplacian distribution model. IEEE Trans Syst Man Cybern Part C 41(5):589–598CrossRef
4.
go back to reference Oral M, Deniz U (2007) Centre of mass model–a novel approach to background modelling for segmentation of moving objects. Image Vis Comput 25(8):1365–1376CrossRef Oral M, Deniz U (2007) Centre of mass model–a novel approach to background modelling for segmentation of moving objects. Image Vis Comput 25(8):1365–1376CrossRef
5.
go back to reference Yilmaz A, Li X, Shah M (2004) Contour-based object tracking with occlusion handling in video acquired using mobile cameras. IEEE Trans Pattern Anal Mach Intell 26(11):1531–1536CrossRef Yilmaz A, Li X, Shah M (2004) Contour-based object tracking with occlusion handling in video acquired using mobile cameras. IEEE Trans Pattern Anal Mach Intell 26(11):1531–1536CrossRef
6.
go back to reference Bernardin K, Stiefelhagen R (2008) Evaluating multiple object tracking performance: the CLEAR MOT metrics. EURASIP J Image Video Process 2008:1–10CrossRef Bernardin K, Stiefelhagen R (2008) Evaluating multiple object tracking performance: the CLEAR MOT metrics. EURASIP J Image Video Process 2008:1–10CrossRef
7.
go back to reference Bolme DS, Beveridge JR, Draper BA, Lui YM (2010) Visual object tracking using adaptive correlation filters. In: 2010 IEEE computer society conference on computer vision and pattern recognition, IEEE, pp 2544–2550 Bolme DS, Beveridge JR, Draper BA, Lui YM (2010) Visual object tracking using adaptive correlation filters. In: 2010 IEEE computer society conference on computer vision and pattern recognition, IEEE, pp 2544–2550
8.
go back to reference Cucchiara R, Grana C, Piccardi M, Prati A (2003) Detecting moving objects, ghosts, and shadows in video streams. IEEE Trans Pattern Anal Mach Intell 25(10):1337–1342CrossRef Cucchiara R, Grana C, Piccardi M, Prati A (2003) Detecting moving objects, ghosts, and shadows in video streams. IEEE Trans Pattern Anal Mach Intell 25(10):1337–1342CrossRef
9.
go back to reference Denman S, Fookes C, Sridharan S (2009) Improved simultaneous computation of motion detection and optical flow for object tracking. In: 2009 Digital image computing: techniques and applications, IEEE, pp 175–182 Denman S, Fookes C, Sridharan S (2009) Improved simultaneous computation of motion detection and optical flow for object tracking. In: 2009 Digital image computing: techniques and applications, IEEE, pp 175–182
10.
go back to reference Ince S, Konrad J (2008) Occlusion-aware optical flow estimation. IEEE Trans Image Process 17(8):1443–1451CrossRef Ince S, Konrad J (2008) Occlusion-aware optical flow estimation. IEEE Trans Image Process 17(8):1443–1451CrossRef
11.
go back to reference Morris BT, Trivedi MM (2008) A survey of vision-based trajectory learning and analysis for surveillance. IEEE Trans Circuits Syst Video Technol 18(8):1114–1127CrossRef Morris BT, Trivedi MM (2008) A survey of vision-based trajectory learning and analysis for surveillance. IEEE Trans Circuits Syst Video Technol 18(8):1114–1127CrossRef
12.
go back to reference Laptev I (2005) On space-time interest points. Int J Comput Vision 64(2–3):107–123CrossRef Laptev I (2005) On space-time interest points. Int J Comput Vision 64(2–3):107–123CrossRef
13.
go back to reference Blunsom P (2004) Maximum entropy markov models for semantic role labelling. Proc Australasian Lang Technol Workshop 2004:109–116 Blunsom P (2004) Maximum entropy markov models for semantic role labelling. Proc Australasian Lang Technol Workshop 2004:109–116
14.
go back to reference Nunez JC, Cabido R, Pantrigo JJ, Montemayor AS, Velez JF (2018) Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. Pattern Recogn 76:80–94CrossRef Nunez JC, Cabido R, Pantrigo JJ, Montemayor AS, Velez JF (2018) Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. Pattern Recogn 76:80–94CrossRef
15.
go back to reference Chen X, Guo H, Wang G, Zhang L (2017) Motion feature augmented recurrent neural network for skeleton-based dynamic hand gesture recognition. In: 2017 IEEE international conference on image processing (ICIP), IEEE, pp 2881–2885 Chen X, Guo H, Wang G, Zhang L (2017) Motion feature augmented recurrent neural network for skeleton-based dynamic hand gesture recognition. In: 2017 IEEE international conference on image processing (ICIP), IEEE, pp 2881–2885
16.
go back to reference Li C, Hou Y, Wang P, Li W (2017) Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Process Lett 24(5):624–628CrossRef Li C, Hou Y, Wang P, Li W (2017) Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Process Lett 24(5):624–628CrossRef
17.
go back to reference Kerber F, Puhl M, Krüger A (2017) User-independent real-time hand gesture recognition based on surface electromyography. In: Proceedings of the 19th international conference on human-computer interaction with mobile devices and services, pp 1–7 Kerber F, Puhl M, Krüger A (2017) User-independent real-time hand gesture recognition based on surface electromyography. In: Proceedings of the 19th international conference on human-computer interaction with mobile devices and services, pp 1–7
18.
go back to reference Vishwakarma S, Agrawal A (2013) A survey on activity recognition and behavior understanding in video surveillance. Vis Comput 29(10):983–1009CrossRef Vishwakarma S, Agrawal A (2013) A survey on activity recognition and behavior understanding in video surveillance. Vis Comput 29(10):983–1009CrossRef
19.
go back to reference Zhen X, Shao L, Maybank S, Chellappa R (2016) Handcrafted vs. learned representations for human action recognition. Image Vis Comput 55(2):39–41CrossRef Zhen X, Shao L, Maybank S, Chellappa R (2016) Handcrafted vs. learned representations for human action recognition. Image Vis Comput 55(2):39–41CrossRef
20.
go back to reference Sargano AB, Angelov P, Habib Z (2017) A comprehensive review on handcrafted and learning-based action representation approaches for human activity recognition. Appl Sci 7(1):110CrossRef Sargano AB, Angelov P, Habib Z (2017) A comprehensive review on handcrafted and learning-based action representation approaches for human activity recognition. Appl Sci 7(1):110CrossRef
21.
go back to reference Ke S-R, Thuc HLU, Lee Y-J, Hwang J-N, Yoo J-H, Choi K-H (2013) A review on video-based human activity recognition. Computers 2(2):88–131CrossRef Ke S-R, Thuc HLU, Lee Y-J, Hwang J-N, Yoo J-H, Choi K-H (2013) A review on video-based human activity recognition. Computers 2(2):88–131CrossRef
22.
go back to reference Cheng G, Wan Y, Saudagar A, Namuduri K, Buckles B (2015) Advances in human action recognition: a survey. arXiv preprint arXiv:1501.05964 Cheng G, Wan Y, Saudagar A, Namuduri K, Buckles B (2015) Advances in human action recognition: a survey. arXiv preprint arXiv:​1501.​05964
23.
go back to reference Dawn DD, Shaikh SH (2016) A comprehensive survey of human action recognition with spatio-temporal interest point (STIP) detector. Vis Comput 32(3):289–306CrossRef Dawn DD, Shaikh SH (2016) A comprehensive survey of human action recognition with spatio-temporal interest point (STIP) detector. Vis Comput 32(3):289–306CrossRef
24.
go back to reference Vrigkas M, Nikou C, Kakadiaris IA (2015) A review of human activity recognition methods. Front Robot AI 2:28CrossRef Vrigkas M, Nikou C, Kakadiaris IA (2015) A review of human activity recognition methods. Front Robot AI 2:28CrossRef
25.
go back to reference Herath S, Harandi M, Porikli F (2017) Going deeper into action recognition: a survey. Image Vis Comput 60:4–21CrossRef Herath S, Harandi M, Porikli F (2017) Going deeper into action recognition: a survey. Image Vis Comput 60:4–21CrossRef
26.
go back to reference Jegham I, Khalifa AB, Alouani I, Mahjoub MA (2020) Vision-based human action recognition: an overview and real world challenges. Forensic Sci Int Digit Invest 32:200901CrossRef Jegham I, Khalifa AB, Alouani I, Mahjoub MA (2020) Vision-based human action recognition: an overview and real world challenges. Forensic Sci Int Digit Invest 32:200901CrossRef
27.
go back to reference Wang Z et al (2019) A survey on human behavior recognition using channel state information. IEEE Access 7:155986–156024CrossRef Wang Z et al (2019) A survey on human behavior recognition using channel state information. IEEE Access 7:155986–156024CrossRef
28.
go back to reference Rodríguez-Moreno I, Martínez-Otzeta JM, Sierra B, Rodriguez I, Jauregi E (2019) Video activity recognition: state-of-the-art. Sensors 19(14):3160CrossRef Rodríguez-Moreno I, Martínez-Otzeta JM, Sierra B, Rodriguez I, Jauregi E (2019) Video activity recognition: state-of-the-art. Sensors 19(14):3160CrossRef
29.
go back to reference Liu J, Liu H, Chen Y, Wang Y, Wang C (2019) Wireless sensing for human activity: a survey. IEEE Commun Surv Tutor 22(3):1629–1645CrossRef Liu J, Liu H, Chen Y, Wang Y, Wang C (2019) Wireless sensing for human activity: a survey. IEEE Commun Surv Tutor 22(3):1629–1645CrossRef
30.
go back to reference Dang LM, Min K, Wang H, Piran MJ, Lee CH, Moon H (2020) Sensor-based and vision-based human activity recognition: a comprehensive survey. Pattern Recogn 108:107561CrossRef Dang LM, Min K, Wang H, Piran MJ, Lee CH, Moon H (2020) Sensor-based and vision-based human activity recognition: a comprehensive survey. Pattern Recogn 108:107561CrossRef
31.
go back to reference Chaurasia SK, Reddy S (2022) State-of-the-art survey on activity recognition and classification using smartphones and wearable sensors. Multimedia Tools Appl 81(1):1077–1108CrossRef Chaurasia SK, Reddy S (2022) State-of-the-art survey on activity recognition and classification using smartphones and wearable sensors. Multimedia Tools Appl 81(1):1077–1108CrossRef
32.
go back to reference Yao G, Lei T, Zhong J (2019) A review of convolutional-neural-network-based action recognition. Pattern Recogn Lett 118:14–22CrossRef Yao G, Lei T, Zhong J (2019) A review of convolutional-neural-network-based action recognition. Pattern Recogn Lett 118:14–22CrossRef
33.
go back to reference Zhang H-B et al (2019) A comprehensive survey of vision-based human action recognition methods. Sensors 19(5):1005CrossRef Zhang H-B et al (2019) A comprehensive survey of vision-based human action recognition methods. Sensors 19(5):1005CrossRef
34.
go back to reference Das B, Saha A (2021) A survey on current trends in human action recognition. In: Advances in medical physics and healthcare engineering, Springer, pp 443–453 Das B, Saha A (2021) A survey on current trends in human action recognition. In: Advances in medical physics and healthcare engineering, Springer, pp 443–453
35.
go back to reference Gupta N, Gupta SK, Pathak RK, Jain V, Rashidi P, Suri JS (2022) Human activity recognition in artificial intelligence framework: a narrative review. Artif Intell Rev 3:1–54 Gupta N, Gupta SK, Pathak RK, Jain V, Rashidi P, Suri JS (2022) Human activity recognition in artificial intelligence framework: a narrative review. Artif Intell Rev 3:1–54
36.
go back to reference Zhu F, Shao L, Xie J, Fang Y (2016) From handcrafted to learned representations for human action recognition: a survey. Image Vis Comput 55:42–52CrossRef Zhu F, Shao L, Xie J, Fang Y (2016) From handcrafted to learned representations for human action recognition: a survey. Image Vis Comput 55:42–52CrossRef
37.
go back to reference Tripathi RK, Jalal AS, Agrawal SC (2018) Suspicious human activity recognition: a review. Artif Intell Rev 50(2):283–339CrossRef Tripathi RK, Jalal AS, Agrawal SC (2018) Suspicious human activity recognition: a review. Artif Intell Rev 50(2):283–339CrossRef
38.
go back to reference Chaquet JM, Carmona EJ, Fernández-Caballero A (2013) A survey of video datasets for human action and activity recognition. Comput Vis Image Underst 117(6):633–659CrossRef Chaquet JM, Carmona EJ, Fernández-Caballero A (2013) A survey of video datasets for human action and activity recognition. Comput Vis Image Underst 117(6):633–659CrossRef
39.
go back to reference Zhang J, Li W, Ogunbona PO, Wang P, Tang C (2016) RGB-D-based action recognition datasets: a survey. Pattern Recogn 60:86–105CrossRef Zhang J, Li W, Ogunbona PO, Wang P, Tang C (2016) RGB-D-based action recognition datasets: a survey. Pattern Recogn 60:86–105CrossRef
40.
go back to reference Singh T, Vishwakarma DK (2019) Video benchmarks of human action datasets: a review. Artif Intell Rev 52(2):1107–1154CrossRef Singh T, Vishwakarma DK (2019) Video benchmarks of human action datasets: a review. Artif Intell Rev 52(2):1107–1154CrossRef
41.
go back to reference Wang J, Nie X, Xia Y, Wu Y, Zhu S-C (2014) Cross-view action modeling, learning and recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2649–2656 Wang J, Nie X, Xia Y, Wu Y, Zhu S-C (2014) Cross-view action modeling, learning and recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2649–2656
42.
go back to reference Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proceedings of the 17th international conference on pattern recognition, 2004. ICPR 2004, vol. 3: IEEE, pp 32–36 Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proceedings of the 17th international conference on pattern recognition, 2004. ICPR 2004, vol. 3: IEEE, pp 32–36
43.
go back to reference Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell 29(12):2247–2253CrossRef Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell 29(12):2247–2253CrossRef
44.
go back to reference Xia L, Chen C-C, Aggarwal JK (2012) View invariant human action recognition using histograms of 3d joints. In: 2012 IEEE computer society conference on computer vision and pattern recognition workshops, IEEE, pp 20–27 Xia L, Chen C-C, Aggarwal JK (2012) View invariant human action recognition using histograms of 3d joints. In: 2012 IEEE computer society conference on computer vision and pattern recognition workshops, IEEE, pp 20–27
45.
go back to reference Soomro K, Zamir AR, Shah M (2012) A dataset of 101 human action classes from videos in the wild. Center Res Comput Vis 2:666 Soomro K, Zamir AR, Shah M (2012) A dataset of 101 human action classes from videos in the wild. Center Res Comput Vis 2:666
46.
go back to reference Rahmani A, Mahmood A, Huynh D, Mian A (2014) Action classification with locality-constrained linear coding. In: 2014 22nd international conference on pattern recognition, IEEE, pp 3511–3516 Rahmani A, Mahmood A, Huynh D, Mian A (2014) Action classification with locality-constrained linear coding. In: 2014 22nd international conference on pattern recognition, IEEE, pp 3511–3516
47.
go back to reference Weinland D, Ronfard R, Boyer E (2006) Free viewpoint action recognition using motion history volumes. Comput Vis Image Underst 104(2–3):249–257CrossRef Weinland D, Ronfard R, Boyer E (2006) Free viewpoint action recognition using motion history volumes. Comput Vis Image Underst 104(2–3):249–257CrossRef
48.
go back to reference Niebles JC, Chen C-W, Fei-Fei L (2010) Modeling temporal structure of decomposable motion segments for activity classification. European conference on computer vision. Springer, Berlin, pp 392–405 Niebles JC, Chen C-W, Fei-Fei L (2010) Modeling temporal structure of decomposable motion segments for activity classification. European conference on computer vision. Springer, Berlin, pp 392–405
49.
go back to reference Marszalek M, Laptev I, Schmid C (2009) Actions in context. In: 2009 IEEE conference on computer vision and pattern recognition, IEEE, pp 2929–2936 Marszalek M, Laptev I, Schmid C (2009) Actions in context. In: 2009 IEEE conference on computer vision and pattern recognition, IEEE, pp 2929–2936
50.
go back to reference Reddy KK, Shah M (2013) Recognizing 50 human action categories of web videos. Mach Vis Appl 24(5):971–981CrossRef Reddy KK, Shah M (2013) Recognizing 50 human action categories of web videos. Mach Vis Appl 24(5):971–981CrossRef
51.
go back to reference Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 1725–1732 Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 1725–1732
52.
go back to reference Heilbron FC, Escorcia V, Ghanem B, Niebles JC (2015) Activitynet: A large-scale video benchmark for human activity understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 961–970 Heilbron FC, Escorcia V, Ghanem B, Niebles JC (2015) Activitynet: A large-scale video benchmark for human activity understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 961–970
54.
go back to reference Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. In: 2011 international conference on computer vision, IEEE, pp 2556–2563 Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. In: 2011 international conference on computer vision, IEEE, pp 2556–2563
55.
go back to reference Yu S, Tan D, Tan T (2006) A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. In: 18th international conference on pattern recognition (ICPR'06), vol 4: IEEE, pp 441–444 Yu S, Tan D, Tan T (2006) A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. In: 18th international conference on pattern recognition (ICPR'06), vol 4: IEEE, pp 441–444
56.
go back to reference Gu C et al. (2018) Ava: a video dataset of spatio-temporally localized atomic visual actions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6047–6056 Gu C et al. (2018) Ava: a video dataset of spatio-temporally localized atomic visual actions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6047–6056
57.
go back to reference Sultani W, Chen C, Shah M (2018) Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6479–6488 Sultani W, Chen C, Shah M (2018) Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6479–6488
58.
go back to reference Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3d points. In: 2010 IEEE computer society conference on computer vision and pattern recognition-workshops, IEEE, pp 9–14 Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3d points. In: 2010 IEEE computer society conference on computer vision and pattern recognition-workshops, IEEE, pp 9–14
59.
go back to reference Berclaz J, Fleuret F, Turetken E, Fua P (2011) Multiple object tracking using k-shortest paths optimization. IEEE Trans Pattern Anal Mach Intell 33(9):1806–1819CrossRef Berclaz J, Fleuret F, Turetken E, Fua P (2011) Multiple object tracking using k-shortest paths optimization. IEEE Trans Pattern Anal Mach Intell 33(9):1806–1819CrossRef
60.
go back to reference Hu J-F, Zheng W-S, Ma L, Wang G, Lai J (2016) Real-time RGB-D activity prediction by soft regression. European Conference on Computer Vision. Springer, Berlin, pp 280–296 Hu J-F, Zheng W-S, Ma L, Wang G, Lai J (2016) Real-time RGB-D activity prediction by soft regression. European Conference on Computer Vision. Springer, Berlin, pp 280–296
61.
go back to reference Sung J, Ponce C, Selman B, Saxena A (2012) Unstructured human activity detection from rgbd images. In: 2012 IEEE international conference on robotics and automation, IEEE, pp 842–849 Sung J, Ponce C, Selman B, Saxena A (2012) Unstructured human activity detection from rgbd images. In: 2012 IEEE international conference on robotics and automation, IEEE, pp 842–849
62.
go back to reference Koppula HS, Gupta R, Saxena A (2013) Learning human activities and object affordances from rgb-d videos. Int J Robot Res 32(8):951–970CrossRef Koppula HS, Gupta R, Saxena A (2013) Learning human activities and object affordances from rgb-d videos. Int J Robot Res 32(8):951–970CrossRef
63.
go back to reference Chen C, Jafari R, Kehtarnavaz N (2015) UTD-MHAD: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: 2015 IEEE international conference on image processing (ICIP), IEEE, pp 168–172 Chen C, Jafari R, Kehtarnavaz N (2015) UTD-MHAD: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: 2015 IEEE international conference on image processing (ICIP), IEEE, pp 168–172
64.
go back to reference Ni B, Wang G, Moulin P (2011) Rgbd-hudaact: A color-depth video database for human daily activity recognition. In: 2011 IEEE international conference on computer vision workshops (ICCV workshops), IEEE, pp 1147–1153 Ni B, Wang G, Moulin P (2011) Rgbd-hudaact: A color-depth video database for human daily activity recognition. In: 2011 IEEE international conference on computer vision workshops (ICCV workshops), IEEE, pp 1147–1153
65.
go back to reference Ofli F, Chaudhry R, Kurillo G, Vidal R, Bajcsy R (2013) Berkeley mhad: a comprehensive multimodal human action database. In: 2013 IEEE workshop on applications of computer vision (WACV), IEEE, pp 53–60 Ofli F, Chaudhry R, Kurillo G, Vidal R, Bajcsy R (2013) Berkeley mhad: a comprehensive multimodal human action database. In: 2013 IEEE workshop on applications of computer vision (WACV), IEEE, pp 53–60
66.
go back to reference Wolf C et al (2014) Evaluation of video activity localizations integrating quality and quantity measurements. Comput Vis Image Underst 127:14–30CrossRef Wolf C et al (2014) Evaluation of video activity localizations integrating quality and quantity measurements. Comput Vis Image Underst 127:14–30CrossRef
67.
go back to reference Bloom V, Argyriou V, Makris D (2014) G3di: A gaming interaction dataset with a real time detection and evaluation framework. European conference on computer vision. Springer, Berlin, pp 698–712 Bloom V, Argyriou V, Makris D (2014) G3di: A gaming interaction dataset with a real time detection and evaluation framework. European conference on computer vision. Springer, Berlin, pp 698–712
68.
go back to reference Shahroudy A, Liu J, Ng T-T, Wang G (2016) Ntu rgb+ d: a large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1010–1019 Shahroudy A, Liu J, Ng T-T, Wang G (2016) Ntu rgb+ d: a large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1010–1019
69.
go back to reference Van Gemeren C, Tan RT, Poppe R, Veltkamp RC (2014) Dyadic interaction detection from pose and flow. International Workshop on Human Behavior Understanding. Springer, Berlin, pp 101–115CrossRef Van Gemeren C, Tan RT, Poppe R, Veltkamp RC (2014) Dyadic interaction detection from pose and flow. International Workshop on Human Behavior Understanding. Springer, Berlin, pp 101–115CrossRef
70.
go back to reference Jalal A, Kim Y-H, Kim Y-J, Kamal S, Kim D (2017) Robust human activity recognition from depth video using spatiotemporal multi-fused features. Pattern Recogn 61:295–308CrossRef Jalal A, Kim Y-H, Kim Y-J, Kamal S, Kim D (2017) Robust human activity recognition from depth video using spatiotemporal multi-fused features. Pattern Recogn 61:295–308CrossRef
71.
go back to reference Lin J, Gan C, Han S (2019) Tsm: temporal shift module for efficient video understanding. In: Proceedings of the IEEE international conference on computer vision, pp 7083–7093 Lin J, Gan C, Han S (2019) Tsm: temporal shift module for efficient video understanding. In: Proceedings of the IEEE international conference on computer vision, pp 7083–7093
72.
go back to reference Soomro K, Idrees H, Shah M (2016) Predicting the where and what of actors and actions through online action localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2648–2657 Soomro K, Idrees H, Shah M (2016) Predicting the where and what of actors and actions through online action localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2648–2657
73.
go back to reference Singh G, Saha S, Sapienza M, Torr PH, Cuzzolin F (2017) Online real-time multiple spatiotemporal action localisation and prediction. In: Proceedings of the IEEE international conference on computer vision, pp 3637–3646 Singh G, Saha S, Sapienza M, Torr PH, Cuzzolin F (2017) Online real-time multiple spatiotemporal action localisation and prediction. In: Proceedings of the IEEE international conference on computer vision, pp 3637–3646
74.
go back to reference Zolfaghari M, Singh K, Brox T (2018) Eco: efficient convolutional network for online video understanding. In: Proceedings of the European conference on computer vision (ECCV), pp 695–712 Zolfaghari M, Singh K, Brox T (2018) Eco: efficient convolutional network for online video understanding. In: Proceedings of the European conference on computer vision (ECCV), pp 695–712
75.
go back to reference Xu M, Gao M, Chen Y-T, Davis LS, Crandall DJ (2019) Temporal recurrent networks for online action detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 5532–5541 Xu M, Gao M, Chen Y-T, Davis LS, Crandall DJ (2019) Temporal recurrent networks for online action detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 5532–5541
76.
go back to reference Gao M, Zhou Y, Xu R, Socher R, Xiong C (2020) WOAD: weakly supervised online action detection in untrimmed videos. arXiv preprint arXiv:2006.03732 Gao M, Zhou Y, Xu R, Socher R, Xiong C (2020) WOAD: weakly supervised online action detection in untrimmed videos. arXiv preprint arXiv:​2006.​03732
77.
go back to reference Ye Y, Li K, Qi G-J, Hua KA (2015) Temporal order-preserving dynamic quantization for human action recognition from multimodal sensor streams. In: Proceedings of the 5th ACM on international conference on multimedia retrieval, pp 99–106 Ye Y, Li K, Qi G-J, Hua KA (2015) Temporal order-preserving dynamic quantization for human action recognition from multimodal sensor streams. In: Proceedings of the 5th ACM on international conference on multimedia retrieval, pp 99–106
78.
go back to reference Vrigkas M, Nikou C, Kakadiadis IA (2014) Classifying behavioral attributes using conditional random fields. Hellenic conference on artificial intelligence. Springer, Berlin, pp 95–104 Vrigkas M, Nikou C, Kakadiadis IA (2014) Classifying behavioral attributes using conditional random fields. Hellenic conference on artificial intelligence. Springer, Berlin, pp 95–104
79.
go back to reference Shahroudy A, Ng T-T, Yang Q, Wang G (2015) Multimodal multipart learning for action recognition in depth videos. IEEE Trans Pattern Anal Mach Intell 38(10):2123–2129CrossRef Shahroudy A, Ng T-T, Yang Q, Wang G (2015) Multimodal multipart learning for action recognition in depth videos. IEEE Trans Pattern Anal Mach Intell 38(10):2123–2129CrossRef
80.
go back to reference Wu Z, Jiang Y-G, Wang X, Ye H, Xue X, Wang J (2015) Fusing multi-stream deep networks for video classification. arXiv preprint arXiv:1509.06086 Wu Z, Jiang Y-G, Wang X, Ye H, Xue X, Wang J (2015) Fusing multi-stream deep networks for video classification. arXiv preprint arXiv:​1509.​06086
81.
82.
go back to reference Zhang C, Tian Y, Guo X, Liu J (2018) DAAL: deep activation-based attribute learning for action recognition in depth videos. Comput Vis Image Underst 167:37–49CrossRef Zhang C, Tian Y, Guo X, Liu J (2018) DAAL: deep activation-based attribute learning for action recognition in depth videos. Comput Vis Image Underst 167:37–49CrossRef
83.
go back to reference Franco A, Magnani A, Maio D (2020) A multimodal approach for human activity recognition based on skeleton and RGB data. Pattern Recogn Lett 131:293–299CrossRef Franco A, Magnani A, Maio D (2020) A multimodal approach for human activity recognition based on skeleton and RGB data. Pattern Recogn Lett 131:293–299CrossRef
84.
go back to reference Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267CrossRef Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267CrossRef
85.
go back to reference Hu Y, Cao L, Lv F, Yan S, Gong Y, Huang TS (2009) Action detection in complex scenes with spatial and temporal ambiguities. In: 2009 IEEE 12th international conference on computer vision, IEEE, pp 128–135 Hu Y, Cao L, Lv F, Yan S, Gong Y, Huang TS (2009) Action detection in complex scenes with spatial and temporal ambiguities. In: 2009 IEEE 12th international conference on computer vision, IEEE, pp 128–135
86.
go back to reference Roh M-C, Shin H-K, Lee S-W (2010) View-independent human action recognition with volume motion template on single stereo camera. Pattern Recogn Lett 31(7):639–647CrossRef Roh M-C, Shin H-K, Lee S-W (2010) View-independent human action recognition with volume motion template on single stereo camera. Pattern Recogn Lett 31(7):639–647CrossRef
87.
go back to reference Qian H, Mao Y, Xiang W, Wang Z (2010) Recognition of human activities using SVM multi-class classifier. Pattern Recogn Lett 31(2):100–111CrossRef Qian H, Mao Y, Xiang W, Wang Z (2010) Recognition of human activities using SVM multi-class classifier. Pattern Recogn Lett 31(2):100–111CrossRef
88.
go back to reference Kim W, Lee J, Kim M, Oh D, Kim C (2010) Human action recognition using ordinal measure of accumulated motion. EURASIP J Adv Signal Process 2010(1):1–11CrossRef Kim W, Lee J, Kim M, Oh D, Kim C (2010) Human action recognition using ordinal measure of accumulated motion. EURASIP J Adv Signal Process 2010(1):1–11CrossRef
89.
go back to reference Ijsselmuiden J, Stiefelhagen R (2010) Towards high-level human activity recognition through computer vision and temporal logic. Annual conference on artificial intelligence. Springer, Berlin, pp 426–435 Ijsselmuiden J, Stiefelhagen R (2010) Towards high-level human activity recognition through computer vision and temporal logic. Annual conference on artificial intelligence. Springer, Berlin, pp 426–435
90.
go back to reference Fang C-H, Chen J-C, Tseng C-C, Lien J-JJ (2009) Human action recognition using spatio-temporal classification. Asian conference on computer vision. Springer, Berlin, pp 98–109 Fang C-H, Chen J-C, Tseng C-C, Lien J-JJ (2009) Human action recognition using spatio-temporal classification. Asian conference on computer vision. Springer, Berlin, pp 98–109
91.
go back to reference Ziaeefard M, Ebrahimnezhad H (2010) Hierarchical human action recognition by normalized-polar histogram. In: 2010 20th international conference on pattern recognition, IEEE, pp 3720–3723 Ziaeefard M, Ebrahimnezhad H (2010) Hierarchical human action recognition by normalized-polar histogram. In: 2010 20th international conference on pattern recognition, IEEE, pp 3720–3723
92.
go back to reference Wang Y, Mori G (2009) Human action recognition by semilatent topic models. IEEE Trans Pattern Anal Mach Intell 31(10):1762–1774CrossRef Wang Y, Mori G (2009) Human action recognition by semilatent topic models. IEEE Trans Pattern Anal Mach Intell 31(10):1762–1774CrossRef
93.
go back to reference Guo K, Ishwar P, Konrad J (2009) Action recognition in video by covariance matching of silhouette tunnels. In: 2009 XXII Brazilian symposium on computer graphics and image processing, IEEE, pp 299–306 Guo K, Ishwar P, Konrad J (2009) Action recognition in video by covariance matching of silhouette tunnels. In: 2009 XXII Brazilian symposium on computer graphics and image processing, IEEE, pp 299–306
94.
go back to reference Kim T-K, Cipolla R (2008) Canonical correlation analysis of video volume tensors for action categorization and detection. IEEE Trans Pattern Anal Mach Intell 31(8):1415–1428 Kim T-K, Cipolla R (2008) Canonical correlation analysis of video volume tensors for action categorization and detection. IEEE Trans Pattern Anal Mach Intell 31(8):1415–1428
95.
go back to reference Messing R, Pal C, Kautz H (2009) Activity recognition using the velocity histories of tracked keypoints. In: 2009 IEEE 12th international conference on computer vision, IEEE, pp 104–111 Messing R, Pal C, Kautz H (2009) Activity recognition using the velocity histories of tracked keypoints. In: 2009 IEEE 12th international conference on computer vision, IEEE, pp 104–111
96.
go back to reference Wang H, Kläser A, Schmid C, Liu C-L (2011) Action recognition by dense trajectories. In: CVPR 2011, IEEE, pp 3169–3176 Wang H, Kläser A, Schmid C, Liu C-L (2011) Action recognition by dense trajectories. In: CVPR 2011, IEEE, pp 3169–3176
97.
go back to reference Dollár P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: 2005 IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance, IEEE, pp 65–72 Dollár P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: 2005 IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance, IEEE, pp 65–72
98.
go back to reference Jones S, Shao L, Zhang J, Liu Y (2012) Relevance feedback for real-world human action retrieval. Pattern Recogn Lett 33(4):446–452CrossRef Jones S, Shao L, Zhang J, Liu Y (2012) Relevance feedback for real-world human action retrieval. Pattern Recogn Lett 33(4):446–452CrossRef
99.
go back to reference Gilbert A, Illingworth J, Bowden R (2009) Fast realistic multi-action recognition using mined dense spatio-temporal features. In: 2009 IEEE 12th international conference on computer vision, IEEE, pp 925–931 Gilbert A, Illingworth J, Bowden R (2009) Fast realistic multi-action recognition using mined dense spatio-temporal features. In: 2009 IEEE 12th international conference on computer vision, IEEE, pp 925–931
100.
go back to reference Sadek S, Al-Hamadi A, Michaelis B, Sayed U (2011) An action recognition scheme using fuzzy log-polar histogram and temporal self-similarity. EURASIP J Adv Signal Process 2011(1):540375CrossRef Sadek S, Al-Hamadi A, Michaelis B, Sayed U (2011) An action recognition scheme using fuzzy log-polar histogram and temporal self-similarity. EURASIP J Adv Signal Process 2011(1):540375CrossRef
101.
go back to reference Ikizler-Cinbis N, Sclaroff S (2010) Object, scene and actions: Combining multiple features for human action recognition. European conference on computer vision. Springer, Berlin, pp 494–507 Ikizler-Cinbis N, Sclaroff S (2010) Object, scene and actions: Combining multiple features for human action recognition. European conference on computer vision. Springer, Berlin, pp 494–507
102.
go back to reference Minhas R, Baradarani A, Seifzadeh S, Wu QJ (2010) Human action recognition using extreme learning machine based on visual vocabularies. Neurocomputing 73(10–12):1906–1917CrossRef Minhas R, Baradarani A, Seifzadeh S, Wu QJ (2010) Human action recognition using extreme learning machine based on visual vocabularies. Neurocomputing 73(10–12):1906–1917CrossRef
103.
go back to reference Darrell T, Pentland A (1993) Space-time gestures. In: Proceedings of IEEE conference on computer vision and pattern recognition, IEEE, pp 335–340 Darrell T, Pentland A (1993) Space-time gestures. In: Proceedings of IEEE conference on computer vision and pattern recognition, IEEE, pp 335–340
104.
go back to reference Gavrila DM, Davis LS (1996) 3-D model-based tracking of humans in action: a multi-view approach. In: Proceedings cvpr ieee computer society conference on computer vision and pattern recognition, IEEE, pp 73–80 Gavrila DM, Davis LS (1996) 3-D model-based tracking of humans in action: a multi-view approach. In: Proceedings cvpr ieee computer society conference on computer vision and pattern recognition, IEEE, pp 73–80
105.
go back to reference Veeraraghavan A, Chellappa R, Roy-Chowdhury AK (2006) The function space of an activity. In: 2006 IEEE Computer society conference on computer vision and pattern recognition (CVPR'06), vol 1: IEEE, pp 959–968 Veeraraghavan A, Chellappa R, Roy-Chowdhury AK (2006) The function space of an activity. In: 2006 IEEE Computer society conference on computer vision and pattern recognition (CVPR'06), vol 1: IEEE, pp 959–968
106.
go back to reference Yacoob Y, Black MJ (1999) Parameterized modeling and recognition of activities. Comput Vis Image Underst 73(2):232–247CrossRef Yacoob Y, Black MJ (1999) Parameterized modeling and recognition of activities. Comput Vis Image Underst 73(2):232–247CrossRef
107.
go back to reference Efros AA, Berg AC, Mori G, Malik J (2003) Recognizing action at a distance. In: Null, IEEE, p 726 Efros AA, Berg AC, Mori G, Malik J (2003) Recognizing action at a distance. In: Null, IEEE, p 726
108.
go back to reference Lublinerman R, Ozay N, Zarpalas D, Camps O (2006) Activity recognition from silhouettes using linear systems and model (in) validation techniques. In: 18th international conference on pattern recognition (ICPR'06), vol 1: IEEE, pp 347–350 Lublinerman R, Ozay N, Zarpalas D, Camps O (2006) Activity recognition from silhouettes using linear systems and model (in) validation techniques. In: 18th international conference on pattern recognition (ICPR'06), vol 1: IEEE, pp 347–350
109.
go back to reference Jiang H, Drew MS, Li Z-N (2006) Successive convex matching for action detection. In: 2006 IEEE Computer society conference on computer vision and pattern recognition (CVPR'06), vol 2: IEEE, pp 1646–1653 Jiang H, Drew MS, Li Z-N (2006) Successive convex matching for action detection. In: 2006 IEEE Computer society conference on computer vision and pattern recognition (CVPR'06), vol 2: IEEE, pp 1646–1653
110.
go back to reference Lin Z, Jiang Z, Davis LS (2009) Recognizing actions by shape-motion prototype trees. In: 2009 IEEE 12th international conference on computer vision, IEEE, pp 444–451 Lin Z, Jiang Z, Davis LS (2009) Recognizing actions by shape-motion prototype trees. In: 2009 IEEE 12th international conference on computer vision, IEEE, pp 444–451
111.
go back to reference Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time-sequential images using hidden markov model. CVPR 92:379–385 Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time-sequential images using hidden markov model. CVPR 92:379–385
112.
go back to reference Starner T, Pentland A (1997) Real-time american sign language recognition from video using hidden Markov models. In: Motion-based recognition, Springer, pp 227–243 Starner T, Pentland A (1997) Real-time american sign language recognition from video using hidden Markov models. In: Motion-based recognition, Springer, pp 227–243
113.
go back to reference Vogler C, Metaxas D (1999) Parallel hidden Markov models for American sign language recognition. In: Proceedings of the seventh IEEE international conference on computer vision, vol 1: IEEE, pp 116–122 Vogler C, Metaxas D (1999) Parallel hidden Markov models for American sign language recognition. In: Proceedings of the seventh IEEE international conference on computer vision, vol 1: IEEE, pp 116–122
114.
go back to reference Bobick AF, Wilson AD (1997) A state-based approach to the representation and recognition of gesture. IEEE Trans Pattern Anal Mach Intell 19(12):1325–1337CrossRef Bobick AF, Wilson AD (1997) A state-based approach to the representation and recognition of gesture. IEEE Trans Pattern Anal Mach Intell 19(12):1325–1337CrossRef
115.
go back to reference Oliver NM, Rosario B, Pentland AP (2000) A Bayesian computer vision system for modeling human interactions. IEEE Trans Pattern Anal Mach Intell 22(8):831–843CrossRef Oliver NM, Rosario B, Pentland AP (2000) A Bayesian computer vision system for modeling human interactions. IEEE Trans Pattern Anal Mach Intell 22(8):831–843CrossRef
116.
go back to reference Park S, Aggarwal JK (2004) A hierarchical Bayesian network for event recognition of human actions and interactions. Multimedia Syst 10(2):164–179CrossRef Park S, Aggarwal JK (2004) A hierarchical Bayesian network for event recognition of human actions and interactions. Multimedia Syst 10(2):164–179CrossRef
117.
go back to reference Natarajan P, Nevatia R (2007) Coupled hidden semi markov models for activity recognition. In: 2007 IEEE workshop on motion and video computing (WMVC'07), IEEE, pp 10–10 Natarajan P, Nevatia R (2007) Coupled hidden semi markov models for activity recognition. In: 2007 IEEE workshop on motion and video computing (WMVC'07), IEEE, pp 10–10
118.
go back to reference Gupta A, Davis LS (2007) Objects in action: An approach for combining action understanding and object perception. In: 2007 IEEE conference on computer vision and pattern recognition, IEEE, pp 1–8 Gupta A, Davis LS (2007) Objects in action: An approach for combining action understanding and object perception. In: 2007 IEEE conference on computer vision and pattern recognition, IEEE, pp 1–8
119.
go back to reference Moore DJ, Essa IA, Hayes MH (1999) Exploiting human actions and object context for recognition tasks. In: Proceedings of the seventh IEEE international conference on computer vision, vol 1: IEEE, pp 80–86 Moore DJ, Essa IA, Hayes MH (1999) Exploiting human actions and object context for recognition tasks. In: Proceedings of the seventh IEEE international conference on computer vision, vol 1: IEEE, pp 80–86
120.
go back to reference Yu E, Aggarwal JK (2009) Human action recognition with extremities as semantic posture representation. In: 2009 IEEE computer society conference on computer vision and pattern recognition workshops, IEEE, pp 1–8 Yu E, Aggarwal JK (2009) Human action recognition with extremities as semantic posture representation. In: 2009 IEEE computer society conference on computer vision and pattern recognition workshops, IEEE, pp 1–8
121.
go back to reference Kellokumpu V, Zhao G, Pietikäinen M (2011) Recognition of human actions using texture descriptors. Mach Vis Appl 22(5):767–780CrossRef Kellokumpu V, Zhao G, Pietikäinen M (2011) Recognition of human actions using texture descriptors. Mach Vis Appl 22(5):767–780CrossRef
122.
go back to reference Shi Q, Cheng L, Wang L, Smola A (2011) Human action segmentation and recognition using discriminative semi-Markov models. Int J Comput Vision 93(1):22–32MATHCrossRef Shi Q, Cheng L, Wang L, Smola A (2011) Human action segmentation and recognition using discriminative semi-Markov models. Int J Comput Vision 93(1):22–32MATHCrossRef
123.
go back to reference Wang L, Suter D (2007) Recognizing human activities from silhouettes: motion subspace and factorial discriminative graphical model. In: 2007 IEEE conference on computer vision and pattern recognition, IEEE, pp 1–8 Wang L, Suter D (2007) Recognizing human activities from silhouettes: motion subspace and factorial discriminative graphical model. In: 2007 IEEE conference on computer vision and pattern recognition, IEEE, pp 1–8
124.
go back to reference Rahman SA, Cho S-Y, Leung M (2012) Recognising human actions by analysing negative spaces. IET Comput Vision 6(3):197–213CrossRef Rahman SA, Cho S-Y, Leung M (2012) Recognising human actions by analysing negative spaces. IET Comput Vision 6(3):197–213CrossRef
125.
go back to reference Vishwakarma DK, Kapoor R (2015) Hybrid classifier based human activity recognition using the silhouette and cells. Expert Syst Appl 42(20):6957–6965CrossRef Vishwakarma DK, Kapoor R (2015) Hybrid classifier based human activity recognition using the silhouette and cells. Expert Syst Appl 42(20):6957–6965CrossRef
126.
go back to reference Junejo IN, Junejo KN, Al Aghbari Z (2014) Silhouette-based human action recognition using SAX-Shapes. The Visual Comput 30(3):259–269CrossRef Junejo IN, Junejo KN, Al Aghbari Z (2014) Silhouette-based human action recognition using SAX-Shapes. The Visual Comput 30(3):259–269CrossRef
127.
go back to reference Chaaraoui AA, Climent-Pérez P, Flórez-Revuelta F (2013) Silhouette-based human action recognition using sequences of key poses. Pattern Recogn Lett 34(15):1799–1807CrossRef Chaaraoui AA, Climent-Pérez P, Flórez-Revuelta F (2013) Silhouette-based human action recognition using sequences of key poses. Pattern Recogn Lett 34(15):1799–1807CrossRef
128.
go back to reference Chaaraoui AA, Flórez-Revuelta F (2014) A low-dimensional radial silhouette-based feature for fast human action recognition fusing multiple views. Int Schol Res Notices 2014:6666 Chaaraoui AA, Flórez-Revuelta F (2014) A low-dimensional radial silhouette-based feature for fast human action recognition fusing multiple views. Int Schol Res Notices 2014:6666
129.
go back to reference Cheema S, Eweiwi A, Thurau C, Bauckhage C (2011) Action recognition by learning discriminative key poses. In: 2011 IEEE international conference on computer vision workshops (ICCV Workshops), IEEE, pp 1302–1309 Cheema S, Eweiwi A, Thurau C, Bauckhage C (2011) Action recognition by learning discriminative key poses. In: 2011 IEEE international conference on computer vision workshops (ICCV Workshops), IEEE, pp 1302–1309
130.
go back to reference Chun S, Lee C-S (2016) Human action recognition using histogram of motion intensity and direction from multiple views. IET Comput Vision 10(4):250–257CrossRef Chun S, Lee C-S (2016) Human action recognition using histogram of motion intensity and direction from multiple views. IET Comput Vision 10(4):250–257CrossRef
131.
go back to reference Murtaza F, Yousaf MH, Velastin SA (2016) Multi-view human action recognition using 2D motion templates based on MHIs and their HOG description. IET Comput Vision 10(7):758–767CrossRef Murtaza F, Yousaf MH, Velastin SA (2016) Multi-view human action recognition using 2D motion templates based on MHIs and their HOG description. IET Comput Vision 10(7):758–767CrossRef
132.
go back to reference Ladjailia A, Bouchrika I, Merouani HF, Harrati N, Mahfouf Z (2020) Human activity recognition via optical flow: decomposing activities into basic actions. Neural Comput Appl 32(21):16387–16400CrossRef Ladjailia A, Bouchrika I, Merouani HF, Harrati N, Mahfouf Z (2020) Human activity recognition via optical flow: decomposing activities into basic actions. Neural Comput Appl 32(21):16387–16400CrossRef
133.
go back to reference Ahmad M, Lee S-W (2006) HMM-based human action recognition using multiview image sequences. In: 18th international conference on pattern recognition (ICPR'06), vol 1: IEEE, pp 263–266 Ahmad M, Lee S-W (2006) HMM-based human action recognition using multiview image sequences. In: 18th international conference on pattern recognition (ICPR'06), vol 1: IEEE, pp 263–266
134.
go back to reference Pehlivan S, Forsyth DA (2014) Recognizing activities in multiple views with fusion of frame judgments. Image Vis Comput 32(4):237–249CrossRef Pehlivan S, Forsyth DA (2014) Recognizing activities in multiple views with fusion of frame judgments. Image Vis Comput 32(4):237–249CrossRef
135.
go back to reference Jiang Z, Lin Z, Davis L (2012) Recognizing human actions by learning and matching shape-motion prototype trees. IEEE Trans Pattern Anal Mach Intell 34(3):533–547CrossRef Jiang Z, Lin Z, Davis L (2012) Recognizing human actions by learning and matching shape-motion prototype trees. IEEE Trans Pattern Anal Mach Intell 34(3):533–547CrossRef
136.
go back to reference Eweiwi A, Cheema S, Thurau C, Bauckhage C (2011) Temporal key poses for human action recognition. In: 2011 IEEE international conference on computer vision workshops (ICCV Workshops), IEEE, pp 1310–1317 Eweiwi A, Cheema S, Thurau C, Bauckhage C (2011) Temporal key poses for human action recognition. In: 2011 IEEE international conference on computer vision workshops (ICCV Workshops), IEEE, pp 1310–1317
137.
go back to reference Shi Y, Huang Y, Minnen D, Bobick A, Essa I (2004) Propagation networks for recognition of partially ordered sequential action. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, CVPR 2004, vol. 2: IEEE, pp II–II Shi Y, Huang Y, Minnen D, Bobick A, Essa I (2004) Propagation networks for recognition of partially ordered sequential action. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, CVPR 2004, vol. 2: IEEE, pp II–II
138.
go back to reference Yin J, Meng Y (2010) Human activity recognition in video using a hierarchical probabilistic latent model. In: 2010 IEEE computer society conference on computer vision and pattern recognition-workshops, IEEE, pp 15–20 Yin J, Meng Y (2010) Human activity recognition in video using a hierarchical probabilistic latent model. In: 2010 IEEE computer society conference on computer vision and pattern recognition-workshops, IEEE, pp 15–20
139.
go back to reference Mauthner T, Roth PM, Bischof H (2010) Temporal feature weighting for prototype-based action recognition. Asian conference on computer vision. Springer, Berlin, pp 566–579 Mauthner T, Roth PM, Bischof H (2010) Temporal feature weighting for prototype-based action recognition. Asian conference on computer vision. Springer, Berlin, pp 566–579
140.
go back to reference Han L, Wu X, Liang W, Hou G, Jia Y (2010) Discriminative human action recognition in the learned hierarchical manifold space. Image Vis Comput 28(5):836–849CrossRef Han L, Wu X, Liang W, Hou G, Jia Y (2010) Discriminative human action recognition in the learned hierarchical manifold space. Image Vis Comput 28(5):836–849CrossRef
141.
go back to reference Zeng Z, Ji Q (2010) Knowledge based activity recognition with dynamic bayesian network. European conference on computer vision. Springer, Berlin, pp 532–546 Zeng Z, Ji Q (2010) Knowledge based activity recognition with dynamic bayesian network. European conference on computer vision. Springer, Berlin, pp 532–546
142.
go back to reference Minnen D, Essa I, Starner T (2003) Expectation grammars: leveraging high-level expectations for activity recognition. In: 2003 IEEE computer society conference on computer vision and pattern recognition, 2003. Proceedings, vol 2: IEEE, pp II–II Minnen D, Essa I, Starner T (2003) Expectation grammars: leveraging high-level expectations for activity recognition. In: 2003 IEEE computer society conference on computer vision and pattern recognition, 2003. Proceedings, vol 2: IEEE, pp II–II
143.
go back to reference Moore D, Essa I (2002) Recognizing multitasked activities from video using stochastic context-free grammar. In: AAAI/IAAI, pp 770–776 Moore D, Essa I (2002) Recognizing multitasked activities from video using stochastic context-free grammar. In: AAAI/IAAI, pp 770–776
144.
go back to reference Kitani KM, Sato Y, Sugimoto A (2008) Recovering the basic structure of human activities from noisy video-based symbol strings. Int J Pattern Recognit Artif Intell 22(08):1621–1646CrossRef Kitani KM, Sato Y, Sugimoto A (2008) Recovering the basic structure of human activities from noisy video-based symbol strings. Int J Pattern Recognit Artif Intell 22(08):1621–1646CrossRef
145.
go back to reference Wang L, Wang Y, Gao W (2011) Mining layered grammar rules for action recognition. Int J Comput Vision 93(2):162–182MATHCrossRef Wang L, Wang Y, Gao W (2011) Mining layered grammar rules for action recognition. Int J Comput Vision 93(2):162–182MATHCrossRef
146.
go back to reference Nevatia R, Hobbs J, Bolles B (2004) An ontology for video event representation. In: 2004 Conference on computer vision and pattern recognition workshop, IEEE, pp 119–119 Nevatia R, Hobbs J, Bolles B (2004) An ontology for video event representation. In: 2004 Conference on computer vision and pattern recognition workshop, IEEE, pp 119–119
147.
go back to reference Ryoo MS, Aggarwal JK (2006) Recognition of composite human activities through context-free grammar based representation. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR'06), vol 2: IEEE, pp 1709–1718 Ryoo MS, Aggarwal JK (2006) Recognition of composite human activities through context-free grammar based representation. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR'06), vol 2: IEEE, pp 1709–1718
148.
go back to reference Pinhanez CS, Bobick AF (1998) Human action detection using pnf propagation of temporal constraints. In: Proceedings. 1998 IEEE computer society conference on computer vision and pattern recognition (Cat. No. 98CB36231), IEEE, pp 898–904 Pinhanez CS, Bobick AF (1998) Human action detection using pnf propagation of temporal constraints. In: Proceedings. 1998 IEEE computer society conference on computer vision and pattern recognition (Cat. No. 98CB36231), IEEE, pp 898–904
149.
go back to reference Ghanem N, De Menthon D, Doermann D, Davis L (2004) Representation and recognition of events in surveillance video using petri nets. In: 2004 conference on computer vision and pattern recognition workshop, IEEE, pp 112–112 Ghanem N, De Menthon D, Doermann D, Davis L (2004) Representation and recognition of events in surveillance video using petri nets. In: 2004 conference on computer vision and pattern recognition workshop, IEEE, pp 112–112
150.
go back to reference Intille SS, Bobick AF (1999) A framework for recognizing multi-agent action from visual evidence. AAAI/IAAI 99(518–525):2 Intille SS, Bobick AF (1999) A framework for recognizing multi-agent action from visual evidence. AAAI/IAAI 99(518–525):2
151.
go back to reference Siskind JM (2001) Grounding the lexical semantics of verbs in visual perception using force dynamics and event logic. J Artif Intell Res 15:31–90MATHCrossRef Siskind JM (2001) Grounding the lexical semantics of verbs in visual perception using force dynamics and event logic. J Artif Intell Res 15:31–90MATHCrossRef
152.
go back to reference Tran SD, Davis LS (2008) Event modeling and recognition using markov logic networks. European conference on computer vision. Springer, Berlin, pp 610–623 Tran SD, Davis LS (2008) Event modeling and recognition using markov logic networks. European conference on computer vision. Springer, Berlin, pp 610–623
153.
go back to reference Morariu VI, Davis LS (2011) Multi-agent event recognition in structured scenarios. In: CVPR 2011, IEEE, pp 3289–3296 Morariu VI, Davis LS (2011) Multi-agent event recognition in structured scenarios. In: CVPR 2011, IEEE, pp 3289–3296
154.
go back to reference Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of the IEEE international conference on computer vision, pp 3551–3558 Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of the IEEE international conference on computer vision, pp 3551–3558
155.
go back to reference Kang L, Ye P, Li Y, Doermann D (2014) Convolutional neural networks for no-reference image quality assessment. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1733–1740 Kang L, Ye P, Li Y, Doermann D (2014) Convolutional neural networks for no-reference image quality assessment. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1733–1740
156.
157.
go back to reference Shao L, Ji L, Liu Y, Zhang J (2012) Human action segmentation and recognition via motion and shape analysis. Pattern Recogn Lett 33(4):438–445CrossRef Shao L, Ji L, Liu Y, Zhang J (2012) Human action segmentation and recognition via motion and shape analysis. Pattern Recogn Lett 33(4):438–445CrossRef
158.
go back to reference Marĉelja S (1980) Mathematical description of the responses of simple cortical cells. JOSA 70(11):1297–1300CrossRef Marĉelja S (1980) Mathematical description of the responses of simple cortical cells. JOSA 70(11):1297–1300CrossRef
159.
go back to reference Primer A, Burrus CS, Gopinath RA (1998) Introduction to wavelets and wavelet transforms. Prentice Hall, Upper Saddle River Primer A, Burrus CS, Gopinath RA (1998) Introduction to wavelets and wavelet transforms. Prentice Hall, Upper Saddle River
160.
161.
go back to reference Guha T, Ward RK (2011) Learning sparse representations for human action recognition. IEEE Trans Pattern Anal Mach Intell 34(8):1576–1588CrossRef Guha T, Ward RK (2011) Learning sparse representations for human action recognition. IEEE Trans Pattern Anal Mach Intell 34(8):1576–1588CrossRef
162.
go back to reference Zheng J, Jiang Z, Phillips PJ, Chellappa R (2012) Cross-view action recognition via a transferable dictionary pair. BMVC 1:7 Zheng J, Jiang Z, Phillips PJ, Chellappa R (2012) Cross-view action recognition via a transferable dictionary pair. BMVC 1:7
163.
go back to reference Zhu F, Shao L (2014) Weakly-supervised cross-domain dictionary learning for visual recognition. Int J Comput Vision 109(1–2):42–59MATHCrossRef Zhu F, Shao L (2014) Weakly-supervised cross-domain dictionary learning for visual recognition. Int J Comput Vision 109(1–2):42–59MATHCrossRef
164.
go back to reference Kim H-J, Lee JS, Yang H-S (2007) Human action recognition using a modified convolutional neural network. International symposium on neural networks. Springer, Berlin, pp 715–723 Kim H-J, Lee JS, Yang H-S (2007) Human action recognition using a modified convolutional neural network. International symposium on neural networks. Springer, Berlin, pp 715–723
165.
go back to reference Jones JP, Palmer LA (1987) An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. J Neurophysiol 58(6):1233–1258CrossRef Jones JP, Palmer LA (1987) An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. J Neurophysiol 58(6):1233–1258CrossRef
166.
go back to reference Kim H-J, Lee J, Yang H-S (2006) A weighted FMM neural network and its application to face detection. International conference on neural information processing. Springer, Berlin, pp 177–186CrossRef Kim H-J, Lee J, Yang H-S (2006) A weighted FMM neural network and its application to face detection. International conference on neural information processing. Springer, Berlin, pp 177–186CrossRef
167.
go back to reference Jhuang H, Serre T, Wolf L, Poggio T (2007) A biologically inspired system for action recognition. In: 2007 IEEE 11th international conference on computer vision, IEEE, pp 1–8 Jhuang H, Serre T, Wolf L, Poggio T (2007) A biologically inspired system for action recognition. In: 2007 IEEE 11th international conference on computer vision, IEEE, pp 1–8
168.
go back to reference Shao L, Liu L, Li X (2013) Feature learning for image classification via multiobjective genetic programming. IEEE Trans Neural Netw Learn Syst 25(7):1359–1371CrossRef Shao L, Liu L, Li X (2013) Feature learning for image classification via multiobjective genetic programming. IEEE Trans Neural Netw Learn Syst 25(7):1359–1371CrossRef
169.
go back to reference Taylor GW, Hinton GE, Roweis ST (2007) Modeling human motion using binary latent variables. In: Advances in neural information processing systems, pp 1345–1352 Taylor GW, Hinton GE, Roweis ST (2007) Modeling human motion using binary latent variables. In: Advances in neural information processing systems, pp 1345–1352
170.
go back to reference Baum LE, Petrie T (1966) Statistical inference for probabilistic functions of finite state Markov chains. Ann Math Stat 37(6):1554–1563MATHCrossRef Baum LE, Petrie T (1966) Statistical inference for probabilistic functions of finite state Markov chains. Ann Math Stat 37(6):1554–1563MATHCrossRef
171.
go back to reference Ji S, Xu W, Yang M, Yu K (2012) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231CrossRef Ji S, Xu W, Yang M, Yu K (2012) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231CrossRef
172.
go back to reference LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRef LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRef
173.
go back to reference Le QV, Zou WY, Yeung SY, Ng AY (2011) Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: CVPR 2011, IEEE, pp 3361–3368 Le QV, Zou WY, Yeung SY, Ng AY (2011) Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: CVPR 2011, IEEE, pp 3361–3368
174.
go back to reference Hyvarinen A, Hurri J, Hoyer PO (2009) "A probabilistic approach to early computational vision. Nat Image Stat 2:666MATH Hyvarinen A, Hurri J, Hoyer PO (2009) "A probabilistic approach to early computational vision. Nat Image Stat 2:666MATH
175.
go back to reference Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2(1–3):37–52CrossRef Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2(1–3):37–52CrossRef
176.
go back to reference Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A (2011) Sequential deep learning for human action recognition. International workshop on human behavior understanding. Springer, Berlin, pp 29–39CrossRef Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A (2011) Sequential deep learning for human action recognition. International workshop on human behavior understanding. Springer, Berlin, pp 29–39CrossRef
177.
go back to reference Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229 Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:​1312.​6229
178.
go back to reference Jia Y et al. (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on Multimedia, ACM, pp 675–678 Jia Y et al. (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on Multimedia, ACM, pp 675–678
179.
go back to reference Ning F, Delhomme D, LeCun Y, Piano F, Bottou L, Barbano PE (2005) Toward automatic phenotyping of developing embryos from videos. IEEE Trans Image Process 14(9):1360–1371CrossRef Ning F, Delhomme D, LeCun Y, Piano F, Bottou L, Barbano PE (2005) Toward automatic phenotyping of developing embryos from videos. IEEE Trans Image Process 14(9):1360–1371CrossRef
180.
go back to reference Singh T, Vishwakarma DK (2021) A deeply coupled ConvNet for human activity recognition using dynamic and RGB images. Neural Comput Appl 33(1):469–485CrossRef Singh T, Vishwakarma DK (2021) A deeply coupled ConvNet for human activity recognition using dynamic and RGB images. Neural Comput Appl 33(1):469–485CrossRef
181.
go back to reference Yao L, Qian Y (2018) Dt-3dresnet-lstm: An architecture for temporal activity recognition in videos. Pacific Rim conference on multimedia. Springer, Berlin, pp 622–632 Yao L, Qian Y (2018) Dt-3dresnet-lstm: An architecture for temporal activity recognition in videos. Pacific Rim conference on multimedia. Springer, Berlin, pp 622–632
182.
go back to reference Meng B, Liu X, Wang X (2018) Human action recognition based on quaternion spatial-temporal convolutional neural network and LSTM in RGB videos. Multimedia Tools Appl 77(20):26901–26918CrossRef Meng B, Liu X, Wang X (2018) Human action recognition based on quaternion spatial-temporal convolutional neural network and LSTM in RGB videos. Multimedia Tools Appl 77(20):26901–26918CrossRef
183.
go back to reference Qi M, Qin J, Li A, Wang Y, Luo J, Van Gool L (2018) stagnet: an attentive semantic RNN for group activity recognition. In: Proceedings of the European conference on computer vision (ECCV), pp 101–117 Qi M, Qin J, Li A, Wang Y, Luo J, Van Gool L (2018) stagnet: an attentive semantic RNN for group activity recognition. In: Proceedings of the European conference on computer vision (ECCV), pp 101–117
184.
go back to reference Qi M, Wang Y, Qin J, Li A, Luo J, Van Gool L (2019) stagNet: an attentive semantic RNN for group activity and individual action recognition. IEEE Trans Circuits Syst Video Technol 30(2):549–565CrossRef Qi M, Wang Y, Qin J, Li A, Luo J, Van Gool L (2019) stagNet: an attentive semantic RNN for group activity and individual action recognition. IEEE Trans Circuits Syst Video Technol 30(2):549–565CrossRef
185.
go back to reference Muhammad K et al (2021) Human action recognition using attention based LSTM network with dilated CNN features. Futur Gener Comput Syst 125:820–830CrossRef Muhammad K et al (2021) Human action recognition using attention based LSTM network with dilated CNN features. Futur Gener Comput Syst 125:820–830CrossRef
186.
go back to reference He J-Y, Wu X, Cheng Z-Q, Yuan Z, Jiang Y-G (2021) DB-LSTM: Densely-connected Bi-directional LSTM for human action recognition. Neurocomputing 444:319–331CrossRef He J-Y, Wu X, Cheng Z-Q, Yuan Z, Jiang Y-G (2021) DB-LSTM: Densely-connected Bi-directional LSTM for human action recognition. Neurocomputing 444:319–331CrossRef
187.
go back to reference Hu K, Zheng F, Weng L, Ding Y, Jin J (2021) Action recognition algorithm of Spatio-temporal differential LSTM based on feature enhancement. Appl Sci 11(17):7876CrossRef Hu K, Zheng F, Weng L, Ding Y, Jin J (2021) Action recognition algorithm of Spatio-temporal differential LSTM based on feature enhancement. Appl Sci 11(17):7876CrossRef
188.
go back to reference Vaswani A et al. (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008 Vaswani A et al. (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
190.
go back to reference Plizzari C, Cannici M, Matteucci M (2021) Spatial temporal transformer network for skeleton-based action recognition. International conference on pattern recognition. Springer, Berlin, pp 694–701 Plizzari C, Cannici M, Matteucci M (2021) Spatial temporal transformer network for skeleton-based action recognition. International conference on pattern recognition. Springer, Berlin, pp 694–701
191.
go back to reference Mazzia V, Angarano S, Salvetti F, Angelini F, Chiaberge M (2021) Action transformer: a self-attention model for short-time human action recognition. arXiv preprint arXiv:2107.00606 Mazzia V, Angarano S, Salvetti F, Angelini F, Chiaberge M (2021) Action transformer: a self-attention model for short-time human action recognition. arXiv preprint arXiv:​2107.​00606
192.
go back to reference Ullah A, Muhammad K, Haq IU, Baik SW (2019) Action recognition using optimized deep autoencoder and CNN for surveillance data streams of non-stationary environments. Futur Gener Comput Syst 96:386–397CrossRef Ullah A, Muhammad K, Haq IU, Baik SW (2019) Action recognition using optimized deep autoencoder and CNN for surveillance data streams of non-stationary environments. Futur Gener Comput Syst 96:386–397CrossRef
193.
go back to reference Chong YS, Tay YH (2017) Abnormal event detection in videos using spatiotemporal autoencoder. International symposium on neural networks. Springer, Berlin, pp 189–196 Chong YS, Tay YH (2017) Abnormal event detection in videos using spatiotemporal autoencoder. International symposium on neural networks. Springer, Berlin, pp 189–196
194.
go back to reference Cui R, Hua G, Wu J (2020) AP-GAN: predicting skeletal activity to improve early activity recognition. J Vis Commun Image Represent 73:102923CrossRef Cui R, Hua G, Wu J (2020) AP-GAN: predicting skeletal activity to improve early activity recognition. J Vis Commun Image Represent 73:102923CrossRef
195.
196.
go back to reference Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4305–4314 Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4305–4314
197.
go back to reference Sánchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vision 105(3):222–245MATHCrossRef Sánchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vision 105(3):222–245MATHCrossRef
198.
go back to reference Gowda SN, Sevilla-Lara L, Keller F, Rohrbach M (2021) CLASTER: clustering with reinforcement learning for zero-shot action recognition. arXiv preprint arXiv:2101.07042 Gowda SN, Sevilla-Lara L, Keller F, Rohrbach M (2021) CLASTER: clustering with reinforcement learning for zero-shot action recognition. arXiv preprint arXiv:​2101.​07042
199.
go back to reference Liu K, Liu W, Ma H, Huang W, Dong X (2019) Generalized zero-shot learning for action recognition with web-scale video data. World Wide Web 22(2):807–824CrossRef Liu K, Liu W, Ma H, Huang W, Dong X (2019) Generalized zero-shot learning for action recognition with web-scale video data. World Wide Web 22(2):807–824CrossRef
201.
go back to reference Taylor GW, Fergus R, LeCun Y, Bregler C (2010) Convolutional learning of spatio-temporal features. European conference on computer vision. Springer, Berlin, pp 140–153 Taylor GW, Fergus R, LeCun Y, Bregler C (2010) Convolutional learning of spatio-temporal features. European conference on computer vision. Springer, Berlin, pp 140–153
202.
go back to reference Collobert R, Weston J (2008) A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th international conference on Machine learning, pp 160–167 Collobert R, Weston J (2008) A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th international conference on Machine learning, pp 160–167
203.
go back to reference Yan Y, Ricci E, Subramanian R, Liu G, Sebe N (2014) Multitask linear discriminant analysis for view invariant action recognition. IEEE Trans Image Process 23(12):5599–5611MATHCrossRef Yan Y, Ricci E, Subramanian R, Liu G, Sebe N (2014) Multitask linear discriminant analysis for view invariant action recognition. IEEE Trans Image Process 23(12):5599–5611MATHCrossRef
204.
go back to reference Yang Q (2009) Activity recognition: linking low-level sensors to high-level intelligence. In: Twenty-first international joint conference on artificial intelligence Yang Q (2009) Activity recognition: linking low-level sensors to high-level intelligence. In: Twenty-first international joint conference on artificial intelligence
205.
go back to reference Zheng VW, Hu DH, Yang Q (2009) Cross-domain activity recognition. In: Proceedings of the 11th international conference on Ubiquitous computing, pp 61–70 Zheng VW, Hu DH, Yang Q (2009) Cross-domain activity recognition. In: Proceedings of the 11th international conference on Ubiquitous computing, pp 61–70
206.
go back to reference Liu J, Shah M, Kuipers B, Savarese S (2011) Cross-view action recognition via view knowledge transfer. In: CVPR 2011, IEEE, pp 3209–3216 Liu J, Shah M, Kuipers B, Savarese S (2011) Cross-view action recognition via view knowledge transfer. In: CVPR 2011, IEEE, pp 3209–3216
207.
go back to reference Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1717–1724 Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1717–1724
208.
go back to reference Wang H, Schmid AC, Liu C-L (2011) Action recognition by dense trajectories. Proc IEEE Conf Comput Vis Pattern Recognit 2:3169–3176 Wang H, Schmid AC, Liu C-L (2011) Action recognition by dense trajectories. Proc IEEE Conf Comput Vis Pattern Recognit 2:3169–3176
209.
go back to reference Kliper-Gross O, Gurovich Y, Hassner T, Wolf L (2012) Motion interchange patterns for action recognition in unconstrained videos. European conference on computer vision. Springer, Berlin, pp 256–269 Kliper-Gross O, Gurovich Y, Hassner T, Wolf L (2012) Motion interchange patterns for action recognition in unconstrained videos. European conference on computer vision. Springer, Berlin, pp 256–269
210.
go back to reference Oneata D, Verbeek J, Schmid C (2013) Action and event recognition with fisher vectors on a compact feature set. In: Proceedings of the IEEE international conference on computer vision, pp 1817–1824 Oneata D, Verbeek J, Schmid C (2013) Action and event recognition with fisher vectors on a compact feature set. In: Proceedings of the IEEE international conference on computer vision, pp 1817–1824
211.
go back to reference Jain M, Jégou H, Bouthemy P (2013) Better exploiting motion for better action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2555–2562 Jain M, Jégou H, Bouthemy P (2013) Better exploiting motion for better action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2555–2562
212.
go back to reference Peng X, Zou C, Qiao Y, Peng Q (2014) Action recognition with stacked fisher vectors. European conference on computer vision. Springer, Berlin, pp 581–595 Peng X, Zou C, Qiao Y, Peng Q (2014) Action recognition with stacked fisher vectors. European conference on computer vision. Springer, Berlin, pp 581–595
213.
214.
go back to reference Sun L, Jia K, Yeung D-Y, Shi BE (2015) Human action recognition using factorized spatio-temporal convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4597–4605 Sun L, Jia K, Yeung D-Y, Shi BE (2015) Human action recognition using factorized spatio-temporal convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4597–4605
215.
216.
go back to reference Yue-Hei Ng J, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: deep networks for video classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4694–4702 Yue-Hei Ng J, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: deep networks for video classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4694–4702
217.
go back to reference Fernando B, Gavves E, Oramas JM, Ghodrati A, Tuytelaars T (2015) Modeling video evolution for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5378–5387 Fernando B, Gavves E, Oramas JM, Ghodrati A, Tuytelaars T (2015) Modeling video evolution for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5378–5387
218.
go back to reference Donahue J et al. (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2625–2634 Donahue J et al. (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2625–2634
219.
go back to reference Jiang Y-G, Dai Q, Liu W, Xue X, Ngo C-W (2015) Human action recognition in unconstrained videos by explicit motion modeling. IEEE Trans Image Process 24(11):3781–3795MATHCrossRef Jiang Y-G, Dai Q, Liu W, Xue X, Ngo C-W (2015) Human action recognition in unconstrained videos by explicit motion modeling. IEEE Trans Image Process 24(11):3781–3795MATHCrossRef
220.
go back to reference Lan Z, Lin M, Li X, Hauptmann AG, Raj B (2015) Beyond gaussian pyramid: Multi-skip feature stacking for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 204–212 Lan Z, Lin M, Li X, Hauptmann AG, Raj B (2015) Beyond gaussian pyramid: Multi-skip feature stacking for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 204–212
221.
go back to reference Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497 Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497
222.
go back to reference Fernando B, Gould S (2016) Learning end-to-end video classification with rank-pooling. In: International conference on machine learning, PMLR, pp 1187–1196 Fernando B, Gould S (2016) Learning end-to-end video classification with rank-pooling. In: International conference on machine learning, PMLR, pp 1187–1196
223.
go back to reference Fernando B, Anderson P, Hutter M, Gould S (2016) Discriminative hierarchical rank pooling for activity recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1924–1932 Fernando B, Anderson P, Hutter M, Gould S (2016) Discriminative hierarchical rank pooling for activity recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1924–1932
224.
go back to reference Li Y, Li W, Mahadevan V, Vasconcelos N (2016) Vlad3: encoding dynamics of deep features for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1951–1960 Li Y, Li W, Mahadevan V, Vasconcelos N (2016) Vlad3: encoding dynamics of deep features for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1951–1960
225.
go back to reference Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1933–1941 Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1933–1941
226.
go back to reference Varol G, Laptev I, Schmid C (2017) Long-term temporal convolutions for action recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1510–1517CrossRef Varol G, Laptev I, Schmid C (2017) Long-term temporal convolutions for action recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1510–1517CrossRef
227.
go back to reference Singh D, Mohan CK (2017) Graph formulation of video activities for abnormal activity recognition. Pattern Recogn 65:265–272CrossRef Singh D, Mohan CK (2017) Graph formulation of video activities for abnormal activity recognition. Pattern Recogn 65:265–272CrossRef
228.
go back to reference Carmona JM, Climent J (2018) Human action recognition by means of subtensor projections and dense trajectories. Pattern Recogn 81:443–455CrossRef Carmona JM, Climent J (2018) Human action recognition by means of subtensor projections and dense trajectories. Pattern Recogn 81:443–455CrossRef
229.
go back to reference Mao F, Wu X, Xue H, Zhang R (2018) Hierarchical video frame sequence representation with deep convolutional graph network. In: Proceedings of the European conference on computer vision (ECCV) workshops, pp 0–0 Mao F, Wu X, Xue H, Zhang R (2018) Hierarchical video frame sequence representation with deep convolutional graph network. In: Proceedings of the European conference on computer vision (ECCV) workshops, pp 0–0
230.
go back to reference Siddiqi MH, Alruwaili M, Ali A (2019) A novel feature selection method for video-based human activity recognition systems. IEEE Access 7:119593–119602CrossRef Siddiqi MH, Alruwaili M, Ali A (2019) A novel feature selection method for video-based human activity recognition systems. IEEE Access 7:119593–119602CrossRef
231.
go back to reference Zhang Y, Po LM, Liu M, Rehman YAU, Ou W, Zhao Y (2020) Data-level information enhancement: motion-patch-based Siamese convolutional neural networks for human activity recognition in videos. Expert Syst Appl 147:113203CrossRef Zhang Y, Po LM, Liu M, Rehman YAU, Ou W, Zhao Y (2020) Data-level information enhancement: motion-patch-based Siamese convolutional neural networks for human activity recognition in videos. Expert Syst Appl 147:113203CrossRef
232.
go back to reference Arzani MM, Fathy M, Azirani AA, Adeli E (2020) Switching structured prediction for simple and complex human activity recognition. IEEE Trans Cybern 6:7777 Arzani MM, Fathy M, Azirani AA, Adeli E (2020) Switching structured prediction for simple and complex human activity recognition. IEEE Trans Cybern 6:7777
233.
go back to reference Gowda SN, Rohrbach M, Sevilla-Lara L (2020) SMART frame selection for action recognition. arXiv e-prints, p. arXiv:2012.10671 Gowda SN, Rohrbach M, Sevilla-Lara L (2020) SMART frame selection for action recognition. arXiv e-prints, p. arXiv:2012.10671
234.
go back to reference Wharton Z, Behera A, Liu Y, Bessis N (2021) Coarse temporal attention network (cta-net) for driver's activity recognition. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1279–1289 Wharton Z, Behera A, Liu Y, Bessis N (2021) Coarse temporal attention network (cta-net) for driver's activity recognition. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1279–1289
235.
go back to reference Ullah A, Muhammad K, Ding W, Palade V, Haq IU, Baik SW (2021) Efficient activity recognition using lightweight CNN and DS-GRU network for surveillance applications. Appl Soft Comput 103:107102CrossRef Ullah A, Muhammad K, Ding W, Palade V, Haq IU, Baik SW (2021) Efficient activity recognition using lightweight CNN and DS-GRU network for surveillance applications. Appl Soft Comput 103:107102CrossRef
236.
go back to reference Khan MA et al (2021) A fused heterogeneous deep neural network and robust feature selection framework for human actions recognition. Arabian J Sci Eng 6:1–16 Khan MA et al (2021) A fused heterogeneous deep neural network and robust feature selection framework for human actions recognition. Arabian J Sci Eng 6:1–16
237.
go back to reference Ullah A, Muhammad K, Hussain T, Baik SW (2021) Conflux LSTMs network: a novel approach for multi-view action recognition. Neurocomputing 435:321–329CrossRef Ullah A, Muhammad K, Hussain T, Baik SW (2021) Conflux LSTMs network: a novel approach for multi-view action recognition. Neurocomputing 435:321–329CrossRef
238.
go back to reference Reinolds F, Neto C, Machado J (2022) Deep learning for activity recognition using audio and video. Electronics 11(5):782CrossRef Reinolds F, Neto C, Machado J (2022) Deep learning for activity recognition using audio and video. Electronics 11(5):782CrossRef
239.
go back to reference Siddiqi MH, Alsirhani A (2022) An efficient feature selection method for video-based activity recognition systems. Math Problems Eng 2022:66689CrossRef Siddiqi MH, Alsirhani A (2022) An efficient feature selection method for video-based activity recognition systems. Math Problems Eng 2022:66689CrossRef
240.
go back to reference Khare M, Jeon M (2022) Multi-resolution approach to human activity recognition in video sequence based on combination of complex wavelet transform, Local Binary Pattern and Zernike moment. Multimedia Tools Appl 2:1–30 Khare M, Jeon M (2022) Multi-resolution approach to human activity recognition in video sequence based on combination of complex wavelet transform, Local Binary Pattern and Zernike moment. Multimedia Tools Appl 2:1–30
241.
go back to reference Deotale D et al (2022) HARTIV: human activity recognition using temporal information in videos. CMC-Comput Mater Continua 70(2):3919–3938CrossRef Deotale D et al (2022) HARTIV: human activity recognition using temporal information in videos. CMC-Comput Mater Continua 70(2):3919–3938CrossRef
242.
243.
go back to reference Ahmed N, Asif HMS, Khalid H (2021) PIQI: perceptual image quality index based on ensemble of Gaussian process regression. Multimedia Tools Appl 80(10):15677–15700CrossRef Ahmed N, Asif HMS, Khalid H (2021) PIQI: perceptual image quality index based on ensemble of Gaussian process regression. Multimedia Tools Appl 80(10):15677–15700CrossRef
245.
go back to reference Ahmed N, Asif HS, Bhatti AR, Khan A (2022) Deep ensembling for perceptual image quality assessment. Soft Comput 2:1–22 Ahmed N, Asif HS, Bhatti AR, Khan A (2022) Deep ensembling for perceptual image quality assessment. Soft Comput 2:1–22
246.
go back to reference Ahmed N, Asif HMS (2020) Perceptual quality assessment of digital images using deep features. Comput Inform 39(3):385–409MATHCrossRef Ahmed N, Asif HMS (2020) Perceptual quality assessment of digital images using deep features. Comput Inform 39(3):385–409MATHCrossRef
247.
go back to reference Alzantot M, Chakraborty S, Srivastava M (2017) Sensegen: a deep learning architecture for synthetic sensor data generation. In: 2017 IEEE international conference on pervasive computing and communications workshops (PerCom Workshops), IEEE, pp 188–193 Alzantot M, Chakraborty S, Srivastava M (2017) Sensegen: a deep learning architecture for synthetic sensor data generation. In: 2017 IEEE international conference on pervasive computing and communications workshops (PerCom Workshops), IEEE, pp 188–193
Metadata
Title
Toward human activity recognition: a survey
Authors
Gulshan Saleem
Usama Ijaz Bajwa
Rana Hammad Raza
Publication date
20-10-2022
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 5/2023
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-022-07937-4

Other articles of this Issue 5/2023

Neural Computing and Applications 5/2023 Go to the issue

Premium Partner