Skip to main content
Erschienen in: Wireless Personal Communications 2/2023

12.03.2023

Multi-view Multi-modal Approach Based on 5S-CNN and BiLSTM Using Skeleton, Depth and RGB Data for Human Activity Recognition

verfasst von: Rahul Kumar, Shailender Kumar

Erschienen in: Wireless Personal Communications | Ausgabe 2/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Recognition of human activity is a challenging issue, especially in the presence of multiple actions and multiple scenarios. Therefore, in this paper, multi-view multi-modal based human action recognition (HAR) is proposed. Here, initially, motion representation of each image such as Depth motion maps, motion history images, and skeleton images are created from depth, RGB, and skeleton data of RGB-D sensor. After the motion representation, each motion is separately trained by using a 5-stack convolution neural network (5S-CNN). To enhance the recognition rate and accuracy, the skeleton representation is trained using a hybrid 5S-CNN and Bi-LSTM classifier. Then, decision-level fusion is applied to fuse the score value of three motions. Finally, based on the fusion value, the activity of humans is identified. To estimate the efficiency of the suggested 5S-CNN with the Bi-LSTM method, we conduct our experiments using UTD-MHAD. Results show that the suggested HAR method attained better than other existing approaches.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Aggarwal, J. K., & Ryoo, M. S. (2011). Human activity analysis: A review. ACM Computing Surveys (CSUR), 43(3), 16.CrossRef Aggarwal, J. K., & Ryoo, M. S. (2011). Human activity analysis: A review. ACM Computing Surveys (CSUR), 43(3), 16.CrossRef
2.
Zurück zum Zitat Wang, P., Li, W., Gao, Z., Tang, C., Zhang, J., & Ogunbona, P. (2015). Convnets-based action recognition from depth maps through virtual cameras and pseudocoloring. In Proceedings of the 23rd ACM international conference on Multimedia, ACM, pp. 1119–1122. Wang, P., Li, W., Gao, Z., Tang, C., Zhang, J., & Ogunbona, P. (2015). Convnets-based action recognition from depth maps through virtual cameras and pseudocoloring. In Proceedings of the 23rd ACM international conference on Multimedia, ACM, pp. 1119–1122.
3.
Zurück zum Zitat Wang, P., Li, W., Gao, Z., Zhang, J., Tang, C., & Ogunbona, P. O. (2016). Action recognition from depth maps using deep convolutional neural networks. IEEE Transactions on Human-Machine Systems, 46(4), 498–509.CrossRef Wang, P., Li, W., Gao, Z., Zhang, J., Tang, C., & Ogunbona, P. O. (2016). Action recognition from depth maps using deep convolutional neural networks. IEEE Transactions on Human-Machine Systems, 46(4), 498–509.CrossRef
4.
Zurück zum Zitat Wang, P., Li, W., Ogunbona, P., Gao, Z., & Zhang, H. (2014). Mining mid-level features for action recognition based on effective skeleton representation. In Digital lmage computing: techniques and applications (DlCTA), 2014 international conference on, IEEE, pp. 1–8. Wang, P., Li, W., Ogunbona, P., Gao, Z., & Zhang, H. (2014). Mining mid-level features for action recognition based on effective skeleton representation. In Digital lmage computing: techniques and applications (DlCTA), 2014 international conference on, IEEE, pp. 1–8.
5.
Zurück zum Zitat Vemulapalli, R., Arrate, F., & Chellappa, R. (2014). Human action recognition by representing 3d skeletons as points in a lie group. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 588–595. Vemulapalli, R., Arrate, F., & Chellappa, R. (2014). Human action recognition by representing 3d skeletons as points in a lie group. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 588–595.
6.
Zurück zum Zitat Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for largescale image recognition, arXiv preprint arXiv:1409.1556, ICLR, pp. 1–10. Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for largescale image recognition, arXiv preprint arXiv:​1409.​1556, ICLR, pp. 1–10.
7.
Zurück zum Zitat Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014). Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1725–1732. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014). Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1725–1732.
8.
Zurück zum Zitat Simonyan, K., & Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. In Advances in neural information processing systems, pp. 568–576. Simonyan, K., & Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. In Advances in neural information processing systems, pp. 568–576.
9.
Zurück zum Zitat Luo, J., Wang, W., & Qi, H. (2014). Spatio-temporal feature extraction and representation for RGB-D human action recognition. Pattern Recognition Letters, 50, 139–148.CrossRef Luo, J., Wang, W., & Qi, H. (2014). Spatio-temporal feature extraction and representation for RGB-D human action recognition. Pattern Recognition Letters, 50, 139–148.CrossRef
10.
Zurück zum Zitat Chen, C., Jafari, R., & Kehtarnavaz, N. (2015). Improving human action recognition using fusion of depth camera and inertial sensors. IEEE Transactions on Human-Machine Systems, 45(1), 51–61.CrossRef Chen, C., Jafari, R., & Kehtarnavaz, N. (2015). Improving human action recognition using fusion of depth camera and inertial sensors. IEEE Transactions on Human-Machine Systems, 45(1), 51–61.CrossRef
11.
Zurück zum Zitat El Madany, N. E. D., He, Y., & Guan, L. (2016). Human action recognition via Multiview discriminative analysis of canonical correlations. In: Image processing (ICIP), 2016 IEEE international conference on, IEEE, pp. 4170–4174. El Madany, N. E. D., He, Y., & Guan, L. (2016). Human action recognition via Multiview discriminative analysis of canonical correlations. In: Image processing (ICIP), 2016 IEEE international conference on, IEEE, pp. 4170–4174.
12.
Zurück zum Zitat Verma, P., Sah, A., & Srivastava, R. (2020). Deep learning-based multi-modal approach using RGB and skeleton sequences for human activity recognition. Multimedia Systems, 26(6), 671–685.CrossRef Verma, P., Sah, A., & Srivastava, R. (2020). Deep learning-based multi-modal approach using RGB and skeleton sequences for human activity recognition. Multimedia Systems, 26(6), 671–685.CrossRef
13.
Zurück zum Zitat Wang, P., Li, W., Gao, Z., Zhang, J., Tang, C., & Ogunbona, P. O. (2015). Action recognition from depth maps using deep convolutional neural networks. IEEE Transactions on Human-Machine Systems, 46(4), 498–509.CrossRef Wang, P., Li, W., Gao, Z., Zhang, J., Tang, C., & Ogunbona, P. O. (2015). Action recognition from depth maps using deep convolutional neural networks. IEEE Transactions on Human-Machine Systems, 46(4), 498–509.CrossRef
14.
Zurück zum Zitat Chen, C., Jafari, R., & Kehtarnavaz, N. (2016). Fusion of depth, skeleton, and inertial data for human action recognition. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 2712–2716. IEEE. Chen, C., Jafari, R., & Kehtarnavaz, N. (2016). Fusion of depth, skeleton, and inertial data for human action recognition. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 2712–2716. IEEE.
15.
Zurück zum Zitat Escobedo, E., & Camara, G. (2016). A new approach for dynamic gesture recognition using skeleton trajectory representation and histograms of cumulative magnitudes. In 2016 29th SIBGRAPI conference on graphics, patterns and images (SIBGRAPI), pp. 209–216. IEEE. Escobedo, E., & Camara, G. (2016). A new approach for dynamic gesture recognition using skeleton trajectory representation and histograms of cumulative magnitudes. In 2016 29th SIBGRAPI conference on graphics, patterns and images (SIBGRAPI), pp. 209–216. IEEE.
16.
Zurück zum Zitat Gaglio, S., Re, G. L., & Morana, M. (2014). Human activity recognition process using 3-D posture data. IEEE Transactions on Human-Machine Systems, 45(5), 586–597.CrossRef Gaglio, S., Re, G. L., & Morana, M. (2014). Human activity recognition process using 3-D posture data. IEEE Transactions on Human-Machine Systems, 45(5), 586–597.CrossRef
17.
Zurück zum Zitat Khaire, P., Kumar, P., & Imran, J. (2018). Combining CNN streams of RGB-D and skeletal data for human activity recognition. Pattern Recognition Letters, 115, 107–116.CrossRef Khaire, P., Kumar, P., & Imran, J. (2018). Combining CNN streams of RGB-D and skeletal data for human activity recognition. Pattern Recognition Letters, 115, 107–116.CrossRef
18.
Zurück zum Zitat Guo, J., Bai, H., Tang, Z., Xu, P., Gan, D., & Liu, B. (2020). Multi modal human action recognition for video content matching. Multimedia Tools and Applications, 79, 34665–34683. Guo, J., Bai, H., Tang, Z., Xu, P., Gan, D., & Liu, B. (2020). Multi modal human action recognition for video content matching. Multimedia Tools and Applications, 79, 34665–34683.
19.
Zurück zum Zitat Tran, T.-H., Tran, H.-N., & Doan, H.-G. (2019). Dynamic hand gesture recognition from multi-modal streams using deep neural network. In International conference on multi-disciplinary trends in artificial intelligence. Springer, Cham, pp. 156–167. Tran, T.-H., Tran, H.-N., & Doan, H.-G. (2019). Dynamic hand gesture recognition from multi-modal streams using deep neural network. In International conference on multi-disciplinary trends in artificial intelligence. Springer, Cham, pp. 156–167.
20.
Zurück zum Zitat Nie, W., Yan, Y., Song, D., & Wang, K. (2020). Multi-modal feature fusion based on multi-layers LSTM for video emotion recognition. Multimedia Tools and Applications, 80, 1–10. Nie, W., Yan, Y., Song, D., & Wang, K. (2020). Multi-modal feature fusion based on multi-layers LSTM for video emotion recognition. Multimedia Tools and Applications, 80, 1–10.
21.
Zurück zum Zitat Khowaja, S. A., & Lee, S.-L. (2020). Hybrid and hierarchical fusion networks: a deep cross-modal learning architecture for action recognition. Neural Computing and Applications, 32(14), 10423–10434.CrossRef Khowaja, S. A., & Lee, S.-L. (2020). Hybrid and hierarchical fusion networks: a deep cross-modal learning architecture for action recognition. Neural Computing and Applications, 32(14), 10423–10434.CrossRef
22.
Zurück zum Zitat Chen, C., Jafari, R., & Kehtarnavaz, N. (2015). UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In Image processing (ICIP), 2015 IEEE international conference on, IEEE, pp. 168–172. Chen, C., Jafari, R., & Kehtarnavaz, N. (2015). UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In Image processing (ICIP), 2015 IEEE international conference on, IEEE, pp. 168–172.
23.
Zurück zum Zitat Bulbul, M. F., Jiang, Y., & Ma, J. (2015). Dmms-based multiple features fusion for human action recognition. International Journal of Multimedia Data Engineering and Management (IJMDEM), 6(4), 23–39.CrossRef Bulbul, M. F., Jiang, Y., & Ma, J. (2015). Dmms-based multiple features fusion for human action recognition. International Journal of Multimedia Data Engineering and Management (IJMDEM), 6(4), 23–39.CrossRef
24.
Zurück zum Zitat Annadani, Y., Rakshith, D., & Biswas, S. (2016). Sliding dictionary based sparse representation for action recognition, arXiv preprint arXiv:1611.00218, 1–7. Annadani, Y., Rakshith, D., & Biswas, S. (2016). Sliding dictionary based sparse representation for action recognition, arXiv preprint arXiv:​1611.​00218, 1–7.
25.
Zurück zum Zitat Escobedo, E., & Camara, G. (2016). A new approach for dynamic gesture recognition using skeleton trajectory representation and histograms of cumulative magnitudes. In Graphics, patterns and images (SIBGRAPI), 2016 29th SIBGRAPI conference on, IEEE, pp. 209–216. Escobedo, E., & Camara, G. (2016). A new approach for dynamic gesture recognition using skeleton trajectory representation and histograms of cumulative magnitudes. In Graphics, patterns and images (SIBGRAPI), 2016 29th SIBGRAPI conference on, IEEE, pp. 209–216.
Metadaten
Titel
Multi-view Multi-modal Approach Based on 5S-CNN and BiLSTM Using Skeleton, Depth and RGB Data for Human Activity Recognition
verfasst von
Rahul Kumar
Shailender Kumar
Publikationsdatum
12.03.2023
Verlag
Springer US
Erschienen in
Wireless Personal Communications / Ausgabe 2/2023
Print ISSN: 0929-6212
Elektronische ISSN: 1572-834X
DOI
https://doi.org/10.1007/s11277-023-10324-4

Weitere Artikel der Ausgabe 2/2023

Wireless Personal Communications 2/2023 Zur Ausgabe

Neuer Inhalt