nach oben

Pattern Analysis and Applications

Erschienen in:

01.06.2024 | Industrial and Commercial Application

Spatial–temporal attention with graph and general neural network-based sign language recognition

verfasst von: Abu Saleh Musa Miah, Md. Al Mehedi Hasan, Yuichi Okuyama, Yoichi Tomioka, Jungpil Shin

Erschienen in: Pattern Analysis and Applications | Ausgabe 2/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Automatic sign language recognition (SLR) stands as a vital aspect within the realms of human–computer interaction and computer vision, facilitating the conversion of hand signs utilized by individuals with significant hearing and speech impairments into equivalent text or voice. Researchers have recently used hand skeleton joint information instead of the image pixel due to light illumination and complex background-bound problems. However, besides the hand information, body motion and facial gestures play an essential role in expressing sign language emotion. Also, a few researchers have been working to develop an SLR system by taking a multi-gesture dataset, but their performance accuracy and time complexity are not sufficient. In light of these limitations, we introduce a spatial and temporal attention model amalgamated with a general neural network designed for the SLR system. The main idea of our architecture is first to construct a fully connected graph to project the skeleton information. We employ self-attention mechanisms to extract insights from node and edge features across spatial and temporal domains. Our architecture bifurcates into three branches: a graph-based spatial branch, a graph-based temporal branch, and a general neural network branch, which collectively synergize to contribute to the final feature integration. Specifically, the spatial branch discerns spatial dependencies, while the temporal branch amplifies temporal dependencies embedded within the sequential hand skeleton data. Further, the general neural network branch enhances the architecture’s generalization capabilities, bolstering its robustness. In our evaluation, utilizing the Mexican Sign Language (MSL), Pakistani Sign Language (PSL) datasets, and American Sign Language Large Video dataset which comprises 3D joint coordinates for face, body, and hands that conducted experiments on individual gestures and their combinations. Impressively, our model demonstrated notable efficacy, achieving an accuracy rate of 99.96% for the MSL dataset, 92.00% for PSL, and 26.00% for the ASLLVD dataset, which includes more than 2700 classes. These exemplary performance metrics, coupled with the model’s computationally efficient profile, underscore its preeminence compared to contemporaneous methodologies in the field.

Vorheriger Artikel CABF-YOLO: a precise and efficient deep learning method for defect detection on strip steel surface

Nächster Artikel Tiny polyp detection from endoscopic video frames using vision transformers

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Obi Y, Claudio KS, Budiman VM, Achmad S, Kurniawan A (2023) Sign language recognition system for communicating to people with disabilities. Proc Comput Sci 216:13–20. https://doi.org/10.1016/j.procs.2022.12.106CrossRef

Manning V, Murray JJ, Bloxs A (2022) Linguistic human rights in the work of the world federation of the deaf. In: The handbook of linguistic human rights. John Wiley & Sons, Ltd, pp 267–280CrossRef

Mejía-Peréz K, Córdova-Esparza DM, Terven J, Herrera-Navarro AM, García-Ramírez T, Ramírez-Pedraza A (2022) Automatic recognition of Mexican Sign Language using a depth camera and recurrent neural networks. Appl Sci 12(11):5523CrossRef

Miah ASM, Shin J, Hasan MAM, Rahim MA (2022) Bensignnet: Bengali sign language alphabet recognition using concatenated segmentation and convolutional neural network. Appl Sci 12(8):3933CrossRef

Zhang Z, Li Z, Liu H, Cao T, Liu S (2020) Data-driven online learning engagement detection via facial expression and mouse behavior recognition technology. J Educ Comput Res 58(1):63–86CrossRef

Rajan RG, Leo MJ (2020) American sign language alphabets recognition using hand crafted and deep learning features. In: 2020 international conference on inventive computation technologies (ICICT). IEEE, pp 430–434

Kudrinko K, Flavin E, Zhu X, Li Q (2020) Wearable sensor-based sign language recognition: a comprehensive review. IEEE Rev Biomed Eng 14:82–97CrossRef

Sharma S, Singh S (2020) Vision-based sign language recognition system: a comprehensive review. In: 2020 international conference on inventive computation technologies (ICICT). IEEE, pp 140–144

Shin J, Musa Miah AS, Hasan MAM, Hirooka K, Suzuki K, Lee H-S, Jang S-W (2023) Korean Sign Language recognition using transformer-based deep neural network. Appl Sci 13(5):3029CrossRef

10.

Miah ASM, Hasan MAM, Shin J, Okuyama Y, Tomioka Y (2023) Multistage spatial attention-based neural network for hand gesture recognition. Computers 12(1):13CrossRef

11.

Miah ASM, Hasan MAM, Shin J (2023) Dynamic hand gesture recognition using multi-branch attention based graph and general deep learning model. IEEE Access 11:4703CrossRef

12.

Gu Y, Sherrine Wei W, Li X, Yuan J, Todoh M (2022) American Sign Language alphabet recognition using inertial motion capture system with deep learning. Inventions 7(4):112CrossRef

13.

Abdullahi SB, Chamnongthai K (2022) American sign language words recognition of skeletal videos using processed video driven multi-stacked deep LSTM. Sensors 22(4):1406CrossRef

14.

De Smedt Q, Wannous H, Vandeborre JP, Guerry J, Le Saux B, Filliat D (2017) Shrec’17 track: 3d hand gesture recognition using a depth and skeletal dataset. In: 3DOR-10th Eurographics workshop on 3D object retrieval, pp 1–6

15.

Li C, Zhang X, Liao L, Jin L, Yang W (2019) Skeleton-based gesture recognition using several fully connected layers with path signature features and temporal transformer module. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8585–8593

16.

Hou J, Wang G, Chen X, Xue JH, Zhu R, Yang H (2018) Spatial-temporal attention res-TCN for skeleton-based dynamic hand gesture recognition. In: proceedings of the European conference on computer vision (ECCV) workshops, pp 0–0

17.

Lai K, Yanushkevich SN (2018) Cnn+ rnn depth and skeleton based dynamic hand gesture recognition. In: 2018 24th International Conference on Pattern Recognition (ICPR). IEEE, pp 3451–3456

18.

Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: proceedings of the AAAI conference on artificial intelligence, vol 32

19.

de Amorim, CC, Macêdo D, Zanchettin C (2019) Spatial–temporal graph convolutional networks for sign language recognition. In: artificial neural networks and machine learning–ICANN 2019: workshop and special sessions: 28th international conference on artificial neural networks, Munich, Germany, September 17–19, 2019, Proceedings 28, pp 646–657 Springer

20.

Jiang S, Sun B, Wang L, Bai Y, Li K, Fu Y (2021) Skeleton aware multi-modal sign language recognition. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3413–3423

21.

Jiang S, Sun B, Wang L, Bai Y, Li K, Fu Y (2021) Sign language recognition via skeleton-aware multi-model ensemble. arXiv preprint arXiv:2110.06161

22.

Dai Z, Yang Z, Yang Y, Carbonell J, Le QV, Salakhutdinov R (2019) Transformer-xl: attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860

23.

Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3146–3154

24.

Chen Y, Zhao L, Peng X, Yuan J, Metaxas DN (2019) Construct dynamic graphs for hand gesture recognition via spatial-temporal attention. arXiv preprint arXiv:1907.08871

25.

Cheng K, Zhang Y, Cao C, Shi L, Cheng J, Lu H (2020) Decoupling gcn with dropgraph module for skeleton-based action recognition. In: European conference on computer vision, Springer, pp 536–553

26.

Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI Conference on Artificial Intelligence

27.

28.

Zhou K, Huang X, Li Y, Zha D, Chen R, Hu X (2020) Towards deeper graph neural networks with differentiable group normalization. Adv Neural Inf Process Syst 33:4917–4928

29.

Al-Hammadi M, Bencherif MA, Alsulaiman M, Muhammad G, Mekhtiche MA, Abdul W, Alohali YA, Alrayes TS, Mathkour H, Faisal M (2022) Spatial attention-based 3d graph convolutional neural network for sign language recognition. Sensors 22(12):4558CrossRef

30.

Altuwaijri GA, Muhammad G, Altaheri H, Alsulaiman M (2022) A multi-branch convolutional neural network with squeeze-and-excitation attention blocks for eeg-based motor imagery signals classification. Diagnostics 12(4):995CrossRef

31.

Amin SU, Altaheri H, Muhammad G, Abdul W, Alsulaiman M (2021) Attention-inception and long-short-term memory-based electroencephalography classification for motor imagery tasks in rehabilitation. IEEE Trans Ind Inf 18(8):5412–5421CrossRef

32.

Miah ASM, Hasan MAM, Shin J, Rahim MA, Okuyama Y (2023) Rotation, translation and scale invariant sign word recognition using deep learning. Comput Syst Sci Eng 44(3):2521–2536CrossRef

33.

Miah ASM, Hasan MAM, Nishimura S, Shin J (2024) Sign Language recognition using graph and general deep neural network based on large scale dataset. IEEE Access 9(10):1–1. https://doi.org/10.1109/ACCESS.2024.3372425CrossRef

34.

Miah ASM, Shin J, Hasan MAM, Molla MKI, Okuyama Y, Tomioka Y (2022) Movie oriented positive negative emotion classification from eeg signal using wavelet transformation and machine learning approaches. In: 2022 IEEE 15th international symposium on embedded multicore/many-core systems-on-chip (MCSoC), pp 26–31. https://doi.org/10.1109/MCSoC57363.2022.00014

35.

Miah ASM, Shin J, Islam MM, Abdullah Molla MKI (2022) Natural human emotion recognition based on various mixed reality(mr) games and electroencephalography (eeg) signals. In: 2022 IEEE 5th Eurasian conference on educational innovation (ECEI), pp 408–411 https://doi.org/10.1109/ECEI53102.2022.9829482

36.

Piskozub J, Strumillo P (2022) Reducing the number of sensors in the data glove for recognition of static hand gestures. Appl Sci 12(15):7388CrossRef

37.

Ruvalcaba D, Ruvalcaba M, Orozco J, López R, Cañedo C (2018) Prototipo de guantes traductores de la lengua de señas mexicana para personas con discapacidad auditiva y del habla. In: Memorias del Congreso Nacional de Ingeniería Biomédica, vol 5, pp 350–353

38.

Saldaña González G, Cerezo Sánchez J, Bustillo Díaz MM, Ata Pérez A (2018) Recognition and classification of sign language for spanish. Computación y Sistemas 22(1):271–277CrossRef

39.

Varela-Santos H, Morales-Jiménez A, Córdova-Esparza D-M, Terven J, Mirelez-Delgado FD, Orenday-Delgado A (2021) Assistive device for the translation from Mexican Sign Language to verbal language. Computación y Sistemas 25(3):451–464CrossRef

40.

Hernández EC, Orozco JJM, Lozada DM, Saucedo AZ, Flores AB, López VEB, Raggi SEA (2018) Sistema de reconocimiento de vocales de la lengua de señas mexicana. Pistas Educativas 39(128), Technologico nacional de Mexico

41.

Estrivero-Chavez C, Contreras-Teran M, Miranda-Hernandez J, Cardenas-Cornejo J, Ibarra-Manzano M, Almanza-Ojeda D (2019) Toward a Mexican Sign Language system using human computer interface. In: 2019 international conference on mechatronics, electronics and automotive engineering (ICMEAE). IEEE, pp 13–17

42.

Unutmaz B, Karaca AC, Güllü MK (2019) Turkish sign language recognition using kinect skeleton and convolutional neural network. In: 2019 27th signal processing and communications applications conference (SIU). IEEE, pp 1–4

43.

Raghuveera T, Deepthi R, Mangalashri R, Akshaya R (2020) A depth-based Indian sign language recognition using microsoft kinect. Sādhanā 45(1):1–13CrossRef

44.

Khan M, Siddiqui N (2020)Sign language translation in urdu/hindi through microsoft kinect. In: IOP conference series: materials science and engineering, vol 899. IOP Publishing, p 012016

45.

Xiao Q, Qin M, Yin Y (2020) Skeleton-based Chinese Sign Language recognition and generation for bidirectional communication between deaf and hearing people. Neural Netw 125:41–55CrossRef

46.

Jing L, Vahdani E, Huenerfauth M, Tian Y (2019) Recognizing american sign language manual signs from rgb-d videos. arXiv preprint arXiv:1906.02851

47.

Gutiérrez MM, Rojano-Cáceres JR, Patiño IEB, Pérez FJ (2016) Identificación de lengua de señas mediante técnicas de procesamiento de imágenes. Adv Intell Technol Appl 121(1):121–129

48.

Solís F, Martínez D, Espinoza O (2016) Automatic Mexican Sign Language recognition using normalized moments and artificial neural networks. Engineering 8(10):733CrossRef

49.

Pérez LM, Rosales AJ, Gallegos FJ, Barba AV (2017) LSM static signs recognition using image processing. In: 2017 14th international conference on electrical engineering, computing science and automatic control (CCE). IEEE, pp 1–5

50.

Morales EM, Aparicio OV, Arguijo P, Armenta RÁM, López AHV (2019) Traducción del lenguaje de señas usando visión por computadora. Res Comput Sci 148(8):79–89CrossRef

51.

Martinez-Seis B, Pichardo-Lagunas O, Rodriguez-Aguilar E, Saucedo-Diaz E-R (2019) Identification of static and dynamic signs of the Mexican Sign Language alphabet for smartphones using deep learning and image processing. Res Comput Sci 148(11):199–211CrossRef

52.

Solís F, Toxqui C, Martínez D (2015) Mexican sign language recognition using Jacobi–Fourier moments. Engineering 7(10):700CrossRef

53.

Cervantes J, García-Lamont F, Rodríguez-Mazahua L, Rendon AY, Chau AL (2016) Recognition of Mexican Sign Language from frames in video sequences. In: international conference on intelligent computing. Springer, pp 353–362

54.

Adhikary S, Talukdar AK, Sarma KK (2021) A vision-based system for recognition of words used in Indian Sign Language using mediapipe. In: 2021 sixth international conference on image information processing (ICIIP), vol 6. IEEE, pp 390–394

55.

Pigou L, Van Den Oord A, Dieleman S, Van Herreweghe M, Dambre J (2018) Beyond temporal pooling: recurrence and temporal convolutions for gesture recognition in video. Int J Comput Vision 126:430–439MathSciNetCrossRef

56.

Chen X, Gao K (2018) Denseimage network: video spatial-temporal evolution encoding and understanding. arXiv preprint arXiv:1805.07550

57.

Liu Y, Jiang D, Duan H, Sun Y, Li G, Tao B, Yun J, Liu Y, Chen B (2021) Dynamic gesture recognition algorithm based on 3d convolutional neural network. Comput Intell Neurosci 2021:4828102

58.

Al-Hammadi M, Muhammad G, Abdul W, Alsulaiman M, Bencherif MA, Alrayes TS, Mathkour H, Mekhtiche MA (2020) Deep learning-based approach for sign language gesture recognition with efficient hand gesture representation. IEEE Access 8:192527–192542CrossRef

59.

Qin W, Mei X, Chen Y, Zhang Q, Yao Y, Hu S (2021) Sign language recognition and translation method based on vtn. In: 2021 international conference on digital society and intelligent systems (DSInS). IEEE, pp 111–115

60.

Martínez-Gutiérrez ME, Rojano-Cáceres JR, Benítez-Guerrero E, Sánchez-Barrera HE (2019) Data acquisition software for sign language recognition. Res Comput Sci 148(3):205–211CrossRef

61.

Shin J, Matsuoka A, Hasan MAM, Srizon AY (2021) American sign language alphabet recognition by extracting feature from hand pose estimation. Sensors 21(17):5856CrossRef

62.

Xie B, He X, Li Y (2018) RGB-D static gesture recognition based on convolutional neural network. J Eng 2018(16):1515–1520CrossRef

63.

Athitsos V, Neidle C, Sclaroff S, Nash J, Stefan A, Yuan Q, Thangali A (2008) American Sign Language lexicon video dataset (asllvd). CVPR 2008, In: workshop on human communicative behaviour analysis (CVPR4HB)

64.

Devineau G, Moutarde F, Xi W, Yang J (2018) Deep learning for hand gesture recognition on skeletal data. In: 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018), pp 106–113. https://doi.org/10.1109/FG.2018.00025

65.

Neidle C, Thangali A, Sclaroff S (2012) Challenges in development of the American Sign Language lexicon video dataset (asllvd) corpus. In: 5th workshop on the representation and processing of sign languages: interactions between Corpus and Lexicon, LREC. Citeseer

66.

De Smedt Q, Wannous H, Vandeborre J-P (2016) Skeleton-based dynamic hand gesture recognition. In: proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 1–9

67.

Cover TM (1999) Elements of information theory. Wiley

68.

Brownlee J (2019) Probability for machine learning: discover how to harness uncertainty with Python. Machine Learning Mastery

69.

Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

Titel: Spatial–temporal attention with graph and general neural network-based sign language recognition
verfasst von: Abu Saleh Musa Miah
Md. Al Mehedi Hasan
Yuichi Okuyama
Yoichi Tomioka
Jungpil Shin
Publikationsdatum: 01.06.2024
Verlag: Springer London
Erschienen in: Pattern Analysis and Applications / Ausgabe 2/2024
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI: https://doi.org/10.1007/s10044-024-01229-4

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2024

Digital fingerprint indexing using synthetic binary indexes

Scene text detection using structured information and an end-to-end trainable generative adversarial networks

Feature selection using adaptive manta ray foraging optimization for brain tumor classification

Remote sensing image location based on improved Yolov7 target detection

CABF-YOLO: a precise and efficient deep learning method for defect detection on strip steel surface

Spatio-temporal trajectory data modeling for fishing gear classification

Premium Partner