nach oben

Pattern Analysis and Applications

Erschienen in:

18.02.2023 | Theoretical Advances

Multiview meta-metric learning for sign language recognition using triplet loss embeddings

verfasst von: Suneetha Mopidevi, M. V. D. Prasad, Polurie Venkata Vijay Kishore

Erschienen in: Pattern Analysis and Applications | Ausgabe 3/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Multiview video processing for recognition is a hard problem if the subject is in continuous motion. Especially the problem becomes even tougher when the subject in question is a human being and the actions to be recognized from the video data are a complex set of actions called sign language. Although many deep learning models have been successfully applied for sign language recognition (SLR), very few models have considered multiple views in their training set. In this work, we propose to apply meta-metric learning for video-based SLR. Contrasting to traditional metric learning where the triplet loss is constructed on the sample-based distances, the meta-metric learns on the set-based distances. Consequently, we construct meta-cells on the entire multiview dataset and perform a task-based learning approach with respect to support cells and query sets. Additionally, we propose a maximum view pooled distance on sub-tasks for binding intra class views. Experiments conducted on the multiview sign language dataset and four human action recognition datasets show that the proposed multiview meta-metric learning model (MVDMML) achieves higher accuracies than the baselines.

Vorheriger Artikel A single defocused image depth recovery with superpixel segmentation

Nächster Artikel Deep Fuzzy SegNet-based lung nodule segmentation and optimized deep learning for lung cancer detection

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Koller O, Zargaran S, Ney H, Bowden R (2018) Deep sign: enabling robust statistical continuous sign language recognition via hybrid CNN-HMMs. Int J Comput Vis 126(12):1311–1325CrossRef

Kumar E, Kiran PVV, Kishore ASCS, Sastry MT, Kumar K, Anil Kumar D (2018) Training CNNs for 3-D sign language recognition with color texture coded joint angular displacement maps. IEEE Signal Process Lett 25(5):645–649CrossRef

Mary TB, Malin Bruntha P, Manimekalai MAP, Martin Sagayam K, Dang H (2021) Investigation of an efficient integrated semantic interactive algorithm for image retrieval. Pattern Recognit Image Anal 31(4):709–721CrossRef

Mittal A, Kumar P, Roy PP, Balasubramanian R, Chaudhuri BB (2019) A modified LSTM model for continuous sign language recognition using leap motion. IEEE Sens J 19(16):7056–7063CrossRef

Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1227–1236

Hoffer E, Ailon N (2015) Deep metric learning using triplet network. In: International workshop on similarity-based pattern recognition. Springer, Cham, pp 84–92

Wang J, Wang K-C, Law MT, Rudzicz F, Brudno M (2019) Centroid-based deep metric learning for speaker recognition. In: ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 3652–3656

Yu J, Hu C-H, Jing X-Y, Feng Y-J (2020) Deep metric learning with dynamic margin hard sampling loss for face verification. Signal Image Video Process 14:791–798CrossRef

Coskun H, Tan DJ, Conjeti S, Navab N, Tombari F (2018) Human motion analysis with deep metric learning. In: Proceedings of the European conference on computer vision (ECCV), pp 667–683

10.

He J, Wang Y, Liu H (2020) Ship classification in medium-resolution SAR images via densely connected triplet CNNs integrating fisher discrimination regularized metric learning. IEEE Trans Geosci Remote Sens 59(4):3022–3039CrossRef

11.

Wojke N, Bewley A (2018) Deep cosine metric learning for person re-identification. In: 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 748–756

12.

Chen G, Zhang T, Lu J, Zhou J (2019) Deep meta metric learning. In: Proceedings of the IEEE international conference on computer vision, pp 9547–9556

13.

Shahroudy A, Liu J, Ng T-T, Wang G (2016) Ntu rgb+ d: a large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1010–1019

14.

Singh S, Velastin SA, Ragheb H (2010) Muhavi: a multicamera human action video dataset for the evaluation of action recognition methods. In: 2010 7th IEEE international conference on advanced video and signal based surveillance. IEEE, pp 48–55

15.

Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell 29(12):2247–2253CrossRef

16.

Wang D, Ouyang W, Li W, Xu D (2018) Dividing and aggregating network for multi-view action recognition. In: Proceedings of the European conference on computer vision (ECCV), pp 451–467

17.

Pezzuoli F, Corona D, Corradini ML (2019) Improvements in a wearable device for sign language translation. In: International conference on applied human factors and ergonomics. Springer, Cham, pp 70–81

18.

Rao GA, Syamala K, Kishore PVV, Sastry ASCS (2018) Deep convolutional neural networks for sign language recognition. In: 2018 conference on signal processing and communication engineering systems (SPACES). IEEE, pp 194–197

19.

Ravi S, Suman M, Kishore PVV, Kumar K, Kumar A (2019) Multi modal spatio temporal co-trained CNNs with single modal testing on RGB–D based sign language gesture recognition. J Comput Lang 52:88–102CrossRef

20.

Kishore PVV, Anil Kumar D, Chandra Sekhara Sastry AS, Kiran Kumar E (2018) Motionlets matching with adaptive kernels for 3-d indian sign language recognition. IEEE Sens J 18(8):3327–3337CrossRef

21.

Li D, Rodriguez C, Yu X, Li H (2020) Word-level deep sign language recognition from video: a new large-scale dataset and methods comparison. In: The IEEE winter conference on applications of computer vision, pp 1459–1469

22.

Liao Y, Xiong P, Min W, Min W, Jiahao Lu (2019) Dynamic sign language recognition based on video sequence with BLSTM-3D residual networks. IEEE Access 7:38044–38054CrossRef

23.

Kishore PVV, Anil Kumar D, Goutham END, Manikanta M (2016) Continuous sign language recognition from tracking and shape features using fuzzy inference engine. In: 2016 International conference on wireless communications, signal processing and networking (WiSPNET). IEEE, pp 2165–2170

24.

Sagayam KM, Jude Hemanth D (2019) A probabilistic model for state sequence analysis in hidden Markov model for hand gesture recognition. Comput Intell 35(1):59–81MathSciNetCrossRef

25.

Kishore PVV, Prasad MVD, Raghava Prasad C, Rahul R (2015) 4-Camera model for sign language recognition using elliptical fourier descriptors and ANN. In: 2015 International conference on signal processing and communication engineering systems. IEEE, pp 34–38

26.

Cui R, Liu H, Zhang C (2017) Recurrent convolutional neural networks for continuous sign language recognition by staged optimization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7361–7369

27.

Rastgoo R, Kiani K, Escalera S (2020) Hand sign language recognition using multi-view hand skeleton. Expert Syst Appl 150:113336CrossRef

28.

Kocabas M, Karagoz S, Akbas E (2019) Self-supervised learning of 3d human pose using multi-view geometry. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1077–1086

29.

Gao Z, Xuan H-Z, Zhang H, Wan S, Choo K-KR (2019) Adaptive fusion and category-level dictionary learning model for multiview human action recognition. IEEE Internet Things J 6(6):9280–9293CrossRef

30.

Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2019) View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Trans Pattern Anal Mach Intell 41(8):1963–1978CrossRef

31.

Hao T, Dan Wu, Wang Q, Sun J-S (2017) Multi-view representation learning for multi-view action recognition. J Vis Commun Image Represent 48:453–460CrossRef

32.

Zhu Y, Liu G (2019) Fine-grained action recognition using multi-view attentions. Visual Comput 36:1–11

33.

Zhu K, Wang R, Zhao Q, Cheng J, Tao D (2020) A cuboid CNN model with an attention mechanism for skeleton-based action recognition. IEEE Trans Multimedia 22(11):2977–2989CrossRef

34.

Wang J, Zhou F, Wen S, Liu X, Lin Y (2017) Deep metric learning with angular loss. In: Proceedings of the IEEE international conference on computer vision, pp 2593–2601

35.

Yi D, Lei Z, Liao S, Li SZ (2014) Deep metric learning for person re-identification. In: 2014 22nd international conference on pattern recognition. IEEE, pp 34–39

36.

Hu J, Lu J, Tan Y-P (2014) Discriminative deep metric learning for face verification in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1875–1882

37.

Ghahabi O, Hernando J (2017) Deep learning backend for single and multisession i-vector speaker recognition. IEEE/ACM Trans Audio Speech Lang Process 25(4):807–817CrossRef

38.

Lu J, Wang G, Deng W, Moulin P, Zhou J (2015) Multi-manifold deep metric learning for image set classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1137–1145

39.

Cheng G, Yang C, Yao X, Guo L, Han J (2018) When deep learning meets metric learning: remote sensing image scene classification via learning discriminative CNNs. IEEE Trans Geosci Remote Sens 56(5):2811–2821CrossRef

40.

Ge W (2018) Deep metric learning with hierarchical triplet loss. In: Proceedings of the European conference on computer vision (ECCV), pp 269–285

41.

Zheng W, Chen Z, Lu J, Zhou J (2019) Hardness-aware deep metric learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 72–81

42.

Wang X, Han X, Huang W, Dong D, Scott MR (2019) Multi-similarity loss with general pair weighting for deep metric learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5022–5030

43.

Sohn K (2016) Improved deep metric learning with multi-class n-pair loss objective. In: Advances in neural information processing systems, pp 1857–1865

44.

Lee K, Maji S, Ravichandran A, Soatto S (2019) Meta-learning with differentiable convex optimization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 10657–10665

45.

Achille A, Lam M, Tewari R, Ravichandran A, Maji S, Fowlkes CC, Soatto S, Perona P (2019) Task2vec: task embedding for meta-learning. In: Proceedings of the IEEE international conference on computer vision, pp 6430–6439

46.

Yoo D, Fan H, Boddeti V, Kitani K (2018) Efficient k-shot learning with regularized deep networks. In: Proceedings of the AAAI conference on artificial intelligence vol. 32, No. 1

47.

Lee Y, Choi S (2018) Gradient-based meta-learning with learned layerwise metric and subspace. In: International conference on machine learning. PMLR, pp 2927–2936

48.

Xu Z, Cao L, Chen X (2019) Meta-learning via weighted gradient update. IEEE Access 7:110846–110855CrossRef

49.

Wang D, Cheng Yu, Mo Yu, Guo X, Zhang T (2019) A hybrid approach with optimization-based and metric-based meta-learner for few-shot learning. Neurocomputing 349:202–211CrossRef

50.

Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International conference on machine learning. PMLR, pp 1126–1135

51.

Cai Q, Pan Y, Yao T, Yan C, Mei T (2018) Memory matching networks for one-shot image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4080–4088

52.

He X, Zhou Y, Zhou Z, Bai S, Bai X (2018) Triplet-center loss for multi-view 3d object retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1945–1954

53.

Qu F, Liu J, Liu X, Jiang L (2020) A multi-fault detection method with improved triplet loss based on hard sample mining. IEEE Trans Sustain Energy 12(1):127–137CrossRef

54.

He Z, Jung C, Qingtao Fu, Zhang Z (2019) Deep feature embedding learning for person re-identification based on lifted structured loss. Multimedia Tools Appl 78(5):5863–5880CrossRef

55.

Chen M, Ge Y, Feng X, Chuanyun Xu, Yang D (2018) Person re-identification by pose invariant deep metric learning with improved triplet loss. IEEE Access 6:68089–68095CrossRef

56.

Dong X, Shen J (2018) Triplet loss in siamese network for object tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 459–474

57.

Choi H, Som A, Turaga P (2020) AMC-loss: angular margin contrastive loss for improved explainability in image classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 838–839

58.

Zhong P, Wang Di, Miao C (2019) An affect-rich neural conversational model with biased attention and weighted cross-entropy loss. Proc AAAI Conf Artif Intell 33(01):7492–7500

59.

Wang Qi, Chen X, Zhang L-G, Wang C, Gao W (2007) Viewpoint invariant sign language recognition. Comput Vis Image Underst 108(1–2):87–97CrossRef

60.

Elons AS, Abull-Ela M, Tolba MF (2013) A proposed PCNN features quality optimization technique for pose-invariant 3D Arabic sign language recognition. Appl Soft Comput 13(4):1646–1660CrossRef

61.

Zhu J, Zou W, Zhu Z, Liang Xu, Huang G (2019) Action machine: toward person-centric action recognition in videos. IEEE Signal Process Lett 26(11):1633–1637CrossRef

62.

Luvizon DC, Picard D, Tabia H (2018) 2d/3d pose estimation and action recognition using multitask deep learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5137–5146

63.

Liu M, Yuan J (2018) Recognizing human actions as the evolution of pose estimation maps. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1159–1168

64.

Wang P, Li W, Wan J, Ogunbona P, Liu X (2018) Cooperative training of deep aggregation networks for RGB-D action recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 32, No. 1

65.

Nida N, Yousaf MH, Irtaza A, Velastin SA (2020) Deep temporal motion descriptor (DTMD) for human action recognition. Turk J Electr Eng Comput Sci 28(3):1371–1385CrossRef

66.

Khan MA, Akram T, Sharif M, Javed MY, Muhammad N, Yasmin M (2019) An implementation of optimized framework for action classification using multilayers neural network on selected fused features. Pattern Anal Appl 22(4):1377–1397MathSciNetCrossRef

67.

Ijjina EP, Chalavadi KM (2017) Human action recognition in RGB-D videos using motion sequence information and deep learning. Pattern Recognit 72:504–516CrossRef

68.

Liu C, Ying J, Yang H, Hu X, Liu J (2021) Improved human action recognition approach based on two-stream convolutional neural network model. Vis Comput 37:1327–1341CrossRef

69.

Mambou S, Krejcar O, Kuca K, Selamat A (2018) Novel cross-view human action model recognition based on the powerful view-invariant features technique. Future Internet 10(9):89CrossRef

Titel: Multiview meta-metric learning for sign language recognition using triplet loss embeddings
verfasst von: Suneetha Mopidevi
M. V. D. Prasad
Polurie Venkata Vijay Kishore
Publikationsdatum: 18.02.2023
Verlag: Springer London
Erschienen in: Pattern Analysis and Applications / Ausgabe 3/2023
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI: https://doi.org/10.1007/s10044-023-01134-2

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2023

MPF6D: masked pyramid fusion 6D pose estimation

DStab: estimating clustering quality by distance stability

Exponential filtering technique for Euclidean norm-regularized extreme learning machines

Deep neural networks for rank-consistent ordinal regression based on conditional probabilities

A new multidimensional discriminant representation for robust person re-identification

A review of natural language processing in contact centre automation

Premium Partner