On the role of multimodal learning in the recognition of sign language

Ferreira, Pedro M.; Cardoso, Jaime S.; Rebelo, Ana

doi:10.1007/s11042-018-6565-5

On the role of multimodal learning in the recognition of sign language

Published: 01 September 2018

Volume 78, pages 10035–10056, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

643 Accesses
15 Citations
Explore all metrics

Abstract

Sign Language Recognition (SLR) has become one of the most important research areas in the field of human computer interaction. SLR systems are meant to automatically translate sign language into text or speech, in order to reduce the communicational gap between deaf and hearing people. The aim of this paper is to exploit multimodal learning techniques for an accurate SLR, making use of data provided by Kinect and Leap Motion. In this regard, single-modality approaches as well as different multimodal methods, mainly based on convolutional neural networks, are proposed. Our main contribution is a novel multimodal end-to-end neural network that explicitly models private feature representations that are specific to each modality and shared feature representations that are similar between modalities. By imposing such regularization in the learning process, the underlying idea is to increase the discriminative ability of the learned features and, hence, improve the generalization capability of the model. Experimental results demonstrate that multimodal learning yields an overall improvement in the sign recognition performance. In particular, the novel neural network architecture outperforms the current state-of-the-art methods for the SLR task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Multimodal Learning for Sign Language Recognition

Deep Learning-Based Sign Language Recognition System for Cognitive Development

Article 16 August 2023

A Comprehensive Analysis on Technological Approaches in Sign Language Recognition

References

Adithya V, Vinod PR, Gopalakrishnan U (2013) Artificial neural network based method for indian sign language recognition. In: 2013 IEEE conference on information communication technologies (ICT), pp 1080–1085. https://doi.org/10.1109/CICT.2013.6558259
Bastien F, Lamblin P, Pascanu R, Bergstra J, Goodfellow IJ, Bergeron A, Bouchard N, Bengio Y (2012) Theano: new features and speed improvements. Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop
Bousmalis K, Trigeorgis G, Silberman N, Krishnan D, Erhan D (2016) Domain separation networks. In: lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R (eds) Advances in neural information processing systems 29, pp 343–351
Cooper H, Bowden R (2007) Large lexicon detection of sign language. Springer, Berlin, pp 88–97
Google Scholar
den Bergh MV, Gool LV (2011) Combining rgb and tof cameras for real-time 3d hand gesture interaction. In: 2011 IEEE workshop on applications of computer vision (WACV), pp 66–72
Dominio F, Donadeo M, Zanuttigh P (2014) Combining multiple depth-based descriptors for hand gesture recognition. Pattern Recogn Lett 50:101–111
Article Google Scholar
Ferreira PM, Cardoso JS, Rebelo A (2017) Multimodal learning for sign language recognition. In: Iberian conference on pattern recognition and image analysis, pp 313–321. Springer
Geng Y, Zhang G, Li W, Gu Y, Liang RZ, Liang G, Wang J, Wu Y, Patil N, Wang JY (2017) A novel image tag completion method based on convolutional neural transformation. In: Lintas A, Rovetta S, Verschure PF, Villa AE (eds) Artificial neural networks and machine learning – ICANN 2017. Springer International Publishing, Cham, pp 539–546
Hamid ATZ, Wirza RR, Iqbal SM, Suhaiza SP (2014) Skin segmentation using yuv and rgb color spaces. J Inf Process Syst 10(2):283
Article Google Scholar
Huang C, Loy CC, Tang X (2016) Local similarity-aware deep feature embedding. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R (eds) Advances in neural information processing systems 29, pp 1262–1270
Kurakin A, Zhang Z, Liu Z (2012) A real time system for dynamic hand gesture recognition with a depth sensor. In: 2012 Proceedings of the 20th European signal processing conference (EUSIPCO), pp 1975–1979
Lenz I, Lee H, Saxena A (2015) Deep learning for detecting robotic grasps. Int J Robot Res 34(4-5):705–724. https://doi.org/10.1177/0278364914549607
Article Google Scholar
Liang R, Liang G, Li W, Li Q, Wang JJ (2016) Learning convolutional neural network to maximize pos@top performance measure. arXiv:1609.08417
Marin G, Dominio F, Zanuttigh P (2014) Hand gesture recognition with leap motion and kinect devices. In: 2014 IEEE International conference on image processing (ICIP), pp 1565–1569
Marin G, Dominio F, Zanuttigh P (2016) Hand gesture recognition with jointly calibrated leap motion and depth sensor. Multimedia Tools and Applications 75 (22):14,991–15,015. https://doi.org/10.1007/s11042-015-2451-6
Article Google Scholar
Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY (2011) Multimodal deep learning. In: International conference on machine learning (ICML), vol 6
Potter LE, Araullo J, Carter L (2013) The leap motion controller: a view on sign language. In: Proceedings of the 25th Australian computer-human interaction conference: augmentation, application, innovation, collaboration, OzCHI ’13. ACM, New York, pp 175–178
Ramachandram D, Taylor GW (2017) Deep multimodal learning: a survey on recent advances and trends. IEEE Signal Proc Mag 34(6):96–108. https://doi.org/10.1109/MSP.2017.2738401
Article Google Scholar
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Sohn K, Shang W, Lee H (2014) Improved multimodal deep learning with variation of information. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems 27, pp 2141–2149. Curran Associates, Inc. http://papers.nips.cc/paper/5279-improved-multimodal-deep-learning-with-variation-of-information.pdf
Srinivas S, Sarvadevabhatla RK, Mopuri KR, Prabhu N, Kruthiventi S, Radhakrishnan VB (2016) A taxonomy of deep convolutional neural nets for computer vision. Frontiers in Robotics and AI 2(36):1–13
Google Scholar
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958. http://jmlr.org/papers/v15/srivastava14a.html
MathSciNet MATH Google Scholar
Su F, Wang J (2018) Domain transfer convolutional attribute embedding. arXiv:1803.09733
Wang A, Cai J, Lu J, Cham TJ (2015) Mmss: Multi-modal sharable and specific feature learning for rgb-d object recognition. In: 2015 IEEE International conference on computer vision (ICCV), pp 1125–1133
Wang A, Lu J, Cai J, Cham TJ, Wang G (2015) Large-margin multi-modal deep learning for rgb-d object recognition. IEEE Trans Multimedia 17(11):1887–1898. https://doi.org/10.1109/TMM.2015.2476655
Article Google Scholar
Wang J, Shi L, Wang H, Meng J, Wang JJ, Sun Q, Gu Y (2016) Optimizing top precision performance measure of content-based image retrieval by learning similarity function. arXiv:1604.06620
Wang JJY, Wang Y, Zhao S, Gao X (2015) Maximum mutual information regularized classification. Eng Appl Artif Intell 37:1–8. https://doi.org/10.1016/j.engappai.2014.08.009. http://www.sciencedirect.com/science/article/pii/S0952197614002085
Article Google Scholar
Wu Z, Jiang YG, Wang J, Pu J, Xue X (2014) Exploring inter-feature and inter-class relationships with deep neural networks for video classification. In: Proceedings of the 22Nd ACM International conference on multimedia, MM ’14. ACM, New York, pp 167–176. https://doi.org/10.1145/2647868.2654931. http://doi.acm.org/10.1145/2647868.2654931
Yang H (2015) Sign language recognition with the kinect sensor based on conditional random fields. Sensors 15(1):135–147. https://doi.org/10.3390/s150100135
Article Google Scholar
Zhang G, Liang G, Li W, Fang J, Wang J, Geng Y, Wang JY (2017) Learning convolutional ranking-score function by query preference regularization. In: Yin H, Gao Y, Chen S, Wen Y, Cai G, Gu T, Du J, Tallón-Ballesteros AJ, Zhang M (eds) Intelligent data engineering and automated learning – IDEAL 2017. Springer International Publishing, Cham, pp 1–8
Zhang S, Wang H, Huang W (2017) Two-stage plant species recognition by local mean clustering and weighted sparse representation classification. Clust Comput 20(2):1517–1525. https://doi.org/10.1007/s10586-017-0859-7
Article Google Scholar

Download references

Acknowledgements

This work was funded by the Protect “NanoSTIMA: Macro-to-Nano Human Sensing: Towards Integrated Multimodal Health Monitoring and Analytics/NORTE010145-FEDER000016 ” financed by the North Portugal Regional Operational Programme (NORTE 2020), under PORTUGAL 2020 Partnership Agreement, and through the European Regional Development FUND (ERDF), and also by Fundação para a Ciência e a Tecnologia (FCT) within PhD and BPD grants with numbers SFRH/BD/102177/2014 and SFRH/BPD/101439/2014.

Author information

Authors and Affiliations

INESC TEC and Universidade do Porto, Porto, Portugal
Pedro M. Ferreira & Jaime S. Cardoso
INESC TEC and Univ Portucalense, Oporto, Portugal
Ana Rebelo

Authors

Pedro M. Ferreira
View author publications
You can also search for this author in PubMed Google Scholar
Jaime S. Cardoso
View author publications
You can also search for this author in PubMed Google Scholar
Ana Rebelo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pedro M. Ferreira.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ferreira, P.M., Cardoso, J.S. & Rebelo, A. On the role of multimodal learning in the recognition of sign language. Multimed Tools Appl 78, 10035–10056 (2019). https://doi.org/10.1007/s11042-018-6565-5

Download citation

Received: 06 February 2018
Revised: 01 August 2018
Accepted: 16 August 2018
Published: 01 September 2018
Issue Date: April 2019
DOI: https://doi.org/10.1007/s11042-018-6565-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the role of multimodal learning in the recognition of sign language

Abstract

Access this article

Similar content being viewed by others

Multimodal Learning for Sign Language Recognition

Deep Learning-Based Sign Language Recognition System for Cognitive Development

A Comprehensive Analysis on Technological Approaches in Sign Language Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the role of multimodal learning in the recognition of sign language

Abstract

Access this article

Similar content being viewed by others

Multimodal Learning for Sign Language Recognition

Deep Learning-Based Sign Language Recognition System for Cognitive Development

A Comprehensive Analysis on Technological Approaches in Sign Language Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation