Skip to main content
main-content

Tipp

Weitere Kapitel dieses Buchs durch Wischen aufrufen

2018 | OriginalPaper | Buchkapitel

A Kinematic Gesture Representation Based on Shape Difference VLAD for Sign Language Recognition

verfasst von : Jefferson Rodríguez, Fabio Martínez

Erschienen in: Computer Vision and Graphics

Verlag: Springer International Publishing

share
TEILEN

Abstract

Automatic Sign language recognition (SLR) is a fundamental task to help with inclusion of deaf community in society, facilitating, noways, many conventional multimedia interactions. In this work is proposed a novel approach to represent gestures in SLR as a shape difference-VLAD mid level coding of kinematic primitives, captured along videos. This representation capture local salient motions together with regional dominant patterns developed by articulators along utterances. Also, the special VLAD representation allows to quantify local motion pattern but also capture shape of motion descriptors, that achieved a proper regional gesture characterization. The proposed approach achieved an average accuracy of 85,45% in a corpus data of 64 sign words captured in 3200 videos. Additionally, for Boston sign dataset the proposed approach achieve competitive results with \(82\%\) of accuracy in average.
Literatur
1.
Zurück zum Zitat Ali, S., Shah, M.: Human action recognition in videos using kinematic features and multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 32(2), 288–303 (2010) CrossRef Ali, S., Shah, M.: Human action recognition in videos using kinematic features and multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 32(2), 288–303 (2010) CrossRef
2.
Zurück zum Zitat Brox, T., Bregler, C., Malik, J.: Large displacement optical flow. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 41–48. IEEE (2009) Brox, T., Bregler, C., Malik, J.: Large displacement optical flow. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 41–48. IEEE (2009)
4.
Zurück zum Zitat Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011) Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
5.
Zurück zum Zitat Chaudhry, R., Ravichandran, A., Hager, G., Vidal, R.: Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 1932–1939. IEEE (2009) Chaudhry, R., Ravichandran, A., Hager, G., Vidal, R.: Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 1932–1939. IEEE (2009)
7.
Zurück zum Zitat Duta, I.C., Uijlings, J.R., Ionescu, B., Aizawa, K., Hauptmann, A.G., Sebe, N.: Efficient human action recognition using histograms of motion gradients and vlad with descriptor shape information. Multimedia Tools Appl. 76, 22445–22472 (2017) CrossRef Duta, I.C., Uijlings, J.R., Ionescu, B., Aizawa, K., Hauptmann, A.G., Sebe, N.: Efficient human action recognition using histograms of motion gradients and vlad with descriptor shape information. Multimedia Tools Appl. 76, 22445–22472 (2017) CrossRef
8.
Zurück zum Zitat Jain, M., Jegou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2555–2562 (2013) Jain, M., Jegou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2555–2562 (2013)
9.
Zurück zum Zitat Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3304–3311. IEEE (2010) Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3304–3311. IEEE (2010)
10.
Zurück zum Zitat Konecnỳ, J., Hagara, M.: One-shot-learning gesture recognition using HOG-HOF. J. Mach. Learn. Res. 15, 2513–2532 (2014) MathSciNet Konecnỳ, J., Hagara, M.: One-shot-learning gesture recognition using HOG-HOF. J. Mach. Learn. Res. 15, 2513–2532 (2014) MathSciNet
12.
Zurück zum Zitat Paulraj, M., Yaacob, S., Desa, H., Hema, C., Ridzuan, W.M., Ab Majid, W.: Extraction of head and hand gesture features for recognition of sign language. In: International Conference on Electronic Design, ICED 2008, pp. 1–6. IEEE (2008) Paulraj, M., Yaacob, S., Desa, H., Hema, C., Ridzuan, W.M., Ab Majid, W.: Extraction of head and hand gesture features for recognition of sign language. In: International Conference on Electronic Design, ICED 2008, pp. 1–6. IEEE (2008)
13.
Zurück zum Zitat Peng, X., Wang, L., Wang, X., Qiao, Y.: Bag of visual words and fusion methods for action recognition: comprehensive study and good practice. Comput. Vis. Image Underst. 150, 109–125 (2016) CrossRef Peng, X., Wang, L., Wang, X., Qiao, Y.: Bag of visual words and fusion methods for action recognition: comprehensive study and good practice. Comput. Vis. Image Underst. 150, 109–125 (2016) CrossRef
15.
Zurück zum Zitat Ronchetti, F., Quiroga, F., Estrebou, C.A., Lanzarini, L.C., Rosete, A.: LSA64: an Argentinian sign language dataset. In: XXII Congreso Argentino de Ciencias de la Computación (CACIC 2016) (2016) Ronchetti, F., Quiroga, F., Estrebou, C.A., Lanzarini, L.C., Rosete, A.: LSA64: an Argentinian sign language dataset. In: XXII Congreso Argentino de Ciencias de la Computación (CACIC 2016) (2016)
16.
Zurück zum Zitat Wan, J., Ruan, Q., Li, W., Deng, S.: One-shot learning gesture recognition from RGB-D data using bag of features. J. Mach. Learn. Res. 14(1), 2549–2582 (2013) Wan, J., Ruan, Q., Li, W., Deng, S.: One-shot learning gesture recognition from RGB-D data using bag of features. J. Mach. Learn. Res. 14(1), 2549–2582 (2013)
17.
Zurück zum Zitat Zafrulla, Z., Brashear, H., Starner, T., Hamilton, H., Presti, P.: American sign language recognition with the kinect. In: Proceedings of the 13th International Conference on Multimodal Interfaces, pp. 279–286. ACM (2011) Zafrulla, Z., Brashear, H., Starner, T., Hamilton, H., Presti, P.: American sign language recognition with the kinect. In: Proceedings of the 13th International Conference on Multimodal Interfaces, pp. 279–286. ACM (2011)
18.
Zurück zum Zitat Zahedi, M., Keysers, D., Deselaers, T., Ney, H.: Combination of tangent distance and an image distortion model for appearance-based sign language recognition. In: Kropatsch, W.G., Sablatnig, R., Hanbury, A. (eds.) DAGM 2005. LNCS, vol. 3663, pp. 401–408. Springer, Heidelberg (2005). https://​doi.​org/​10.​1007/​11550518_​50 CrossRef Zahedi, M., Keysers, D., Deselaers, T., Ney, H.: Combination of tangent distance and an image distortion model for appearance-based sign language recognition. In: Kropatsch, W.G., Sablatnig, R., Hanbury, A. (eds.) DAGM 2005. LNCS, vol. 3663, pp. 401–408. Springer, Heidelberg (2005). https://​doi.​org/​10.​1007/​11550518_​50 CrossRef
20.
Zurück zum Zitat Zaki, M.M., Shaheen, S.I.: Sign language recognition using a combination of new vision based features. Pattern Recogn. Lett. 32(4), 572–577 (2011) CrossRef Zaki, M.M., Shaheen, S.I.: Sign language recognition using a combination of new vision based features. Pattern Recogn. Lett. 32(4), 572–577 (2011) CrossRef
Metadaten
Titel
A Kinematic Gesture Representation Based on Shape Difference VLAD for Sign Language Recognition
verfasst von
Jefferson Rodríguez
Fabio Martínez
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-00692-1_38

Premium Partner