Skip to main content

2016 | OriginalPaper | Buchkapitel

Sign Language Recognition with Multi-modal Features

verfasst von : Junfu Pu, Wengang Zhou, Houqiang Li

Erschienen in: Advances in Multimedia Information Processing - PCM 2016

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We study the problem of recognizing sign language automatically using the RGB videos and skeleton coordinates captured by Kinect, which is of great significance in communication between the deaf and the hearing societies. In this paper, we propose a sign language recognition (SLR) system with data of two channels, including the gesture videos of the sign words and joint trajectories. In our framework, we extract two modals of features to represent the hand shape videos and hand trajectories for recognition. The variation of gesture is obtained by 3D CNN and the activations of fully connected layers are used as the representations of these sign videos. For trajectories, we use the shape context to describe each joint, and combine them all within a feature matrix. After that, a convolutional neural network is applied to generate a robust representation of these trajectories. Furthermore, we fuse these features and train a SVM classifier for recognition. We conduct some experiments on large vocabulary sign language dataset with up to 500 words and the results demonstrate the effectiveness of our proposed method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Amor, B.B., Su, J., Srivastava, A.: Action recognition using rate-invariant analysis of skeletal shape trajectories. TPAMI 38(1), 1–13 (2016)CrossRef Amor, B.B., Su, J., Srivastava, A.: Action recognition using rate-invariant analysis of skeletal shape trajectories. TPAMI 38(1), 1–13 (2016)CrossRef
2.
Zurück zum Zitat Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. TPAMI 24(4), 509–522 (2002)CrossRef Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. TPAMI 24(4), 509–522 (2002)CrossRef
3.
Zurück zum Zitat Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2(2), 121–167 (1998)CrossRef Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2(2), 121–167 (1998)CrossRef
4.
Zurück zum Zitat Cai, X., Zhou, W., Li, H.: An effective representation for action recognition with human skeleton joints. In: SPIE/COS Photonics Asia, 92731R. International Society for Optics and Photonics (2014) Cai, X., Zhou, W., Li, H.: An effective representation for action recognition with human skeleton joints. In: SPIE/COS Photonics Asia, 92731R. International Society for Optics and Photonics (2014)
5.
Zurück zum Zitat Cai, X., Zhou, W., Wu, L., Luo, J., Li, H.: Effective active skeleton representation for low latency human action recognition. TMM 18(2), 141–154 (2016) Cai, X., Zhou, W., Wu, L., Luo, J., Li, H.: Effective active skeleton representation for low latency human action recognition. TMM 18(2), 141–154 (2016)
6.
Zurück zum Zitat Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011) Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
7.
Zurück zum Zitat Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)CrossRefMATH Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)CrossRefMATH
8.
Zurück zum Zitat Gao, W., Fang, G., Zhao, D., Chen, Y.: Transition movement models for large vocabulary continuous sign language recognition. In: FG, pp. 553–558 (2004) Gao, W., Fang, G., Zhao, D., Chen, Y.: Transition movement models for large vocabulary continuous sign language recognition. In: FG, pp. 553–558 (2004)
9.
Zurück zum Zitat Huang, J., Zhou, W., Li, H., Li, W.: Sign language recognition using 3D convolutional neural networks. In: ICME (2015) Huang, J., Zhou, W., Li, H., Li, W.: Sign language recognition using 3D convolutional neural networks. In: ICME (2015)
10.
Zurück zum Zitat Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. TPAMI 35(1), 221–231 (2013) Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. TPAMI 35(1), 221–231 (2013)
11.
Zurück zum Zitat Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM MM, pp. 675–678 (2014) Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM MM, pp. 675–678 (2014)
12.
Zurück zum Zitat Knerr, S., Personnaz, L., Dreyfus, G.: Single-layer learning revisited: a stepwise procedure for building and training a neural network. In: Neurocomputing, pp. 41–50 (1990) Knerr, S., Personnaz, L., Dreyfus, G.: Single-layer learning revisited: a stepwise procedure for building and training a neural network. In: Neurocomputing, pp. 41–50 (1990)
13.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
14.
Zurück zum Zitat LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRef LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRef
15.
Zurück zum Zitat Lee, G.C., Yeh, F.H., Hsiao, Y.H.: Kinect-based taiwanese sign-language recognition system. Multimedia Tools Appl. 75(1), 261–279 (2016)CrossRef Lee, G.C., Yeh, F.H., Hsiao, Y.H.: Kinect-based taiwanese sign-language recognition system. Multimedia Tools Appl. 75(1), 261–279 (2016)CrossRef
16.
Zurück zum Zitat Lin, Y., Chai, X., Zhou, Y., Chen, X.: Curve matching from the view of manifold for sign language recognition. In: Jawahar, C.V., Shan, S. (eds.) ACCV 2014. LNCS, vol. 9010, pp. 233–246. Springer, Heidelberg (2015). doi:10.1007/978-3-319-16634-6_18 Lin, Y., Chai, X., Zhou, Y., Chen, X.: Curve matching from the view of manifold for sign language recognition. In: Jawahar, C.V., Shan, S. (eds.) ACCV 2014. LNCS, vol. 9010, pp. 233–246. Springer, Heidelberg (2015). doi:10.​1007/​978-3-319-16634-6_​18
17.
Zurück zum Zitat Liu, Z., Li, H., Zhou, W., Hong, R., Tian, Q.: Uniting keypoints: local visual information fusion for large-scale image search. TMM 17(4), 538–548 (2015) Liu, Z., Li, H., Zhou, W., Hong, R., Tian, Q.: Uniting keypoints: local visual information fusion for large-scale image search. TMM 17(4), 538–548 (2015)
18.
Zurück zum Zitat Murakami, K., Taguchi, H.: Gesture recognition using recurrent neural networks. In: SIGCHI Conference on Human Factors in Computing Systems (1991) Murakami, K., Taguchi, H.: Gesture recognition using recurrent neural networks. In: SIGCHI Conference on Human Factors in Computing Systems (1991)
19.
Zurück zum Zitat Pavlovic, V.I., Sharma, R., Huang, T.S.: Visual interpretation of hand gestures for human-computer interaction: a review. TPAMI 19(7), 677–695 (1997)CrossRef Pavlovic, V.I., Sharma, R., Huang, T.S.: Visual interpretation of hand gestures for human-computer interaction: a review. TPAMI 19(7), 677–695 (1997)CrossRef
20.
Zurück zum Zitat Pu, J., Zhou, W., Zhang, J., Li, H.: Sign language recognition based on trajectory modeling with HMMs. In: Tian, Q., Sebe, N., Qi, G.-J., Huet, B., Hong, R., Liu, X. (eds.) MMM 2016. LNCS, vol. 9516, pp. 686–697. Springer, Heidelberg (2016). doi:10.1007/978-3-319-27671-7_58 CrossRef Pu, J., Zhou, W., Zhang, J., Li, H.: Sign language recognition based on trajectory modeling with HMMs. In: Tian, Q., Sebe, N., Qi, G.-J., Huet, B., Hong, R., Liu, X. (eds.) MMM 2016. LNCS, vol. 9516, pp. 686–697. Springer, Heidelberg (2016). doi:10.​1007/​978-3-319-27671-7_​58 CrossRef
21.
Zurück zum Zitat Schüldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR, vol. 3, pp. 32–36 (2004) Schüldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR, vol. 3, pp. 32–36 (2004)
22.
Zurück zum Zitat Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: ICCV, pp. 4489–4497 (2015) Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: ICCV, pp. 4489–4497 (2015)
23.
Zurück zum Zitat Ueoka, R., Hirose, M., Kuma, K., Sone, M., Kohiyama, K., Kawamura, T., Hiroto, K.: Wearable computer application for open air exhibition in expo 2005. In: PCM, pp. 8–15 (2001) Ueoka, R., Hirose, M., Kuma, K., Sone, M., Kohiyama, K., Kawamura, T., Hiroto, K.: Wearable computer application for open air exhibition in expo 2005. In: PCM, pp. 8–15 (2001)
24.
Zurück zum Zitat Vapnik, V.N., Vapnik, V.: Statistical Learning Theory, vol. 1. Wiley, New York (1998)MATH Vapnik, V.N., Vapnik, V.: Statistical Learning Theory, vol. 1. Wiley, New York (1998)MATH
25.
Zurück zum Zitat Wang, C., Gao, W., Xuan, Z.: A real-time large vocabulary continuous recognition system for Chinese sign language. In: Shum, H.-Y., Liao, M., Chang, S.-F. (eds.) PCM 2001. LNCS, vol. 2195, pp. 150–157. Springer, Heidelberg (2001). doi:10.1007/3-540-45453-5_20 CrossRef Wang, C., Gao, W., Xuan, Z.: A real-time large vocabulary continuous recognition system for Chinese sign language. In: Shum, H.-Y., Liao, M., Chang, S.-F. (eds.) PCM 2001. LNCS, vol. 2195, pp. 150–157. Springer, Heidelberg (2001). doi:10.​1007/​3-540-45453-5_​20 CrossRef
26.
Zurück zum Zitat Wang, H., Chai, X., Chen, X.: Sparse observation (so) alignment for sign language recognition. Neurocomputing 175, 674–685 (2016)CrossRef Wang, H., Chai, X., Chen, X.: Sparse observation (so) alignment for sign language recognition. Neurocomputing 175, 674–685 (2016)CrossRef
27.
Zurück zum Zitat Wang, H., Chai, X., Zhou, Y., Chen, X.: Fast sign language recognition benefited from low rank approximation. In: FG, vol. 1, pp. 1–6 (2015) Wang, H., Chai, X., Zhou, Y., Chen, X.: Fast sign language recognition benefited from low rank approximation. In: FG, vol. 1, pp. 1–6 (2015)
28.
Zurück zum Zitat Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR, pp. 3169–3176 (2011) Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR, pp. 3169–3176 (2011)
29.
Zurück zum Zitat Wobbrock, J.O., Wilson, A.D., Li, Y.: Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes. In: Annual ACM Symposium on User Interface Software and Technology, pp. 159–168 (2007) Wobbrock, J.O., Wilson, A.D., Li, Y.: Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes. In: Annual ACM Symposium on User Interface Software and Technology, pp. 159–168 (2007)
30.
Zurück zum Zitat Zhang, J., Zhou, W., Li, H.: A threshold-based HMM-DTW approach for continuous sign language recognition. In: ICIMCS, p. 237 (2014) Zhang, J., Zhou, W., Li, H.: A threshold-based HMM-DTW approach for continuous sign language recognition. In: ICIMCS, p. 237 (2014)
31.
Zurück zum Zitat Zhang, J., Zhou, W., Li, H.: Chinese sign language recognition with adaptive HMM. In: ICME (2016) Zhang, J., Zhou, W., Li, H.: Chinese sign language recognition with adaptive HMM. In: ICME (2016)
32.
Zurück zum Zitat Zhang, Z.: Microsoft kinect sensor and its effect. IEEE MultiMedia 19(2), 4–10 (2012)CrossRef Zhang, Z.: Microsoft kinect sensor and its effect. IEEE MultiMedia 19(2), 4–10 (2012)CrossRef
Metadaten
Titel
Sign Language Recognition with Multi-modal Features
verfasst von
Junfu Pu
Wengang Zhou
Houqiang Li
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-48896-7_25

Neuer Inhalt