Skip to main content
Erschienen in: Universal Access in the Information Society 4/2008

01.02.2008 | Long Paper

Recent developments in visual sign language recognition

verfasst von: Ulrich von Agris, Jörg Zieren, Ulrich Canzler, Britta Bauer, Karl-Friedrich Kraiss

Erschienen in: Universal Access in the Information Society | Ausgabe 4/2008

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Research in the field of sign language recognition has made significant advances in recent years. The present achievements provide the basis for future applications with the objective of supporting the integration of deaf people into the hearing society. Translation systems, for example, could facilitate communication between deaf and hearing people in public situations. Further applications, such as user interfaces and automatic indexing of signed videos, become feasible. The current state in sign language recognition is roughly 30 years behind speech recognition, which corresponds to the gradual transition from isolated to continuous recognition for small vocabulary tasks. Research efforts were mainly focused on robust feature extraction or statistical modeling of signs. However, current recognition systems are still designed for signer-dependent operation under laboratory conditions. This paper describes a comprehensive concept for robust visual sign language recognition, which represents the recent developments in this field. The proposed recognition system aims for signer-independent operation and utilizes a single video camera for data acquisition to ensure user-friendliness. Since sign languages make use of manual and facial means of expression, both channels are employed for recognition. For mobile operation in uncontrolled environments, sophisticated algorithms were developed that robustly extract manual and facial features. The extraction of manual features relies on a multiple hypotheses tracking approach to resolve ambiguities of hand positions. For facial feature extraction, an active appearance model is applied which allows identification of areas of interest such as the eyes and mouth region. In the next processing step, a numerical description of the facial expression, head pose, line of sight, and lip outline is computed. The system employs a resolution strategy for dealing with mutual overlapping of the signer’s hands and face. Classification is based on hidden Markov models which are able to compensate time and amplitude variances in the articulation of a sign. The classification stage is designed for recognition of isolated signs, as well as of continuous sign language. In the latter case, a stochastic language model can be utilized, which considers uni- and bigram probabilities of single and successive signs. For statistical modeling of reference models each sign is represented either as a whole or as a composition of smaller subunits—similar to phonemes in spoken languages. While recognition based on word models is limited to rather small vocabularies, subunit models open the door to large vocabularies. Achieving signer-independence constitutes a challenging problem, as the articulation of a sign is subject to high interpersonal variance. This problem cannot be solved by simple feature normalization and must be addressed at the classification level. Therefore, dedicated adaptation methods known from speech recognition were implemented and modified to consider the specifics of sign languages. For rapid adaptation to unknown signers the proposed recognition system employs a combined approach of maximum likelihood linear regression and maximum a posteriori estimation.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
For speech-recognition the accordant name is acoustic subunits. For sign language recognitions the name is adapted.
 
Literatur
1.
Zurück zum Zitat Bahl, L., Jelinek, F., Mercer, R.: A maximum likelihood approach to continuous speech recognition. IEEE Trans. Pattern Anal. Mach. Intell. 5(2), 179–190 (1983)CrossRef Bahl, L., Jelinek, F., Mercer, R.: A maximum likelihood approach to continuous speech recognition. IEEE Trans. Pattern Anal. Mach. Intell. 5(2), 179–190 (1983)CrossRef
2.
Zurück zum Zitat Bauer, B.: Erkennung kontinuierlicher Gebärdensprache mit Untereinheiten-Modellen. Shaker Verlag, Aachen (2003) Bauer, B.: Erkennung kontinuierlicher Gebärdensprache mit Untereinheiten-Modellen. Shaker Verlag, Aachen (2003)
3.
Zurück zum Zitat Becker, C.: Zur Struktur der deutschen Gebärdensprache. WVT Wissenschaftlicher Verlag, Trier (Germany) (1997) Becker, C.: Zur Struktur der deutschen Gebärdensprache. WVT Wissenschaftlicher Verlag, Trier (Germany) (1997)
4.
Zurück zum Zitat Canzler, U.: Nicht-intrusive Mimikanalyse. Dissertation, Chair of Technical Computer Science, RWTH, Aachen (2005) Canzler, U.: Nicht-intrusive Mimikanalyse. Dissertation, Chair of Technical Computer Science, RWTH, Aachen (2005)
5.
Zurück zum Zitat Canzler, U., Dziurzyk, T.: Extraction of non manual features for videobased sign language recognition. In: Proceedings of the IAPR Workshop on Machine Vision Applications, pp. 318–321. Nara, Japan (2002) Canzler, U., Dziurzyk, T.: Extraction of non manual features for videobased sign language recognition. In: Proceedings of the IAPR Workshop on Machine Vision Applications, pp. 318–321. Nara, Japan (2002)
6.
Zurück zum Zitat Canzler, U., Ersayar, T.: Manual and facial features combination for videobased sign language recognition. In: Proceedings of the 7th International Student Conference on Electrical Engineering. Prague (2003) Canzler, U., Ersayar, T.: Manual and facial features combination for videobased sign language recognition. In: Proceedings of the 7th International Student Conference on Electrical Engineering. Prague (2003)
7.
Zurück zum Zitat Canzler, U., Kraiss, K.-F.: Person-adaptive facial feature analysis for an advanced wheelchair user-interface. In: Conference on Mechatronics and Robotics, vol. Part III, pp. 871–876. Sascha Eysoldt Verlag (2004) Canzler, U., Kraiss, K.-F.: Person-adaptive facial feature analysis for an advanced wheelchair user-interface. In: Conference on Mechatronics and Robotics, vol. Part III, pp. 871–876. Sascha Eysoldt Verlag (2004)
8.
Zurück zum Zitat Canzler, U., Wegener, B.: Person-adaptive facial feature analysis. In: Proceedings of the 8th International Student Conference on Electrical Engineering. Prague (2004) Canzler, U., Wegener, B.: Person-adaptive facial feature analysis. In: Proceedings of the 8th International Student Conference on Electrical Engineering. Prague (2004)
9.
Zurück zum Zitat Cootes, T., Edwards, G., Taylor, C.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001)CrossRef Cootes, T., Edwards, G., Taylor, C.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001)CrossRef
10.
Zurück zum Zitat Derpanis, K.G.: A review of vision-based hand gestures. Technical Report, Department of Computer Science, York University (2004) Derpanis, K.G.: A review of vision-based hand gestures. Technical Report, Department of Computer Science, York University (2004)
11.
Zurück zum Zitat Dick, T., Zieren, J., Kraiss, K.-F.: Visual hand posture recognition in monocular image sequences. In: Pattern Recognition, 28th DAGM Symposium Berlin, Lecture Notes in Computer Science. Springer, Berlin (2006) Dick, T., Zieren, J., Kraiss, K.-F.: Visual hand posture recognition in monocular image sequences. In: Pattern Recognition, 28th DAGM Symposium Berlin, Lecture Notes in Computer Science. Springer, Berlin (2006)
12.
Zurück zum Zitat Fang, G., Gao, W., Chen, X., Wang, C., Ma, J. Signer-independent continuous sign language recognition based on SRN/HMM. In: Revised Papers from the International Gesture Workshop on Gestures and Sign Languages in Human–Computer Interaction, pp. 76–85. Springer, Heidelberg (2002) Fang, G., Gao, W., Chen, X., Wang, C., Ma, J. Signer-independent continuous sign language recognition based on SRN/HMM. In: Revised Papers from the International Gesture Workshop on Gestures and Sign Languages in Human–Computer Interaction, pp. 76–85. Springer, Heidelberg (2002)
13.
Zurück zum Zitat Gales, M., Woodland, P.: Mean and variance adaptation within the MLLR framework. Comput. Speech Lang. 10, 249–264 (1996)CrossRef Gales, M., Woodland, P.: Mean and variance adaptation within the MLLR framework. Comput. Speech Lang. 10, 249–264 (1996)CrossRef
14.
Zurück zum Zitat Hermansky, H., Timberwala, S., Pavel, M.: Towards ASR on partially corrupted speech. In: Proceedings of the 4th International Conference on Spoken Language Processing, vol. 1, pp. 462–465. Philadelphia, PA (1996) Hermansky, H., Timberwala, S., Pavel, M.: Towards ASR on partially corrupted speech. In: Proceedings of the 4th International Conference on Spoken Language Processing, vol. 1, pp. 462–465. Philadelphia, PA (1996)
15.
Zurück zum Zitat Holden, E.J., Owens, R.A.: Visual sign language recognition. In: Proceedings of the 10th International Workshop on Theoretical Foundations of Computer Vision, pp. 270–288. Springer, Heidelberg (2001) Holden, E.J., Owens, R.A.: Visual sign language recognition. In: Proceedings of the 10th International Workshop on Theoretical Foundations of Computer Vision, pp. 270–288. Springer, Heidelberg (2001)
16.
Zurück zum Zitat Huang, X., Ariki, Y., Jack, M.: Hidden Markov Models for Speech Recognition. Edinburgh University Press, Edinburgh (1990) Huang, X., Ariki, Y., Jack, M.: Hidden Markov Models for Speech Recognition. Edinburgh University Press, Edinburgh (1990)
17.
Zurück zum Zitat Illingworth, J., Kittler, J.: A survey of the Hough transform. Computer Vision, Graphics, and Image Processing 44(1), 87–116 (1988)CrossRef Illingworth, J., Kittler, J.: A survey of the Hough transform. Computer Vision, Graphics, and Image Processing 44(1), 87–116 (1988)CrossRef
18.
Zurück zum Zitat Imai, A., Shimada, N., Shirai, Y.: 3-D hand posture recognition by training contour variation. In: Proceedings of the International Conference on Automatic Face and Gesture Recognition (2004) Imai, A., Shimada, N., Shirai, Y.: 3-D hand posture recognition by training contour variation. In: Proceedings of the International Conference on Automatic Face and Gesture Recognition (2004)
19.
Zurück zum Zitat Jelinek, F.: Statistical Methods for Speech Recognition. MIT, Cambridge (1998). ISBN 0-262-10066-5 Jelinek, F.: Statistical Methods for Speech Recognition. MIT, Cambridge (1998). ISBN 0-262-10066-5
20.
Zurück zum Zitat Jones, M., Rehg, J.: Statistical color models with application to skin detection. Technical Report CRL 98/11, Compaq Cambridge Research Lab (1998) Jones, M., Rehg, J.: Statistical color models with application to skin detection. Technical Report CRL 98/11, Compaq Cambridge Research Lab (1998)
21.
Zurück zum Zitat Kraiss, K.-F. (ed): Advanced man–machine interaction. Springer, Heidelberg (2006). ISBN 3-540-30618-8 Kraiss, K.-F. (ed): Advanced man–machine interaction. Springer, Heidelberg (2006). ISBN 3-540-30618-8
22.
Zurück zum Zitat Lee, C.-H., Lin, C.-H., Juang, B.-H.: A study on speaker adaptation of the parameters of continuous density hidden Markov models. IEEE Trans. Acoust. Speech Signal Process. 39(4), 806–814 (1991) Lee, C.-H., Lin, C.-H., Juang, B.-H.: A study on speaker adaptation of the parameters of continuous density hidden Markov models. IEEE Trans. Acoust. Speech Signal Process. 39(4), 806–814 (1991)
23.
Zurück zum Zitat Leggetter, C.J.: Improved acoustic modelling for HMMs using linear transformations. Ph.D. Thesis, Cambridge University (1995) Leggetter, C.J.: Improved acoustic modelling for HMMs using linear transformations. Ph.D. Thesis, Cambridge University (1995)
24.
Zurück zum Zitat Liang, R.H., Ouhyoung, M.: A real-time continuous gesture interface for Taiwanese sign language. In: Proceedings of the 10th Annual ACM Symposium on User Interface Software and Technology. Banff, Alberta, Canada, 14–17 October 1997 Liang, R.H., Ouhyoung, M.: A real-time continuous gesture interface for Taiwanese sign language. In: Proceedings of the 10th Annual ACM Symposium on User Interface Software and Technology. Banff, Alberta, Canada, 14–17 October 1997
25.
Zurück zum Zitat Liddell, S.K., Johnson, R.E.: American sign language: the phonological base. Sign Lang. Stud. 18(64), 195–277 (1989) Liddell, S.K., Johnson, R.E.: American sign language: the phonological base. Sign Lang. Stud. 18(64), 195–277 (1989)
26.
Zurück zum Zitat Lievin, M., Luthon, F.: Nonlinear color space and spatiotemporal MRF for hierarchical segmentation of face features in video. IEEE Trans. Image Process. 13, 63–71 (2004)CrossRef Lievin, M., Luthon, F.: Nonlinear color space and spatiotemporal MRF for hierarchical segmentation of face features in video. IEEE Trans. Image Process. 13, 63–71 (2004)CrossRef
27.
Zurück zum Zitat Murakami, K., Taguchi, H.: Gesture recognition using recurrent neural networks. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 237–242. ACM, New York (1991) Murakami, K., Taguchi, H.: Gesture recognition using recurrent neural networks. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 237–242. ACM, New York (1991)
28.
Zurück zum Zitat Ong, S.C.W., Ranganath, S.: Deciphering gestures with layered meanings and signer adaptation. In: Proceedings of the 6th IEEE International Conference on Automatic Face and Gesture Recognition (2004) Ong, S.C.W., Ranganath, S.: Deciphering gestures with layered meanings and signer adaptation. In: Proceedings of the 6th IEEE International Conference on Automatic Face and Gesture Recognition (2004)
29.
Zurück zum Zitat Ong, S.C.W., Ranganath, S.: Automatic sign language analysis: a survey and the future beyond lexical meaning. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 873–891 (2005)CrossRef Ong, S.C.W., Ranganath, S.: Automatic sign language analysis: a survey and the future beyond lexical meaning. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 873–891 (2005)CrossRef
30.
Zurück zum Zitat Parashar, A.S.: Representation and interpretation of manual and non-manual information for automated American sign language recognition. Ph.D. Thesis, Department of Computer Science and Engineering, College of Engineering, University of South Florida (2003) Parashar, A.S.: Representation and interpretation of manual and non-manual information for automated American sign language recognition. Ph.D. Thesis, Department of Computer Science and Engineering, College of Engineering, University of South Florida (2003)
31.
Zurück zum Zitat Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)CrossRef Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)CrossRef
32.
Zurück zum Zitat Rabiner, L.R., Juang, B.-H.: An introduction to hidden Markov models. IEEE Acoust. Speech Signal Process. Soc. Mag. 3(1), 4–16 (1986) Rabiner, L.R., Juang, B.-H.: An introduction to hidden Markov models. IEEE Acoust. Speech Signal Process. Soc. Mag. 3(1), 4–16 (1986)
33.
Zurück zum Zitat Rabiner, L.R., Juang, B.-H.: Fundamentals of Speech Recognition. Prentice-Hall, Upper Saddle River, ISBN 0-13-015157-2 (1993) Rabiner, L.R., Juang, B.-H.: Fundamentals of Speech Recognition. Prentice-Hall, Upper Saddle River, ISBN 0-13-015157-2 (1993)
34.
Zurück zum Zitat Sonka, M., Hlavac, V., Boyle, R.: Image Processing, Analysis and Machine Vision. International Thomson Publishing (1998). ISBN 0-534-95393-X Sonka, M., Hlavac, V., Boyle, R.: Image Processing, Analysis and Machine Vision. International Thomson Publishing (1998). ISBN 0-534-95393-X
35.
Zurück zum Zitat Starner, T., Weaver, J., Pentland, A.: Real-time American sign language recognition using desk and wearable computer based video. IEEE Trans. Pattern Anal. Mach. Intell. 20(12), 1371–1375 (1998)CrossRef Starner, T., Weaver, J., Pentland, A.: Real-time American sign language recognition using desk and wearable computer based video. IEEE Trans. Pattern Anal. Mach. Intell. 20(12), 1371–1375 (1998)CrossRef
36.
Zurück zum Zitat Stokoe, W.: Sign language structure: an outline of the visual communication systems of the american deaf. (Studies in Linguistics. Occasional paper, University of Buffalo (1960) Stokoe, W.: Sign language structure: an outline of the visual communication systems of the american deaf. (Studies in Linguistics. Occasional paper, University of Buffalo (1960)
37.
Zurück zum Zitat Sturman, D.J.: Whole-hand input. Ph.D. Thesis, School of Architecture and Planning, Massachusetts Institute of Technology (1992) Sturman, D.J.: Whole-hand input. Ph.D. Thesis, School of Architecture and Planning, Massachusetts Institute of Technology (1992)
39.
Zurück zum Zitat Tomasi, C., Kanade, T.: Detection and tracking of point features. Technical Report CS-91-132, CMU, 1991 Tomasi, C., Kanade, T.: Detection and tracking of point features. Technical Report CS-91-132, CMU, 1991
40.
Zurück zum Zitat Vamplew, P., Adams, A.: Recognition of Sign Language Gestures Using Neural Networks. In: European Conference on Disabilities, Virtual Reality and Associated Technologies (1996) Vamplew, P., Adams, A.: Recognition of Sign Language Gestures Using Neural Networks. In: European Conference on Disabilities, Virtual Reality and Associated Technologies (1996)
41.
Zurück zum Zitat Vittrup, M., Sørensen, M.K.D, McCane, B.: Pose Estimation by Applied Numerical Techniques. Image and Vision Computing, New Zealand (2002) Vittrup, M., Sørensen, M.K.D, McCane, B.: Pose Estimation by Applied Numerical Techniques. Image and Vision Computing, New Zealand (2002)
42.
Zurück zum Zitat Vogler, C., Metaxas, D.: Parallel hidden Markov models for American sign language recognition. In: Proceedings of the International Conference on Computer Vision (1999) Vogler, C., Metaxas, D.: Parallel hidden Markov models for American sign language recognition. In: Proceedings of the International Conference on Computer Vision (1999)
43.
Zurück zum Zitat Vogler, C., Metaxas, D.: Toward scalability in ASL recognition: breaking down signs into phonemes. In: Gesture-Based Communication in Human–Computer Interaction, International Gesture Workshop, GW’99, Lecture Notes in Computer Science, pp. 211–224. Springer, Berlin (1999) Vogler, C., Metaxas, D.: Toward scalability in ASL recognition: breaking down signs into phonemes. In: Gesture-Based Communication in Human–Computer Interaction, International Gesture Workshop, GW’99, Lecture Notes in Computer Science, pp. 211–224. Springer, Berlin (1999)
44.
Zurück zum Zitat von Agris, U., Schneider, D., Zieren, J., Kraiss, K.-F.: Rapid signer adaptation for isolated sign language recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop. New York, USA (2006) von Agris, U., Schneider, D., Zieren, J., Kraiss, K.-F.: Rapid signer adaptation for isolated sign language recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop. New York, USA (2006)
45.
Zurück zum Zitat Welch, G., Bishop, G.: An introduction to the Kalman Filter. Technical Report TR 95-041, Department of Computer Science, University of North Carolina at Chapel Hill (2004) Welch, G., Bishop, G.: An introduction to the Kalman Filter. Technical Report TR 95-041, Department of Computer Science, University of North Carolina at Chapel Hill (2004)
46.
Zurück zum Zitat Yang, M., Ahuja, N., Tabb, M.: Extraction of 2D motion trajectories and its application to hand gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 24, 1061–1074 (2002)CrossRef Yang, M., Ahuja, N., Tabb, M.: Extraction of 2D motion trajectories and its application to hand gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 24, 1061–1074 (2002)CrossRef
47.
Zurück zum Zitat Zieren, J., Kraiss, K.-F.: Robust person-independent visual sign language recognition. In: Proceedings of the 2nd Iberian Conference on Pattern Recognition and Image Analysis, Lecture Notes in Computer Science (2005) Zieren, J., Kraiss, K.-F.: Robust person-independent visual sign language recognition. In: Proceedings of the 2nd Iberian Conference on Pattern Recognition and Image Analysis, Lecture Notes in Computer Science (2005)
48.
Zurück zum Zitat Zieren, J.: Visuelle Erkennung von Handposituren für einen interaktiven Gebärdensprachtutor. Dissertation, Chair of Technical Computer Science, RWTH Aachen (2007) Zieren, J.: Visuelle Erkennung von Handposituren für einen interaktiven Gebärdensprachtutor. Dissertation, Chair of Technical Computer Science, RWTH Aachen (2007)
Metadaten
Titel
Recent developments in visual sign language recognition
verfasst von
Ulrich von Agris
Jörg Zieren
Ulrich Canzler
Britta Bauer
Karl-Friedrich Kraiss
Publikationsdatum
01.02.2008
Verlag
Springer-Verlag
Erschienen in
Universal Access in the Information Society / Ausgabe 4/2008
Print ISSN: 1615-5289
Elektronische ISSN: 1615-5297
DOI
https://doi.org/10.1007/s10209-007-0104-x

Weitere Artikel der Ausgabe 4/2008

Universal Access in the Information Society 4/2008 Zur Ausgabe