nach oben

Universal Access in the Information Society

Erschienen in:

01.02.2008 | Long Paper

Recent developments in visual sign language recognition

verfasst von: Ulrich von Agris, Jörg Zieren, Ulrich Canzler, Britta Bauer, Karl-Friedrich Kraiss

Erschienen in: Universal Access in the Information Society | Ausgabe 4/2008

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Research in the field of sign language recognition has made significant advances in recent years. The present achievements provide the basis for future applications with the objective of supporting the integration of deaf people into the hearing society. Translation systems, for example, could facilitate communication between deaf and hearing people in public situations. Further applications, such as user interfaces and automatic indexing of signed videos, become feasible. The current state in sign language recognition is roughly 30 years behind speech recognition, which corresponds to the gradual transition from isolated to continuous recognition for small vocabulary tasks. Research efforts were mainly focused on robust feature extraction or statistical modeling of signs. However, current recognition systems are still designed for signer-dependent operation under laboratory conditions. This paper describes a comprehensive concept for robust visual sign language recognition, which represents the recent developments in this field. The proposed recognition system aims for signer-independent operation and utilizes a single video camera for data acquisition to ensure user-friendliness. Since sign languages make use of manual and facial means of expression, both channels are employed for recognition. For mobile operation in uncontrolled environments, sophisticated algorithms were developed that robustly extract manual and facial features. The extraction of manual features relies on a multiple hypotheses tracking approach to resolve ambiguities of hand positions. For facial feature extraction, an active appearance model is applied which allows identification of areas of interest such as the eyes and mouth region. In the next processing step, a numerical description of the facial expression, head pose, line of sight, and lip outline is computed. The system employs a resolution strategy for dealing with mutual overlapping of the signer’s hands and face. Classification is based on hidden Markov models which are able to compensate time and amplitude variances in the articulation of a sign. The classification stage is designed for recognition of isolated signs, as well as of continuous sign language. In the latter case, a stochastic language model can be utilized, which considers uni- and bigram probabilities of single and successive signs. For statistical modeling of reference models each sign is represented either as a whole or as a composition of smaller subunits—similar to phonemes in spoken languages. While recognition based on word models is limited to rather small vocabularies, subunit models open the door to large vocabularies. Achieving signer-independence constitutes a challenging problem, as the articulation of a sign is subject to high interpersonal variance. This problem cannot be solved by simple feature normalization and must be addressed at the classification level. Therefore, dedicated adaptation methods known from speech recognition were implemented and modified to consider the specifics of sign languages. For rapid adaptation to unknown signers the proposed recognition system employs a combined approach of maximum likelihood linear regression and maximum a posteriori estimation.

Vorheriger Artikel Special issue: “Emerging Technologies for Deaf Accessibility in the Information Society”

Nächster Artikel Facial movement analysis in ASL

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

For speech-recognition the accordant name is acoustic subunits. For sign language recognitions the name is adapted.

Bahl, L., Jelinek, F., Mercer, R.: A maximum likelihood approach to continuous speech recognition. IEEE Trans. Pattern Anal. Mach. Intell. 5(2), 179–190 (1983)CrossRef

Bauer, B.: Erkennung kontinuierlicher Gebärdensprache mit Untereinheiten-Modellen. Shaker Verlag, Aachen (2003)

Becker, C.: Zur Struktur der deutschen Gebärdensprache. WVT Wissenschaftlicher Verlag, Trier (Germany) (1997)

Canzler, U.: Nicht-intrusive Mimikanalyse. Dissertation, Chair of Technical Computer Science, RWTH, Aachen (2005)

Canzler, U., Dziurzyk, T.: Extraction of non manual features for videobased sign language recognition. In: Proceedings of the IAPR Workshop on Machine Vision Applications, pp. 318–321. Nara, Japan (2002)

Canzler, U., Ersayar, T.: Manual and facial features combination for videobased sign language recognition. In: Proceedings of the 7th International Student Conference on Electrical Engineering. Prague (2003)

Canzler, U., Kraiss, K.-F.: Person-adaptive facial feature analysis for an advanced wheelchair user-interface. In: Conference on Mechatronics and Robotics, vol. Part III, pp. 871–876. Sascha Eysoldt Verlag (2004)

Canzler, U., Wegener, B.: Person-adaptive facial feature analysis. In: Proceedings of the 8th International Student Conference on Electrical Engineering. Prague (2004)

Cootes, T., Edwards, G., Taylor, C.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001)CrossRef

10.

Derpanis, K.G.: A review of vision-based hand gestures. Technical Report, Department of Computer Science, York University (2004)

11.

Dick, T., Zieren, J., Kraiss, K.-F.: Visual hand posture recognition in monocular image sequences. In: Pattern Recognition, 28th DAGM Symposium Berlin, Lecture Notes in Computer Science. Springer, Berlin (2006)

12.

Fang, G., Gao, W., Chen, X., Wang, C., Ma, J. Signer-independent continuous sign language recognition based on SRN/HMM. In: Revised Papers from the International Gesture Workshop on Gestures and Sign Languages in Human–Computer Interaction, pp. 76–85. Springer, Heidelberg (2002)

13.

Gales, M., Woodland, P.: Mean and variance adaptation within the MLLR framework. Comput. Speech Lang. 10, 249–264 (1996)CrossRef

14.

Hermansky, H., Timberwala, S., Pavel, M.: Towards ASR on partially corrupted speech. In: Proceedings of the 4th International Conference on Spoken Language Processing, vol. 1, pp. 462–465. Philadelphia, PA (1996)

15.

Holden, E.J., Owens, R.A.: Visual sign language recognition. In: Proceedings of the 10th International Workshop on Theoretical Foundations of Computer Vision, pp. 270–288. Springer, Heidelberg (2001)

16.

Huang, X., Ariki, Y., Jack, M.: Hidden Markov Models for Speech Recognition. Edinburgh University Press, Edinburgh (1990)

17.

Illingworth, J., Kittler, J.: A survey of the Hough transform. Computer Vision, Graphics, and Image Processing 44(1), 87–116 (1988)CrossRef

18.

Imai, A., Shimada, N., Shirai, Y.: 3-D hand posture recognition by training contour variation. In: Proceedings of the International Conference on Automatic Face and Gesture Recognition (2004)

19.

Jelinek, F.: Statistical Methods for Speech Recognition. MIT, Cambridge (1998). ISBN 0-262-10066-5

20.

Jones, M., Rehg, J.: Statistical color models with application to skin detection. Technical Report CRL 98/11, Compaq Cambridge Research Lab (1998)

21.

Kraiss, K.-F. (ed): Advanced man–machine interaction. Springer, Heidelberg (2006). ISBN 3-540-30618-8

22.

Lee, C.-H., Lin, C.-H., Juang, B.-H.: A study on speaker adaptation of the parameters of continuous density hidden Markov models. IEEE Trans. Acoust. Speech Signal Process. 39(4), 806–814 (1991)

23.

Leggetter, C.J.: Improved acoustic modelling for HMMs using linear transformations. Ph.D. Thesis, Cambridge University (1995)

24.

Liang, R.H., Ouhyoung, M.: A real-time continuous gesture interface for Taiwanese sign language. In: Proceedings of the 10th Annual ACM Symposium on User Interface Software and Technology. Banff, Alberta, Canada, 14–17 October 1997

25.

Liddell, S.K., Johnson, R.E.: American sign language: the phonological base. Sign Lang. Stud. 18(64), 195–277 (1989)

26.

Lievin, M., Luthon, F.: Nonlinear color space and spatiotemporal MRF for hierarchical segmentation of face features in video. IEEE Trans. Image Process. 13, 63–71 (2004)CrossRef

27.

Murakami, K., Taguchi, H.: Gesture recognition using recurrent neural networks. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 237–242. ACM, New York (1991)

28.

Ong, S.C.W., Ranganath, S.: Deciphering gestures with layered meanings and signer adaptation. In: Proceedings of the 6th IEEE International Conference on Automatic Face and Gesture Recognition (2004)

29.

Ong, S.C.W., Ranganath, S.: Automatic sign language analysis: a survey and the future beyond lexical meaning. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 873–891 (2005)CrossRef

30.

Parashar, A.S.: Representation and interpretation of manual and non-manual information for automated American sign language recognition. Ph.D. Thesis, Department of Computer Science and Engineering, College of Engineering, University of South Florida (2003)

31.

Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)CrossRef

32.

Rabiner, L.R., Juang, B.-H.: An introduction to hidden Markov models. IEEE Acoust. Speech Signal Process. Soc. Mag. 3(1), 4–16 (1986)

33.

Rabiner, L.R., Juang, B.-H.: Fundamentals of Speech Recognition. Prentice-Hall, Upper Saddle River, ISBN 0-13-015157-2 (1993)

34.

Sonka, M., Hlavac, V., Boyle, R.: Image Processing, Analysis and Machine Vision. International Thomson Publishing (1998). ISBN 0-534-95393-X

35.

Starner, T., Weaver, J., Pentland, A.: Real-time American sign language recognition using desk and wearable computer based video. IEEE Trans. Pattern Anal. Mach. Intell. 20(12), 1371–1375 (1998)CrossRef

36.

Stokoe, W.: Sign language structure: an outline of the visual communication systems of the american deaf. (Studies in Linguistics. Occasional paper, University of Buffalo (1960)

37.

Sturman, D.J.: Whole-hand input. Ph.D. Thesis, School of Architecture and Planning, Massachusetts Institute of Technology (1992)

38.

Sutton, V.: http://www.signwriting.org/ (2003)

39.

Tomasi, C., Kanade, T.: Detection and tracking of point features. Technical Report CS-91-132, CMU, 1991

40.

Vamplew, P., Adams, A.: Recognition of Sign Language Gestures Using Neural Networks. In: European Conference on Disabilities, Virtual Reality and Associated Technologies (1996)

41.

Vittrup, M., Sørensen, M.K.D, McCane, B.: Pose Estimation by Applied Numerical Techniques. Image and Vision Computing, New Zealand (2002)

42.

Vogler, C., Metaxas, D.: Parallel hidden Markov models for American sign language recognition. In: Proceedings of the International Conference on Computer Vision (1999)

43.

Vogler, C., Metaxas, D.: Toward scalability in ASL recognition: breaking down signs into phonemes. In: Gesture-Based Communication in Human–Computer Interaction, International Gesture Workshop, GW’99, Lecture Notes in Computer Science, pp. 211–224. Springer, Berlin (1999)

44.

von Agris, U., Schneider, D., Zieren, J., Kraiss, K.-F.: Rapid signer adaptation for isolated sign language recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop. New York, USA (2006)

45.

Welch, G., Bishop, G.: An introduction to the Kalman Filter. Technical Report TR 95-041, Department of Computer Science, University of North Carolina at Chapel Hill (2004)

46.

Yang, M., Ahuja, N., Tabb, M.: Extraction of 2D motion trajectories and its application to hand gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 24, 1061–1074 (2002)CrossRef

47.

Zieren, J., Kraiss, K.-F.: Robust person-independent visual sign language recognition. In: Proceedings of the 2nd Iberian Conference on Pattern Recognition and Image Analysis, Lecture Notes in Computer Science (2005)

48.

Zieren, J.: Visuelle Erkennung von Handposituren für einen interaktiven Gebärdensprachtutor. Dissertation, Chair of Technical Computer Science, RWTH Aachen (2007)

Titel: Recent developments in visual sign language recognition
verfasst von: Ulrich von Agris
Jörg Zieren
Ulrich Canzler
Britta Bauer
Karl-Friedrich Kraiss
Publikationsdatum: 01.02.2008
Verlag: Springer-Verlag
Erschienen in: Universal Access in the Information Society / Ausgabe 4/2008
Print ISSN: 1615-5289
Elektronische ISSN: 1615-5297
DOI: https://doi.org/10.1007/s10209-007-0104-x

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 4/2008

Sign language applications: preliminary modeling

A knowledge-based sign synthesis architecture

Universal access to communication and learning: the role of automatic speech recognition

Acknowledgement to reviewers for 2006

Linguistic modelling and language-processing technologies for Avatar-based sign language presentation

Generating American Sign Language animation: overcoming misconceptions and technical challenges