Skip to main content
Top

2015 | OriginalPaper | Chapter

Audio-Visual User Identification in HCI Scenarios

Authors : Markus Kächele, Sascha Meudt, Andrej Schwarz, Friedhelm Schwenker

Published in: Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Modern computing systems are usually equipped with various input devices such as microphones or cameras, and hence the user of such a system can easily be identified. User identification is important in many human computer interaction (HCI) scenarios, such as speech recognition, activity recognition, transcription of meeting room data or affective computing. Here personalized models may significantly improve the performance of the overall recognition system. This paper deals with audio-visual user identification. The main processing steps are segmentation of the relevant parts from video and audio streams, extraction of meaningful features and construction of the overall classifier and fusion architectures. The proposed system has been evaluated on the MOBIO dataset, a benchmark database consisting of real-world recordings collected from mobile devices, e.g. cell-phones. Recognition rates of up to 92 % could be achieved for the proposed audio-visual classifier system.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Chibelushi, C., Deravi, F., Mason, J.: A review of speech-based bimodal recognition. IEEE Trans. Multimed. 4(1), 23–37 (2002)CrossRef Chibelushi, C., Deravi, F., Mason, J.: A review of speech-based bimodal recognition. IEEE Trans. Multimed. 4(1), 23–37 (2002)CrossRef
2.
go back to reference Duc, B., Fischer, S., Bigun, J.: Face authentication with Gabor information on deformable graphs. IEEE Trans. Image Process. 8(4), 504–516 (1999)CrossRef Duc, B., Fischer, S., Bigun, J.: Face authentication with Gabor information on deformable graphs. IEEE Trans. Image Process. 8(4), 504–516 (1999)CrossRef
3.
go back to reference Faraj, M.I., Bigun, J.: Audio-visual person authentication using lip-motion from orientation maps. Pattern Recogn. Lett. 28(11), 1368–1382 (2007)CrossRef Faraj, M.I., Bigun, J.: Audio-visual person authentication using lip-motion from orientation maps. Pattern Recogn. Lett. 28(11), 1368–1382 (2007)CrossRef
4.
go back to reference Freund, Y., Schapire, R.E.: A short introduction to boosting. J. Jpn. Soc. Artif. Intell. 14(5), 771–780 (1999) Freund, Y., Schapire, R.E.: A short introduction to boosting. J. Jpn. Soc. Artif. Intell. 14(5), 771–780 (1999)
5.
go back to reference Fröba, B., Ernst, A.: Face detection with the modified census transform. In: Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition, FGR 2004, pp. 91–96. IEEE Computer Society, Washington, DC (2004) Fröba, B., Ernst, A.: Face detection with the modified census transform. In: Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition, FGR 2004, pp. 91–96. IEEE Computer Society, Washington, DC (2004)
6.
go back to reference Glodek, M., et al.: Multiple classifier systems for the classification of audio-visual emotional states. In: D’Mello, S., Graesser, A., Schuller, B., Martin, J.-C. (eds.) ACII 2011, Part II. LNCS, vol. 6975, pp. 359–368. Springer, Heidelberg (2011)CrossRef Glodek, M., et al.: Multiple classifier systems for the classification of audio-visual emotional states. In: D’Mello, S., Graesser, A., Schuller, B., Martin, J.-C. (eds.) ACII 2011, Part II. LNCS, vol. 6975, pp. 359–368. Springer, Heidelberg (2011)CrossRef
7.
go back to reference Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical report 07-49, University of Massachusetts, Amherst, October 2007 Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical report 07-49, University of Massachusetts, Amherst, October 2007
8.
go back to reference Jain, A., Hong, L., Pankanti, S., Bolle, R.: An identity-authentication system using fingerprints. Proc. IEEE 85(9), 1365–1388 (1997)CrossRef Jain, A., Hong, L., Pankanti, S., Bolle, R.: An identity-authentication system using fingerprints. Proc. IEEE 85(9), 1365–1388 (1997)CrossRef
9.
go back to reference Jain, A., Ross, A.: Learning user-specific parameters in a multibiometric system. In: Proceedings of the International Conference on Image Processing, pp. 57–60 (2002) Jain, A., Ross, A.: Learning user-specific parameters in a multibiometric system. In: Proceedings of the International Conference on Image Processing, pp. 57–60 (2002)
10.
go back to reference Kächele, M., Glodek, M., Zharkov, D., Meudt, S., Schwenker, F.: Fusion of audio-visual features using hierarchical classifier systems for the recognition of affective states and the state of depression. In: De Marsico, M., Tabbone, A., Fred, A. (eds.) Proceedings of the International Conference on Pattern Recognition Applications and Methods (ICPRAM), pp. 671–678. SciTePress, Vienna (2014) Kächele, M., Glodek, M., Zharkov, D., Meudt, S., Schwenker, F.: Fusion of audio-visual features using hierarchical classifier systems for the recognition of affective states and the state of depression. In: De Marsico, M., Tabbone, A., Fred, A. (eds.) Proceedings of the International Conference on Pattern Recognition Applications and Methods (ICPRAM), pp. 671–678. SciTePress, Vienna (2014)
11.
go back to reference Kächele, M., Schwenker, F.: Cascaded fusion of dynamic, spatial, and textural feature sets for person-independent facial emotion recognition. In: Proceedings of the International Conference on Pattern Recognition (ICPR), pp. 4660–4665 (2014) Kächele, M., Schwenker, F.: Cascaded fusion of dynamic, spatial, and textural feature sets for person-independent facial emotion recognition. In: Proceedings of the International Conference on Pattern Recognition (ICPR), pp. 4660–4665 (2014)
12.
go back to reference Küblbeck, B.F.C.: Robust face detection at video frame rate based on edge orientation features. In: Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition (2002) Küblbeck, B.F.C.: Robust face detection at video frame rate based on edge orientation features. In: Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition (2002)
13.
go back to reference Lee, K.F., Hon, H.W.: Speaker-independent phone recognition using hidden Markov models. IEEE Trans. Acoust. Speech Signal Process. 37, 1641–1648 (1989)CrossRef Lee, K.F., Hon, H.W.: Speaker-independent phone recognition using hidden Markov models. IEEE Trans. Acoust. Speech Signal Process. 37, 1641–1648 (1989)CrossRef
15.
go back to reference MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967) MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967)
16.
go back to reference Matějka, P., Schwarz, P., Hermanský, H., Černocký, J.H.: Phoneme recognition using temporal patterns. In: Matoušek, V., Mautner, P. (eds.) TSD 2003. LNCS (LNAI), vol. 2807, pp. 198–205. Springer, Heidelberg (2003)CrossRef Matějka, P., Schwarz, P., Hermanský, H., Černocký, J.H.: Phoneme recognition using temporal patterns. In: Matoušek, V., Mautner, P. (eds.) TSD 2003. LNCS (LNAI), vol. 2807, pp. 198–205. Springer, Heidelberg (2003)CrossRef
17.
go back to reference McCool, C., Marcel, S., Hadid, A., Pietikainen, M., Matejka, P., Cernocky, J., Poh, N., Kittler, J., Larcher, A., Levy, C., Matrouf, D., Bonastre, J.F., Tresadern, P., Cootes, T.: Bi-modal person recognition on a mobile phone: using mobile phone data. In: IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 635–640, (July 2012) McCool, C., Marcel, S., Hadid, A., Pietikainen, M., Matejka, P., Cernocky, J., Poh, N., Kittler, J., Larcher, A., Levy, C., Matrouf, D., Bonastre, J.F., Tresadern, P., Cootes, T.: Bi-modal person recognition on a mobile phone: using mobile phone data. In: IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 635–640, (July 2012)
18.
go back to reference Scherer, S., Glodek, M., Layher, G., Schels, M., Schmidt, M., Brosch, T., Tschechne, S., Schwenker, F., Neumann, H., Palm, G.: A generic framework for the inference of user states in human computer interaction. J. Multimodal User Interfaces 6(3–4), 117–141 (2012)CrossRef Scherer, S., Glodek, M., Layher, G., Schels, M., Schmidt, M., Brosch, T., Tschechne, S., Schwenker, F., Neumann, H., Palm, G.: A generic framework for the inference of user states in human computer interaction. J. Multimodal User Interfaces 6(3–4), 117–141 (2012)CrossRef
19.
go back to reference Schwarz, P.: Phoneme recognition based on long temporal context. Technical report, University of Brno, Faculty of Information Technology BUT (2009) Schwarz, P.: Phoneme recognition based on long temporal context. Technical report, University of Brno, Faculty of Information Technology BUT (2009)
20.
go back to reference Schwenker, F., Sachs, A., Palm, G., Kestler, H.A.: Orientation histograms for face recognition. In: Schwenker, F., Marinai, S. (eds.) ANNPR 2006. LNCS (LNAI), vol. 4087, pp. 253–259. Springer, Heidelberg (2006)CrossRef Schwenker, F., Sachs, A., Palm, G., Kestler, H.A.: Orientation histograms for face recognition. In: Schwenker, F., Marinai, S. (eds.) ANNPR 2006. LNCS (LNAI), vol. 4087, pp. 253–259. Springer, Heidelberg (2006)CrossRef
21.
go back to reference Strauß, P.M., Hoffmann, H., Minker, W., Neumann, H., Palm, G., Scherer, S., Schwenker, F., Traue, H., Walter, W., Weidenbacher, U.: Wizard-of-Oz data collection for perception and interaction in multi-user environments. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC), pp. 2014–2017 (2006) Strauß, P.M., Hoffmann, H., Minker, W., Neumann, H., Palm, G., Scherer, S., Schwenker, F., Traue, H., Walter, W., Weidenbacher, U.: Wizard-of-Oz data collection for perception and interaction in multi-user environments. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC), pp. 2014–2017 (2006)
22.
go back to reference Wöllmer, M., Kaiser, M., Eyben, F., Schuller, B., Rigoll, G.: LSTM-modeling of continuous emotions in an audiovisual affect recognition framework. Image Vis. Comput. 31(2), 153–163 (2013)CrossRef Wöllmer, M., Kaiser, M., Eyben, F., Schuller, B., Rigoll, G.: LSTM-modeling of continuous emotions in an audiovisual affect recognition framework. Image Vis. Comput. 31(2), 153–163 (2013)CrossRef
23.
go back to reference Zabih, R., Woodfill, J.: Non-parametric local transforms for computing visual correspondence. In: Eklundh, J.O. (ed.) ECCV 1994. Lecture Notes in Computer Science, vol. 801, pp. 151–158. Springer, Heidelberg (1994) Zabih, R., Woodfill, J.: Non-parametric local transforms for computing visual correspondence. In: Eklundh, J.O. (ed.) ECCV 1994. Lecture Notes in Computer Science, vol. 801, pp. 151–158. Springer, Heidelberg (1994)
Metadata
Title
Audio-Visual User Identification in HCI Scenarios
Authors
Markus Kächele
Sascha Meudt
Andrej Schwarz
Friedhelm Schwenker
Copyright Year
2015
DOI
https://doi.org/10.1007/978-3-319-14899-1_11

Premium Partner