nach oben

Erschienen in:

2017 | OriginalPaper | Buchkapitel

7. Two-Stream CNNs for Gesture-Based Verification and Identification: Learning User Style

verfasst von : Jonathan Wu, Jiawei Chen, Prakash Ishwar, Janusz Konrad

Erschienen in: Deep Learning for Biometrics

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

A gesture is a short body motion that contains both static (nonrenewable) anatomical information and dynamic (renewable) behavioral information. Unlike traditional biometrics such as face, fingerprint, and iris, which cannot be easily changed, gestures can be modified if compromised. We consider two types of gestures: full-body gestures, such as a wave of the arms, and hand gestures, such as a subtle curl of the fingers and palm, as captured by a depth sensor (Kinect v1 and v2 in our case). Most prior work in this area evaluates gestures in the context of a “password,” where each user has a single, chosen gesture motion. Contrary to this, we aim to learn a user’s gesture “style” from a set of training gestures. This allows for user convenience since an exact user motion is not required for user recognition. To achieve the goal of learning gesture style, we use two-stream convolutional neural networks, a deep learning framework that leverages both the spatial (depth) and temporal (optical flow) information of a video sequence. First, we evaluate the generalization performance during testing of our approach against gestures of users that have not been seen during training. Then, we study the importance of dynamics by suppressing the use of dynamic information in training and testing. Finally, we assess the capacity of the aforementioned techniques to learn representations of gestures that are invariant across users (gesture recognition) or to learn representations of users that are invariant across gestures (user style in verification and identification) by visualizing the two-dimensional t-Distributed Stochastic Neighbor Embedding (t-SNE) of neural network features. We find that our approach outperforms state-of-the-art methods in identification and verification on two biometrics-oriented gesture datasets for full-body and in-air hand gestures.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Iris Segmentation Using Fully Convolutional Encoder–Decoder Networks

Nächstes Kapitel DeepGender2: A Generative Approach Toward Occlusion and Low-Resolution Robust Facial Gender Classification via Progressively Trained Attention Shift Convolutional Neural Networks (PTAS-CNN) and Deep Convolutional Generative Adversarial Networks (DCGAN)

Verification is also called authentication.

Of the 5 gesture classes in BodyLogin, 4 gesture classes are shared across users, and 1 is not, being user defined. This means that in leave-persons-out gesture recognition, the fifth gesture class will not have samples of its gesture type in training. As a result, the fifth gesture class is expected to act as a “reject”/“not gestures 1-4” category for gesture recognition.

Due to the general lack of per-user samples in MSRAction3D (as it is a gesture-centric dataset), we do not report results for verification, and leave-gesture-out experiments for identification.

M. Aumi, S. Kratz, Airauth: evaluating in-air hand gestures for authentication, in Proceedings of the 16th International Conference on Human-Computer Interaction with Mobile Devices & Services (ACM, 2014), pp. 309–318

BodyLogin, http://vip.bu.edu/projects/hcis/body-login

CMU Motion Capture Database, http://mocap.cs.cmu.edu

DeepLogin, http://vip.bu.edu/projects/hcis/deep-login

J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, T. Darrell, Decaf: a deep convolutional activation feature for generic visual recognition, in Proceedings of The 31st International Conference on Machine Learning (2014), pp. 647–655

C. Feichtenhofer, A. Pinz, A. Zisserman, Convolutional two-stream network fusion for video action recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 1933–1941

S. Fothergill, H. Mentis, P. Kohli, S. Nowozin, Instructing people for training gestural interactive systems, in CHI (ACM, 2012), pp. 1737–1746

HandLogin, http://vip.bu.edu/projects/hcis/hand-login

M. Hussein, M. Torki, M.,Gowayyed, M. El-Saban, Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations, in Proceedings of the 23rd International Joint Conference on Artificial Intelligence (AAAI Press, 2013), pp. 2466–2472

10.

A. Jain, A. Ross, K. Nandakumar, Introduction to Biometrics (Springer, Berlin, 2011)CrossRef

11.

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, T. Darrell, Caffe: convolutional architecture for fast feature embedding, in Proceedings of the ACM International Conference on Multimedia (ACM, 2014), pp. 675–678

12.

S. Karayev, M. Trentacoste, H. Han, A. Agarwala, T. Darrell, A. Hertzmann, H. Winnemoeller, Recognizing image style, in Proceedings of the British Machine Vision Conference (BMVA Press, 2014)

13.

Kinect for Windows, http://www.microsoft.com/en-us/kinectforwindows

14.

A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems (2012), pp. 1097–1105

15.

A. Kurakin, Z. Zhang, Z. Liu, A real time system for dynamic hand gesture recognition with a depth sensor, in Signal Processing Conference (EUSIPCO), 2012 Proceedings of the 20th European (IEEE, 2012), pp. 1975–1979

16.

I. Kviatkovsky, I. Shimshoni, E. Rivlin, Person identification from action styles, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2015), pp. 84–92

17.

K. Lai, J. Konrad, P. Ishwar, Towards gesture-based user authentication, in 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance (AVSS, 2012), pp. 282–287. doi:10.1109/AVSS.2012.77

18.

W. Li, Z. Zhang, Z. Liu, Action recognition based on a bag of 3D points, in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW, 2010), pp. 9–14. doi:10.1109/CVPRW.2010.5543273

19.

T.Y. Lin, A. RoyChowdhury, S. Maji, Bilinear CNN models for fine-grained visual recognition, in Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 1449–1457

20.

C. Liu, Beyond pixels: exploring new representations and applications for motion analysis. Ph.D. thesis, Citeseer (2009)

21.

L. Miranda, T. Vieira, D. Martinez, T. Lewiner, A. Vieira, M. Campos, Real-time gesture recognition from depth data through key poses learning and decision forests, in 25th SIBGRAPI Conference on Graphics, Patterns and Images (IEEE, 2012), pp. 268–275

22.

M. Müller, T. Röder, M. Clausen, B. Eberhardt, B. Krüger, A. Weber, Documentation mocap database hdm05. Technical Report CG-2007-2, Universität Bonn (2007)

23.

F. Ofli, R. Chaudhry, G. Kurillo, R. Vidal, R. Bajcsy, Berkeley mhad: a comprehensive multimodal human action database. IEEE Workshop Appl. Comput. Vis. 0, 53–60 (2013). doi:10.1109/WACV.2013.6474999

24.

O. Oreifej, Z. Liu, Hon4D: histogram of oriented 4D normals for activity recognition from depth sequences, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2013), pp. 716–723

25.

E. Park, X. Han, T.L. Berg, A.C. Berg, Combining multiple sources of knowledge in deep CNNs for action recognition, in 2016 IEEE Winter Conference on Applications of Computer Vision (WACV) (IEEE, 2016), pp. 1–8

26.

Z. Ren, J. Yuan, Z. Zhang, Robust hand gesture recognition based on finger-earth mover’s distance with a commodity depth camera, in Proceedings of the 19th ACM International Conference on Multimedia (ACM, 2011), pp. 1093–1096

27.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRef

28.

M.J. Scott, M. Niranjan, R.W. Prager, Realisable classifiers: improving operating performance on variable cost problems, in BMVC (Citeseer, 1998), pp. 1–10

29.

L. Sigal, A.O. Balan, M.J. Black, Humaneva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comput. Vision 87(1–2), 4–27 (2010). doi:10.1007/s11263-009-0273-6CrossRef

30.

K. Simonyan, A. Zisserman, Two-stream convolutional networks for action recognition in videos, in Advances in Neural Information Processing Systems (2014), pp. 568–576

31.

K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition (2014), arXiv:1409.1556

32.

J. Suarez, R. Murphy, Hand gesture recognition with depth images: a review, in RO-MAN, 2012 (IEEE, 2012), pp. 411–417

33.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 1–9

34.

L. Van der Maaten, G. Hinton, Visualizing data using t-SNE. J. Mach. Learn. Res. 9(2579–2605), 85 (2008)MATH

35.

J. Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2012), pp. 1290–1297. doi:10.1109/CVPR.2012.6247813

36.

J. Wang, Z. Liu, J. Chorowski, Z. Chen, Y. Wu, Robust 3D action recognition with random occupancy patterns, in Computer Vision–ECCV 2012 (Springer, 2012), pp. 872–885CrossRef

37.

L. Wang, Y. Xiong, Z. Wang, Y. Qiao, Towards good practices for very deep two-stream convnets (2015), arXiv:1507.02159

38.

J. Wu, J. Konrad, P. Ishwar, Dynamic time warping for gesture-based user identification and authentication with kinect, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP, 2013), pp. 2371–2375. doi:10.1109/ICASSP.2013.6638079

39.

J. Wu, P. Ishwar, J. Konrad, Silhouettes versus skeletons in gesture-based authentication with kinect, in Proceedings of the IEEE Conference on Advanced Video and Signal-Based Surveillance (AVSS) (2014)

40.

J. Wu, P. Ishwar, J. Konrad, The value of posture, build and dynamics in gesture-based user authentication, in 2014 IEEE International Joint Conference on Biometrics (IJCB) (IEEE, 2014), pp. 1–8

41.

J. Wu, J. Konrad, P. Ishwar, The value of multiple viewpoints in gesture-based user authentication, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW, 2014), pp. 90–97

42.

J. Wu, J. Christianson, J. Konrad, P. Ishwar, Leveraging shape and depth in user authentication from in-air hand gestures, in 2015 IEEE International Conference on Image Processing (ICIP) (IEEE, 2015), pp. 3195–3199

43.

L. Xia, C. Chen, J. Aggarwal, View invariant human action recognition using histograms of 3D joints, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW, 2012), pp. 20–27

Titel: Two-Stream CNNs for Gesture-Based Verification and Identification: Learning User Style
verfasst von: Jonathan Wu
Jiawei Chen
Prakash Ishwar
Janusz Konrad
Verlag: Springer International Publishing
Buch: Deep Learning for Biometrics
Print ISBN: 978-3-319-61656-8

Electronic ISBN: 978-3-319-61657-5

Copyright-Jahr: 2017
DOI: https://doi.org/10.1007/978-3-319-61657-5_7

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"