Skip to main content

2016 | OriginalPaper | Buchkapitel

Fingertip in the Eye: An Attention-Based Method for Real-Time Hand Tracking and Fingertip Detection in Egocentric Videos

verfasst von : Xiaorui Liu, Yichao Huang, Xin Zhang, Lianwen Jin

Erschienen in: Pattern Recognition

Verlag: Springer Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The hand and fingertip tracking is the crucial part in the egocentric vision interaction, and it remains a challenging problem due to various factors like dynamic environment and hand deformation. We propose a convolutional neural network (CNN) based method for the real-time and accurate hand tracking and fingertip detection in RGB sequences captured by an egocentric mobile camera. Firstly, we build a large scale dataset, Ego-Finger, containing plenty of scenarios and human labeled ground truth. Secondly, we propose a two stage CNN pipeline, i.e., the human vision inspired Attention-based Hand Tracker (AHT) and the hand physical constrained Multi-Points Fingertip Detector (MFD). Comparing with state-of-the-art methods, the proposed method achieves very promising results in the real-time fashion.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
Literatur
1.
Zurück zum Zitat Bambach, S., Lee, S., Crandall, D.J., Yu, C.: Lending a hand: detecting hands and recognizing activities in complex egocentric interactions. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1949–1957 (2015) Bambach, S., Lee, S., Crandall, D.J., Yu, C.: Lending a hand: detecting hands and recognizing activities in complex egocentric interactions. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1949–1957 (2015)
2.
Zurück zum Zitat Baraldi, L., Paci, F., Serra, G., Benini, L., Cucchiara, R.: Gesture recognition in ego-centric videos using dense trajectories and hand segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 702–707 (2014) Baraldi, L., Paci, F., Serra, G., Benini, L., Cucchiara, R.: Gesture recognition in ego-centric videos using dense trajectories and hand segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 702–707 (2014)
3.
Zurück zum Zitat Betancourt, A., Morerio, P., Marcenaro, L., Rauterberg, M., Regazzoni, C.: Filtering SVM frame-by-frame binary classification in a detection framework. In: IEEE International Conference on Image Processing (ICIP), pp. 2552–2556 (2015) Betancourt, A., Morerio, P., Marcenaro, L., Rauterberg, M., Regazzoni, C.: Filtering SVM frame-by-frame binary classification in a detection framework. In: IEEE International Conference on Image Processing (ICIP), pp. 2552–2556 (2015)
4.
Zurück zum Zitat Betancourt, A., Morerio, P., Regazzoni, C.S., Rauterberg, M.: The evolution of first person vision methods: a survey. IEEE Trans. Circ. Syst. Video Technol. 25(5), 744–760 (2015)CrossRef Betancourt, A., Morerio, P., Regazzoni, C.S., Rauterberg, M.: The evolution of first person vision methods: a survey. IEEE Trans. Circ. Syst. Video Technol. 25(5), 744–760 (2015)CrossRef
5.
Zurück zum Zitat Bindemann, M.: Scene and screen center bias early eye movements in scene viewing. Vis. Res. 50(23), 2577–2587 (2010)CrossRef Bindemann, M.: Scene and screen center bias early eye movements in scene viewing. Vis. Res. 50(23), 2577–2587 (2010)CrossRef
6.
Zurück zum Zitat Cheng, M., Mitra, N.J., Huang, X., Torr, P.H., Hu, S.: Global contrast based salient region detection. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 569–582 (2015)CrossRef Cheng, M., Mitra, N.J., Huang, X., Torr, P.H., Hu, S.: Global contrast based salient region detection. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 569–582 (2015)CrossRef
7.
Zurück zum Zitat Goferman, S., Zelnik-Manor, L., Tal, A.: Context-aware saliency detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(10), 1915–1926 (2012)CrossRef Goferman, S., Zelnik-Manor, L., Tal, A.: Context-aware saliency detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(10), 1915–1926 (2012)CrossRef
8.
Zurück zum Zitat Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2015)CrossRef Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2015)CrossRef
9.
Zurück zum Zitat Huang, Y., Liu, X., Zhang, X., Jin, L.: Deepfinger: a cascade convolutional neuron network approach to finger key point detection in egocentric vision with mobile camera. In: The IEEE Conference on System, Man and Cybernetics (SMC), pp. 2944–2949 (2015) Huang, Y., Liu, X., Zhang, X., Jin, L.: Deepfinger: a cascade convolutional neuron network approach to finger key point detection in egocentric vision with mobile camera. In: The IEEE Conference on System, Man and Cybernetics (SMC), pp. 2944–2949 (2015)
10.
Zurück zum Zitat Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1409–1422 (2012)CrossRef Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1409–1422 (2012)CrossRef
11.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
12.
Zurück zum Zitat Li, C., Kitani, K.M.: Model recommendation with virtual probes for egocentric hand detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 2624–2631 (2013) Li, C., Kitani, K.M.: Model recommendation with virtual probes for egocentric hand detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 2624–2631 (2013)
13.
Zurück zum Zitat Li, C., Kitani, K.M.: Pixel-level hand detection in ego-centric videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3570–3577 (2013) Li, C., Kitani, K.M.: Pixel-level hand detection in ego-centric videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3570–3577 (2013)
14.
Zurück zum Zitat Mittal, A., Zisserman, A., Torr, P.H.: Hand detection using multiple proposals. In: BMVC, pp. 1–11. Citeseer (2011) Mittal, A., Zisserman, A., Torr, P.H.: Hand detection using multiple proposals. In: BMVC, pp. 1–11. Citeseer (2011)
15.
Zurück zum Zitat Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 1–42 (2014)MathSciNet Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 1–42 (2014)MathSciNet
16.
Zurück zum Zitat Sun, X., Wei, Y., Liang, S., Tang, X., Sun, J.: Cascaded hand pose regression. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015) Sun, X., Wei, Y., Liang, S., Tang, X., Sun, J.: Cascaded hand pose regression. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
17.
Zurück zum Zitat Supancic, J.S., Rogez, G., Yang, Y., Shotton, J., Ramanan, D.: Depth-based hand pose estimation: data, methods, and challenges. In: The IEEE International Conference on Computer Vision (ICCV) (2015) Supancic, J.S., Rogez, G., Yang, Y., Shotton, J., Ramanan, D.: Depth-based hand pose estimation: data, methods, and challenges. In: The IEEE International Conference on Computer Vision (ICCV) (2015)
18.
Zurück zum Zitat Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph. (TOG) 33(5), 169 (2014)CrossRef Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph. (TOG) 33(5), 169 (2014)CrossRef
19.
Zurück zum Zitat Tseng, P.H., Carmi, R., Cameron, I.G., Munoz, D.P., Itti, L.: Quantifying center bias of observers in free viewing of dynamic natural scenes. J. Vis. 9(7), 4 (2009)CrossRef Tseng, P.H., Carmi, R., Cameron, I.G., Munoz, D.P., Itti, L.: Quantifying center bias of observers in free viewing of dynamic natural scenes. J. Vis. 9(7), 4 (2009)CrossRef
20.
Zurück zum Zitat Wang, N., Shi, J., Yeung, D.Y., Jia, J.: Understanding and diagnosing visual tracking systems. In: The IEEE International Conference on Computer Vision (ICCV) (2015) Wang, N., Shi, J., Yeung, D.Y., Jia, J.: Understanding and diagnosing visual tracking systems. In: The IEEE International Conference on Computer Vision (ICCV) (2015)
Metadaten
Titel
Fingertip in the Eye: An Attention-Based Method for Real-Time Hand Tracking and Fingertip Detection in Egocentric Videos
verfasst von
Xiaorui Liu
Yichao Huang
Xin Zhang
Lianwen Jin
Copyright-Jahr
2016
Verlag
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-3002-4_12

Premium Partner