nach oben

Machine Vision and Applications

Erschienen in:

12.06.2019 | Original Paper

An embedded implementation of CNN-based hand detection and orientation estimation algorithm

verfasst von: Li Yang, Zhi Qi, Zeheng Liu, Hao Liu, Ming Ling, Longxing Shi, Xinning Liu

Erschienen in: Machine Vision and Applications | Ausgabe 6/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Hand detection is an essential step to support many tasks including HCI applications. However, detecting various hands robustly under conditions of cluttered backgrounds, motion blur or changing light is still a challenging problem. Recently, object detection methods using CNN models have significantly improved the accuracy of hand detection yet at a high computational expense. In this paper, we propose a light CNN network, which uses a modified MobileNet as the feature extractor in company with the SSD framework to achieve robust and fast detection of hand location and orientation. The network generates a set of feature maps of various resolutions to detect hands of different sizes. In order to improve the robustness, we also employ a top-down feature fusion architecture that integrates context information across levels of features. For an accurate estimation of hand orientation by CNN, we manage to estimate two orthogonal vectors’ projections along the horizontal and vertical axes and then recover the size and orientation of a bounding box exactly enclosing the hand. In order to deploy the detection algorithm on embedded platform Jetson TK1, we optimize the implementations of the building modules in the CNN network. Evaluated on the challenging Oxford hand dataset, our method (the code is available at https://github.com/yangli18/hand_detection) reaches 83.2% average precision at 139 FPS on a NVIDIA Titan X, outperforming the previous methods both in accuracy and efficiency. The embedded implementation of our algorithm has reached the processing speed of 16 FPS, which basically meets the requirement of real-time processing.

Vorheriger Artikel ARM-VO: an efficient monocular visual odometry for ground vehicles on ARM CPUs

Nächster Artikel Action recognition in poor-quality spectator crowd videos using head distribution-based person segmentation

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Argyros, A.A., Lourakis, M.I.: Real-time tracking of multiple skin-colored objects with a possibly moving camera. In: European Conference on Computer Vision, pp. 368–379. Springer (2004)

Chen, Q., Georganas, N.D., Petriu, E.M.: Hand gesture recognition using haar-like features and a stochastic context-free grammar. IEEE Trans. Instrum. Meas. 57(8), 1562–1571 (2008)CrossRef

Dai, J., Li, Y., He, K., Sun, J.: R-FCN: Object detection via region-based fully convolutional networks. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16. Curran Associates Inc., Barcelona, Spain, pp 379–387 (2016)

Deng, X., Zhang, Y., Yang, S., Tan, P., Chang, L., Yuan, Y., Wang, H.: Joint hand detection and rotation estimation using CNN. IEEE Trans. Image Process. 27(4), 1888–1900 (2018)MathSciNetCrossRefMATH

Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)CrossRef

Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)

Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadarrama, S., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: IEEE CVPR, vol. 4 (2017)

Huang, Y., Liu, X., Zhang, X., Jin, L.: A pointing gesture based egocentric interaction system: dataset, approach and application. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 16–23 (2016)

10.

Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 675–678. ACM (2014)

11.

Jones, M., Viola, P.: Robust real-time object detection. Int. J. Comput. Vis. 57(2), 87 (2002)

12.

Le, T.H.N., Quach, K.G., Zhu, C., Duong, C.N., Luu, K., Savvides, M., Center, C.B.: Robust hand detection and classification in vehicles and in the wild. In: Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1203–1210 (2017)

13.

Le, T.H.N., Zhu, C., Zheng, Y., Luu, K., Savvides, M.: Robust hand detection in vehicles. In: 23rd International Conference on Pattern Recognition (ICPR), pp. 573–578. IEEE (2016)

14.

Li, C., Kitani, K.M.: Pixel-level hand detection in ego-centric videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3570–3577 (2013)

15.

Lin, T.Y., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: CVPR, vol. 1, p. 4 (2017)

16.

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016)

17.

Mao, H., Yao, S., Tang, T., Li, B., Yao, J., Wang, Y.: Towards real-time object detection on embedded systems. IEEE Trans. Emerg. Top. Comput. 1, 1–1 (2016)

18.

Mittal, A., Zisserman, A., Torr, P.H.: Hand detection using multiple proposals. In: BMVC, pp. 1–11. Citeseer (2011)

19.

Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch. In: NIPS-W (2017)

20.

Pisharady, P.K., Vadakkepat, P., Loh, A.P.: Attention based detection and recognition of hand postures against complex backgrounds. Int. J. Comput. Vis. 101(3), 403–419 (2013)CrossRef

21.

Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS’15, vol 1. MIT Press, Montreal, Canada, pp 91–99 (2015)

22.

Shrivastava, A., Sukthankar, R., Malik, J., Gupta, A.: Beyond skip connections: Top-down modulation for object detection. arXiv preprint arXiv:1612.06851 (2016)

23.

Stergiopoulou, E., Sgouropoulos, K., Nikolaou, N., Papamarkos, N., Mitianoudis, N.: Real time hand detection in a complex background. Eng. Appl. Artif. Intell. 35, 54–70 (2014)CrossRef

24.

Wang, C., Wang, Y., Han, Y., Song, L., Quan, Z., Li, J., Li, X.: CNN-based object detection solutions for embedded heterogeneous multicore SoCs. In: 22nd Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 105–110. IEEE (2017)

25.

Yu, J., Guo, K., Hu, Y., Ning, X., Qiu, J., Mao, H., Yao, S., Tang, T., Li, B., Wang, Y., et al.: Real-time object detection towards high power efficiency. In: Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 704–708. IEEE (2018)

Titel: An embedded implementation of CNN-based hand detection and orientation estimation algorithm
verfasst von: Li Yang
Zhi Qi
Zeheng Liu
Hao Liu
Ming Ling
Longxing Shi
Xinning Liu
Publikationsdatum: 12.06.2019
Verlag: Springer Berlin Heidelberg
Erschienen in: Machine Vision and Applications / Ausgabe 6/2019
Print ISSN: 0932-8092
Elektronische ISSN: 1432-1769
DOI: https://doi.org/10.1007/s00138-019-01038-4

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 6/2019

CCDA: a concise corner detection algorithm

Action recognition in poor-quality spectator crowd videos using head distribution-based person segmentation

ARM-VO: an efficient monocular visual odometry for ground vehicles on ARM CPUs

A camera model for cameras with hypercentric lenses and some example applications

Wide baseline pose estimation from video with a density-based uncertainty model

Trajectory prediction of vehicles turning at intersections using deep neural networks