Skip to main content
Top

2017 | OriginalPaper | Chapter

3. CMS-RCNN: Contextual Multi-Scale Region-Based CNN for Unconstrained Face Detection

Authors : Chenchen Zhu, Yutong Zheng, Khoa Luu, Marios Savvides

Published in: Deep Learning for Biometrics

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Robust face detection in the wild is one of the ultimate components to support various facial related problems, i.e., unconstrained face recognition, facial periocular recognition, facial landmarking and pose estimation, facial expression recognition, 3D facial model construction, etc. Although the face detection problem has been intensely studied for decades with various commercial applications, it still meets problems in some real-world scenarios due to numerous challenges, e.g., heavy facial occlusions, extremely low resolutions, strong illumination, exceptional pose variations, image or video compression artifacts, etc. In this paper, we present a face detection approach named Contextual Multi-Scale Region-based Convolution Neural Network (CMS-RCNN) to robustly solve the problems mentioned above. Similar to the region-based CNNs, our proposed network consists of the region proposal component and the region-of-interest (RoI) detection component. However, far apart of that network, there are two main contributions in our proposed network that play a significant role to achieve the state-of-the-art performance in face detection. First, the multi-scale information is grouped both in region proposal and RoI detection to deal with tiny face regions. Second, our proposed network allows explicit body contextual reasoning in the network inspired from the intuition of human vision system. The proposed approach is benchmarked on two recent challenging face detection databases, i.e., the WIDER FACE Dataset which contains high degree of variability, as well as the Face Detection Dataset and Benchmark (FDDB). The experimental results show that our proposed approach trained on WIDER FACE Dataset outperforms strong baselines on WIDER FACE Dataset by a large margin, and consistently achieves competitive results on FDDB against the recent state-of-the-art face detection methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference S. Yang, P. Luo, C.C. Loy, X. Tang, Wider face: a face detection benchmark, in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2016), pp. 5525–5533 S. Yang, P. Luo, C.C. Loy, X. Tang, Wider face: a face detection benchmark, in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2016), pp. 5525–5533
2.
go back to reference P. Viola, M. Jones, Rapid object detection using a boosted cascade of simple features, in Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001. CVPR 2001, vol. 1 (IEEE, 2001), pp. I–511 P. Viola, M. Jones, Rapid object detection using a boosted cascade of simple features, in Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001. CVPR 2001, vol. 1 (IEEE, 2001), pp. I–511
4.
go back to reference X. Zhu, D. Ramanan, Face detection, pose estimation, and landmark localization in the wild, in 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2012), pp. 2879–2886 X. Zhu, D. Ramanan, Face detection, pose estimation, and landmark localization in the wild, in 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2012), pp. 2879–2886
5.
go back to reference J. Li, Y. Zhang, Learning surf cascade for fast and accurate object detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2013), pp. 3468–3475 J. Li, Y. Zhang, Learning surf cascade for fast and accurate object detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2013), pp. 3468–3475
6.
go back to reference H. Li, G. Hua, Z. Lin, J. Brandt, J. Yang, Probabilistic elastic part model for unsupervised face detector adaptation, in Proceedings of the IEEE International Conference on Computer Vision (2013), pp. 793–800 H. Li, G. Hua, Z. Lin, J. Brandt, J. Yang, Probabilistic elastic part model for unsupervised face detector adaptation, in Proceedings of the IEEE International Conference on Computer Vision (2013), pp. 793–800
7.
go back to reference N. Markuš, M. Frljak, I.S. Pandžić, J. Ahlberg, R. Forchheimer, A method for object detection based on pixel intensity comparisons organized in decision trees (2013). arXiv:1305.4537 N. Markuš, M. Frljak, I.S. Pandžić, J. Ahlberg, R. Forchheimer, A method for object detection based on pixel intensity comparisons organized in decision trees (2013). arXiv:​1305.​4537
8.
go back to reference H. Li, Z. Lin, J. Brandt, X. Shen, G. Hua, Efficient boosted exemplar-based face detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014), pp. 1843–1850 H. Li, Z. Lin, J. Brandt, X. Shen, G. Hua, Efficient boosted exemplar-based face detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014), pp. 1843–1850
9.
go back to reference M. Mathias, R. Benenson, M. Pedersoli, L. Van Gool, Face detection without bells and whistles, in Computer Vision-ECCV 2014 (Springer, Berlin, 2014), pp. 720–735) M. Mathias, R. Benenson, M. Pedersoli, L. Van Gool, Face detection without bells and whistles, in Computer Vision-ECCV 2014 (Springer, Berlin, 2014), pp. 720–735)
10.
go back to reference D. Chen, S. Ren, Y. Wei, X. Cao, J. Sun, Joint cascade face detection and alignment, in Computer Vision-ECCV 2014 (Springer, Berlin, 2014), pp. 109–122 D. Chen, S. Ren, Y. Wei, X. Cao, J. Sun, Joint cascade face detection and alignment, in Computer Vision-ECCV 2014 (Springer, Berlin, 2014), pp. 109–122
11.
go back to reference B. Yang, J. Yan, Z. Lei, S.Z. Li, Aggregate channel features for multi-view face detection, in 2014 IEEE International Joint Conference on Biometrics (IJCB) (IEEE, 2014), pp. 1–8 B. Yang, J. Yan, Z. Lei, S.Z. Li, Aggregate channel features for multi-view face detection, in 2014 IEEE International Joint Conference on Biometrics (IJCB) (IEEE, 2014), pp. 1–8
13.
go back to reference S. Liao, A. Jain, S. Li, A fast and accurate unconstrained face detector (2014) S. Liao, A. Jain, S. Li, A fast and accurate unconstrained face detector (2014)
14.
go back to reference H. Li, Z. Lin, X. Shen, J. Brandt, G. Hua, A convolutional neural network cascade for face detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 5325–5334 H. Li, Z. Lin, X. Shen, J. Brandt, G. Hua, A convolutional neural network cascade for face detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 5325–5334
15.
go back to reference S.S. Farfade, M.J. Saberian, L.-J. Li, Multi-view face detection using deep convolutional neural networks, in Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (ACM, 2015), pp. 643–650 S.S. Farfade, M.J. Saberian, L.-J. Li, Multi-view face detection using deep convolutional neural networks, in Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (ACM, 2015), pp. 643–650
16.
go back to reference S. Yang, P. Luo, C.-C. Loy, X. Tang, From facial parts responses to face detection: a deep learning approach, in Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 3676–3684 S. Yang, P. Luo, C.-C. Loy, X. Tang, From facial parts responses to face detection: a deep learning approach, in Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 3676–3684
17.
go back to reference R. Ranjan, V.M. Patel, R. Chellappa, A deep pyramid deformable part model for face detection, in 2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS) (IEEE, 2015), pp. 1–8 R. Ranjan, V.M. Patel, R. Chellappa, A deep pyramid deformable part model for face detection, in 2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS) (IEEE, 2015), pp. 1–8
18.
go back to reference B. Yang, J. Yan, Z. Lei, S.Z. Li, Convolutional channel features, in Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 82–90 B. Yang, J. Yan, Z. Lei, S.Z. Li, Convolutional channel features, in Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 82–90
19.
go back to reference R. Ranjan, V.M. Patel, R. Chellappa, Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition (2016). arXiv:1603.01249 R. Ranjan, V.M. Patel, R. Chellappa, Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition (2016). arXiv:​1603.​01249
20.
go back to reference V. Jain, E. Learned-Miller, FDDB: a benchmark for face detection in unconstrained settings. University of Massachusetts, Amherst, Technical Report UM-CS-2010-009 (2010) V. Jain, E. Learned-Miller, FDDB: a benchmark for face detection in unconstrained settings. University of Massachusetts, Amherst, Technical Report UM-CS-2010-009 (2010)
21.
22.
go back to reference P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, Object detection with discriminatively trained part-based models. IEEE Trans. PAMI 32(9), 1627–1645 (2010)CrossRef P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, Object detection with discriminatively trained part-based models. IEEE Trans. PAMI 32(9), 1627–1645 (2010)CrossRef
23.
go back to reference X. Yu, J. Huang, S. Zhang, W. Yan, D. Metaxas, Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model, in Proceedings of the IEEE International Conference on Computer Vision (2013), pp. 1944–1951 X. Yu, J. Huang, S. Zhang, W. Yan, D. Metaxas, Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model, in Proceedings of the IEEE International Conference on Computer Vision (2013), pp. 1944–1951
24.
go back to reference S.K. Divvala, D. Hoiem, J.H. Hays, A.A. Efros, M. Hebert, An empirical study of context in object detection, in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009 (IEEE, 2009), pp. 1271–1278 S.K. Divvala, D. Hoiem, J.H. Hays, A.A. Efros, M. Hebert, An empirical study of context in object detection, in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009 (IEEE, 2009), pp. 1271–1278
25.
go back to reference S. Bell, C.L. Zitnick, K. Bala, R. Girshick, Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks (2015). arXiv:1512.04143 S. Bell, C.L. Zitnick, K. Bala, R. Girshick, Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks (2015). arXiv:​1512.​04143
26.
go back to reference S. Zagoruyko, A. Lerer, T.-Y. Lin, P.O. Pinheiro, S. Gross, S. Chintala, P. Dollár, A multipath network for object detection (2016). arXiv:1604.02135 S. Zagoruyko, A. Lerer, T.-Y. Lin, P.O. Pinheiro, S. Gross, S. Chintala, P. Dollár, A multipath network for object detection (2016). arXiv:​1604.​02135
27.
go back to reference A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems (2012), pp. 1097–1105 A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems (2012), pp. 1097–1105
28.
go back to reference R. Girshick, J. Donahue, T. Darrell, J. Malik, Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2016)CrossRef R. Girshick, J. Donahue, T. Darrell, J. Malik, Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2016)CrossRef
29.
go back to reference R. Girshick, Fast R-CNN, in Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 1440–1448 R. Girshick, Fast R-CNN, in Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 1440–1448
30.
go back to reference S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: towards real-time object detection with region proposal networks, in Advances in Neural Information Processing Systems (2015), pp. 91–99 S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: towards real-time object detection with region proposal networks, in Advances in Neural Information Processing Systems (2015), pp. 91–99
31.
go back to reference M.D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, in Computer Vision-ECCV 2014 (Springer, Berlin, 2014), pp. 818–833 M.D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, in Computer Vision-ECCV 2014 (Springer, Berlin, 2014), pp. 818–833
32.
go back to reference T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C.L. Zitnick, Microsoft coco: Common objects in context, in ECCV (2014), pp. 740–755 T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C.L. Zitnick, Microsoft coco: Common objects in context, in ECCV (2014), pp. 740–755
33.
go back to reference B. Hariharan, P. Arbeláez, R. Girshick, J. Malik, Hypercolumns for object segmentation and fine-grained localization, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 447–456 B. Hariharan, P. Arbeláez, R. Girshick, J. Malik, Hypercolumns for object segmentation and fine-grained localization, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 447–456
35.
go back to reference Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, T. Darrell, Caffe: Convolutional architecture for fast feature embedding, in Proceedings of the ACM International Conference on Multimedia (ACM, 2014), pp. 675–678 Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, T. Darrell, Caffe: Convolutional architecture for fast feature embedding, in Proceedings of the ACM International Conference on Multimedia (ACM, 2014), pp. 675–678
36.
go back to reference C.L. Zitnick, P. Dollár, Edge boxes: locating object proposals from edges, in ECCV (Springer, Berlin, 2014), pp. 391–405 C.L. Zitnick, P. Dollár, Edge boxes: locating object proposals from edges, in ECCV (Springer, Berlin, 2014), pp. 391–405
37.
go back to reference M. Everingham, L. Van Gool, C.K. Williams, J. Winn, A. Zisserman, The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRef M. Everingham, L. Van Gool, C.K. Williams, J. Winn, A. Zisserman, The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRef
Metadata
Title
CMS-RCNN: Contextual Multi-Scale Region-Based CNN for Unconstrained Face Detection
Authors
Chenchen Zhu
Yutong Zheng
Khoa Luu
Marios Savvides
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-61657-5_3

Premium Partner