Skip to main content
Erschienen in: International Journal of Computer Assisted Radiology and Surgery 6/2019

09.04.2019 | Original Article

Face detection in the operating room: comparison of state-of-the-art methods and a self-supervised approach

verfasst von: Thibaut Issenhuth, Vinkle Srivastav, Afshin Gangi, Nicolas Padoy

Erschienen in: International Journal of Computer Assisted Radiology and Surgery | Ausgabe 6/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Purpose

Face detection is a needed component for the automatic analysis and assistance of human activities during surgical procedures. Efficient face detection algorithms can indeed help to detect and identify the persons present in the room and also be used to automatically anonymize the data. However, current algorithms trained on natural images do not generalize well to the operating room (OR) images. In this work, we provide a comparison of state-of-the-art face detectors on OR data and also present an approach to train a face detector for the OR by exploiting non-annotated OR images.

Methods

We propose a comparison of six state-of-the-art face detectors on clinical data using multi-view OR faces, a dataset of OR images capturing real surgical activities. We then propose to use self-supervision, a domain adaptation method, for the task of face detection in the OR. The approach makes use of non-annotated images to fine-tune a state-of-the-art detector for the OR without using any human supervision.

Results

The results show that the best model, namely the tiny face detector, yields an average precision of 0.556 at intersection over union of 0.5. Our self-supervised model using non-annotated clinical data outperforms this result by 9.2%.

Conclusion

We present the first comparison of state-of-the-art face detectors on OR images and show that results can be significantly improved by using self-supervision on non-annotated data.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Chen K, Gabriel P, Alasfour A, Gong C, Doyle WK, Devinsky O, Friedman D, Dugan P, Melloni L, Thesen T, Gonda D, Sattar S, Wang S, Gilja V (2018) Patient-specific pose estimation in clinical environments. IEEE J Transl Eng Health Med 6:1–11 Chen K, Gabriel P, Alasfour A, Gong C, Doyle WK, Devinsky O, Friedman D, Dugan P, Melloni L, Thesen T, Gonda D, Sattar S, Wang S, Gilja V (2018) Patient-specific pose estimation in clinical environments. IEEE J Transl Eng Health Med 6:1–11
2.
Zurück zum Zitat Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: CVPR, pp I–I Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: CVPR, pp I–I
3.
Zurück zum Zitat Najibi M, Samangouei P, Chellappa R, Davis LS (2017) SSH: single stage headless face detector. In: ICCV, pp 4885–4894 Najibi M, Samangouei P, Chellappa R, Davis LS (2017) SSH: single stage headless face detector. In: ICCV, pp 4885–4894
4.
Zurück zum Zitat Zhang S, Zhu X, Lei Z, Shi H, Wang X, Li SZ (2017) S\(^3\)FD: single shot scale-invariant face detector. In: International conference on computer vision (ICCV) at Venice, Italy Zhang S, Zhu X, Lei Z, Shi H, Wang X, Li SZ (2017) S\(^3\)FD: single shot scale-invariant face detector. In: International conference on computer vision (ICCV) at Venice, Italy
5.
Zurück zum Zitat Jiang H, Learned-Miller E (2017) Face detection with the faster R-CNN. In: 12th IEEE international conference on automatic face & gesture recognition (FG 2017) Jiang H, Learned-Miller E (2017) Face detection with the faster R-CNN. In: 12th IEEE international conference on automatic face & gesture recognition (FG 2017)
6.
Zurück zum Zitat Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp 91–99 Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp 91–99
7.
Zurück zum Zitat Yang S, Luo P, Loy CC, Tang X (2016) Wider face: a face detection benchmark. In: CVPR Yang S, Luo P, Loy CC, Tang X (2016) Wider face: a face detection benchmark. In: CVPR
8.
Zurück zum Zitat Cao Z, Simon T, Wei S-E, Sheikh Y (2017) Realtime multi-person 2D pose estimation using part affinity fields. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 1302–1310 Cao Z, Simon T, Wei S-E, Sheikh Y (2017) Realtime multi-person 2D pose estimation using part affinity fields. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 1302–1310
9.
Zurück zum Zitat Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B (2016) Deepercut: a deeper, stronger, and faster multi-person pose estimation model. In: ECCV, pp 34–50 Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B (2016) Deepercut: a deeper, stronger, and faster multi-person pose estimation model. In: ECCV, pp 34–50
10.
Zurück zum Zitat Fang H-S, Xie S, Tai Y-W, Lu C (2017) RMPE: regional multi-person pose estimation. In: ICCV Fang H-S, Xie S, Tai Y-W, Lu C (2017) RMPE: regional multi-person pose estimation. In: ICCV
11.
Zurück zum Zitat Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: ECCV Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: ECCV
12.
Zurück zum Zitat Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 7103–7112 Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 7103–7112
13.
Zurück zum Zitat Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. In: ECCV, pp 740–755 Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. In: ECCV, pp 740–755
14.
Zurück zum Zitat Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2D human pose estimation: new benchmark and state of the art analysis. In: CVPR, June 2014 Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2D human pose estimation: new benchmark and state of the art analysis. In: CVPR, June 2014
15.
Zurück zum Zitat Twinanda AP, Shehata S, Mutter D, Marescaux J, de Mathelin M, Padoy N (2016) Multi-stream deep architecture for surgical phase recognition on multi-view RGBD videos. In: MICCAI workshop on modeling and monitoring of computer assisted interventions (M2CAI) Twinanda AP, Shehata S, Mutter D, Marescaux J, de Mathelin M, Padoy N (2016) Multi-stream deep architecture for surgical phase recognition on multi-view RGBD videos. In: MICCAI workshop on modeling and monitoring of computer assisted interventions (M2CAI)
16.
Zurück zum Zitat Maier-Hein L, Vedula SS, Speidel S, Navab N, Kikinis R, Park A, Eisenmann M, Feussner H, Forestier G, Giannarou S, Hashizume M, Katic D, Kenngott H, Kranzfelder M, Malpani A, März K, Neumuth T, Padoy N, Pugh C, Schoch N, Stoyanov D, Taylor R, Wagner M, Hager GD, Jannin P (2017) Surgical data science for next-generation interventions. Nat Biomed Eng 1(9):691CrossRefPubMed Maier-Hein L, Vedula SS, Speidel S, Navab N, Kikinis R, Park A, Eisenmann M, Feussner H, Forestier G, Giannarou S, Hashizume M, Katic D, Kenngott H, Kranzfelder M, Malpani A, März K, Neumuth T, Padoy N, Pugh C, Schoch N, Stoyanov D, Taylor R, Wagner M, Hager GD, Jannin P (2017) Surgical data science for next-generation interventions. Nat Biomed Eng 1(9):691CrossRefPubMed
17.
Zurück zum Zitat Yeung S, Downing NL, Fei-Fei L, Milstein A (2018) Bedside computer vision-moving artificial intelligence from driver assistance to patient safety. NEJM 378(14):1271CrossRefPubMed Yeung S, Downing NL, Fei-Fei L, Milstein A (2018) Bedside computer vision-moving artificial intelligence from driver assistance to patient safety. NEJM 378(14):1271CrossRefPubMed
18.
Zurück zum Zitat Kadkhodamohammadi A, Gangi A, de Mathelin M, Padoy N (2017) Articulated clinician detection using 3D pictorial structures on RGB-D data. Med Image Anal 35:215–224CrossRefPubMed Kadkhodamohammadi A, Gangi A, de Mathelin M, Padoy N (2017) Articulated clinician detection using 3D pictorial structures on RGB-D data. Med Image Anal 35:215–224CrossRefPubMed
19.
Zurück zum Zitat Kadkhodamohammadi A, Gangi A, de Mathelin M, Padoy N (2017) A multi-view RGB-D approach for human pose estimation in operating rooms. In: WACV, pp 363–372 Kadkhodamohammadi A, Gangi A, de Mathelin M, Padoy N (2017) A multi-view RGB-D approach for human pose estimation in operating rooms. In: WACV, pp 363–372
20.
Zurück zum Zitat Belagiannis V, Wang X, Shitrit HB, Hashimoto K, Stauder R, Aoki Y, Kranzfelder M, Schneider A, Fua P, Ilic S, Feussner H, Navab N (2016) Parsing human skeletons in an operating room. Mach Vis Appl 27(7):1035–1046CrossRef Belagiannis V, Wang X, Shitrit HB, Hashimoto K, Stauder R, Aoki Y, Kranzfelder M, Schneider A, Fua P, Ilic S, Feussner H, Navab N (2016) Parsing human skeletons in an operating room. Mach Vis Appl 27(7):1035–1046CrossRef
21.
Zurück zum Zitat Nieto-Rodríguez A, Mucientes M, Brea VM (2015) System for medical mask detection in the operating room through facial attributes. In: Iberian conference on pattern recognition and image analysis. Springer, pp 138–145 Nieto-Rodríguez A, Mucientes M, Brea VM (2015) System for medical mask detection in the operating room through facial attributes. In: Iberian conference on pattern recognition and image analysis. Springer, pp 138–145
22.
Zurück zum Zitat Flouty E, Zisimopoulos O, Stoyanov D (2018) Faceoff: anonymizing videos in the operating rooms. In: OR 2.0 context-aware operating theaters, computer assisted robotic endoscopy, clinical image-based procedures, and skin image analysis. Springer, pp 30–38 Flouty E, Zisimopoulos O, Stoyanov D (2018) Faceoff: anonymizing videos in the operating rooms. In: OR 2.0 context-aware operating theaters, computer assisted robotic endoscopy, clinical image-based procedures, and skin image analysis. Springer, pp 30–38
23.
Zurück zum Zitat Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat 28(2):337–407CrossRef Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat 28(2):337–407CrossRef
24.
Zurück zum Zitat Srivastav V, Issenhuth T, Kadkhodamohammadi A, de Mathelin M, Gangi A, Padoy N (2018) MVOR: a multi-view rgb-d operating room dataset for 2D and 3D human pose estimation. In: MICCAI-LABELS-2018 Srivastav V, Issenhuth T, Kadkhodamohammadi A, de Mathelin M, Gangi A, Padoy N (2018) MVOR: a multi-view rgb-d operating room dataset for 2D and 3D human pose estimation. In: MICCAI-LABELS-2018
26.
Zurück zum Zitat Radosavovic I, Dollár P, Girshick RB, Gkioxari G, He K (2018) Data distillation: towards omni-supervised learning. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 4119–4128 Radosavovic I, Dollár P, Girshick RB, Gkioxari G, He K (2018) Data distillation: towards omni-supervised learning. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 4119–4128
27.
Zurück zum Zitat Hu P, Ramanan D (2017) Finding tiny faces. In: CVPR Hu P, Ramanan D (2017) Finding tiny faces. In: CVPR
28.
Zurück zum Zitat Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: single shot multibox detector. In: ECCV. Springer, pp 21–37 Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: single shot multibox detector. In: ECCV. Springer, pp 21–37
Metadaten
Titel
Face detection in the operating room: comparison of state-of-the-art methods and a self-supervised approach
verfasst von
Thibaut Issenhuth
Vinkle Srivastav
Afshin Gangi
Nicolas Padoy
Publikationsdatum
09.04.2019
Verlag
Springer International Publishing
Erschienen in
International Journal of Computer Assisted Radiology and Surgery / Ausgabe 6/2019
Print ISSN: 1861-6410
Elektronische ISSN: 1861-6429
DOI
https://doi.org/10.1007/s11548-019-01944-y

Weitere Artikel der Ausgabe 6/2019

International Journal of Computer Assisted Radiology and Surgery 6/2019 Zur Ausgabe