Skip to main content
Top
Published in: International Journal of Computer Assisted Radiology and Surgery 11/2019

06-08-2019 | Original Article

Fusing information from multiple 2D depth cameras for 3D human pose estimation in the operating room

Authors: Lasse Hansen, Marlin Siebert, Jasper Diesel, Mattias P. Heinrich

Published in: International Journal of Computer Assisted Radiology and Surgery | Issue 11/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Purpose

For many years, deep convolutional neural networks have achieved state-of-the-art results on a wide variety of computer vision tasks. 3D human pose estimation makes no exception and results on public benchmarks are impressive. However, specialized domains, such as operating rooms, pose additional challenges. Clinical settings include severe occlusions, clutter and difficult lighting conditions. Privacy concerns of patients and staff make it necessary to use unidentifiable data. In this work, we aim to bring robust human pose estimation to the clinical domain.

Methods

We propose a 2D–3D information fusion framework that makes use of a network of multiple depth cameras and strong pose priors. In a first step, probabilities of 2D joints are predicted from single depth images. These information are fused in a shared voxel space yielding a rough estimate of the 3D pose. Final joint positions are obtained by regressing into the latent pose space of a pre-trained convolutional autoencoder.

Results

We evaluate our approach against several baselines on the challenging MVOR dataset. Best results are obtained when fusing 2D information from multiple views and constraining the predictions with learned pose priors.

Conclusions

We present a robust 3D human pose estimation framework based on a multi-depth camera network in the operating room. Depth images as only input modalities make our approach especially interesting for clinical applications due to the given anonymity for patients and staff.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Achilles F, Ichim AE, Coskun H, Tombari F, Noachtar S, Navab N (2016) Patient mocap: human pose estimation under blanket occlusion for hospital monitoring applications. In: Proceedings of the international conference on medical image computing and computer-assisted intervention (MICCAI). Springer, pp 491–499 Achilles F, Ichim AE, Coskun H, Tombari F, Noachtar S, Navab N (2016) Patient mocap: human pose estimation under blanket occlusion for hospital monitoring applications. In: Proceedings of the international conference on medical image computing and computer-assisted intervention (MICCAI). Springer, pp 491–499
2.
go back to reference Andriluka M, Iqbal U, Insafutdinov E, Pishchulin L, Milan A, Gall J, Schiele B (2018) Posetrack: a benchmark for human pose estimation and tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5167–5176 Andriluka M, Iqbal U, Insafutdinov E, Pishchulin L, Milan A, Gall J, Schiele B (2018) Posetrack: a benchmark for human pose estimation and tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5167–5176
3.
go back to reference Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2D human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the conference on computer vision and pattern recognition (CVPR), pp 3686–3693 Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2D human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the conference on computer vision and pattern recognition (CVPR), pp 3686–3693
4.
go back to reference Andriluka M, Roth S, Schiele B (2009) Pictorial structures revisited: people detection and articulated pose estimation. In: Proceedings of the conference on computer vision and pattern recognition (CVPR). IEEE, pp 1014–1021 Andriluka M, Roth S, Schiele B (2009) Pictorial structures revisited: people detection and articulated pose estimation. In: Proceedings of the conference on computer vision and pattern recognition (CVPR). IEEE, pp 1014–1021
5.
go back to reference Belagiannis V, Wang X, Shitrit HBB, Hashimoto K, Stauder R, Aoki Y, Kranzfelder M, Schneider A, Fua P, Ilic S, Feussner H, Navab N (2016) Parsing human skeletons in an operating room. Mach Vis Appl (MVA) 27(7):1035–1046CrossRef Belagiannis V, Wang X, Shitrit HBB, Hashimoto K, Stauder R, Aoki Y, Kranzfelder M, Schneider A, Fua P, Ilic S, Feussner H, Navab N (2016) Parsing human skeletons in an operating room. Mach Vis Appl (MVA) 27(7):1035–1046CrossRef
6.
go back to reference Cao Z, Simon T, Wei S.E, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the conference on computer vision and pattern recognition (CVPR), pp 7291–7299 Cao Z, Simon T, Wei S.E, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the conference on computer vision and pattern recognition (CVPR), pp 7291–7299
7.
go back to reference Chen K, Gabriel P, Alasfour A, Gong C, Doyle WK, Devinsky O, Friedman D, Dugan P, Melloni L, Thesen T, Gonda D, Sattar S, Wang S, Gilja V (2018) Patient-specific pose estimation in clinical environments. J Transl Eng Health Med (JTEHM) 6:1–11 Chen K, Gabriel P, Alasfour A, Gong C, Doyle WK, Devinsky O, Friedman D, Dugan P, Melloni L, Thesen T, Gonda D, Sattar S, Wang S, Gilja V (2018) Patient-specific pose estimation in clinical environments. J Transl Eng Health Med (JTEHM) 6:1–11
8.
go back to reference Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the conference on computer vision and pattern recognition (CVPR), pp 7103–7112 Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the conference on computer vision and pattern recognition (CVPR), pp 7103–7112
9.
go back to reference Dietz A, Schröder S, Pösch A, Frank K, Reithmeier E (2016) Contactless surgery light control based on 3D gesture recognition. In: GCAI, pp 138–146 Dietz A, Schröder S, Pösch A, Frank K, Reithmeier E (2016) Contactless surgery light control based on 3D gesture recognition. In: GCAI, pp 138–146
10.
go back to reference Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: Proceedings of the conference on computer vision and pattern recognition (CVPR). IEEE, pp 1–8 Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: Proceedings of the conference on computer vision and pattern recognition (CVPR). IEEE, pp 1–8
11.
go back to reference Girshick R, Shotton J, Kohli P, Criminisi A, Fitzgibbon A (2011) Efficient regression of general-activity human poses from depth images. In: Proceedings of the international conference on computer vision (ICCV). IEEE, pp 415–422 Girshick R, Shotton J, Kohli P, Criminisi A, Fitzgibbon A (2011) Efficient regression of general-activity human poses from depth images. In: Proceedings of the international conference on computer vision (ICCV). IEEE, pp 415–422
12.
go back to reference Hansen L, Diesel J, Heinrich MP (2019) Regularized landmark detection with CAEs for human pose estimation in the operating room. In: Bildverarbeitung für die Medizin (BVM). Springer, pp 178–183 Hansen L, Diesel J, Heinrich MP (2019) Regularized landmark detection with CAEs for human pose estimation in the operating room. In: Bildverarbeitung für die Medizin (BVM). Springer, pp 178–183
13.
go back to reference Haque A, Peng B, Luo Z, Alahi A, Yeung S, Fei-Fei L (2016) Towards viewpoint invariant 3D human pose estimation. In: Proccedings of the European conference on computer vision (ECCV). Springer, pp 160–177 Haque A, Peng B, Luo Z, Alahi A, Yeung S, Fei-Fei L (2016) Towards viewpoint invariant 3D human pose estimation. In: Proccedings of the European conference on computer vision (ECCV). Springer, pp 160–177
14.
go back to reference He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the conference on computer vision and pattern recognition (CVPR), pp 770–778 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the conference on computer vision and pattern recognition (CVPR), pp 770–778
15.
go back to reference Ionescu C, Papava D, Olaru V, Sminchisescu C (2014) Human3.6m: large scale datasets and predictive methods for 3D human sensing in natural environments. Trans Pattern Anal Mach Intell (TPAMI) 36(7):1325–1339CrossRef Ionescu C, Papava D, Olaru V, Sminchisescu C (2014) Human3.6m: large scale datasets and predictive methods for 3D human sensing in natural environments. Trans Pattern Anal Mach Intell (TPAMI) 36(7):1325–1339CrossRef
16.
go back to reference Jacob MG, Li YT, Akingba GA, Wachs JP (2013) Collaboration with a robotic scrub nurse. Commun ACM 56(5):68–75CrossRef Jacob MG, Li YT, Akingba GA, Wachs JP (2013) Collaboration with a robotic scrub nurse. Commun ACM 56(5):68–75CrossRef
17.
go back to reference Jung HY, Suh Y, Moon G, Lee KM (2016) A sequential approach to 3d human pose estimation: separation of localization and identification of body joints. In: Proceedings of the European conference on computer vision (ECCV). Springer, pp 747–761 Jung HY, Suh Y, Moon G, Lee KM (2016) A sequential approach to 3d human pose estimation: separation of localization and identification of body joints. In: Proceedings of the European conference on computer vision (ECCV). Springer, pp 747–761
18.
go back to reference Kadkhodamohammadi A, Gangi A, de Mathelin M, Padoy N (2017) A multi-view RGB-D approach for human pose estimation in operating rooms. In: Proceedings of the winter conference on applications of computer vision (WACV). IEEE, pp 363–372 Kadkhodamohammadi A, Gangi A, de Mathelin M, Padoy N (2017) A multi-view RGB-D approach for human pose estimation in operating rooms. In: Proceedings of the winter conference on applications of computer vision (WACV). IEEE, pp 363–372
20.
go back to reference Katircioglu I, Tekin B, Salzmann M, Lepetit V, Fua P (2018) Learning latent representations of 3D human pose with deep neural networks. Int J Comput Vis (IJCV) 126:1–16CrossRef Katircioglu I, Tekin B, Salzmann M, Lepetit V, Fua P (2018) Learning latent representations of 3D human pose with deep neural networks. Int J Comput Vis (IJCV) 126:1–16CrossRef
23.
go back to reference Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: Proceedings of the European conference on computer vision (ECCV). Springer, pp 21–37 Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: Proceedings of the European conference on computer vision (ECCV). Springer, pp 21–37
24.
go back to reference McCoy TH, Perlis RH (2018) Temporal trends and characteristics of reportable health data breaches, 2010–2017. JAMA 320(12):1282–1284CrossRef McCoy TH, Perlis RH (2018) Temporal trends and characteristics of reportable health data breaches, 2010–2017. JAMA 320(12):1282–1284CrossRef
25.
go back to reference Moon G, Yong Chang J, Mu Lee K (2018) V2v-posenet: voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map. In: Proceedings of the conference on computer vision and pattern recognition (CVPR), pp 5079–5088 Moon G, Yong Chang J, Mu Lee K (2018) V2v-posenet: voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map. In: Proceedings of the conference on computer vision and pattern recognition (CVPR), pp 5079–5088
26.
go back to reference Mori G, Ren X, Efros AA, Malik J (2018) Recovering human body configurations: combining segmentation and recognition. In: Proceedings of the conference on computer vision and pattern recognition (CVPR), vol. 2. IEEE (2004) Mori G, Ren X, Efros AA, Malik J (2018) Recovering human body configurations: combining segmentation and recognition. In: Proceedings of the conference on computer vision and pattern recognition (CVPR), vol. 2. IEEE (2004)
28.
go back to reference Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: Proceedings of the European conference on computer vision (ECCV). Springer, pp 483–499 Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: Proceedings of the European conference on computer vision (ECCV). Springer, pp 483–499
29.
go back to reference Padoy N, Blum T, Ahmadi SA, Feussner H, Berger MO, Navab N (2012) Statistical modeling and recognition of surgical workflow. Med Image Anal 16(3):632–641CrossRef Padoy N, Blum T, Ahmadi SA, Feussner H, Berger MO, Navab N (2012) Statistical modeling and recognition of surgical workflow. Med Image Anal 16(3):632–641CrossRef
30.
go back to reference Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch. In: Advances in neural information processing systems workshop (NIPS-W) Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch. In: Advances in neural information processing systems workshop (NIPS-W)
31.
go back to reference Pavlakos G, Zhou X, Derpanis KG, Daniilidis K (2017) Coarse-to-fine volumetric prediction for single-image 3D human pose. In: Proceedings of the conference on computer vision and pattern recognition (CVPR). IEEE, pp 1263–1272 Pavlakos G, Zhou X, Derpanis KG, Daniilidis K (2017) Coarse-to-fine volumetric prediction for single-image 3D human pose. In: Proceedings of the conference on computer vision and pattern recognition (CVPR). IEEE, pp 1263–1272
32.
go back to reference Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems (NIPS), pp 91–99 Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems (NIPS), pp 91–99
33.
go back to reference Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from single depth images. In: Proceedings of the conference on computer vision and pattern recognition (CVPR). IEEE, pp 1297–1304 Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from single depth images. In: Proceedings of the conference on computer vision and pattern recognition (CVPR). IEEE, pp 1297–1304
34.
go back to reference Silas MR, Grassia P, Langerman A (2015) Video recording of the operating room-is anonymity possible? J Surg Res 197(2):272–276CrossRef Silas MR, Grassia P, Langerman A (2015) Video recording of the operating room-is anonymity possible? J Surg Res 197(2):272–276CrossRef
35.
go back to reference Srivastav V, Issenhuth T, Kadkhodamohammadi A, de Mathelin M, Gangi A, Padoy N (2018) MVOR: a multi-view RGB-D operating room dataset for 2D and 3D human pose estimation. arXiv:1808.08180 Srivastav V, Issenhuth T, Kadkhodamohammadi A, de Mathelin M, Gangi A, Padoy N (2018) MVOR: a multi-view RGB-D operating room dataset for 2D and 3D human pose estimation. arXiv:​1808.​08180
36.
go back to reference Toshev A, Szegedy C (2014) Deeppose: human pose estimation via deep neural networks. In: Proceedings of the conference on computer vision and pattern recognition (CVPR), pp 1653–1660 Toshev A, Szegedy C (2014) Deeppose: human pose estimation via deep neural networks. In: Proceedings of the conference on computer vision and pattern recognition (CVPR), pp 1653–1660
37.
go back to reference Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res (JMLR) 11(Dec):3371–3408 Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res (JMLR) 11(Dec):3371–3408
38.
go back to reference Xiao B, Wu H, Wei, Y (2018) Simple baselines for human pose estimation and tracking. In: Proceedings of the European conference on computer vision (ECCV) Xiao B, Wu H, Wei, Y (2018) Simple baselines for human pose estimation and tracking. In: Proceedings of the European conference on computer vision (ECCV)
39.
go back to reference Yao A, Gall J, Van Gool L (2012) Coupled action recognition and pose estimation from multiple views. Int J Comput Vis 100(1):16–37CrossRef Yao A, Gall J, Van Gool L (2012) Coupled action recognition and pose estimation from multiple views. Int J Comput Vis 100(1):16–37CrossRef
40.
go back to reference Yusoff YA, Basori AH, Mohamed F (2013) Interactive hand and arm gesture control for 2D medical image and 3D volumetric medical visualization. Proc Soc Behav Sci 97:723–729CrossRef Yusoff YA, Basori AH, Mohamed F (2013) Interactive hand and arm gesture control for 2D medical image and 3D volumetric medical visualization. Proc Soc Behav Sci 97:723–729CrossRef
Metadata
Title
Fusing information from multiple 2D depth cameras for 3D human pose estimation in the operating room
Authors
Lasse Hansen
Marlin Siebert
Jasper Diesel
Mattias P. Heinrich
Publication date
06-08-2019
Publisher
Springer International Publishing
Published in
International Journal of Computer Assisted Radiology and Surgery / Issue 11/2019
Print ISSN: 1861-6410
Electronic ISSN: 1861-6429
DOI
https://doi.org/10.1007/s11548-019-02044-7

Other articles of this Issue 11/2019

International Journal of Computer Assisted Radiology and Surgery 11/2019 Go to the issue

Premium Partner