Skip to main content
Erschienen in: International Journal of Computer Vision 3/2021

16.11.2020

AdaFuse: Adaptive Multiview Fusion for Accurate Human Pose Estimation in the Wild

verfasst von: Zhe Zhang, Chunyu Wang, Weichao Qiu, Wenhu Qin, Wenjun Zeng

Erschienen in: International Journal of Computer Vision | Ausgabe 3/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Occlusion is probably the biggest challenge for human pose estimation in the wild. Typical solutions often rely on intrusive sensors such as IMUs to detect occluded joints. To make the task truly unconstrained, we present AdaFuse, an adaptive multiview fusion method, which can enhance the features in occluded views by leveraging those in visible views. The core of AdaFuse is to determine the point-point correspondence between two views which we solve effectively by exploring the sparsity of the heatmap representation. We also learn an adaptive fusion weight for each camera view to reflect its feature quality in order to reduce the chance that good features are undesirably corrupted by “bad” views. The fusion model is trained end-to-end with the pose estimation network, and can be directly applied to new camera configurations without additional adaptation. We extensively evaluate the approach on three public datasets including Human3.6M, Total Capture and CMU Panoptic. It outperforms the state-of-the-arts on all of them. We also create a large scale synthetic dataset Occlusion-Person, which allows us to perform numerical evaluation on the occluded joints, as it provides occlusion labels for every joint in the images. The dataset and code are released at https://​github.​com/​zhezh/​adafuse-3d-human-pose.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Amin, S., Andriluka, M., Rohrbach, M., & Schiele, B. (2013). Multi-view pictorial structures for 3D human pose estimation. In BMVC. Amin, S., Andriluka, M., Rohrbach, M., & Schiele, B. (2013). Multi-view pictorial structures for 3D human pose estimation. In BMVC.
Zurück zum Zitat Andriluka, M., Pishchulin, L., Gehler, P., & Schiele, B. (2014). 2D human pose estimation: New benchmark and state of the art analysis. In CVPR (pp. 3686–3693). Andriluka, M., Pishchulin, L., Gehler, P., & Schiele, B. (2014). 2D human pose estimation: New benchmark and state of the art analysis. In CVPR (pp. 3686–3693).
Zurück zum Zitat Belagiannis, V., Amin, S., Andriluka, M., Schiele, B., Navab, N., & Ilic, S. (2014). 3d pictorial structures for multiple human pose estimation. In CVPR (pp. 1669–1676). Belagiannis, V., Amin, S., Andriluka, M., Schiele, B., Navab, N., & Ilic, S. (2014). 3d pictorial structures for multiple human pose estimation. In CVPR (pp. 1669–1676).
Zurück zum Zitat Bo, L., & Sminchisescu, C. (2010). Twin gaussian processes for structured prediction. IJCV, 87(1–2), 28.CrossRef Bo, L., & Sminchisescu, C. (2010). Twin gaussian processes for structured prediction. IJCV, 87(1–2), 28.CrossRef
Zurück zum Zitat Bridgeman, L., Volino, M., Guillemaut, J. Y., & Hilton, A. (2019). Multi-person 3d pose estimation and tracking in sports. In CVPRW. Bridgeman, L., Volino, M., Guillemaut, J. Y., & Hilton, A. (2019). Multi-person 3d pose estimation and tracking in sports. In CVPRW.
Zurück zum Zitat Burenius, M., Sullivan, J., & Carlsson, S. (2013). 3D pictorial structures for multiple view articulated pose estimation. In CVPR (pp. 3618–3625). Burenius, M., Sullivan, J., & Carlsson, S. (2013). 3D pictorial structures for multiple view articulated pose estimation. In CVPR (pp. 3618–3625).
Zurück zum Zitat Cao, Z., Simon, T., Wei, S. E., & Sheikh, Y. (2017). Realtime multi-person 2d pose estimation using part affinity fields. In CVPR (pp. 7291–7299). Cao, Z., Simon, T., Wei, S. E., & Sheikh, Y. (2017). Realtime multi-person 2d pose estimation using part affinity fields. In CVPR (pp. 7291–7299).
Zurück zum Zitat Chen, W., Wang, H., Li, Y., Su, H., Wang, Z., Tu, C., et al. (2016). Synthesizing training images for boosting human 3d pose estimation. In 3DV (pp. 479–488). IEEE. Chen, W., Wang, H., Li, Y., Su, H., Wang, Z., Tu, C., et al. (2016). Synthesizing training images for boosting human 3d pose estimation. In 3DV (pp. 479–488). IEEE.
Zurück zum Zitat Cheng, Y., Yang, B., Wang, B., Yan, W., & Tan, R. T. (2019). Occlusion-aware networks for 3d human pose estimation in video. In ICCV (pp. 723–732). Cheng, Y., Yang, B., Wang, B., Yan, W., & Tan, R. T. (2019). Occlusion-aware networks for 3d human pose estimation in video. In ICCV (pp. 723–732).
Zurück zum Zitat Ci, H., Wang, C., Ma, X., & Wang, Y. (2019). Optimizing network structure for 3d human pose estimation. In ICCV (pp. 915–922). Ci, H., Wang, C., Ma, X., & Wang, Y. (2019). Optimizing network structure for 3d human pose estimation. In ICCV (pp. 915–922).
Zurück zum Zitat Ci, H., Ma, X., Wang, C., & Wang, Y. (2020). Locally connected network for monocular 3d human pose estimation. In T-PAMI. Ci, H., Ma, X., Wang, C., & Wang, Y. (2020). Locally connected network for monocular 3d human pose estimation. In T-PAMI.
Zurück zum Zitat Dong, J., Jiang, W., Huang, Q., Bao, H., & Zhou, X. (2019). Fast and robust multi-person 3d pose estimation from multiple views. In CVPR (pp. 7792–7801). Dong, J., Jiang, W., Huang, Q., Bao, H., & Zhou, X. (2019). Fast and robust multi-person 3d pose estimation from multiple views. In CVPR (pp. 7792–7801).
Zurück zum Zitat Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381–395.MathSciNetCrossRef Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381–395.MathSciNetCrossRef
Zurück zum Zitat Gal, Y. (2016). Uncertainty in deep learning. PhD thesis, PhD thesis, University of Cambridge. Gal, Y. (2016). Uncertainty in deep learning. PhD thesis, PhD thesis, University of Cambridge.
Zurück zum Zitat Gal, Y., & Ghahramani, Z. (2015). Dropout as a Bayesian approximation: Insights and applications. In Deep learning workshop (Vol. 1, p. 2). ICML. Gal, Y., & Ghahramani, Z. (2015). Dropout as a Bayesian approximation: Insights and applications. In Deep learning workshop (Vol. 1, p. 2). ICML.
Zurück zum Zitat Gall, J., Rosenhahn, B., Brox, T., & Seidel, H. P. (2010). Optimization and filtering for human motion capture. IJCV, 87(1–2), 75.CrossRef Gall, J., Rosenhahn, B., Brox, T., & Seidel, H. P. (2010). Optimization and filtering for human motion capture. IJCV, 87(1–2), 75.CrossRef
Zurück zum Zitat Ghahramani, Z. (2016). A history of Bayesian neural networks. In NIPS workshop on Bayesian deep learning. Ghahramani, Z. (2016). A history of Bayesian neural networks. In NIPS workshop on Bayesian deep learning.
Zurück zum Zitat Gilbert, A., Trumble, M., Malleson, C., Hilton, A., & Collomosse, J. (2019). Fusing visual and inertial sensors with semantics for 3d human pose estimation. IJCV, 127(4), 381–397.CrossRef Gilbert, A., Trumble, M., Malleson, C., Hilton, A., & Collomosse, J. (2019). Fusing visual and inertial sensors with semantics for 3d human pose estimation. IJCV, 127(4), 381–397.CrossRef
Zurück zum Zitat Guo, C., Pleiss, G., Sun, Y., Weinberger, K. Q. (2017). On calibration of modern neural networks. In ICML (pp. 1321–1330), JMLR.org . Guo, C., Pleiss, G., Sun, Y., Weinberger, K. Q. (2017). On calibration of modern neural networks. In ICML (pp. 1321–1330), JMLR.org .
Zurück zum Zitat Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge: Cambridge University Press.MATH Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge: Cambridge University Press.MATH
Zurück zum Zitat He, Y., Zhu, C., Wang, J., Savvides, M., & Zhang, X. (2019). Bounding box regression with uncertainty for accurate object detection. In CVPR (pp. 2888–2897). He, Y., Zhu, C., Wang, J., Savvides, M., & Zhang, X. (2019). Bounding box regression with uncertainty for accurate object detection. In CVPR (pp. 2888–2897).
Zurück zum Zitat Hoffmann, D. T., Tzionas, D., Black, M. J., & Tang, S. (2019). Learning to train with synthetic humans. In German conference on pattern recognition (pp. 609–623). Springer. Hoffmann, D. T., Tzionas, D., Black, M. J., & Tang, S. (2019). Learning to train with synthetic humans. In German conference on pattern recognition (pp. 609–623). Springer.
Zurück zum Zitat Ilg, E., Cicek, O., Galesso, S., Klein, A., Makansi, O., Hutter, F., et al. (2018). Uncertainty estimates and multi-hypotheses networks for optical flow. In ECCV (pp. 652–667). Ilg, E., Cicek, O., Galesso, S., Klein, A., Makansi, O., Hutter, F., et al. (2018). Uncertainty estimates and multi-hypotheses networks for optical flow. In ECCV (pp. 652–667).
Zurück zum Zitat Ionescu, C., Papava, D., Olaru, V., & Sminchisescu, C. (2014). Human3. 6m: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36, 1325–1339.CrossRef Ionescu, C., Papava, D., Olaru, V., & Sminchisescu, C. (2014). Human3. 6m: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36, 1325–1339.CrossRef
Zurück zum Zitat Iskakov, K., Burkov, E., Lempitsky, V., & Malkov, Y. (2019). Learnable triangulation of human pose. arXiv preprint arXiv:1905.05754. Iskakov, K., Burkov, E., Lempitsky, V., & Malkov, Y. (2019). Learnable triangulation of human pose. arXiv preprint arXiv:​1905.​05754.
Zurück zum Zitat Joo, H., Simon, T., Li, X., Liu, H., Tan, L., Gui, L., et al. (2019). Panoptic studio: A massively multiview system for social interaction capture. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(1), 190–204.CrossRef Joo, H., Simon, T., Li, X., Liu, H., Tan, L., Gui, L., et al. (2019). Panoptic studio: A massively multiview system for social interaction capture. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(1), 190–204.CrossRef
Zurück zum Zitat Kendall, A., & Gal, Y. (2017). What uncertainties do we need in bayesian deep learning for computer vision? In NIPS (pp. 5574–5584). Kendall, A., & Gal, Y. (2017). What uncertainties do we need in bayesian deep learning for computer vision? In NIPS (pp. 5574–5584).
Zurück zum Zitat Kreiss, S., Bertoni, L., & Alahi, A. (2019). Pifpaf: Composite fields for human pose estimation. In CVPR (pp. 11977–11986). Kreiss, S., Bertoni, L., & Alahi, A. (2019). Pifpaf: Composite fields for human pose estimation. In CVPR (pp. 11977–11986).
Zurück zum Zitat Lakshminarayanan, B., Pritzel, A., & Blundell, C. (2017). Simple and scalable predictive uncertainty estimation using deep ensembles. In NIPS (pp. 6402–6413). Lakshminarayanan, B., Pritzel, A., & Blundell, C. (2017). Simple and scalable predictive uncertainty estimation using deep ensembles. In NIPS (pp. 6402–6413).
Zurück zum Zitat Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M. J., & Gehler, P. V. (2017). Unite the people: Closing the loop between 3d and 2d human representations. In CVPR (pp. 6050–6059). Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M. J., & Gehler, P. V. (2017). Unite the people: Closing the loop between 3d and 2d human representations. In CVPR (pp. 6050–6059).
Zurück zum Zitat Li, T., Fan, L., Zhao, M., Liu, Y., & Katabi, D. (2019). Making the invisible visible: Action recognition through walls and occlusions. In ICCV (pp. 872–881). Li, T., Fan, L., Zhao, M., Liu, Y., & Katabi, D. (2019). Making the invisible visible: Action recognition through walls and occlusions. In ICCV (pp. 872–881).
Zurück zum Zitat Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., et al. (2014). Microsoft coco: Common objects in context. In ECCV (pp. 740–755). Springer. Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., et al. (2014). Microsoft coco: Common objects in context. In ECCV (pp. 740–755). Springer.
Zurück zum Zitat Liu, Y., Stoll, C., Gall, J., Seidel, H. P., & Theobalt, C. (2011). Markerless motion capture of interacting characters using multi-view image segmentation. In CVPR (pp. 1249–1256). IEEE. Liu, Y., Stoll, C., Gall, J., Seidel, H. P., & Theobalt, C. (2011). Markerless motion capture of interacting characters using multi-view image segmentation. In CVPR (pp. 1249–1256). IEEE.
Zurück zum Zitat Malleson, C., Gilbert, A., Trumble, M., Collomosse, J., Hilton, A., & Volino, M. (2017). Real-time full-body motion capture from video and imus. In 3DV (pp. 449–457). IEEE. Malleson, C., Gilbert, A., Trumble, M., Collomosse, J., Hilton, A., & Volino, M. (2017). Real-time full-body motion capture from video and imus. In 3DV (pp. 449–457). IEEE.
Zurück zum Zitat von Marcard, T., Henschel, R., Black, MJ., Rosenhahn, B., & Pons-Moll, G. (2018). Recovering accurate 3d human pose in the wild using imus and a moving camera. In ECCV (pp. 601–617). von Marcard, T., Henschel, R., Black, MJ., Rosenhahn, B., & Pons-Moll, G. (2018). Recovering accurate 3d human pose in the wild using imus and a moving camera. In ECCV (pp. 601–617).
Zurück zum Zitat Martinez, J., Hossain, R., Romero, J., & Little, J. J. (2017). A simple yet effective baseline for 3D human pose estimation. In ICCV (p. 5). Martinez, J., Hossain, R., Romero, J., & Little, J. J. (2017). A simple yet effective baseline for 3D human pose estimation. In ICCV (p. 5).
Zurück zum Zitat Moeslund, T. B., Hilton, A., & Krüger, V. (2006). A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding, 104(2–3), 90–126.CrossRef Moeslund, T. B., Hilton, A., & Krüger, V. (2006). A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding, 104(2–3), 90–126.CrossRef
Zurück zum Zitat Newell, A., Yang, K., & Deng, J. (2016). Stacked hourglass networks for human pose estimation. In ECCV (pp. 483–499). Springer. Newell, A., Yang, K., & Deng, J. (2016). Stacked hourglass networks for human pose estimation. In ECCV (pp. 483–499). Springer.
Zurück zum Zitat Pavlakos, G., Zhou, X., Derpanis, K. G., & Daniilidis, K. (2017). Harvesting multiple views for marker-less 3D human pose annotations. In: CVPR (pp. 1253–1262). Pavlakos, G., Zhou, X., Derpanis, K. G., & Daniilidis, K. (2017). Harvesting multiple views for marker-less 3D human pose annotations. In: CVPR (pp. 1253–1262).
Zurück zum Zitat Pavlakos, G., Zhou, X., & Daniilidis, K. (2018). Ordinal depth supervision for 3d human pose estimation. In CVPR (pp. 7307–7316). Pavlakos, G., Zhou, X., & Daniilidis, K. (2018). Ordinal depth supervision for 3d human pose estimation. In CVPR (pp. 7307–7316).
Zurück zum Zitat Pavllo, D., Feichtenhofer, C., Grangier, D., & Auli, M. (2019). 3d human pose estimation in video with temporal convolutions and semi-supervised training. In CVPR (pp. 7753–7762). Pavllo, D., Feichtenhofer, C., Grangier, D., & Auli, M. (2019). 3d human pose estimation in video with temporal convolutions and semi-supervised training. In CVPR (pp. 7753–7762).
Zurück zum Zitat Peng, X., Tang, Z., Yang, F., Feris, R. S., & Metaxas, D. (2018). Jointly optimize data augmentation and network training: Adversarial data augmentation in human pose estimation. In CVPR (pp. 2226–2234). Peng, X., Tang, Z., Yang, F., Feris, R. S., & Metaxas, D. (2018). Jointly optimize data augmentation and network training: Adversarial data augmentation in human pose estimation. In CVPR (pp. 2226–2234).
Zurück zum Zitat Perez, P., Vermaak, J., & Blake, A. (2004). Data fusion for visual tracking with particles. Proceedings of the IEEE, 92(3), 495–513.CrossRef Perez, P., Vermaak, J., & Blake, A. (2004). Data fusion for visual tracking with particles. Proceedings of the IEEE, 92(3), 495–513.CrossRef
Zurück zum Zitat Pleiss, G., Raghavan, M., Wu, F., Kleinberg, J., & Weinberger, K. Q. (2017). On fairness and calibration. In NIPS (pp. 5680–5689). Pleiss, G., Raghavan, M., Wu, F., Kleinberg, J., & Weinberger, K. Q. (2017). On fairness and calibration. In NIPS (pp. 5680–5689).
Zurück zum Zitat Qiu, H., Wang, C., Wang, J., Wang, N., & Zeng, W. (2019). Cross view fusion for 3d human pose estimation. In ICCV (pp. 4342–4351). Qiu, H., Wang, C., Wang, J., Wang, N., & Zeng, W. (2019). Cross view fusion for 3d human pose estimation. In ICCV (pp. 4342–4351).
Zurück zum Zitat Qiu, W., Zhong, F., Zhang, Y., Qiao, S., Xiao, Z., Kim, T. S., et al. (2017). Unrealcv: Virtual worlds for computer vision. In Proceedings of the 25th ACM international conference on multimedia (pp. 1221–1224 ).ACM. Qiu, W., Zhong, F., Zhang, Y., Qiao, S., Xiao, Z., Kim, T. S., et al. (2017). Unrealcv: Virtual worlds for computer vision. In Proceedings of the 25th ACM international conference on multimedia (pp. 1221–1224 ).ACM.
Zurück zum Zitat Rhodin, H., Spörri, J., Katircioglu, I., Constantin, V., Meyer, F., Müller, E., Salzmann, M., et al. (2018). Learning monocular 3d human pose estimation from multi-view images. In CVPR (pp. 8437–8446). Rhodin, H., Spörri, J., Katircioglu, I., Constantin, V., Meyer, F., Müller, E., Salzmann, M., et al. (2018). Learning monocular 3d human pose estimation from multi-view images. In CVPR (pp. 8437–8446).
Zurück zum Zitat Roetenberg, D., Luinge, H., & Slycke, P. (2009). Xsens mvn: full 6dof human motion tracking using miniature inertial sensors. Xsens Motion Technologies BV, Tech Rep 1. Roetenberg, D., Luinge, H., & Slycke, P. (2009). Xsens mvn: full 6dof human motion tracking using miniature inertial sensors. Xsens Motion Technologies BV, Tech Rep 1.
Zurück zum Zitat Rogez, G., Schmid, C. (2016). Mocap-guided data augmentation for 3d pose estimation in the wild. In NIPS (pp. 3108–3116). Rogez, G., Schmid, C. (2016). Mocap-guided data augmentation for 3d pose estimation in the wild. In NIPS (pp. 3108–3116).
Zurück zum Zitat Sigal, L., Balan, A. O., & Black, M. J. (2010). Humaneva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. IJCV, 87(1–2), 4.CrossRef Sigal, L., Balan, A. O., & Black, M. J. (2010). Humaneva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. IJCV, 87(1–2), 4.CrossRef
Zurück zum Zitat Starner, T., Leibe, B., Minnen, D., Westyn, T., Hurst, A., & Weeks, J. (2003). The perceptive workbench: Computer-vision-based gesture tracking, object tracking, and 3d reconstruction for augmented desks. Machine Vision and Applications, 14(1), 59–71.CrossRef Starner, T., Leibe, B., Minnen, D., Westyn, T., Hurst, A., & Weeks, J. (2003). The perceptive workbench: Computer-vision-based gesture tracking, object tracking, and 3d reconstruction for augmented desks. Machine Vision and Applications, 14(1), 59–71.CrossRef
Zurück zum Zitat Sun, K., Xiao, B., Liu, D., & Wang, J. (2019). Deep high-resolution representation learning for human pose estimation. In CVPR (pp. 5693–5703). Sun, K., Xiao, B., Liu, D., & Wang, J. (2019). Deep high-resolution representation learning for human pose estimation. In CVPR (pp. 5693–5703).
Zurück zum Zitat Sun, X., Xiao, B., Wei, F., Liang, S., & Wei, Y. (2018). Integral human pose regression. In ECCV (pp. 529–545). Sun, X., Xiao, B., Wei, F., Liang, S., & Wei, Y. (2018). Integral human pose regression. In ECCV (pp. 529–545).
Zurück zum Zitat Tome, D., Toso, M., Agapito, L., & Russell, C. (2018). Rethinking pose in 3D: Multi-stage refinement and recovery for markerless motion capture. In 3DV (pp. 474–483). Tome, D., Toso, M., Agapito, L., & Russell, C. (2018). Rethinking pose in 3D: Multi-stage refinement and recovery for markerless motion capture. In 3DV (pp. 474–483).
Zurück zum Zitat Trumble, M., Gilbert, A., Malleson, C., Hilton, A., & Collomosse, J. (2017). Total capture: 3D human pose estimation fusing video and inertial sensors. In BMVC (pp. 1–13). Trumble, M., Gilbert, A., Malleson, C., Hilton, A., & Collomosse, J. (2017). Total capture: 3D human pose estimation fusing video and inertial sensors. In BMVC (pp. 1–13).
Zurück zum Zitat Trumble, M., Gilbert, A., Hilton, A., & Collomosse, J. (2018). Deep autoencoder for combined human pose estimation and body model upscaling. In ECCV (pp. 784–800). Trumble, M., Gilbert, A., Hilton, A., & Collomosse, J. (2018). Deep autoencoder for combined human pose estimation and body model upscaling. In ECCV (pp. 784–800).
Zurück zum Zitat Tu, H., Wang, C., & Zeng, W. (2020). Voxelpose: Towards multi-camera 3d human pose estimation in wild environment. In ECCV (pp. 1–16). Tu, H., Wang, C., & Zeng, W. (2020). Voxelpose: Towards multi-camera 3d human pose estimation in wild environment. In ECCV (pp. 1–16).
Zurück zum Zitat Varol, G., Romero, J., Martin, X., Mahmood, N., Black, MJ., Laptev, I., et al. (2017). Learning from synthetic humans. In CVPR (pp. 109–117). Varol, G., Romero, J., Martin, X., Mahmood, N., Black, MJ., Laptev, I., et al. (2017). Learning from synthetic humans. In CVPR (pp. 109–117).
Zurück zum Zitat Wei, S. E., Ramakrishna, V., Kanade, T., & Sheikh, Y. (2016). Convolutional pose machines. In CVPR (pp. 4724–4732). Wei, S. E., Ramakrishna, V., Kanade, T., & Sheikh, Y. (2016). Convolutional pose machines. In CVPR (pp. 4724–4732).
Zurück zum Zitat Xiang, D., Joo, H., & Sheikh, Y. (2019). Monocular total capture: Posing face, body, and hands in the wild. In CVPR. Xiang, D., Joo, H., & Sheikh, Y. (2019). Monocular total capture: Posing face, body, and hands in the wild. In CVPR.
Zurück zum Zitat Xiao, B., Wu, H., & Wei, Y. (2018). Simple baselines for human pose estimation and tracking. In ECCV (pp. 466–481). Xiao, B., Wu, H., & Wei, Y. (2018). Simple baselines for human pose estimation and tracking. In ECCV (pp. 466–481).
Zurück zum Zitat Xie, R., Wang, C., & Wang, C. (2020). Metafuse: A pre-trained fusion model for human pose estimation. In CVPR. Xie, R., Wang, C., & Wang, C. (2020). Metafuse: A pre-trained fusion model for human pose estimation. In CVPR.
Zurück zum Zitat Yang, W., Ouyang, W., Wang, X., Ren, J., Li, H., & Wang, X. (2018). 3d human pose estimation in the wild by adversarial learning. In CVPR (pp. 5255–5264). Yang, W., Ouyang, W., Wang, X., Ren, J., Li, H., & Wang, X. (2018). 3d human pose estimation in the wild by adversarial learning. In CVPR (pp. 5255–5264).
Zurück zum Zitat Zafar, U., Ghafoor, M., Zia, T., Ahmed, G., Latif, A., Malik, K. R., et al. (2019). Face recognition with Bayesian convolutional networks for robust surveillance systems. EURASIP Journal on Image and Video Processing, 1, 10.CrossRef Zafar, U., Ghafoor, M., Zia, T., Ahmed, G., Latif, A., Malik, K. R., et al. (2019). Face recognition with Bayesian convolutional networks for robust surveillance systems. EURASIP Journal on Image and Video Processing, 1, 10.CrossRef
Zurück zum Zitat Zhang, Z., Wang, C., Qin, W., & Zeng, W. (2020). Fusing wearable imus with multi-view images for human pose estimation: A geometric approach. In CVPR (pp. 2200–2209). Zhang, Z., Wang, C., Qin, W., & Zeng, W. (2020). Fusing wearable imus with multi-view images for human pose estimation: A geometric approach. In CVPR (pp. 2200–2209).
Zurück zum Zitat Zhao, M., Li, T., Abu Alsheikh, M., Tian, Y., Zhao, H., Torralba, A., et al. (2018). Through-wall human pose estimation using radio signals. In CVPR (pp. 7356–7365). Zhao, M., Li, T., Abu Alsheikh, M., Tian, Y., Zhao, H., Torralba, A., et al. (2018). Through-wall human pose estimation using radio signals. In CVPR (pp. 7356–7365).
Zurück zum Zitat Zhao, M., Liu, Y., Raghu, A., Li, T., Zhao, H., Torralba, A., et al. (2019). Through-wall human mesh recovery using radio signals. In ICCV (pp. 10113–10122). Zhao, M., Liu, Y., Raghu, A., Li, T., Zhao, H., Torralba, A., et al. (2019). Through-wall human mesh recovery using radio signals. In ICCV (pp. 10113–10122).
Zurück zum Zitat Zhou, X., Huang, Q., Sun, X., Xue, X., & Wei, Y. (2017). Towards 3D human pose estimation in the wild: A weakly-supervised approach. In ICCV (pp. 398–407). Zhou, X., Huang, Q., Sun, X., Xue, X., & Wei, Y. (2017). Towards 3D human pose estimation in the wild: A weakly-supervised approach. In ICCV (pp. 398–407).
Metadaten
Titel
AdaFuse: Adaptive Multiview Fusion for Accurate Human Pose Estimation in the Wild
verfasst von
Zhe Zhang
Chunyu Wang
Weichao Qiu
Wenhu Qin
Wenjun Zeng
Publikationsdatum
16.11.2020
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 3/2021
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-020-01398-9

Weitere Artikel der Ausgabe 3/2021

International Journal of Computer Vision 3/2021 Zur Ausgabe