nach oben

International Journal of Computer Vision

Erschienen in:

15.10.2021

DeMoCap: Low-Cost Marker-Based Motion Capture

verfasst von: Anargyros Chatzitofis, Dimitrios Zarpalas, Petros Daras, Stefanos Kollias

Erschienen in: International Journal of Computer Vision | Ausgabe 12/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Optical marker-based motion capture (MoCap) remains the predominant way to acquire high-fidelity articulated body motions. We introduce DeMoCap, the first data-driven approach for end-to-end marker-based MoCap, using only a sparse setup of spatio-temporally aligned, consumer-grade infrared-depth cameras. Trading off some of their typical features, our approach is the sole robust option for far lower-cost marker-based MoCap than high-end solutions. We introduce an end-to-end differentiable markers-to-pose model to solve a set of challenges such as under-constrained position estimates, noisy input data and spatial configuration invariance. We simultaneously handle depth and marker detection noise, label and localize the markers, and estimate the 3D pose by introducing a novel spatial 3D coordinate regression technique under a multi-view rendering and supervision concept. DeMoCap is driven by a special dataset captured with 4 spatio-temporally aligned low-cost Intel RealSense D415 sensors and a 24 MXT40S camera professional MoCap system, used as input and ground truth, respectively.

Vorheriger Artikel 3D-FUTURE: 3D Furniture Shape with TextURE

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

For the sake of clarity, we mention that T-Pose or any other pose is not required for the initialization of our method.

For the dataset recordings, we used the publicly available volumetric capturing tool (https://github.com/VCL3D/VolumetricCapture) proposed by Sterzentsenko et al. (2018).

The models that were publicly available in the official repository of the authors at the time of this work.

https://github.com/karfly/learnable-triangulation-pytorch/tree/9d1a26ea893a513bdff55f30ecbfd2ca8217bf5d

https://github.com/CMU-Perceptual-Computing-Lab/openpose/tree/1f1aa9c59fe59c90cca685b724f4f97f76137224

https://github.com/tofis/democap

http://mocap.cs.sfu.ca/index154af.html?id=0018_DanceTurns002.bvh

http://mocap.cs.sfu.ca/index1fe61.html?id=0015_HopOverObstacle001.bvh.

Alexanderson, S., O’Sullivan, C., & Beskow, J. (2017). Real-time labeling of non-rigid motion capture marker sets. Computers & Graphics, 69, 59–67.

Bascones, J. L. J. (2019). Cloud point labelling in optical motion capture systems. Ph.D. thesis, Universidad del País Vasco-Euskal Herriko Unibertsitatea.

Bekhtaoui, W., Sa, R., Teixeira, B., Singh, V., Kirchberg, K., Yj, Chang, & Kapoor, A. (2020). View invariant human body detection and pose estimation from multiple depth sensors. arXiv preprint arXiv:2005.04258.

Buhrmester, V., Münch, D., Bulatov, D., & Arens, M. (2019). Evaluating the impact of color information in deep neural networks. In Iberian conference on pattern recognition and image analysis (pp. 302–316). Springer.

Burenius, M., Sullivan, J., & Carlsson, S. (2013). 3D pictorial structures for multiple view articulated pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3618–3625).

Cao, Z., Simon, T., Wei, S. E., & Sheikh, Y. (2017). Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7291–7299).

Chatzitofis, A., Zarpalas, D., Kollias, S., & Daras, P. (2019). Deepmocap: Deep optical motion capture using multiple depth sensors and retro-reflectors. Sensors, 19(2), 282.CrossRef

Chatzitofis, A., Saroglou, L., Boutis, P., Drakoulis, P., Zioulis, N., Subramanyam, S., Kevelham, B., Charbonnier, C., Cesar, P., Zarpalas, D., et al. (2020). Human4d: A human-centric multimodal dataset for motions and immersive media. IEEE Access, 8, 176241–176262.CrossRef

Cheng, B., Xiao, B., Wang, J., Shi, H., Huang, T.S., Zhang, L. (2019). Bottom-up higher-resolution networks for multi-person pose estimation. arXiv preprint arXiv:1908.10357.

Doosti, B., Naha, S., Mirbagheri, M., & Crandall, D. J. (2020). Hope-net: A graph-based model for hand-object pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6608–6617).

Elhayek, A., de Aguiar, E., Jain, A., Tompson, J., Pishchulin, L., Andriluka, M., Bregler, C., Schiele, B., & Theobalt, C. (2015). Efficient convnet-based marker-less motion capture in general scenes with a low number of cameras. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3810–3818).

Feng, Z. H., Kittler, J., Awais, M., Huber, P., & Wu, X. J. (2018). Wing loss for robust facial landmark localisation with convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2235–2245).

Fuglede, B., Topsoe, F. (2004). Jensen-shannon divergence and hilbert space embedding. In International symposium on information theory, 2004. ISIT 2004. Proceedings (p. 31). IEEE.

Gao, H., & Ji, S. (2019). Graph u-nets. In International conference on machine learning, PMLR (pp. 2083–2092).

Gaschler, A. (2011). Real-time marker-based motion tracking: Application to kinematic model estimation of a humanoid robot. Thesis

Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the 13th international conference on artificial intelligence and statistics (pp. 249–256).

Guler, R. A., & Kokkinos, I. (2019). Holopose: Holistic 3D human reconstruction in-the-wild. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 10884–10894).

Han, S., Liu, B., Wang, R., Ye, Y., Twigg, C. D., & Kin, K. (2018). Online optical marker-based hand tracking with deep labels. ACM Transactions on Graphics (TOG), 37(4), 166.CrossRef

Haque, A., Peng, B., Luo, Z., Alahi, A., Yeung, S., & Fei-Fei, L. (2016). Towards viewpoint invariant 3D human pose estimation. In European conference on computer vision (pp. 160–177). Springer

Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge University Press.

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).

He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961–2969).

Holden, D. (2018). Robust solving of optical motion capture data by denoising. ACM Transactions on Graphics (TOG), 37(4), 1–12.CrossRef

Ionescu, C., Papava, D., Olaru, V., & Sminchisescu, C. (2013). Human3.6m: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7), 1325–1339.

Iskakov, K., Burkov, E., Lempitsky, V., & Malkov, Y. (2019). Learnable triangulation of human pose. In Proceedings of the IEEE international conference on computer vision (pp. 7718–7727).

Joo, H., Simon, T., & Sheikh, Y. (2018). Total capture: A 3D deformation model for tracking faces, hands, and bodies. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8320–8329).

Keselman, L., Iselin Woodfill, J., Grunnet-Jepsen, A., & Bhowmik, A. (2017). Intel realsense stereoscopic depth cameras. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 1–10).

Kingma, D. P., Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

Li, S., Zhang, W., & Chan, A. B. (2015). Maximum-margin structured learning with deep networks for 3D human pose estimation. In Proceedings of the IEEE international conference on computer vision (ICCV).

Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740–755). Springer.

Loper, M., Mahmood, N., & Black, M. J. (2014). Mosh: Motion and shape capture from sparse markers. ACM Transactions on Graphics (TOG), 33(6), 220.CrossRef

Luvizon, D. C., Tabia, H., & Picard, D. (2019). Human pose regression by combining indirect part detection and contextual information. Computers & Graphics, 85, 15–22.CrossRef

Mahmood, N., Ghorbani, N., Troje, N. F., Pons-Moll, G., Black, M. J. (2019). Amass: Archive of motion capture as surface shapes. arXiv preprint arXiv:1904.03278.

Martínez-González, A., Villamizar, M., Canévet, O., Odobez, J. M. (2018a). Investigating depth domain adaptation for efficient human pose estimation. In 2018 European conference on computer vision—workshops, ECCV 2018.

Martínez-González, A., Villamizar, M., Canévet, O., & Odobez, J. M. (2018b). Real-time convolutional networks for depth-based human pose estimation. In 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 41–47). https://doi.org/10.1109/IROS.2018.8593383.

Mehta, D., Sridhar, S., Sotnychenko, O., Rhodin, H., Shafiei, M., Seidel, H. P., Xu, W., Casas, D., & Theobalt, C. (2017). Vnect: Real-time 3D human pose estimation with a single rgb camera. ACM Transactions on Graphics (TOG), 36(4), 1–14.CrossRef

Mehta, D., Sotnychenko, O., Mueller, F., Xu, W., Elgharib, M., Fua, P., Seidel, H. P., Rhodin, H., Pons-Moll, G., Theobalt, C. (2019). Xnect: Real-time multi-person 3D human pose estimation with a single RGB camera. arXiv preprint arXiv:1907.00837.

moai, . (2021). moai: Accelerating modern data-driven workflows. https://github.com/ai-in-motion/moai.

Newell, A., Yang, K., & Deng, J. (2016). Stacked hourglass networks for human pose estimation. In European conference on computer vision (pp. 483–499). Springer.

Nibali, A., He, Z., Morgan, S., Prendergast, L. (2018). Numerical coordinate regression with convolutional neural networks. arXiv preprint arXiv:1801.07372.

Park, S., Yong Chang, J., Jeong, H., Lee, J. H., & Park, J. Y. (2017). Accurate and efficient 3D human pose estimation algorithm using single depth images for pose analysis in golf. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 49–57).

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Chintala, S. (2019). Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. de-Buc, E. Fox, & R. Garnett (Eds.), Advances in neural information processing systems 32 (pp. 8024–8035). Curran Associates Inc.

Pavllo, D., Porssut, T., Herbelin, B., & Boulic, R. (2018). Real-time finger tracking using active motion capture: A neural network approach robust to occlusions. In Proceedings of the 11th annual international conference on motion, interaction, and games (pp. 1–10).

Perepichka, M., Holden, D., Mudur, S. P., & Popa, T. (2019). Robust marker trajectory repair for mocap using kinematic reference. In Motion, interaction and games (pp. 1–10). Ernst & Sohn.

Qi, C. R., Yi, L., Su, H., Guibas, L. J. (2017). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems (pp. 5099–5108).

Qiu, H., Wang, C., Wang, J., Wang, N., & Zeng, W. (2019). Cross view fusion for 3d human pose estimation. In Proceedings of the IEEE international conference on computer vision (pp. 4342–4351).

Rhodin, H., Salzmann, M., & Fua, P. (2018). Unsupervised geometry-aware representation for 3D human pose estimation. In Proceedings of the European conference on computer vision (ECCV) (pp. 750–767).

Riegler, G., Osman Ulusoy, A., & Geiger, A. (2017). Octnet: Learning deep 3D representations at high resolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3577–3586).

Rüegg, N., Lassner, C., Black, M. J., Schindler, K. (2020). Chained representation cycling: Learning to estimate 3D human pose and shape by cycling between representations. arXiv preprint arXiv:2001.01613.

Sigal, L., Isard, M., Haussecker, H., & Black, M. J. (2012). Loose-limbed people: Estimating 3D human pose and motion using non-parametric belief propagation. International Journal of Computer Vision, 98(1), 15–48.MathSciNetCrossRef

Sterzentsenko, V., Karakottas, A., Papachristou, A., Zioulis, N., Doumanoglou, A., Zarpalas, D., & Daras, P. (2018). A low-cost, flexible and portable volumetric capturing system. In 2018 14th international conference on signal-image technology & internet-based systems (SITIS) (pp. 200–207). IEEE.

Sun, X., Xiao, B., Wei, F., Liang, S., & Wei, Y. (2018). Integral human pose regression. In Proceedings of the European conference on computer vision (ECCV) (pp. 529–545).

Tensmeyer, C., Martinez, T. (2019). Robust keypoint detection. In 2019 international conference on document analysis and recognition workshops (ICDARW) (Vol. 5, pp. 1–7). IEEE.

Tompson, J.J., Jain, A., LeCun, Y., Bregler, C. (2014). Joint training of a convolutional network and a graphical model for human pose estimation. In Advances in neural information processing systems (pp. 1799–1807).

Toshev, A., & Szegedy, C. (2014). Deeppose: Human pose estimation via deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1653–1660).

Tu, H., Wang, C., Zeng, W. (2020). Voxelpose: Towards multi-camera 3d human pose estimation in wild environment. In Computer Vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16 (pp. 197–212). Springer.

VICON L. (1984). Vicon systems ltd. https://www.vicon.com/

Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X., et al. (2020). Deep high-resolution representation learning for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence.

Wei, S. E., Ramakrishna, V., Kanade, T., & Sheikh, Y. (2016). Convolutional pose machines. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4724–4732).

Yang, Y., Ramanan, D. (2011). Articulated pose estimation with flexible mixtures-of-parts. In CVPR 2011 (pp. 1385–1392). IEEE.

Ying, K. Y. G. J. (2011). Sfu motion capture database. http://mocap.cs.sfu.ca/

Zanfir, A., Marinoiu, E., Zanfir, M., Popa, A. I., & Sminchisescu, C. (2018). Deep network for the integrated 3D sensing of multiple people in natural images. Advances in Neural Information Processing Systems, 31, 8410–8419.

Zhang, F., Zhu, X., Dai, H., Ye, M., & Zhu, C. (2020a). Distribution-aware coordinate representation for human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7093–7102).

Zhang, Y., An, L., Yu, T., Li, X., Li, K., & Liu, Y. (2020b). 4D association graph for realtime multi-person motion capture using multiple video cameras. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1324–1333).

Zhang, Z. (2012). Microsoft kinect sensor and its effect. IEEE Multimedia, 19(2), 4–10.CrossRef

Titel: DeMoCap: Low-Cost Marker-Based Motion Capture
verfasst von: Anargyros Chatzitofis
Dimitrios Zarpalas
Petros Daras
Stefanos Kollias
Publikationsdatum: 15.10.2021
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 12/2021
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-021-01526-z

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 12/2021

Hierarchical Domain-Adapted Feature Learning for Video Saliency Prediction

Computer Vision and Pattern Recognition 2020

3D-FUTURE: 3D Furniture Shape with TextURE

Dual-Constrained Deep Semi-Supervised Coupled Factorization Network with Enriched Prior

Guest Editorial: Special Issue on Deep Learning for Video Analysis and Compression

NAS-FCOS: Efficient Search for Object Detection Architectures

Premium Partner