Abstract
Raw optical motion capture data often includes errors such as occluded markers, mislabeled markers, and high frequency noise or jitter. Typically these errors must be fixed by hand - an extremely time-consuming and tedious task. Due to this, there is a large demand for tools or techniques which can alleviate this burden. In this research we present a tool that sidesteps this problem, and produces joint transforms directly from raw marker data (a task commonly called "solving") in a way that is extremely robust to errors in the input data using the machine learning technique of denoising. Starting with a set of marker configurations, and a large database of skeletal motion data such as the CMU motion capture database [CMU 2013b], we synthetically reconstruct marker locations using linear blend skinning and apply a unique noise function for corrupting this marker data - randomly removing and shifting markers to dynamically produce billions of examples of poses with errors similar to those found in real motion capture data. We then train a deep denoising feed-forward neural network to learn a mapping from this corrupted marker data to the corresponding transforms of the joints. Once trained, our neural network can be used as a replacement for the solving part of the motion capture pipeline, and, as it is very robust to errors, it completely removes the need for any manual clean-up of data. Our system is accurate enough to be used in production, generally achieving precision to within a few millimeters, while additionally being extremely fast to compute with low memory requirements.
Supplemental Material
- Ijaz Akhter, Tomas Simon, Sohaib Khan, Iain Matthews, and Yaser Sheikh. 2012. Bilinear spatiotemporal basis models. ACM Transactions on Graphics 31, 2 (April 2012), 17:1--17:12. Google ScholarDigital Library
- Andreas Aristidou, Daniel Cohen-Or, Jessica K. Hodgins, and Ariel Shamir. 2018. Self-similarity Analysis for Motion Capture Cleaning. Comput. Graph. Forum 37, 2 (2018).Google Scholar
- Andreas Aristidou and Joan Lasenby. 2013. Real-time marker prediction and CoR estimation in optical motion capture. The Visual Computer 29, 1 (01 Jan 2013), 7--26.Google Scholar
- Autodesk. 2016. MotionBuilder HumanIK. https://knowledge.autodesk.com/support/motionbuilder/learn-explore/caas/CloudHelp/cloudhelp/2017/ENU/MotionBuilder/files/GUID-44608A05-8D2F-4C2F-BDA6-E3945F36C872-htm.html. (2016).Google Scholar
- Steve Bako, Thijs Vogels, Brian Mcwilliams, Mark Meyer, Jan NováK, Alex Harvill, Pradeep Sen, Tony Derose, and Fabrice Rousselle. 2017. Kernel-predicting Convolutional Networks for Denoising Monte Carlo Renderings. ACM Trans. Graph. 36, 4, Article 97 (July 2017), 14 pages. Google ScholarDigital Library
- Jan Baumann, Björn Krüger, Arno Zinke, and Andreas Weber. 2011. Data-Driven Completion of Motion Capture Data. In Workshop in Virtual Reality interactions and Physical Simulation "VRIPHYS" (2011), Jan Bender, Kenny Erleben, and Eric Galin (Eds.). The Eurographics Association.Google Scholar
- Paul J. Besl and Neil D. McKay. 1992. A Method for Registration of 3-D Shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14, 2 (Feb. 1992), 239--256. Google ScholarDigital Library
- M. Burke and J. Lasenby. 2016. Estimating missing marker positions using low dimensional Kaiman smoothing. Journal of Biomechanics 49, 9 (2016), 1854 -- 1858.Google ScholarCross Ref
- Samuel Buss. 2004. Introduction to inverse kinematics with Jacobian transpose, pseudoinverse and damped least squares methods. 17 (05 2004).Google Scholar
- Jinxiang Chai and Jessica K. Hodgins. 2005. Performance Animation from Low-dimensional Control Signals. In ACM SIGGRAPH 2005 Papers (SIGGRAPH '05). ACM, New York, NY, USA, 686--696. Google ScholarDigital Library
- Chakravarty R. Alla Chaitanya, Anton S. Kaplanyan, Christoph Schied, Marco Salvi, Aaron Lefohn, Derek Nowrouzezahrai, and Timo Aila. 2017. Interactive Reconstruction of Monte Carlo Image Sequences Using a Recurrent Denoising Au-toencoder. ACM Trans. Graph. 36, 4, Article 98 (July 2017), 12 pages. Google ScholarDigital Library
- Junyoung Chung, Çaglar Gülçehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. CoRR abs/1412.3555 (2014). arXiv:1412.3555 http://arxiv.org/abs/1412.3555Google Scholar
- CMU. 2013a. BVH Conversion of Carnegie-Mellon Mocap Database. https://sites.google.com/a/cgspeed.com/cgspeed/motion-capture/cmu-bvh-conversion/. (2013).Google Scholar
- CMU. 2013b. Carnegie-Mellon Mocap Database, http://mocap.cs.cmu.edu/. (2013).Google Scholar
- Klaus Dorfmüller-Ulhaas. 2005. Robust Optical User Motion Tracking Using a Kaiman Filter.Google Scholar
- Yinfu Feng, Ming-Ming Ji, Jun Xiao, Xiaosong Yang, Jian J Zhang, Yueting Zhuang, and Xuelong Li. 2014a. Mining Spatial-Temporal Patterns and Structural Sparsity for Human Motion Data Denoising. 45 (12 2014).Google Scholar
- Yinfu Feng, Jun Xiao, Yueting Zhuang, Xiaosong Yang, Jian J. Zhang, and Rong Song. 2014b. Exploiting temporal stability and low-rank structure for motion capture data refinement. Information Sciences 277, Supplement C (2014), 777 -- 793.Google ScholarCross Ref
- Tamar Flash and Neville Hogans. 1985. The Coordination of Arm Movements: An Experimentally Confirmed Mathematical Model. Journal of neuroscience 5 (1985), 1688--1703.Google ScholarCross Ref
- Katerina Fragkiadaki, Sergey Levine, and Jitendra Malik. 2015. Recurrent Network Models for Kinematic Tracking. CoRR abs/1508.00271 (2015). arXiv:1508.00271 http://arxiv.org/abs/1508.00271 Google ScholarDigital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. CoRR abs/1512.03385 (2015). arXiv:1512.03385 http://arxiv.org/abs/1512.03385Google Scholar
- Daniel Holden, Jun Saito, and Taku Komura. 2016. A Deep Learning Framework for Character Motion Synthesis and Editing. ACM Trans. Graph. 35, 4, Article 138 (July 2016), 11 pages. Google ScholarDigital Library
- Daniel Holden, Jun Saito, Taku Komura, and Thomas Joyce. 2015. Learning Motion Manifolds with Convolutional Autoencoders. In SIGGRAPH Asia 2015 Technical Briefs (SA '15). ACM, New York, NY, USA, Article 18, 4 pages. Google ScholarDigital Library
- Hengyuan Hu, Lisheng Gao, and Quanbin Ma. 2016. Deep Restricted Boltzmann Networks. CoRR abs/1611.07917 (2016). arXiv:1611.07917 http://arxiv.org/abs/1611.07917Google Scholar
- Alexander G. Ororbia II, C. Lee Giles, and David Reitter. 2015. Online Semi-Supervised Learning with Deep Hybrid Boltzmann Machines and Denoising Autoencoders. CoRR abs/1511.06964 (2015). arXiv:1511.06964 http://arxiv.org/abs/1511.06964Google Scholar
- Ladislav Kavan, Steven Collins, Jiří Žára, and Carol O'Sullivan. 2008. Geometric Skinning with Approximate Dual Quaternion Blending. ACM Trans. Graph. 27, 4, Article 105 (Nov. 2008), 23 pages. Google ScholarDigital Library
- Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2014). arXiv:1412.6980 http://arxiv.org/abs/1412.6980Google Scholar
- Günter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. 2017. Self-Normalizing Neural Networks. CoRR abs/1706.02515 (2017). arXiv:1706.02515 http://arxiv.org/abs/1706.02515Google Scholar
- Yann LeCun, Léon Bottou, Genevieve B. Orr, and Klaus-Robert Müller. 1998. Efficient BackProp. In Neural Networks: Tricks of the Trade, This Book is an Outgrowth of a 1996 NIPS Workshop. Springer-Verlag, London, UK, UK, 9--50. http://dl.acm.org/citation.cfm?id=645754.668382 Google ScholarDigital Library
- Lei Li, James Mccann, Nancy Pollard, Christos Faloutsos, Lei Li, James Mccann, Nancy Pollard, and Christos Faloutsos. 2010. Bolero: A principled technique for including bone length constraints in motion capture occlusion filling. In In Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation. Google ScholarDigital Library
- Guodong Liu and Leonard McMillan. 2006. Estimation of Missing Markers in Human Motion Capture. Vis. Comput. 22, 9 (Sept. 2006), 721--728. Google ScholarDigital Library
- Xin Liu, Yiu-ming Cheung, Shu-Juan Peng, Zhen Cui, Bineng Zhong, and Ji-Xiang Du. 2014. Automatic motion capture data denoising via filtered subspace clustering and low rank matrix approximation. 105 (12 2014), 350--362.Google Scholar
- Utkarsh Mall, G. Roshan Lai, Siddhartha Chaudhuri, and Parag Chaudhuri. 2017. A Deep Recurrent Framework for Cleaning Motion Capture Data. CoRR abs/1712.03380 (2017). arXiv:1712.03380 http://arxiv.org/abs/1712.03380Google Scholar
- Sang Il Park and Jessica K. Hodgins. 2006. Capturing and Animating Skin Deformation in Human Motion. In ACM SIGGRAPH 2006 Papers (SIGGRAPH '06). ACM, New York, NY, USA, 881--889. Google ScholarDigital Library
- Sashank J. Reddi, Satyen Kale, and Sanjiv Kumar. 2018. On the Convergence of Adam and Beyond. In International Conference on Learning Representations. https://openreview.net/forum?id=ryQu7f-RZGoogle Scholar
- Maurice Ringer and Joan Lasenby. 2002. Multiple Hypothesis Tracking for Automatic Optical Motion Capture. Springer Berlin Heidelberg, Berlin, Heidelberg, 524--536. Google ScholarDigital Library
- Ruslan Salakhutdinov and Geoffrey Hinton. 2009. Deep Boltzmann Machines. In Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research), David van Dyk and Max Welling (Eds.), Vol. 5. PMLR, Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA, 448--455. http://proceedings.mlr.press/v5/salakhutdinov09a.htmlGoogle Scholar
- Abraham. Savitzky and Marcel J. E. Golay. 1964. Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Analytical Chemistry 36, 8 (1964), 1627--1639.Google ScholarCross Ref
- arXiv:http://dx.doi.org/10.1021/ac60214a047Google Scholar
- Alexander Sorkine-Hornung, Sandip Sar-Dessai, and Leif Kobbelt. 2005. Self-calibrating optical motion tracking for articulated bodies. IEEE Proceedings. VR 2005. Virtual Reality 2005. (2005), 75--82. Google ScholarDigital Library
- Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research 15 (2014), 1929--1958. http://jmlr.org/papers/v15/srivastava14a.html Google ScholarDigital Library
- Jochen Tautges, Arno Zinke, Björn Krüger, Jan Baumann, Andreas Weber, Thomas Helten, Meinard Müller, Hans-Peter Seidel, and Bernd Eberhardt. 2011. Motion Reconstruction Using Sparse Accelerometer Data. ACM Trans. Graph. 30, 3, Article 18 (May 2011), 12 pages. Google ScholarDigital Library
- Graham W. Taylor, Geoffrey E Hinton, and Sam T. Roweis. 2007. Modeling Human Motion Using Binary Latent Variables. In Advances in Neural Information Processing Systems 19, B. Schölkopf, J. C. Platt, and T. Hoffman (Eds.). MIT Press, 1345--1352. http://papers.nips.cc/paper/3078-modeling-human-motion-using-binary-latent-variables.pdf Google ScholarDigital Library
- Theano Development Team. 2016. Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints abs/1605.02688 (May 2016). http://arxiv.org/abs/1605.02688Google Scholar
- Alex Varghese, Kiran Vaidhya, Subramaniam Thirunavukkarasu, Chandrasekharan Kesavdas, and Ganapathy Krishnamurthi. 2016. Semi-supervised Learning using Denoising Autoencoders for Brain Lesion Detection and Segmentation. CoRR abs/1611.08664 (2016). arXiv:1611.08664 http://arxiv.org/abs/1611.08664Google Scholar
- Vicon. 2018. Vicon Software, https://www.vicon.com/products/software/. (2018).Google Scholar
- Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. 2010. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion. J. Mach. Learn. Res. 11 (Dec. 2010), 3371--3408. http://dl.acm.org/citation.cfm?id=1756006.1953039 Google ScholarDigital Library
- Jun Xiao, Yinfu Feng, Mingming Ji, Xiaosong Yang, Jian J. Zhang, and Yueting Zhuang. 2015. Sparse Motion Bases Selection for Human Motion Denoising. Signal Process. 110, C (May 2015), 108--122. Google ScholarDigital Library
- Zhidong Xiao, Hammadi Nait-Charif, and Jian J. Zhang. 2008. Automatic Estimation of Skeletal Motion from Optical Motion Capture Data. Springer Berlin Heidelberg, Berlin, Heidelberg, 144--153. Google ScholarDigital Library
- Junyuan Xie, Linli Xu, and Enhong Chen. 2012. Image Denoising and Inpainting with Deep Neural Networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 341--349. http://papers.nips.cc/paper/4686-image-denoising-and-inpainting-with-deep-neural-networks.pdf Google ScholarDigital Library
- Raymond A. Yeh, Chen Chen, Teck-Yian Lim, Alexander G. Schwing, Mark Hasegawa-Johnson, and Minh N. Do. 2017. Semantic Image Inpainting with Deep Generative Models. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), 6882--6890.Google Scholar
- Victor Brian Zordan and Nicholas C. Van Der Horst. 2003. Mapping Optical Motion Capture Data to Skeletal Motion Using a Physical Model. In Proceedings of the 2003 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (SCA '03). Eurographics Association, Aire-la-Ville, Switzerland, Switzerland, 245--250. http://dl.acm.org/citation.cfm?id=846276.846311 Google ScholarDigital Library
Index Terms
- Robust solving of optical motion capture data by denoising
Recommendations
Segmenting motion capture data using a qualitative analysis
MIG '15: Proceedings of the 8th ACM SIGGRAPH Conference on Motion in GamesMany interactive 3D games utilize motion capture for both character animation and user input. These applications require short, meaningful sequences of data. Manually producing these segments of motion capture data is a laborious, time-consuming process ...
Using motion capture for interactive motion editing
VRCAI '14: Proceedings of the 13th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in IndustryMotion capture technology has been widely used for creating character motions. Motion editing is usually also required to adjust captured motions. Because character poses which include joint rotations, body positions, and orientations are high-...
Multiscale motion saliency for keyframe extraction from motion capture sequences
Motion capture is an increasingly popular animation technique; however data acquired by motion capture can become substantial. This makes it difficult to use motion capture data in a number of applications, such as motion editing, motion understanding, ...
Comments