Abstract
We present a semi-automatic approach to exchange the clothes of an actor for arbitrary virtual garments in conventional monocular video footage as a postprocess. We reconstruct the actor's body shape and motion from the input video using a parameterized body model. The reconstructed dynamic 3D geometry of the actor serves as an animated mannequin for simulating the virtual garment. It also aids in scene illumination estimation, necessary to realistically light the virtual garment. An image-based warping technique ensures realistic compositing of the rendered virtual garment and the original video. We present results for eight real-world video sequences featuring complex test cases to evaluate performance for different types of motion, camera settings, and illumination conditions.
Supplemental Material
Available for Download
Supplemental movie and image files for, Garment Replacement in Monocular Video Sequences
- A. Agarwal and B. Triggs. 2004. 3D human pose from silhouettes by relevance vector regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'04). 882--888. Google ScholarDigital Library
- D. Anguelov, P. Srinivasan, D. Koller, S. Thrun, J. Rodgers, and J. Davis. 2005. SCAPE: Shape completion and animation of people. ACM Trans. Graph. 24, 3, 408--416. Google ScholarDigital Library
- X. Bai, J. Wang, D. Simons, and G. Sapiro. 2009. Video snap-cut: Robust video object cutout using localized classifiers. ACM Trans. Graph. 28, 3, 70:1--70:11. Google ScholarDigital Library
- A. Balan, L. Sigal, M. Black, J. Davis, and H. Haussecker. 2007. Detailed human shape and pose from images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'07). 1--8.Google Scholar
- A. O. Balan and M. J. Black. 2008. The naked truth: Estimating body shape under clothing. In Proceedings of the European Conference on Computer Vision (ECCV'08). 15--29. Google ScholarDigital Library
- X. Chen, K. Wang, and X. Jin. 2011. Single image based illumination estimation for lighting virtual object in real scene. In Proceedings of the 12th International Conference on Computer Aided Design and Computer Graphics (CAD/Graphics'11). 450--455. Google ScholarDigital Library
- E. De Aguiar, C. Stoll, C. Theobalt, N. Ahmed, H.-P. Seidel, and S. Thrun. 2008. Performance capture from sparse multi-view video. ACM Trans. Graph. 27, 3, 98:1--98:10. Google ScholarDigital Library
- P. Debevec. 1998. Rendering synthetic objects into real scenes: Bridging traditional and image-based graphics with global illumination and high dynamic range photography. In Proceedings of the Conference on Computer Graphics and Interactive Techniques (SIGGRAPH'98). 189--198. Google ScholarDigital Library
- A. Divivier, R. Trieb, A. Ebert, H. Hagen, C. Gross, A. Fuhrmann, V. Luckas, J. L. Encarnao, E. Kirchdorfer, M. Rupp, S. Vieth, S. Kimmerle, M. Keckeisen, M. Wacker, W. Strasser, M. Sattler, and R. Sar. 2004. Virtual try-on: Topics in realistic, individualized dressing in virtual reality. In Proceedings of the Virtual and Augmented Reality Status Conference (VRAR'04). 1--17.Google Scholar
- Fitnect. 2012. Fitnect, interactive kft. http://www.fitnect.hu/.Google Scholar
- J. Frahm, K. Koeser, D. Grest, and R. Koch. 2005. Markerless augmented reality with light source estimation for direct illumination. In Proceedings of the 2nd IEE European Conference on Conference on Visual Media Production (CVMP'05). 211--220.Google Scholar
- E. S. L. Gastal and M. M. Oliveira. 2011. Domain transform for edge-aware image and video processing. ACM Trans. Graph. 30, 4, 69:1--69:12. Google ScholarDigital Library
- S. Gibson, T. Howard, and R. Hubbold. 2001. Flexible image-based photometric reconstruction using virtual light sources. Comput. Graph. Forum 20, 3, 203--214.Google ScholarCross Ref
- S. Giovanni, Y. Choi, J. Huang, E. Khoo, and K. Yin. 2012. Virtual try-on using Kinect and HD camera. In Motion in Games, M. Kallmann and K. Bekris, Eds., Lecture Notes in Computer Science, vol. 7660, Springer, 55--65.Google Scholar
- N. Gkalelis, H. Kim, A. Hilton, N. Nikolaidis, and I. Pitas. 2009. The i3Dpost multi-view and 3D human action/interaction database. In Proceedings of the Conference for Visual Media Production (CVMP'09). 159--168. Google ScholarDigital Library
- M. Granados, K. I. Kim, J. Tompkin, J. Kautz, and C. Theobalt. 2012. Background inpainting for videos with dynamic objects and a free-moving camera. In Proceedings of the 12th European Conference on Computer Vision (ECCV'12). 682--695. Google ScholarDigital Library
- P. Guan, O. Freifeld, and M. Black. 2010. A 2d human body model dressed in eigen clothing. In Proceedings of the European Conference on Computer Vision (ECCV'10). Lecture Notes in Computer Science, vol. 6311, Springer, 285--298. Google ScholarDigital Library
- P. Guan, L. Reiss, D. Hirshberg, A. Weiss, and M. J. Black. 2012. Drape: Dressing any person. ACM Trans. Graph. 31, 4, 35:1--35:10. Google ScholarDigital Library
- P. Guan, A. Weiss, A. Balan, and M. Black. 2009. Estimating human shape and pose from a single image. In Proceedings of the 12th IEEE International Conference on Computer Vision (ICCV'09). 1381--1388.Google Scholar
- I. Guskov, S. Klibanov, and B. Bryant. 2003. Trackable surfaces. In Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation (SCA'03). 251--257. Google ScholarDigital Library
- N. Hasler, B. Rosenhahn, T. Thormahlen, M. Wand, J. Gall, and H.-P. Seidel. 2009a. Markerless motion capture with unsynchronized moving cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09). 224--231.Google ScholarCross Ref
- N. Hasler, C. Stoll, B. Rosenhahn, T. Thormahlen, and H.-P. Seidel. 2009b. Estimating body shape of dressed humans. Comput. Graph. 33, 3, 211--216. Google ScholarDigital Library
- N. Hasler, C. Stoll, M. Sunkel, B. Rosenhahn, and H.-P. Seidel. 2009c. A statistical model of human pose and body shape. Comput. Graph. Forum 28, 2, 337--346.Google ScholarCross Ref
- S. Hauswiesner, M. Straka, and G. Reitmayr. 2011. Free view-point virtual try-on with commodity depth cameras. In Proceedings of the 10th International Conference on Virtual Reality Continuum and Its Applications in Industry (VRCAI'11). 23--30. Google ScholarDigital Library
- S. Hauswiesner, M. Straka, and G. Reitmayr. 2013. Virtual try-on through image-based rendering. IEEE Trans. Visual. Comput. Graph. 19, 9, 1552--1565. Google ScholarDigital Library
- A. Hilsmann, and P. Eisert. 2012. Image-based animation of clothes. In Proceedings of the 33rd Conference of the European Association for Computer Graphics (EUROGRAPHICS'12). 69--72.Google Scholar
- A. Hilsmann, P. Fechteler, and P. Eisert. 2013. Pose space image based rendering. Comput. Graph. Forum 32, 2.3, 265--274.Google ScholarCross Ref
- Howcast Media. 2014. Ballet dancing: How to do a pirouette. http://www.howcast.com/videos/497190-How-to-Do-a-Pirouette-Ballet-Dance.Google Scholar
- A. Jain, T. Thormahlen, H.-P. Seidel, and C. Theobalt. 2010. Moviereshape: Tracking and reshaping of humans in videos. ACM Trans. Graph. 29, 5. Google ScholarDigital Library
- E. Jain, Y. Sheikh, M. Mahler, and J. Hodgins. 2012. Three-dimensional proxies for hand-drawn characters. ACM Trans. Graph. 31, 1, 8:1--8:16. Google ScholarDigital Library
- M. Landgrebe. 2012. Underworld: Awakening. Digital Production 3.Google Scholar
- C. L. Lawson and R. J. Hanson. 1995. Solving Least Squares Problems. SIAM.Google Scholar
- C. Lipski, C. Linz, T. Neumann, M. Wacker, and M. Magnor. 2010. High resolution image correspondences for video post-production. In Proceedings of the European Conference on Visual Media Production (CVMP'10). Vol. 7. 33--39. Google ScholarDigital Library
- Makehuman. 2012. Make human -open source tool for making 3D characters. http://www.makehuman.org.Google Scholar
- MGM. 2005. Into the Blue. http://www.imdb.com/title/tt0378109/.Google Scholar
- J. A. Nelder and R. Mead. 1965. A simplex method for function minimization. Comput. J. 7, 4, 308--313.Google ScholarCross Ref
- M. M. Oliveira, B. Bowen, R. Mckenna, and Y.-S. Chang. 2001. Fast digital image inpainting. In Proceedings of the International Conference on Visualization, Imaging and Image Processing (VIIP'01). 106--107.Google Scholar
- D. Pritchard, and W. Heidrich. 2003. Cloth motion capture. Comput. Graph. Forum 22, 263--272.Google ScholarCross Ref
- V. Ramakrishna, T. Kanade, and Y. Sheikh. 2012. Reconstructing 3D human pose from 2D image landmarks. In Proceedings of the European Conference on Computer Vision (ECCV'12). Lecture Notes in Computer Science, vol. 7575, Springer, 573--586. Google ScholarDigital Library
- L. Rogge, T. Neumann, M. Wacker, and M. Magnor. 2011. Monocular pose reconstruction for an augmented reality clothing system. In Proceedings of the Conference on Vision, Modeling and Visualization (VMV'11). 339--346.Google Scholar
- C. Rother, V. Kolmogorov, and A. Blake. 2004. “GrabCut”: Interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23, 3, 309--314. Google ScholarDigital Library
- V. Scholz and M. Magnor. 2006. Texture replacement of garments in monocular video sequences. In Proceedings of the 17th Eurographics Conference on Rendering Techniques (EGSR'06). 305--312. Google ScholarDigital Library
- V. Scholz, T. Stich, M. Keckeisen, M. Wacker, and M. Magnor. 2005. Garment motion capture using color-coded patterns. Comput. Graph. Forum 24, 3, 439--448.Google ScholarCross Ref
- N. Snavely, S. M. Seitz, and R. Szeliski. 2006. Photo tourism: Exploring photo collections in 3D. ACM Trans. Graph. 25, 3, 835--846. Google ScholarDigital Library
- J. Starck and A. Hilton. 2007. Surface capture for performance-based animation. IEEE Comput. Graph. Appl. 27, 3, 21--31. Google ScholarDigital Library
- Stock Footage. 2014. Man doing yoga on a white background. http://www.stockfootage.com/shop/man-doing-yoga-on-a-white-backgroundGoogle Scholar
- R. Swinbank and R. J. Purser. 2006. Fibonacci grids: A novel approach to global modelling. Quart. J. Royal Meteorol. Soc. 132, 619, 1769--1793.Google ScholarCross Ref
- C. Theobalt, N. Ahmed, H. Lensch, M. Magnor, and H.-P. Seidel. 2007. Seeing people in different light-joint shape, motion, and reflectance capture. IEEE Trans. Visual. Comput. Graph. 13, 4, 663--674. Google ScholarDigital Library
- E. Toppe, M. Oswald, D. Cremers, and C. Rother. 2011. Silhouette-based variational methods for single view reconstruction. In Video Processing and Computational Video, D. Cremers, M. Magnor, M. Oswald, and L. Zelnik-Manor, Eds., Lecture Notes in Computer Science, vol. 7082, Springer, 104--123. Google ScholarDigital Library
- M. Vondrak, L. Sigal, J. K. Hodgins, and O. C. Jenkins. 2012. Video-based 3D motion capture through biped control. ACM Trans. Graph. 31, 4, 27. Google ScholarDigital Library
- X. Wei and J. Chai. 2010. Videomocap: Modeling physically realistic human motion from monocular video sequences. ACM Trans. Graph. 29, 4, 42:1--42:10. Google ScholarDigital Library
- Y. Wolff. 2014. Parkour. http://vimeo.com/68317895.Google Scholar
- F. Xu, Y. Liu, C. Stoll, J. Tompkin, G. Bharaj, Q. Dai, H.-P. Seidel, J. Kautz, and C. Theobalt. 2011. Video-based characters: Creating new human performances from a multi-view video data-base. ACM Trans. Graph. 30, 4, 32:1--32:10. Google ScholarDigital Library
- J.-C. Yoon, I.-K. Lee, and H. Kang. 2011. Image-based dress-up system. In Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication (ICUIMC'11). 52:1--52:9. Google ScholarDigital Library
- S. Zhou, H. Fu, L. Liu, D. Cohen-Or, and X. Han. 2010. Parametric reshaping of human bodies in images. ACM Trans. Graph. 29, 4, 126:1--126:10. Google ScholarDigital Library
- G. Ziegler, H. Lensch, N. Ahmed, M. Magnor, and H.-P. Seidel. 2004. Multivideo compression in texture space. In Proceedings of the International Conference on Image Processing (ICIP'04). Vol. 4. 2467--2470.Google Scholar
Index Terms
- Garment Replacement in Monocular Video Sequences
Recommendations
Relighting with 4D incident light fields
SIGGRAPH '03: ACM SIGGRAPH 2003 PapersWe present an image-based technique to relight real objects illuminated by a 4D incident light field, representing the illumination of an environment. By exploiting the richness in angular and spatial variation of the light field, objects can be relit ...
Relighting with 4D incident light fields
We present an image-based technique to relight real objects illuminated by a 4D incident light field, representing the illumination of an environment. By exploiting the richness in angular and spatial variation of the light field, objects can be relit ...
Video puppetry: a performative interface for cutout animation
SIGGRAPH Asia '08: ACM SIGGRAPH Asia 2008 papersWe present a video-based interface that allows users of all skill levels to quickly create cutout-style animations by performing the character motions. The puppeteer first creates a cast of physical puppets using paper, markers and scissors. He then ...
Comments