Skip to main content
Erschienen in: International Journal of Computer Vision 9/2018

30.06.2018

Configurable 3D Scene Synthesis and 2D Image Rendering with Per-pixel Ground Truth Using Stochastic Grammars

verfasst von: Chenfanfu Jiang, Siyuan Qi, Yixin Zhu, Siyuan Huang, Jenny Lin, Lap-Fai Yu, Demetri Terzopoulos, Song-Chun Zhu

Erschienen in: International Journal of Computer Vision | Ausgabe 9/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We propose a systematic learning-based approach to the generation of massive quantities of synthetic 3D scenes and arbitrary numbers of photorealistic 2D images thereof, with associated ground truth information, for the purposes of training, benchmarking, and diagnosing learning-based computer vision and robotics algorithms. In particular, we devise a learning-based pipeline of algorithms capable of automatically generating and rendering a potentially infinite variety of indoor scenes by using a stochastic grammar, represented as an attributed Spatial And-Or Graph, in conjunction with state-of-the-art physics-based rendering. Our pipeline is capable of synthesizing scene layouts with high diversity, and it is configurable inasmuch as it enables the precise customization and control of important attributes of the generated scenes. It renders photorealistic RGB images of the generated scenes while automatically synthesizing detailed, per-pixel ground truth data, including visible surface depth and normal, object identity, and material information (detailed to object parts), as well as environments (e.g., illuminations and camera viewpoints). We demonstrate the value of our synthesized dataset, by improving performance in certain machine-learning-based scene understanding tasks—depth and surface normal prediction, semantic segmentation, reconstruction, etc.—and by providing benchmarks for and diagnostics of trained models by modifying object attributes and scene properties in a controllable manner.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Aldous, D. J. (1985). Exchangeability and related topics. In École d’Été de Probabilités de Saint-Flour XIII 1983 (pp. 1–198). Berlin: Springer. Aldous, D. J. (1985). Exchangeability and related topics. In École d’Été de Probabilités de Saint-Flour XIII 1983 (pp. 1–198). Berlin: Springer.
Zurück zum Zitat Backhaus, W. G., Kliegl, R., & Werner, J. S. (1998). Color vision: Perspectives from different disciplines. Berlin: Walter de Gruyter.CrossRef Backhaus, W. G., Kliegl, R., & Werner, J. S. (1998). Color vision: Perspectives from different disciplines. Berlin: Walter de Gruyter.CrossRef
Zurück zum Zitat Bansal, A., Russell, B., & Gupta, A. (2016). Marr revisited: 2D-3D alignment via surface normal prediction. In Conference on computer vision and pattern recognition (CVPR). Bansal, A., Russell, B., & Gupta, A. (2016). Marr revisited: 2D-3D alignment via surface normal prediction. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Bar-Aviv, E., & Rivlin, E. (2006). Functional 3D object classification using simulation of embodied agent. In British machine vision conference (BMVC). Bar-Aviv, E., & Rivlin, E. (2006). Functional 3D object classification using simulation of embodied agent. In British machine vision conference (BMVC).
Zurück zum Zitat Barron, J. T., & Malik, J. (2015). Shape, illumination, and reflectance from shading. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 37(8), 1670–87.CrossRef Barron, J. T., & Malik, J. (2015). Shape, illumination, and reflectance from shading. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 37(8), 1670–87.CrossRef
Zurück zum Zitat Bartell, F., Dereniak, E., & Wolfe, W. (1981). The theory and measurement of bidirectional reflectance distribution function (brdf) and bidirectional transmittance distribution function (btdf). In Radiation scattering in optical systems (Vol. 257, pp. 154–161). International Society for Optics and Photonics. Bartell, F., Dereniak, E., & Wolfe, W. (1981). The theory and measurement of bidirectional reflectance distribution function (brdf) and bidirectional transmittance distribution function (btdf). In Radiation scattering in optical systems (Vol. 257, pp. 154–161). International Society for Optics and Photonics.
Zurück zum Zitat Bell, S., Bala, K., & Snavely, N. (2014). Intrinsic images in the wild. ACM Transactions on Graphics (TOG), 33(4), 98.CrossRefMATH Bell, S., Bala, K., & Snavely, N. (2014). Intrinsic images in the wild. ACM Transactions on Graphics (TOG), 33(4), 98.CrossRefMATH
Zurück zum Zitat Bell, S., Upchurch, P., Snavely, N., & Bala, K. (2013). Opensurfaces: A richly annotated catalog of surface appearance. ACM Transactions on Graphics (TOG), 32(4), 111.CrossRef Bell, S., Upchurch, P., Snavely, N., & Bala, K. (2013). Opensurfaces: A richly annotated catalog of surface appearance. ACM Transactions on Graphics (TOG), 32(4), 111.CrossRef
Zurück zum Zitat Bell, S., Upchurch, P., Snavely, N., & Bala, K. (2015). Material recognition in the wild with the materials in context database. In Conference on computer vision and pattern recognition (CVPR). Bell, S., Upchurch, P., Snavely, N., & Bala, K. (2015). Material recognition in the wild with the materials in context database. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Ben-David, S., Blitzer, J., Crammer, K., & Pereira, F. (2007). Analysis of representations for domain adaptation. In Advances in neural information processing systems (NIPS). Ben-David, S., Blitzer, J., Crammer, K., & Pereira, F. (2007). Analysis of representations for domain adaptation. In Advances in neural information processing systems (NIPS).
Zurück zum Zitat Bickel, S., Brückner, M., & Scheffer, T. (2009). Discriminative learning under covariate shift. Journal of Machine Learning Research, 10, 2137–2155.MathSciNetMATH Bickel, S., Brückner, M., & Scheffer, T. (2009). Discriminative learning under covariate shift. Journal of Machine Learning Research, 10, 2137–2155.MathSciNetMATH
Zurück zum Zitat Blitzer, J., McDonald, R., & Pereira, F. (2006). Domain adaptation with structural correspondence learning. In Empirical methods in natural language processing (EMNLP). Blitzer, J., McDonald, R., & Pereira, F. (2006). Domain adaptation with structural correspondence learning. In Empirical methods in natural language processing (EMNLP).
Zurück zum Zitat Carreira-Perpinan, M. A., & Hinton, G. E. (2005). On contrastive divergence learning. AI Stats, 10, 33–40. Carreira-Perpinan, M. A., & Hinton, G. E. (2005). On contrastive divergence learning. AI Stats, 10, 33–40.
Zurück zum Zitat Chang, A. X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., & Yu, F. (2015). ShapeNet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012. Chang, A. X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., & Yu, F. (2015). ShapeNet: An information-rich 3D model repository. arXiv preprint arXiv:​1512.​03012.
Zurück zum Zitat Chang, A. X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., & Su, H., et al. (2015). Shapenet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012. Chang, A. X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., & Su, H., et al. (2015). Shapenet: An information-rich 3D model repository. arXiv preprint arXiv:​1512.​03012.
Zurück zum Zitat Chapelle, O., & Harchaoui, Z. (2005). A machine learning approach to conjoint analysis. In Advances in neural information processing systems (NIPS). Chapelle, O., & Harchaoui, Z. (2005). A machine learning approach to conjoint analysis. In Advances in neural information processing systems (NIPS).
Zurück zum Zitat Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2016). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv preprint arXiv:1606.00915. Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2016). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv preprint arXiv:​1606.​00915.
Zurück zum Zitat Chen, W., Wang, H., Li, Y., Su, H., Lischinsk, D., Cohen-Or, D., & Chen, B., et al. (2016). Synthesizing training images for boosting human 3D pose estimation. In International conference on 3D vision (3DV). Chen, W., Wang, H., Li, Y., Su, H., Lischinsk, D., Cohen-Or, D., & Chen, B., et al. (2016). Synthesizing training images for boosting human 3D pose estimation. In International conference on 3D vision (3DV).
Zurück zum Zitat Choi, W., Chao, Y. W., Pantofaru, C., & Savarese, S. (2015). Indoor scene understanding with geometric and semantic contexts. International Journal of Computer Vision (IJCV), 112(2), 204–220.MathSciNetCrossRef Choi, W., Chao, Y. W., Pantofaru, C., & Savarese, S. (2015). Indoor scene understanding with geometric and semantic contexts. International Journal of Computer Vision (IJCV), 112(2), 204–220.MathSciNetCrossRef
Zurück zum Zitat Cortes, C., Mohri, M., Riley, M., & Rostamizadeh, A. (2008). Sample selection bias correction theory. In International conference on algorithmic learning theory. Cortes, C., Mohri, M., Riley, M., & Rostamizadeh, A. (2008). Sample selection bias correction theory. In International conference on algorithmic learning theory.
Zurück zum Zitat Daumé III, H. (2007). Frustratingly easy domain adaptation. In Annual meeting of the association for computational linguistics (ACL). Daumé III, H. (2007). Frustratingly easy domain adaptation. In Annual meeting of the association for computational linguistics (ACL).
Zurück zum Zitat Daumé III, H. (2009). Bayesian multitask learning with latent hierarchies. In Conference on uncertainty in artificial intelligence (UAI). Daumé III, H. (2009). Bayesian multitask learning with latent hierarchies. In Conference on uncertainty in artificial intelligence (UAI).
Zurück zum Zitat Del Pero, L., Bowdish, J., Fried, D., Kermgard, B., Hartley, E., & Barnard, K. (2012). Bayesian geometric modeling of indoor scenes. In Conference on computer vision and pattern recognition (CVPR). Del Pero, L., Bowdish, J., Fried, D., Kermgard, B., Hartley, E., & Barnard, K. (2012). Bayesian geometric modeling of indoor scenes. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In Conference on computer vision and pattern recognition (CVPR). Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D., & Brox, T. (2015). Flownet: Learning optical flow with convolutional networks. In Conference on computer vision and pattern recognition (CVPR). Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D., & Brox, T. (2015). Flownet: Learning optical flow with convolutional networks. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Du, Y., Wong, Y., Liu, Y., Han, F., Gui, Y., Wang, Z., Kankanhalli, M., & Geng, W. (2016). Marker-less 3d human motion capture with monocular image sequence and height-maps. In European conference on computer vision (ECCV). Du, Y., Wong, Y., Liu, Y., Han, F., Gui, Y., Wang, Z., Kankanhalli, M., & Geng, W. (2016). Marker-less 3d human motion capture with monocular image sequence and height-maps. In European conference on computer vision (ECCV).
Zurück zum Zitat Eigen, D., & Fergus, R. (2015). Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In International conference on computer vision (ICCV). Eigen, D., & Fergus, R. (2015). Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In International conference on computer vision (ICCV).
Zurück zum Zitat Eigen, D., Puhrsch, C., & Fergus, R. (2014). Depth map prediction from a single image using a multi-scale deep network. In Advances in neural information processing systems (NIPS). Eigen, D., Puhrsch, C., & Fergus, R. (2014). Depth map prediction from a single image using a multi-scale deep network. In Advances in neural information processing systems (NIPS).
Zurück zum Zitat Everingham, M., Eslami, S. A., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision (IJCV), 111(1), 98–136.CrossRef Everingham, M., Eslami, S. A., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision (IJCV), 111(1), 98–136.CrossRef
Zurück zum Zitat Evgeniou, T., & Pontil, M. (2004). Regularized multi–task learning. In International conference on knowledge discovery and data mining (SIGKDD). Evgeniou, T., & Pontil, M. (2004). Regularized multi–task learning. In International conference on knowledge discovery and data mining (SIGKDD).
Zurück zum Zitat Fanello, S. R., Keskin, C., Izadi, S., Kohli, P., Kim, D., Sweeney, D., et al. (2014). Learning to be a depth camera for close-range human capture and interaction. ACM Transactions on Graphics (TOG), 33(4), 86.CrossRefMATH Fanello, S. R., Keskin, C., Izadi, S., Kohli, P., Kim, D., Sweeney, D., et al. (2014). Learning to be a depth camera for close-range human capture and interaction. ACM Transactions on Graphics (TOG), 33(4), 86.CrossRefMATH
Zurück zum Zitat Fisher, M., Ritchie, D., Savva, M., Funkhouser, T., & Hanrahan, P. (2012). Example-based synthesis of 3D object arrangements. ACM Transactions on Graphics (TOG), 31(6), 208-1–208-12.CrossRef Fisher, M., Ritchie, D., Savva, M., Funkhouser, T., & Hanrahan, P. (2012). Example-based synthesis of 3D object arrangements. ACM Transactions on Graphics (TOG), 31(6), 208-1–208-12.CrossRef
Zurück zum Zitat Fisher, M., Savva, M., & Hanrahan, P. (2011). Characterizing structural relationships in scenes using graph kernels. ACM Transactions on Graphics (TOG), 30(4), 107-1–107-12.CrossRef Fisher, M., Savva, M., & Hanrahan, P. (2011). Characterizing structural relationships in scenes using graph kernels. ACM Transactions on Graphics (TOG), 30(4), 107-1–107-12.CrossRef
Zurück zum Zitat Fouhey, D. F., Gupta, A., & Hebert, M. (2013). Data-driven 3d primitives for single image understanding. In International conference on computer vision (ICCV). Fouhey, D. F., Gupta, A., & Hebert, M. (2013). Data-driven 3d primitives for single image understanding. In International conference on computer vision (ICCV).
Zurück zum Zitat Fridman, A. (2003). Mixed markov models. Proceedings of the National Academy of Sciences (PNAS), 100(14), 8093.MathSciNetCrossRef Fridman, A. (2003). Mixed markov models. Proceedings of the National Academy of Sciences (PNAS), 100(14), 8093.MathSciNetCrossRef
Zurück zum Zitat Gaidon, A., Wang, Q., Cabon, Y., & Vig, E. (2016). Virtual worlds as proxy for multi-object tracking analysis. In Conference on computer vision and pattern recognition (CVPR). Gaidon, A., Wang, Q., Cabon, Y., & Vig, E. (2016). Virtual worlds as proxy for multi-object tracking analysis. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Ganin, Y., & Lempitsky, V. (2015). Unsupervised domain adaptation by backpropagation. In International conference on machine learning (ICML). Ganin, Y., & Lempitsky, V. (2015). Unsupervised domain adaptation by backpropagation. In International conference on machine learning (ICML).
Zurück zum Zitat Ghezelghieh, M. F., Kasturi, R., & Sarkar, S. (2016). Learning camera viewpoint using cnn to improve 3D body pose estimation. In International conference on 3D vision (3DV). Ghezelghieh, M. F., Kasturi, R., & Sarkar, S. (2016). Learning camera viewpoint using cnn to improve 3D body pose estimation. In International conference on 3D vision (3DV).
Zurück zum Zitat Grabner, H., Gall, J., & Van Gool, L. (2011). What makes a chair a chair? In Conference on computer vision and pattern recognition (CVPR). Grabner, H., Gall, J., & Van Gool, L. (2011). What makes a chair a chair? In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., & Wierstra, D. (2015) Draw: A recurrent neural network for image generation. arXiv preprint arXiv:1502.04623. Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., & Wierstra, D. (2015) Draw: A recurrent neural network for image generation. arXiv preprint arXiv:​1502.​04623.
Zurück zum Zitat Gretton, A., Smola, A. J., Huang, J., Schmittfull, M., Borgwardt, K. M., & Schöllkopf, B. (2009). Covariate shift by kernel mean matching. In Dataset shift in machine learning (pp. 131–160). MIT Press. Gretton, A., Smola, A. J., Huang, J., Schmittfull, M., Borgwardt, K. M., & Schöllkopf, B. (2009). Covariate shift by kernel mean matching. In Dataset shift in machine learning (pp. 131–160). MIT Press.
Zurück zum Zitat Gupta, A., Hebert, M., Kanade, T., & Blei, D. M. (2010). Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In Advances in neural information processing systems (NIPS). Gupta, A., Hebert, M., Kanade, T., & Blei, D. M. (2010). Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In Advances in neural information processing systems (NIPS).
Zurück zum Zitat Gupta, A., Satkin, S., Efros, A. A., & Hebert, M. (2011). From 3D scene geometry to human workspace. In Conference on computer vision and pattern recognition (CVPR). Gupta, A., Satkin, S., Efros, A. A., & Hebert, M. (2011). From 3D scene geometry to human workspace. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Handa, A., Pătrăucean, V., Badrinarayanan, V., Stent, S., & Cipolla, R. (2016). Understanding real world indoor scenes with synthetic data. In Conference on computer vision and pattern recognition (CVPR). Handa, A., Pătrăucean, V., Badrinarayanan, V., Stent, S., & Cipolla, R. (2016). Understanding real world indoor scenes with synthetic data. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Handa, A., Patraucean, V., Stent, S., & Cipolla, R. (2016). Scenenet: an annotated model generator for indoor scene understanding. In International conference on robotics and automation (ICRA). Handa, A., Patraucean, V., Stent, S., & Cipolla, R. (2016). Scenenet: an annotated model generator for indoor scene understanding. In International conference on robotics and automation (ICRA).
Zurück zum Zitat Handa, A., Whelan, T., McDonald, J., & Davison, A. J. (2014). A benchmark for rgb-d visual odometry, 3D reconstruction and slam. In International conference on robotics and automation (ICRA). Handa, A., Whelan, T., McDonald, J., & Davison, A. J. (2014). A benchmark for rgb-d visual odometry, 3D reconstruction and slam. In International conference on robotics and automation (ICRA).
Zurück zum Zitat Hara, K., Nishino, K., et al. (2005). Light source position and reflectance estimation from a single view without the distant illumination assumption. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 27(4), 493–505.CrossRef Hara, K., Nishino, K., et al. (2005). Light source position and reflectance estimation from a single view without the distant illumination assumption. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 27(4), 493–505.CrossRef
Zurück zum Zitat Hattori, H., Naresh Boddeti, V., Kitani, K. M., & Kanade, T. (2015). Learning scene-specific pedestrian detectors without real data. In Conference on computer vision and pattern recognition (CVPR). Hattori, H., Naresh Boddeti, V., Kitani, K. M., & Kanade, T. (2015). Learning scene-specific pedestrian detectors without real data. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In International conference on computer vision (ICCV). He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In International conference on computer vision (ICCV).
Zurück zum Zitat Heckman, J. J. (1977). Sample selection bias as a specification error (with an application to the estimation of labor supply functions). Massachusetts: National Bureau of Economic Research Cambridge Heckman, J. J. (1977). Sample selection bias as a specification error (with an application to the estimation of labor supply functions). Massachusetts: National Bureau of Economic Research Cambridge
Zurück zum Zitat Hedau, V., Hoiem, D., & Forsyth, D. (2009). Recovering the spatial layout of cluttered rooms. In International conference on computer vision (ICCV). Hedau, V., Hoiem, D., & Forsyth, D. (2009). Recovering the spatial layout of cluttered rooms. In International conference on computer vision (ICCV).
Zurück zum Zitat Heess, N., Sriram, S., Lemmon, J., Merel, J., Wayne, G., Tassa, Y., Erez, T., Wang, Z., Eslami, A., & Riedmiller, M., et al. (2017). Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286. Heess, N., Sriram, S., Lemmon, J., Merel, J., Wayne, G., Tassa, Y., Erez, T., Wang, Z., Eslami, A., & Riedmiller, M., et al. (2017). Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:​1707.​02286.
Zurück zum Zitat Hermans, T., Rehg, J. M., & Bobick, A. (2011). Affordance prediction via learned object attributes. In International conference on robotics and automation (ICRA). Hermans, T., Rehg, J. M., & Bobick, A. (2011). Affordance prediction via learned object attributes. In International conference on robotics and automation (ICRA).
Zurück zum Zitat Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8), 1771–1800.CrossRefMATH Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8), 1771–1800.CrossRefMATH
Zurück zum Zitat Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507.MathSciNetCrossRefMATH Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507.MathSciNetCrossRefMATH
Zurück zum Zitat Hoiem, D., Efros, A. A., & Hebert, M. (2005). Automatic photo pop-up. ACM Transactions on Graphics (TOG), 24(3), 577–584.CrossRef Hoiem, D., Efros, A. A., & Hebert, M. (2005). Automatic photo pop-up. ACM Transactions on Graphics (TOG), 24(3), 577–584.CrossRef
Zurück zum Zitat Jiang, Y., Koppula, H., & Saxena, A. (2013). Hallucinated humans as the hidden context for labeling 3D scenes. In Conference on computer vision and pattern recognition (CVPR). Jiang, Y., Koppula, H., & Saxena, A. (2013). Hallucinated humans as the hidden context for labeling 3D scenes. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Kohli, Y. Z. M. B. P., Izadi, S., & Xiao, J. (2016). Deepcontext: Context-encoding neural pathways for 3D holistic scene understanding. arXiv preprint arXiv:1603.04922. Kohli, Y. Z. M. B. P., Izadi, S., & Xiao, J. (2016). Deepcontext: Context-encoding neural pathways for 3D holistic scene understanding. arXiv preprint arXiv:​1603.​04922.
Zurück zum Zitat Koppula, H. S., & Saxena, A. (2014). Physically grounded spatio-temporal object affordances. In European conference on computer vision (ECCV). Koppula, H. S., & Saxena, A. (2014). Physically grounded spatio-temporal object affordances. In European conference on computer vision (ECCV).
Zurück zum Zitat Koppula, H. S., & Saxena, A. (2016). Anticipating human activities using object affordances for reactive robotic response. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 38(1), 14–29.CrossRef Koppula, H. S., & Saxena, A. (2016). Anticipating human activities using object affordances for reactive robotic response. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 38(1), 14–29.CrossRef
Zurück zum Zitat Kratz, L., & Nishino, K. (2009). Factorizing scene albedo and depth from a single foggy image. In International conference on computer vision (ICCV). Kratz, L., & Nishino, K. (2009). Factorizing scene albedo and depth from a single foggy image. In International conference on computer vision (ICCV).
Zurück zum Zitat Kulkarni, T. D., Kohli, P., Tenenbaum, J. B., & Mansinghka, V. (2015). Picture: A probabilistic programming language for scene perception. In Conference on computer vision and pattern recognition (CVPR). Kulkarni, T. D., Kohli, P., Tenenbaum, J. B., & Mansinghka, V. (2015). Picture: A probabilistic programming language for scene perception. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Kulkarni, T. D., Whitney, W. F., Kohli, P., & Tenenbaum, J. (2015). Deep convolutional inverse graphics network. In Advances in neural information processing systems (NIPS). Kulkarni, T. D., Whitney, W. F., Kohli, P., & Tenenbaum, J. (2015). Deep convolutional inverse graphics network. In Advances in neural information processing systems (NIPS).
Zurück zum Zitat Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., & Navab, N. (2016). Deeper depth prediction with fully convolutional residual networks. arXiv preprint arXiv:1606.00373. Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., & Navab, N. (2016). Deeper depth prediction with fully convolutional residual networks. arXiv preprint arXiv:​1606.​00373.
Zurück zum Zitat Lee, D. C., Hebert, M., & Kanade, T. (2009). Geometric reasoning for single image structure recovery. In Conference on computer vision and pattern recognition (CVPR). Lee, D. C., Hebert, M., & Kanade, T. (2009). Geometric reasoning for single image structure recovery. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Liang, W., Zhao, Y., Zhu, Y., & Zhu, S.C. (2016). What is where: Inferring containment relations from videos. In International joint conference on artificial intelligence (IJCAI). Liang, W., Zhao, Y., Zhu, Y., & Zhu, S.C. (2016). What is where: Inferring containment relations from videos. In International joint conference on artificial intelligence (IJCAI).
Zurück zum Zitat Lin, J., Guo, X., Shao, J., Jiang, C., Zhu, Y., & Zhu, S. C. (2016). A virtual reality platform for dynamic human-scene interaction. In SIGGRAPH ASIA 2016 virtual reality meets physical reality: Modelling and simulating virtual humans and environments (pp. 11). ACM. Lin, J., Guo, X., Shao, J., Jiang, C., Zhu, Y., & Zhu, S. C. (2016). A virtual reality platform for dynamic human-scene interaction. In SIGGRAPH ASIA 2016 virtual reality meets physical reality: Modelling and simulating virtual humans and environments (pp. 11). ACM.
Zurück zum Zitat Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In European conference on computer vision (ECCV). Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In European conference on computer vision (ECCV).
Zurück zum Zitat Liu, F., Shen, C., & Lin, G. (2015). Deep convolutional neural fields for depth estimation from a single image. In Conference on computer vision and pattern recognition (CVPR). Liu, F., Shen, C., & Lin, G. (2015). Deep convolutional neural fields for depth estimation from a single image. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Liu, X., Zhao, Y., & Zhu, S. C. (2014). Single-view 3d scene parsing by attributed grammar. In Conference on computer vision and pattern recognition (CVPR). Liu, X., Zhao, Y., & Zhu, S. C. (2014). Single-view 3d scene parsing by attributed grammar. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Lombardi, S., & Nishino, K. (2016). Reflectance and illumination recovery in the wild. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 38(1), 2321–2334. Lombardi, S., & Nishino, K. (2016). Reflectance and illumination recovery in the wild. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 38(1), 2321–2334.
Zurück zum Zitat Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Conference on computer vision and pattern recognition (CVPR). Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Loper, M. M., & Black, M. J. (2014). Opendr: An approximate differentiable renderer. In European conference on computer vision (ECCV). Loper, M. M., & Black, M. J. (2014). Opendr: An approximate differentiable renderer. In European conference on computer vision (ECCV).
Zurück zum Zitat López, A. M., Xu, J., Gómez, J. L., Vázquez, D., & Ros, G. (2017). From virtual to real world visual perception using domain adaptation the dpm as example. In Domain adaptation in computer vision applications (pp. 243–258). Springer. López, A. M., Xu, J., Gómez, J. L., Vázquez, D., & Ros, G. (2017). From virtual to real world visual perception using domain adaptation the dpm as example. In Domain adaptation in computer vision applications (pp. 243–258). Springer.
Zurück zum Zitat Lu, Y., Zhu, S. C., & Wu, Y. N. (2016). Learning frame models using cnn filters. In AAAI Conference on artificial intelligence (AAAI). Lu, Y., Zhu, S. C., & Wu, Y. N. (2016). Learning frame models using cnn filters. In AAAI Conference on artificial intelligence (AAAI).
Zurück zum Zitat Mallya, A., & Lazebnik, S. (2015). Learning informative edge maps for indoor scene layout prediction. In International conference on computer vision (ICCV). Mallya, A., & Lazebnik, S. (2015). Learning informative edge maps for indoor scene layout prediction. In International conference on computer vision (ICCV).
Zurück zum Zitat Mansinghka, V., Kulkarni, T. D., Perov, Y. N., & Tenenbaum, J. (2013). Approximate bayesian image interpretation using generative probabilistic graphics programs. In Advances in neural information processing systems (NIPS). Mansinghka, V., Kulkarni, T. D., Perov, Y. N., & Tenenbaum, J. (2013). Approximate bayesian image interpretation using generative probabilistic graphics programs. In Advances in neural information processing systems (NIPS).
Zurück zum Zitat Mansour, Y., Mohri, M., & Rostamizadeh, A. (2009). Domain adaptation: Learning bounds and algorithms. In Annual conference on learning theory (COLT). Mansour, Y., Mohri, M., & Rostamizadeh, A. (2009). Domain adaptation: Learning bounds and algorithms. In Annual conference on learning theory (COLT).
Zurück zum Zitat Marin, J., Vázquez, D., Gerónimo, D., & López, A. M. (2010). Learning appearance in virtual scenarios for pedestrian detection. In Conference on computer vision and pattern recognition (CVPR). Marin, J., Vázquez, D., Gerónimo, D., & López, A. M. (2010). Learning appearance in virtual scenarios for pedestrian detection. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Movshovitz-Attias, Y., Kanade, T., & Sheikh, Y. (2016). How useful is photo-realistic rendering for visual learning? In European conference on computer vision (ECCV). Movshovitz-Attias, Y., Kanade, T., & Sheikh, Y. (2016). How useful is photo-realistic rendering for visual learning? In European conference on computer vision (ECCV).
Zurück zum Zitat Movshovitz-Attias, Y., Sheikh, Y., Boddeti, V. N., & Wei, Z. (2014). 3D pose-by-detection of vehicles via discriminatively reduced ensembles of correlation filters. In British machine vision conference (BMVC). Movshovitz-Attias, Y., Sheikh, Y., Boddeti, V. N., & Wei, Z. (2014). 3D pose-by-detection of vehicles via discriminatively reduced ensembles of correlation filters. In British machine vision conference (BMVC).
Zurück zum Zitat Myers, A., Kanazawa, A., Fermuller, C., & Aloimonos, Y. (2014). Affordance of object parts from geometric features. In Workshop on Vision meets Cognition, CVPR. Myers, A., Kanazawa, A., Fermuller, C., & Aloimonos, Y. (2014). Affordance of object parts from geometric features. In Workshop on Vision meets Cognition, CVPR.
Zurück zum Zitat Nishino, K., Zhang, Z., Ikeuchi, K. (2001). Determining reflectance parameters and illumination distribution from a sparse set of images for view-dependent image synthesis. In International conference on computer vision (ICCV). Nishino, K., Zhang, Z., Ikeuchi, K. (2001). Determining reflectance parameters and illumination distribution from a sparse set of images for view-dependent image synthesis. In International conference on computer vision (ICCV).
Zurück zum Zitat Noh, H., Hong, S., & Han, B. (2015). Learning deconvolution network for semantic segmentation. In International conference on computer vision (ICCV). Noh, H., Hong, S., & Han, B. (2015). Learning deconvolution network for semantic segmentation. In International conference on computer vision (ICCV).
Zurück zum Zitat Oxholm, G., & Nishino, K. (2014). Multiview shape and reflectance from natural illumination. In Conference on computer vision and pattern recognition (CVPR). Oxholm, G., & Nishino, K. (2014). Multiview shape and reflectance from natural illumination. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Oxholm, G., & Nishino, K. (2016). Shape and reflectance estimation in the wild. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 38(2), 2321–2334. Oxholm, G., & Nishino, K. (2016). Shape and reflectance estimation in the wild. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 38(2), 2321–2334.
Zurück zum Zitat Peng, X., Sun, B., Ali, K., & Saenko, K. (2015). Learning deep object detectors from 3D models. In Conference on computer vision and pattern recognition (CVPR). Peng, X., Sun, B., Ali, K., & Saenko, K. (2015). Learning deep object detectors from 3D models. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Pharr, M., & Humphreys, G. (2004). Physically based rendering: From theory to implementation. San Francisco: Morgan Kaufmann. Pharr, M., & Humphreys, G. (2004). Physically based rendering: From theory to implementation. San Francisco: Morgan Kaufmann.
Zurück zum Zitat Pishchulin, L., Jain, A., Andriluka, M., Thormählen, T., & Schiele, B. (2012). Articulated people detection and pose estimation: Reshaping the future. In Conference on computer vision and pattern recognition (CVPR). Pishchulin, L., Jain, A., Andriluka, M., Thormählen, T., & Schiele, B. (2012). Articulated people detection and pose estimation: Reshaping the future. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Pishchulin, L., Jain, A., Wojek, C., Andriluka, M., Thormählen, T., & Schiele, B. (2011). Learning people detection models from few training samples. In Conference on computer vision and pattern recognition (CVPR). Pishchulin, L., Jain, A., Wojek, C., Andriluka, M., Thormählen, T., & Schiele, B. (2011). Learning people detection models from few training samples. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Qi, C. R., Su, H., Niessner, M., Dai, A., Yan, M., & Guibas, L. J. (2016). Volumetric and multi-view cnns for object classification on 3D data. In Conference on computer vision and pattern recognition (CVPR). Qi, C. R., Su, H., Niessner, M., Dai, A., Yan, M., & Guibas, L. J. (2016). Volumetric and multi-view cnns for object classification on 3D data. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Qiu, W. (2016). Generating human images and ground truth using computer graphics. Ph.D. thesis, University of California, Los Angeles. Qiu, W. (2016). Generating human images and ground truth using computer graphics. Ph.D. thesis, University of California, Los Angeles.
Zurück zum Zitat Qureshi, F., & Terzopoulos, D. (2008). Smart camera networks in virtual reality. Proceedings of the IEEE, 96(10), 1640–1656.CrossRef Qureshi, F., & Terzopoulos, D. (2008). Smart camera networks in virtual reality. Proceedings of the IEEE, 96(10), 1640–1656.CrossRef
Zurück zum Zitat Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434. Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:​1511.​06434.
Zurück zum Zitat Rahmani, H., & Mian, A. (2015). Learning a non-linear knowledge transfer model for cross-view action recognition. In Conference on computer vision and pattern recognition (CVPR). Rahmani, H., & Mian, A. (2015). Learning a non-linear knowledge transfer model for cross-view action recognition. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Rahmani, H., & Mian, A. (2016). 3D action recognition from novel viewpoints. In Conference on computer vision and pattern recognition (CVPR). Rahmani, H., & Mian, A. (2016). 3D action recognition from novel viewpoints. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (NIPS). Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (NIPS).
Zurück zum Zitat Richter, S. R., Vineet, V., Roth, S., & Koltun, V. (2016). Playing for data: Ground truth from computer games. In European conference on computer vision (ECCV). Richter, S. R., Vineet, V., Roth, S., & Koltun, V. (2016). Playing for data: Ground truth from computer games. In European conference on computer vision (ECCV).
Zurück zum Zitat Roberto de Souza, C., Gaidon, A., Cabon, Y., & Manuel Lopez, A. (2017). Procedural generation of videos to train deep action recognition networks. In Conference on computer vision and pattern recognition (CVPR). Roberto de Souza, C., Gaidon, A., Cabon, Y., & Manuel Lopez, A. (2017). Procedural generation of videos to train deep action recognition networks. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Rogez, G., & Schmid, C. (2016). Mocap-guided data augmentation for 3D pose estimation in the wild. In Advances in neural information processing systems (NIPS). Rogez, G., & Schmid, C. (2016). Mocap-guided data augmentation for 3D pose estimation in the wild. In Advances in neural information processing systems (NIPS).
Zurück zum Zitat Romero, J., Loper, M., & Black, M. J. (2015). Flowcap: 2D human pose from optical flow. In German conference on pattern recognition. Romero, J., Loper, M., & Black, M. J. (2015). Flowcap: 2D human pose from optical flow. In German conference on pattern recognition.
Zurück zum Zitat Ros, G., Sellart, L., Materzynska, J., Vazquez, D., & Lopez, A.M. (2016). The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Conference on computer vision and pattern recognition (CVPR). Ros, G., Sellart, L., Materzynska, J., Vazquez, D., & Lopez, A.M. (2016). The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Roy, A., & Todorovic, S. (2016). A multi-scale cnn for affordance segmentation in rgb images. In European conference on computer vision (ECCV). Roy, A., & Todorovic, S. (2016). A multi-scale cnn for affordance segmentation in rgb images. In European conference on computer vision (ECCV).
Zurück zum Zitat Sato, I., Sato, Y., & Ikeuchi, K. (2003). Illumination from shadows. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 25(3), 1218–1227. Sato, I., Sato, Y., & Ikeuchi, K. (2003). Illumination from shadows. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 25(3), 1218–1227.
Zurück zum Zitat Shakhnarovich, G., Viola, P., & Darrell, T. (2003). Fast pose estimation with parameter-sensitive hashing. In International conference on computer vision (ICCV). Shakhnarovich, G., Viola, P., & Darrell, T. (2003). Fast pose estimation with parameter-sensitive hashing. In International conference on computer vision (ICCV).
Zurück zum Zitat Sharma, G., & Bala, R. (2002). Digital color imaging handbook. Boca Raton: CRC Press.CrossRef Sharma, G., & Bala, R. (2002). Digital color imaging handbook. Boca Raton: CRC Press.CrossRef
Zurück zum Zitat Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A., et al. (2013). Real-time human pose recognition in parts from single depth images. Communications of the ACM, 56(1), 116–124.CrossRef Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A., et al. (2013). Real-time human pose recognition in parts from single depth images. Communications of the ACM, 56(1), 116–124.CrossRef
Zurück zum Zitat Silberman, N., Hoiem, D., Kohli, P., & Fergus, R. (2012). Indoor segmentation and support inference from rgbd images. In European conference on computer vision (ECCV). Silberman, N., Hoiem, D., Kohli, P., & Fergus, R. (2012). Indoor segmentation and support inference from rgbd images. In European conference on computer vision (ECCV).
Zurück zum Zitat Song, S., & Xiao, J. (2014). Sliding shapes for 3D object detection in depth images. In European conference on computer vision (ECCV). Song, S., & Xiao, J. (2014). Sliding shapes for 3D object detection in depth images. In European conference on computer vision (ECCV).
Zurück zum Zitat Song, S., Yu, F., Zeng, A., Chang, A. X., Savva, M., & Funkhouser, T. (2014). Semantic scene completion from a single depth image. In Conference on computer vision and pattern recognition (CVPR). Song, S., Yu, F., Zeng, A., Chang, A. X., Savva, M., & Funkhouser, T. (2014). Semantic scene completion from a single depth image. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Stark, L., & Bowyer, K. (1991). Achieving generalized object recognition through reasoning about association of function to structure. Transactions on Pattern Analysis and Machine Intelligence (TPAMI),13(10), 1097–1104. Stark, L., & Bowyer, K. (1991). Achieving generalized object recognition through reasoning about association of function to structure. Transactions on Pattern Analysis and Machine Intelligence (TPAMI),13(10), 1097–1104.
Zurück zum Zitat Stark, M., Goesele, M., & Schiele, B. (2010). Back to the future: Learning shape models from 3D cad data. In British machine vision conference (BMVC). Stark, M., Goesele, M., & Schiele, B. (2010). Back to the future: Learning shape models from 3D cad data. In British machine vision conference (BMVC).
Zurück zum Zitat Su, H., Huang, Q., Mitra, N. J., Li, Y., & Guibas, L. (2014). Estimating image depth using shape collections. ACM Transactions on Graphics (TOG), 33(4), 37.MATH Su, H., Huang, Q., Mitra, N. J., Li, Y., & Guibas, L. (2014). Estimating image depth using shape collections. ACM Transactions on Graphics (TOG), 33(4), 37.MATH
Zurück zum Zitat Su, H., Qi, C. R., Li, Y., & Guibas, L. J. (2015). Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views. In International conference on computer vision (ICCV). Su, H., Qi, C. R., Li, Y., & Guibas, L. J. (2015). Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views. In International conference on computer vision (ICCV).
Zurück zum Zitat Sun, B., & Saenko, K. (2014). From virtual to reality: Fast adaptation of virtual object detectors to real domains. In British machine vision conference (BMVC). Sun, B., & Saenko, K. (2014). From virtual to reality: Fast adaptation of virtual object detectors to real domains. In British machine vision conference (BMVC).
Zurück zum Zitat Sun, C., Shrivastava, A., Singh, S., & Gupta, A. (2017). Revisiting unreasonable effectiveness of data in deep learning era. In International conference on computer vision (ICCV). Sun, C., Shrivastava, A., Singh, S., & Gupta, A. (2017). Revisiting unreasonable effectiveness of data in deep learning era. In International conference on computer vision (ICCV).
Zurück zum Zitat Terzopoulos, D., & Rabie, T. F. (1995). Animat vision: Active vision in artificial animals. In International conference on computer vision (ICCV). Terzopoulos, D., & Rabie, T. F. (1995). Animat vision: Active vision in artificial animals. In International conference on computer vision (ICCV).
Zurück zum Zitat Torralba, A., & Efros, A.A. (2011). Unbiased look at dataset bias. In Conference on computer vision and pattern recognition (CVPR). Torralba, A., & Efros, A.A. (2011). Unbiased look at dataset bias. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Tzeng, E., Hoffman, J., Darrell, T., & Saenko, K. (2015). Simultaneous deep transfer across domains and tasks. In International conference on computer vision (ICCV). Tzeng, E., Hoffman, J., Darrell, T., & Saenko, K. (2015). Simultaneous deep transfer across domains and tasks. In International conference on computer vision (ICCV).
Zurück zum Zitat Valberg, A. (2007). Light vision color. New York: Wiley. Valberg, A. (2007). Light vision color. New York: Wiley.
Zurück zum Zitat Varol, G., Romero, J., Martin, X., Mahmood, N., Black, M., Laptev, I., & Schmid, C. (2017). Learning from synthetic humans. In Conference on computer vision and pattern recognition (CVPR). Varol, G., Romero, J., Martin, X., Mahmood, N., Black, M., Laptev, I., & Schmid, C. (2017). Learning from synthetic humans. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Vázquez, D., Lopez, A. M., Marin, J., Ponsa, D., & Geronimo, D. (2014). Virtual and real world adaptation for pedestrian detection. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 36(4), 797–809.CrossRef Vázquez, D., Lopez, A. M., Marin, J., Ponsa, D., & Geronimo, D. (2014). Virtual and real world adaptation for pedestrian detection. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 36(4), 797–809.CrossRef
Zurück zum Zitat Wang, X., Fouhey, D., & Gupta, A. (2015). Designing deep networks for surface normal estimation. In Conference on computer vision and pattern recognition (CVPR). Wang, X., Fouhey, D., & Gupta, A. (2015). Designing deep networks for surface normal estimation. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Wang, X., & Gupta, A. (2016). Generative image modeling using style and structure adversarial networks. arXiv preprint arXiv:1603.05631. Wang, X., & Gupta, A. (2016). Generative image modeling using style and structure adversarial networks. arXiv preprint arXiv:​1603.​05631.
Zurück zum Zitat Wang, Z., Merel, J. S., Reed, S. E., de Freitas, N., Wayne, G., & Heess, N. (2017). Robust imitation of diverse behaviors. In Advances in neural information processing systems (NIPS). Wang, Z., Merel, J. S., Reed, S. E., de Freitas, N., Wayne, G., & Heess, N. (2017). Robust imitation of diverse behaviors. In Advances in neural information processing systems (NIPS).
Zurück zum Zitat Weinberger, K., Dasgupta, A., Langford, J., Smola, A., & Attenberg, J. (2009). Feature hashing for large scale multitask learning. In International conference on machine learning (ICML). Weinberger, K., Dasgupta, A., Langford, J., Smola, A., & Attenberg, J. (2009). Feature hashing for large scale multitask learning. In International conference on machine learning (ICML).
Zurück zum Zitat Whelan, T., Leutenegger, S., Salas-Moreno, R. F., Glocker, B., & Davison, A. J. (2015). Elasticfusion: Dense slam without a pose graph. In Robotics: Science and systems (RSS). Whelan, T., Leutenegger, S., Salas-Moreno, R. F., Glocker, B., & Davison, A. J. (2015). Elasticfusion: Dense slam without a pose graph. In Robotics: Science and systems (RSS).
Zurück zum Zitat Wu, J. (2016). Computational perception of physical object properties. Ph.D. thesis, Massachusetts Institute of Technology. Wu, J. (2016). Computational perception of physical object properties. Ph.D. thesis, Massachusetts Institute of Technology.
Zurück zum Zitat Wu, J., Yildirim, I., Lim, J. J., Freeman, B., & Tenenbaum, J. (2015). Galileo: Perceiving physical object properties by integrating a physics engine with deep learning. In Advances in neural information processing systems (NIPS). Wu, J., Yildirim, I., Lim, J. J., Freeman, B., & Tenenbaum, J. (2015). Galileo: Perceiving physical object properties by integrating a physics engine with deep learning. In Advances in neural information processing systems (NIPS).
Zurück zum Zitat Xiao, J., Russell, B., & Torralba, A. (2012). Localizing 3D cuboids in single-view images. In Advances in neural information processing systems (NIPS). Xiao, J., Russell, B., & Torralba, A. (2012). Localizing 3D cuboids in single-view images. In Advances in neural information processing systems (NIPS).
Zurück zum Zitat Xie, J., Lu, Y., Zhu, S. C., & Wu, Y. N. (2016). Cooperative training of descriptor and generator networks. arXiv preprint arXiv:1609.09408. Xie, J., Lu, Y., Zhu, S. C., & Wu, Y. N. (2016). Cooperative training of descriptor and generator networks. arXiv preprint arXiv:​1609.​09408.
Zurück zum Zitat Xie, J., Lu, Y., Zhu, S. C., & Wu, Y. N. (2016). A theory of generative convnet. In International conference on machine learning (ICML). Xie, J., Lu, Y., Zhu, S. C., & Wu, Y. N. (2016). A theory of generative convnet. In International conference on machine learning (ICML).
Zurück zum Zitat Xue, Y., Liao, X., Carin, L., & Krishnapuram, B. (2007). Multi-task learning for classification with dirichlet process priors. Journal of Machine Learning Research, 8, 35–63.MathSciNetMATH Xue, Y., Liao, X., Carin, L., & Krishnapuram, B. (2007). Multi-task learning for classification with dirichlet process priors. Journal of Machine Learning Research, 8, 35–63.MathSciNetMATH
Zurück zum Zitat Yasin, H., Iqbal, U., Krüger, B., Weber, A., & Gall, J. (2016). A dual-source approach for 3d pose estimation from a single image. In Conference on computer vision and pattern recognition (CVPR). Yasin, H., Iqbal, U., Krüger, B., Weber, A., & Gall, J. (2016). A dual-source approach for 3d pose estimation from a single image. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Yu, K., Tresp, V., & Schwaighofer, A. (2005). Learning Gaussian processes from multiple tasks. In International conference on machine learning (ICML). Yu, K., Tresp, V., & Schwaighofer, A. (2005). Learning Gaussian processes from multiple tasks. In International conference on machine learning (ICML).
Zurück zum Zitat Yu, L. F., Duncan, N., & Yeung, S. K. (2015). Fill and transfer: A simple physics-based approach for containability reasoning. In International conference on computer vision (ICCV). Yu, L. F., Duncan, N., & Yeung, S. K. (2015). Fill and transfer: A simple physics-based approach for containability reasoning. In International conference on computer vision (ICCV).
Zurück zum Zitat Yu, L. F., Yeung, S. K., Tang, C. K., Terzopoulos, D., Chan, T. F., & Osher, S. J. (2011). Make it home: Automatic optimization of furniture arrangement. ACM Transactions on Graphics (TOG), 30(4), 786–797.CrossRef Yu, L. F., Yeung, S. K., Tang, C. K., Terzopoulos, D., Chan, T. F., & Osher, S. J. (2011). Make it home: Automatic optimization of furniture arrangement. ACM Transactions on Graphics (TOG), 30(4), 786–797.CrossRef
Zurück zum Zitat Yu, L. F., Yeung, S. K., & Terzopoulos, D. (2016). The clutterpalette: An interactive tool for detailing indoor scenes. IEEE Transactions on Visualization & Computer Graph (TVCG), 22(2), 1138–1148.CrossRef Yu, L. F., Yeung, S. K., & Terzopoulos, D. (2016). The clutterpalette: An interactive tool for detailing indoor scenes. IEEE Transactions on Visualization & Computer Graph (TVCG), 22(2), 1138–1148.CrossRef
Zurück zum Zitat Zhang, H., Dana, K., & Nishino, K. (2015). Reflectance hashing for material recognition. In Conference on computer vision and pattern recognition (CVPR). Zhang, H., Dana, K., & Nishino, K. (2015). Reflectance hashing for material recognition. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Zhang, Y., Song, S., Yumer, E., Savva, M., Lee, J. Y., Jin, H., & Funkhouser, T. (2017). Physically-based rendering for indoor scene understanding using convolutional neural networks. In Conference on computer vision and pattern recognition (CVPR). Zhang, Y., Song, S., Yumer, E., Savva, M., Lee, J. Y., Jin, H., & Funkhouser, T. (2017). Physically-based rendering for indoor scene understanding using convolutional neural networks. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Zhao, Y., & Zhu, S. C. (2013). Scene parsing by integrating function, geometry and appearance models. In Conference on computer vision and pattern recognition (CVPR). Zhao, Y., & Zhu, S. C. (2013). Scene parsing by integrating function, geometry and appearance models. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Zheng, B., Zhao, Y., Yu, J., Ikeuchi, K., & Zhu, S. C. (2015). Scene understanding by reasoning stability and safety. International Journal of Computer Vision (IJCV), 112(2), 221–238.MathSciNetCrossRef Zheng, B., Zhao, Y., Yu, J., Ikeuchi, K., & Zhu, S. C. (2015). Scene understanding by reasoning stability and safety. International Journal of Computer Vision (IJCV), 112(2), 221–238.MathSciNetCrossRef
Zurück zum Zitat Zheng, B., Zhao, Y., Yu, J. C., Ikeuchi, K., & Zhu, S. C. (2013). Beyond point clouds: Scene understanding by reasoning geometry and physics. In Conference on computer vision and pattern recognition (CVPR). Zheng, B., Zhao, Y., Yu, J. C., Ikeuchi, K., & Zhu, S. C. (2013). Beyond point clouds: Scene understanding by reasoning geometry and physics. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Zhou, T., Krähenbühl, P., Aubry, M., Huang, Q., & Efros, A. A. (2016). Learning dense correspondence via 3D-guided cycle consistency. In Conference on computer vision and pattern recognition (CVPR). Zhou, T., Krähenbühl, P., Aubry, M., Huang, Q., & Efros, A. A. (2016). Learning dense correspondence via 3D-guided cycle consistency. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Zhou, X., Zhu, M., Leonardos, S., Derpanis, K. G., & Daniilidis, K. (2016). Sparseness meets deepness: 3D human pose estimation from monocular video. In Conference on computer vision and pattern recognition (CVPR). Zhou, X., Zhu, M., Leonardos, S., Derpanis, K. G., & Daniilidis, K. (2016). Sparseness meets deepness: 3D human pose estimation from monocular video. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Zhu, S. C., & Mumford, D. (2007). A stochastic grammar of images. Breda: Now Publishers Inc.MATH Zhu, S. C., & Mumford, D. (2007). A stochastic grammar of images. Breda: Now Publishers Inc.MATH
Zurück zum Zitat Zhu, Y., Fathi, A., & Fei-Fei, L. (2014). Reasoning about object affordances in a knowledge base representation. In European conference on computer vision (ECCV). Zhu, Y., Fathi, A., & Fei-Fei, L. (2014). Reasoning about object affordances in a knowledge base representation. In European conference on computer vision (ECCV).
Zurück zum Zitat Zhu, Y., Jiang, C., Zhao, Y., Terzopoulos, D., & Zhu, S. C. (2016). Inferring forces and learning human utilities from videos. In Conference on computer vision and pattern recognition (CVPR). Zhu, Y., Jiang, C., Zhao, Y., Terzopoulos, D., & Zhu, S. C. (2016). Inferring forces and learning human utilities from videos. In Conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Zhu, Y., Mottaghi, R., Kolve, E., Lim, J. J., Gupta, A., Fei-Fei, L., & Farhadi, A. (2017). Target-driven visual navigation in indoor scenes using deep reinforcement learning. In International conference on robotics and automation (ICRA). Zhu, Y., Mottaghi, R., Kolve, E., Lim, J. J., Gupta, A., Fei-Fei, L., & Farhadi, A. (2017). Target-driven visual navigation in indoor scenes using deep reinforcement learning. In International conference on robotics and automation (ICRA).
Zurück zum Zitat Zhu, Y., Zhao, Y., & Zhu, S. C. (2015). Understanding tools: Task-oriented object modeling, learning and recognition. In Conference on computer vision and pattern recognition (CVPR). Zhu, Y., Zhao, Y., & Zhu, S. C. (2015). Understanding tools: Task-oriented object modeling, learning and recognition. In Conference on computer vision and pattern recognition (CVPR).
Metadaten
Titel
Configurable 3D Scene Synthesis and 2D Image Rendering with Per-pixel Ground Truth Using Stochastic Grammars
verfasst von
Chenfanfu Jiang
Siyuan Qi
Yixin Zhu
Siyuan Huang
Jenny Lin
Lap-Fai Yu
Demetri Terzopoulos
Song-Chun Zhu
Publikationsdatum
30.06.2018
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 9/2018
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-018-1103-5

Weitere Artikel der Ausgabe 9/2018

International Journal of Computer Vision 9/2018 Zur Ausgabe