nach oben

International Journal of Computer Vision

Erschienen in:

30.06.2018

Configurable 3D Scene Synthesis and 2D Image Rendering with Per-pixel Ground Truth Using Stochastic Grammars

verfasst von: Chenfanfu Jiang, Siyuan Qi, Yixin Zhu, Siyuan Huang, Jenny Lin, Lap-Fai Yu, Demetri Terzopoulos, Song-Chun Zhu

Erschienen in: International Journal of Computer Vision | Ausgabe 9/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We propose a systematic learning-based approach to the generation of massive quantities of synthetic 3D scenes and arbitrary numbers of photorealistic 2D images thereof, with associated ground truth information, for the purposes of training, benchmarking, and diagnosing learning-based computer vision and robotics algorithms. In particular, we devise a learning-based pipeline of algorithms capable of automatically generating and rendering a potentially infinite variety of indoor scenes by using a stochastic grammar, represented as an attributed Spatial And-Or Graph, in conjunction with state-of-the-art physics-based rendering. Our pipeline is capable of synthesizing scene layouts with high diversity, and it is configurable inasmuch as it enables the precise customization and control of important attributes of the generated scenes. It renders photorealistic RGB images of the generated scenes while automatically synthesizing detailed, per-pixel ground truth data, including visible surface depth and normal, object identity, and material information (detailed to object parts), as well as environments (e.g., illuminations and camera viewpoints). We demonstrate the value of our synthesized dataset, by improving performance in certain machine-learning-based scene understanding tasks—depth and surface normal prediction, semantic segmentation, reconstruction, etc.—and by providing benchmarks for and diagnostics of trained models by modifying object attributes and scene properties in a controllable manner.

Vorheriger Artikel Sim4CV: A Photo-Realistic Simulator for Computer Vision Applications

Nächster Artikel What Makes Good Synthetic Training Data for Learning Disparity and Optical Flow Estimation?

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Aldous, D. J. (1985). Exchangeability and related topics. In École d’Été de Probabilités de Saint-Flour XIII 1983 (pp. 1–198). Berlin: Springer.

Backhaus, W. G., Kliegl, R., & Werner, J. S. (1998). Color vision: Perspectives from different disciplines. Berlin: Walter de Gruyter.CrossRef

Bansal, A., Russell, B., & Gupta, A. (2016). Marr revisited: 2D-3D alignment via surface normal prediction. In Conference on computer vision and pattern recognition (CVPR).

Bar-Aviv, E., & Rivlin, E. (2006). Functional 3D object classification using simulation of embodied agent. In British machine vision conference (BMVC).

Barron, J. T., & Malik, J. (2015). Shape, illumination, and reflectance from shading. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 37(8), 1670–87.CrossRef

Bartell, F., Dereniak, E., & Wolfe, W. (1981). The theory and measurement of bidirectional reflectance distribution function (brdf) and bidirectional transmittance distribution function (btdf). In Radiation scattering in optical systems (Vol. 257, pp. 154–161). International Society for Optics and Photonics.

Bell, S., Bala, K., & Snavely, N. (2014). Intrinsic images in the wild. ACM Transactions on Graphics (TOG), 33(4), 98.CrossRefMATH

Bell, S., Upchurch, P., Snavely, N., & Bala, K. (2013). Opensurfaces: A richly annotated catalog of surface appearance. ACM Transactions on Graphics (TOG), 32(4), 111.CrossRef

Bell, S., Upchurch, P., Snavely, N., & Bala, K. (2015). Material recognition in the wild with the materials in context database. In Conference on computer vision and pattern recognition (CVPR).

Ben-David, S., Blitzer, J., Crammer, K., & Pereira, F. (2007). Analysis of representations for domain adaptation. In Advances in neural information processing systems (NIPS).

Bickel, S., Brückner, M., & Scheffer, T. (2009). Discriminative learning under covariate shift. Journal of Machine Learning Research, 10, 2137–2155.MathSciNetMATH

Blitzer, J., McDonald, R., & Pereira, F. (2006). Domain adaptation with structural correspondence learning. In Empirical methods in natural language processing (EMNLP).

Carreira-Perpinan, M. A., & Hinton, G. E. (2005). On contrastive divergence learning. AI Stats, 10, 33–40.

Chang, A. X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., & Yu, F. (2015). ShapeNet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012.

Chang, A. X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., & Su, H., et al. (2015). Shapenet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012.

Chapelle, O., & Harchaoui, Z. (2005). A machine learning approach to conjoint analysis. In Advances in neural information processing systems (NIPS).

Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2016). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv preprint arXiv:1606.00915.

Chen, W., Wang, H., Li, Y., Su, H., Lischinsk, D., Cohen-Or, D., & Chen, B., et al. (2016). Synthesizing training images for boosting human 3D pose estimation. In International conference on 3D vision (3DV).

Choi, W., Chao, Y. W., Pantofaru, C., & Savarese, S. (2015). Indoor scene understanding with geometric and semantic contexts. International Journal of Computer Vision (IJCV), 112(2), 204–220.MathSciNetCrossRef

Cortes, C., Mohri, M., Riley, M., & Rostamizadeh, A. (2008). Sample selection bias correction theory. In International conference on algorithmic learning theory.

Csurka, G. (2017). Domain adaptation for visual applications: A comprehensive survey. arXiv preprint arXiv:1702.05374.

Daumé III, H. (2007). Frustratingly easy domain adaptation. In Annual meeting of the association for computational linguistics (ACL).

Daumé III, H. (2009). Bayesian multitask learning with latent hierarchies. In Conference on uncertainty in artificial intelligence (UAI).

Del Pero, L., Bowdish, J., Fried, D., Kermgard, B., Hartley, E., & Barnard, K. (2012). Bayesian geometric modeling of indoor scenes. In Conference on computer vision and pattern recognition (CVPR).

Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In Conference on computer vision and pattern recognition (CVPR).

Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D., & Brox, T. (2015). Flownet: Learning optical flow with convolutional networks. In Conference on computer vision and pattern recognition (CVPR).

Du, Y., Wong, Y., Liu, Y., Han, F., Gui, Y., Wang, Z., Kankanhalli, M., & Geng, W. (2016). Marker-less 3d human motion capture with monocular image sequence and height-maps. In European conference on computer vision (ECCV).

Eigen, D., & Fergus, R. (2015). Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In International conference on computer vision (ICCV).

Eigen, D., Puhrsch, C., & Fergus, R. (2014). Depth map prediction from a single image using a multi-scale deep network. In Advances in neural information processing systems (NIPS).

Everingham, M., Eslami, S. A., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision (IJCV), 111(1), 98–136.CrossRef

Evgeniou, T., & Pontil, M. (2004). Regularized multi–task learning. In International conference on knowledge discovery and data mining (SIGKDD).

Fanello, S. R., Keskin, C., Izadi, S., Kohli, P., Kim, D., Sweeney, D., et al. (2014). Learning to be a depth camera for close-range human capture and interaction. ACM Transactions on Graphics (TOG), 33(4), 86.CrossRefMATH

Fisher, M., Ritchie, D., Savva, M., Funkhouser, T., & Hanrahan, P. (2012). Example-based synthesis of 3D object arrangements. ACM Transactions on Graphics (TOG), 31(6), 208-1–208-12.CrossRef

Fisher, M., Savva, M., & Hanrahan, P. (2011). Characterizing structural relationships in scenes using graph kernels. ACM Transactions on Graphics (TOG), 30(4), 107-1–107-12.CrossRef

Fouhey, D. F., Gupta, A., & Hebert, M. (2013). Data-driven 3d primitives for single image understanding. In International conference on computer vision (ICCV).

Fridman, A. (2003). Mixed markov models. Proceedings of the National Academy of Sciences (PNAS), 100(14), 8093.MathSciNetCrossRef

Gaidon, A., Wang, Q., Cabon, Y., & Vig, E. (2016). Virtual worlds as proxy for multi-object tracking analysis. In Conference on computer vision and pattern recognition (CVPR).

Ganin, Y., & Lempitsky, V. (2015). Unsupervised domain adaptation by backpropagation. In International conference on machine learning (ICML).

Ghezelghieh, M. F., Kasturi, R., & Sarkar, S. (2016). Learning camera viewpoint using cnn to improve 3D body pose estimation. In International conference on 3D vision (3DV).

Grabner, H., Gall, J., & Van Gool, L. (2011). What makes a chair a chair? In Conference on computer vision and pattern recognition (CVPR).

Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., & Wierstra, D. (2015) Draw: A recurrent neural network for image generation. arXiv preprint arXiv:1502.04623.

Gretton, A., Smola, A. J., Huang, J., Schmittfull, M., Borgwardt, K. M., & Schöllkopf, B. (2009). Covariate shift by kernel mean matching. In Dataset shift in machine learning (pp. 131–160). MIT Press.

Gupta, A., Hebert, M., Kanade, T., & Blei, D. M. (2010). Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In Advances in neural information processing systems (NIPS).

Gupta, A., Satkin, S., Efros, A. A., & Hebert, M. (2011). From 3D scene geometry to human workspace. In Conference on computer vision and pattern recognition (CVPR).

Handa, A., Pătrăucean, V., Badrinarayanan, V., Stent, S., & Cipolla, R. (2016). Understanding real world indoor scenes with synthetic data. In Conference on computer vision and pattern recognition (CVPR).

Handa, A., Patraucean, V., Stent, S., & Cipolla, R. (2016). Scenenet: an annotated model generator for indoor scene understanding. In International conference on robotics and automation (ICRA).

Handa, A., Whelan, T., McDonald, J., & Davison, A. J. (2014). A benchmark for rgb-d visual odometry, 3D reconstruction and slam. In International conference on robotics and automation (ICRA).

Hara, K., Nishino, K., et al. (2005). Light source position and reflectance estimation from a single view without the distant illumination assumption. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 27(4), 493–505.CrossRef

Hattori, H., Naresh Boddeti, V., Kitani, K. M., & Kanade, T. (2015). Learning scene-specific pedestrian detectors without real data. In Conference on computer vision and pattern recognition (CVPR).

He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In International conference on computer vision (ICCV).

Heckman, J. J. (1977). Sample selection bias as a specification error (with an application to the estimation of labor supply functions). Massachusetts: National Bureau of Economic Research Cambridge

Hedau, V., Hoiem, D., & Forsyth, D. (2009). Recovering the spatial layout of cluttered rooms. In International conference on computer vision (ICCV).

Heess, N., Sriram, S., Lemmon, J., Merel, J., Wayne, G., Tassa, Y., Erez, T., Wang, Z., Eslami, A., & Riedmiller, M., et al. (2017). Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286.

Hermans, T., Rehg, J. M., & Bobick, A. (2011). Affordance prediction via learned object attributes. In International conference on robotics and automation (ICRA).

Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8), 1771–1800.CrossRefMATH

Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507.MathSciNetCrossRefMATH

Hoiem, D., Efros, A. A., & Hebert, M. (2005). Automatic photo pop-up. ACM Transactions on Graphics (TOG), 24(3), 577–584.CrossRef

Huang, Q., Wang, H., & Koltun, V. (2015). Single-view reconstruction via joint analysis of image and shape collections. ACM Transactions on Graphics (TOG). https://doi.org/10.1145/2766890.

Jiang, Y., Koppula, H., & Saxena, A. (2013). Hallucinated humans as the hidden context for labeling 3D scenes. In Conference on computer vision and pattern recognition (CVPR).

Kohli, Y. Z. M. B. P., Izadi, S., & Xiao, J. (2016). Deepcontext: Context-encoding neural pathways for 3D holistic scene understanding. arXiv preprint arXiv:1603.04922.

Koppula, H. S., & Saxena, A. (2014). Physically grounded spatio-temporal object affordances. In European conference on computer vision (ECCV).

Koppula, H. S., & Saxena, A. (2016). Anticipating human activities using object affordances for reactive robotic response. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 38(1), 14–29.CrossRef

Kratz, L., & Nishino, K. (2009). Factorizing scene albedo and depth from a single foggy image. In International conference on computer vision (ICCV).

Kulkarni, T. D., Kohli, P., Tenenbaum, J. B., & Mansinghka, V. (2015). Picture: A probabilistic programming language for scene perception. In Conference on computer vision and pattern recognition (CVPR).

Kulkarni, T. D., Whitney, W. F., Kohli, P., & Tenenbaum, J. (2015). Deep convolutional inverse graphics network. In Advances in neural information processing systems (NIPS).

Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., & Navab, N. (2016). Deeper depth prediction with fully convolutional residual networks. arXiv preprint arXiv:1606.00373.

Lee, D. C., Hebert, M., & Kanade, T. (2009). Geometric reasoning for single image structure recovery. In Conference on computer vision and pattern recognition (CVPR).

Liang, W., Zhao, Y., Zhu, Y., & Zhu, S.C. (2016). What is where: Inferring containment relations from videos. In International joint conference on artificial intelligence (IJCAI).

Lin, J., Guo, X., Shao, J., Jiang, C., Zhu, Y., & Zhu, S. C. (2016). A virtual reality platform for dynamic human-scene interaction. In SIGGRAPH ASIA 2016 virtual reality meets physical reality: Modelling and simulating virtual humans and environments (pp. 11). ACM.

Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In European conference on computer vision (ECCV).

Liu, F., Shen, C., & Lin, G. (2015). Deep convolutional neural fields for depth estimation from a single image. In Conference on computer vision and pattern recognition (CVPR).

Liu, X., Zhao, Y., & Zhu, S. C. (2014). Single-view 3d scene parsing by attributed grammar. In Conference on computer vision and pattern recognition (CVPR).

Lombardi, S., & Nishino, K. (2016). Reflectance and illumination recovery in the wild. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 38(1), 2321–2334.

Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Conference on computer vision and pattern recognition (CVPR).

Loper, M. M., & Black, M. J. (2014). Opendr: An approximate differentiable renderer. In European conference on computer vision (ECCV).

López, A. M., Xu, J., Gómez, J. L., Vázquez, D., & Ros, G. (2017). From virtual to real world visual perception using domain adaptation the dpm as example. In Domain adaptation in computer vision applications (pp. 243–258). Springer.

Lu, Y., Zhu, S. C., & Wu, Y. N. (2016). Learning frame models using cnn filters. In AAAI Conference on artificial intelligence (AAAI).

Mallya, A., & Lazebnik, S. (2015). Learning informative edge maps for indoor scene layout prediction. In International conference on computer vision (ICCV).

Mansinghka, V., Kulkarni, T. D., Perov, Y. N., & Tenenbaum, J. (2013). Approximate bayesian image interpretation using generative probabilistic graphics programs. In Advances in neural information processing systems (NIPS).

Mansour, Y., Mohri, M., & Rostamizadeh, A. (2009). Domain adaptation: Learning bounds and algorithms. In Annual conference on learning theory (COLT).

Marin, J., Vázquez, D., Gerónimo, D., & López, A. M. (2010). Learning appearance in virtual scenarios for pedestrian detection. In Conference on computer vision and pattern recognition (CVPR).

Merrell, P., Schkufza, E., Li, Z., Agrawala, M., & Koltun, V. (2011). Interactive furniture layout using interior design guidelines. ACM Transactions on Graphics (TOG). https://doi.org/10.1145/2010324.1964982.

Movshovitz-Attias, Y., Kanade, T., & Sheikh, Y. (2016). How useful is photo-realistic rendering for visual learning? In European conference on computer vision (ECCV).

Movshovitz-Attias, Y., Sheikh, Y., Boddeti, V. N., & Wei, Z. (2014). 3D pose-by-detection of vehicles via discriminatively reduced ensembles of correlation filters. In British machine vision conference (BMVC).

Myers, A., Kanazawa, A., Fermuller, C., & Aloimonos, Y. (2014). Affordance of object parts from geometric features. In Workshop on Vision meets Cognition, CVPR.

Nishino, K., Zhang, Z., Ikeuchi, K. (2001). Determining reflectance parameters and illumination distribution from a sparse set of images for view-dependent image synthesis. In International conference on computer vision (ICCV).

Noh, H., Hong, S., & Han, B. (2015). Learning deconvolution network for semantic segmentation. In International conference on computer vision (ICCV).

Oxholm, G., & Nishino, K. (2014). Multiview shape and reflectance from natural illumination. In Conference on computer vision and pattern recognition (CVPR).

Oxholm, G., & Nishino, K. (2016). Shape and reflectance estimation in the wild. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 38(2), 2321–2334.

Pearl, J. (2009). Causality. Cambridge: Cambridge University Press.CrossRefMATH

Peng, X., Sun, B., Ali, K., & Saenko, K. (2015). Learning deep object detectors from 3D models. In Conference on computer vision and pattern recognition (CVPR).

Pharr, M., & Humphreys, G. (2004). Physically based rendering: From theory to implementation. San Francisco: Morgan Kaufmann.

Pishchulin, L., Jain, A., Andriluka, M., Thormählen, T., & Schiele, B. (2012). Articulated people detection and pose estimation: Reshaping the future. In Conference on computer vision and pattern recognition (CVPR).

Pishchulin, L., Jain, A., Wojek, C., Andriluka, M., Thormählen, T., & Schiele, B. (2011). Learning people detection models from few training samples. In Conference on computer vision and pattern recognition (CVPR).

Qi, C. R., Su, H., Niessner, M., Dai, A., Yan, M., & Guibas, L. J. (2016). Volumetric and multi-view cnns for object classification on 3D data. In Conference on computer vision and pattern recognition (CVPR).

Qiu, W. (2016). Generating human images and ground truth using computer graphics. Ph.D. thesis, University of California, Los Angeles.

Qiu, W., & Yuille, A. (2016). Unrealcv: Connecting computer vision to unreal engine. arXiv preprint arXiv:1609.01326.

Qureshi, F., & Terzopoulos, D. (2008). Smart camera networks in virtual reality. Proceedings of the IEEE, 96(10), 1640–1656.CrossRef

Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.

Rahmani, H., & Mian, A. (2015). Learning a non-linear knowledge transfer model for cross-view action recognition. In Conference on computer vision and pattern recognition (CVPR).

Rahmani, H., & Mian, A. (2016). 3D action recognition from novel viewpoints. In Conference on computer vision and pattern recognition (CVPR).

Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (NIPS).

Richter, S. R., Vineet, V., Roth, S., & Koltun, V. (2016). Playing for data: Ground truth from computer games. In European conference on computer vision (ECCV).

Roberto de Souza, C., Gaidon, A., Cabon, Y., & Manuel Lopez, A. (2017). Procedural generation of videos to train deep action recognition networks. In Conference on computer vision and pattern recognition (CVPR).

Rogez, G., & Schmid, C. (2016). Mocap-guided data augmentation for 3D pose estimation in the wild. In Advances in neural information processing systems (NIPS).

Romero, J., Loper, M., & Black, M. J. (2015). Flowcap: 2D human pose from optical flow. In German conference on pattern recognition.

Ros, G., Sellart, L., Materzynska, J., Vazquez, D., & Lopez, A.M. (2016). The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Conference on computer vision and pattern recognition (CVPR).

Roy, A., & Todorovic, S. (2016). A multi-scale cnn for affordance segmentation in rgb images. In European conference on computer vision (ECCV).

Sato, I., Sato, Y., & Ikeuchi, K. (2003). Illumination from shadows. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 25(3), 1218–1227.

Shakhnarovich, G., Viola, P., & Darrell, T. (2003). Fast pose estimation with parameter-sensitive hashing. In International conference on computer vision (ICCV).

Sharma, G., & Bala, R. (2002). Digital color imaging handbook. Boca Raton: CRC Press.CrossRef

Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A., et al. (2013). Real-time human pose recognition in parts from single depth images. Communications of the ACM, 56(1), 116–124.CrossRef

Silberman, N., Hoiem, D., Kohli, P., & Fergus, R. (2012). Indoor segmentation and support inference from rgbd images. In European conference on computer vision (ECCV).

Song, S., & Xiao, J. (2014). Sliding shapes for 3D object detection in depth images. In European conference on computer vision (ECCV).

Song, S., Yu, F., Zeng, A., Chang, A. X., Savva, M., & Funkhouser, T. (2014). Semantic scene completion from a single depth image. In Conference on computer vision and pattern recognition (CVPR).

Stark, L., & Bowyer, K. (1991). Achieving generalized object recognition through reasoning about association of function to structure. Transactions on Pattern Analysis and Machine Intelligence (TPAMI),13(10), 1097–1104.

Stark, M., Goesele, M., & Schiele, B. (2010). Back to the future: Learning shape models from 3D cad data. In British machine vision conference (BMVC).

Su, H., Huang, Q., Mitra, N. J., Li, Y., & Guibas, L. (2014). Estimating image depth using shape collections. ACM Transactions on Graphics (TOG), 33(4), 37.MATH

Su, H., Qi, C. R., Li, Y., & Guibas, L. J. (2015). Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views. In International conference on computer vision (ICCV).

Sun, B., & Saenko, K. (2014). From virtual to reality: Fast adaptation of virtual object detectors to real domains. In British machine vision conference (BMVC).

Sun, C., Shrivastava, A., Singh, S., & Gupta, A. (2017). Revisiting unreasonable effectiveness of data in deep learning era. In International conference on computer vision (ICCV).

Terzopoulos, D., & Rabie, T. F. (1995). Animat vision: Active vision in artificial animals. In International conference on computer vision (ICCV).

Torralba, A., & Efros, A.A. (2011). Unbiased look at dataset bias. In Conference on computer vision and pattern recognition (CVPR).

Tzeng, E., Hoffman, J., Darrell, T., & Saenko, K. (2015). Simultaneous deep transfer across domains and tasks. In International conference on computer vision (ICCV).

Valberg, A. (2007). Light vision color. New York: Wiley.

Varol, G., Romero, J., Martin, X., Mahmood, N., Black, M., Laptev, I., & Schmid, C. (2017). Learning from synthetic humans. In Conference on computer vision and pattern recognition (CVPR).

Vázquez, D., Lopez, A. M., Marin, J., Ponsa, D., & Geronimo, D. (2014). Virtual and real world adaptation for pedestrian detection. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 36(4), 797–809.CrossRef

Wang, X., Fouhey, D., & Gupta, A. (2015). Designing deep networks for surface normal estimation. In Conference on computer vision and pattern recognition (CVPR).

Wang, X., & Gupta, A. (2016). Generative image modeling using style and structure adversarial networks. arXiv preprint arXiv:1603.05631.

Wang, Z., Merel, J. S., Reed, S. E., de Freitas, N., Wayne, G., & Heess, N. (2017). Robust imitation of diverse behaviors. In Advances in neural information processing systems (NIPS).

Weinberger, K., Dasgupta, A., Langford, J., Smola, A., & Attenberg, J. (2009). Feature hashing for large scale multitask learning. In International conference on machine learning (ICML).

Whelan, T., Leutenegger, S., Salas-Moreno, R. F., Glocker, B., & Davison, A. J. (2015). Elasticfusion: Dense slam without a pose graph. In Robotics: Science and systems (RSS).

Wu, J. (2016). Computational perception of physical object properties. Ph.D. thesis, Massachusetts Institute of Technology.

Wu, J., Yildirim, I., Lim, J. J., Freeman, B., & Tenenbaum, J. (2015). Galileo: Perceiving physical object properties by integrating a physics engine with deep learning. In Advances in neural information processing systems (NIPS).

Xiao, J., Russell, B., & Torralba, A. (2012). Localizing 3D cuboids in single-view images. In Advances in neural information processing systems (NIPS).

Xie, J., Lu, Y., Zhu, S. C., & Wu, Y. N. (2016). Cooperative training of descriptor and generator networks. arXiv preprint arXiv:1609.09408.

Xie, J., Lu, Y., Zhu, S. C., & Wu, Y. N. (2016). A theory of generative convnet. In International conference on machine learning (ICML).

Xue, Y., Liao, X., Carin, L., & Krishnapuram, B. (2007). Multi-task learning for classification with dirichlet process priors. Journal of Machine Learning Research, 8, 35–63.MathSciNetMATH

Yasin, H., Iqbal, U., Krüger, B., Weber, A., & Gall, J. (2016). A dual-source approach for 3d pose estimation from a single image. In Conference on computer vision and pattern recognition (CVPR).

Yeh, Y. T., Yang, L., Watson, M., Goodman, N. D.,&Hanrahan, P. (2012). Synthesizing open worlds with constraints using locally annealed reversible jump mcmc. ACM Transactions on Graphics (TOG), https://doi.org/.10.1145/2185520.2185552.

Yu, K., Tresp, V., & Schwaighofer, A. (2005). Learning Gaussian processes from multiple tasks. In International conference on machine learning (ICML).

Yu, L. F., Duncan, N., & Yeung, S. K. (2015). Fill and transfer: A simple physics-based approach for containability reasoning. In International conference on computer vision (ICCV).

Yu, L. F., Yeung, S. K., Tang, C. K., Terzopoulos, D., Chan, T. F., & Osher, S. J. (2011). Make it home: Automatic optimization of furniture arrangement. ACM Transactions on Graphics (TOG), 30(4), 786–797.CrossRef

Yu, L. F., Yeung, S. K., & Terzopoulos, D. (2016). The clutterpalette: An interactive tool for detailing indoor scenes. IEEE Transactions on Visualization & Computer Graph (TVCG), 22(2), 1138–1148.CrossRef

Zhang, H., Dana, K., & Nishino, K. (2015). Reflectance hashing for material recognition. In Conference on computer vision and pattern recognition (CVPR).

Zhang, Y., Song, S., Yumer, E., Savva, M., Lee, J. Y., Jin, H., & Funkhouser, T. (2017). Physically-based rendering for indoor scene understanding using convolutional neural networks. In Conference on computer vision and pattern recognition (CVPR).

Zhao, Y., & Zhu, S. C. (2013). Scene parsing by integrating function, geometry and appearance models. In Conference on computer vision and pattern recognition (CVPR).

Zheng, B., Zhao, Y., Yu, J., Ikeuchi, K., & Zhu, S. C. (2015). Scene understanding by reasoning stability and safety. International Journal of Computer Vision (IJCV), 112(2), 221–238.MathSciNetCrossRef

Zheng, B., Zhao, Y., Yu, J. C., Ikeuchi, K., & Zhu, S. C. (2013). Beyond point clouds: Scene understanding by reasoning geometry and physics. In Conference on computer vision and pattern recognition (CVPR).

Zhou, T., Krähenbühl, P., Aubry, M., Huang, Q., & Efros, A. A. (2016). Learning dense correspondence via 3D-guided cycle consistency. In Conference on computer vision and pattern recognition (CVPR).

Zhou, X., Zhu, M., Leonardos, S., Derpanis, K. G., & Daniilidis, K. (2016). Sparseness meets deepness: 3D human pose estimation from monocular video. In Conference on computer vision and pattern recognition (CVPR).

Zhu, S. C., & Mumford, D. (2007). A stochastic grammar of images. Breda: Now Publishers Inc.MATH

Zhu, Y., Fathi, A., & Fei-Fei, L. (2014). Reasoning about object affordances in a knowledge base representation. In European conference on computer vision (ECCV).

Zhu, Y., Jiang, C., Zhao, Y., Terzopoulos, D., & Zhu, S. C. (2016). Inferring forces and learning human utilities from videos. In Conference on computer vision and pattern recognition (CVPR).

Zhu, Y., Mottaghi, R., Kolve, E., Lim, J. J., Gupta, A., Fei-Fei, L., & Farhadi, A. (2017). Target-driven visual navigation in indoor scenes using deep reinforcement learning. In International conference on robotics and automation (ICRA).

Zhu, Y., Zhao, Y., & Zhu, S. C. (2015). Understanding tools: Task-oriented object modeling, learning and recognition. In Conference on computer vision and pattern recognition (CVPR).

Titel: Configurable 3D Scene Synthesis and 2D Image Rendering with Per-pixel Ground Truth Using Stochastic Grammars
verfasst von: Chenfanfu Jiang
Siyuan Qi
Yixin Zhu
Siyuan Huang
Jenny Lin
Lap-Fai Yu
Demetri Terzopoulos
Song-Chun Zhu
Publikationsdatum: 30.06.2018
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 9/2018
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-018-1103-5

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 9/2018

Synthesizing a Scene-Specific Pedestrian Detector and Pose Estimator for Static Video Surveillance

What Makes Good Synthetic Training Data for Learning Disparity and Optical Flow Estimation?

Sim4CV: A Photo-Realistic Simulator for Computer Vision Applications

Virtual Training for a Real Application: Accurate Object-Robot Relative Localization Without Calibration

The Reasonable Effectiveness of Synthetic Visual Data

3D Interpreter Networks for Viewer-Centered Wireframe Modeling