Skip to main content
Erschienen in: International Journal of Computer Vision 2/2015

01.04.2015

Scene Understanding by Reasoning Stability and Safety

verfasst von: Bo Zheng, Yibiao Zhao, Joey Yu, Katsushi Ikeuchi, Song-Chun Zhu

Erschienen in: International Journal of Computer Vision | Ausgabe 2/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper presents a new perspective for 3D scene understanding by reasoning object stability and safety using intuitive mechanics. Our approach utilizes a simple observation that, by human design, objects in static scenes should be stable in the gravity field and be safe with respect to various physical disturbances such as human activities. This assumption is applicable to all scene categories and poses useful constraints for the plausible interpretations (parses) in scene understanding. Given a 3D point cloud captured for a static scene by depth cameras, our method consists of three steps: (i) recovering solid 3D volumetric primitives from voxels; (ii) reasoning stability by grouping the unstable primitives to physically stable objects by optimizing the stability and the scene prior; and (iii) reasoning safety by evaluating the physical risks for objects under physical disturbances, such as human activity, wind or earthquakes. We adopt a novel intuitive physics model and represent the energy landscape of each primitive and object in the scene by a disconnectivity graph (DG). We construct a contact graph with nodes being 3D volumetric primitives and edges representing the supporting relations. Then we adopt a Swendson–Wang Cuts algorithm to partition the contact graph into groups, each of which is a stable object. In order to detect unsafe objects in a static scene, our method further infers hidden and situated causes (disturbances) in the scene, and then introduces intuitive physical mechanics to predict possible effects (e.g., falls) as consequences of the disturbances. In experiments, we demonstrate that the algorithm achieves a substantially better performance for (i) object segmentation, (ii) 3D volumetric recovery, and (iii) scene understanding with respect to other state-of-the-art methods. We also compare the safety prediction from the intuitive mechanics model with human judgement.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Anand, A., Koppula, H., Joachims, T., & Saxena, A. (2012). Contextually guided semantic labeling and search for 3d point clouds. In IJRR. Anand, A., Koppula, H., Joachims, T., & Saxena, A. (2012). Contextually guided semantic labeling and search for 3d point clouds. In IJRR.
Zurück zum Zitat Attene, M., Falcidieno, B., & Spagnuolo, M. (2006). Hierarchical mesh segmentation based on fitting primitives. The Visual Computer, 22, 181–193.CrossRef Attene, M., Falcidieno, B., & Spagnuolo, M. (2006). Hierarchical mesh segmentation based on fitting primitives. The Visual Computer, 22, 181–193.CrossRef
Zurück zum Zitat Barbu, A., & Zhu, S. C. (2005). Generalizing Swendsen–Wang to sampling arbitrary posterior probabilities. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 1239–1253.CrossRef Barbu, A., & Zhu, S. C. (2005). Generalizing Swendsen–Wang to sampling arbitrary posterior probabilities. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 1239–1253.CrossRef
Zurück zum Zitat Biederman, I., Mezzanotte, R. J., & Rabinowitz, J. C. (1982). Scene perception: Detecting and judging objects undergoing relational violations. Cognitive Psychology, 14(2), 143–177.CrossRef Biederman, I., Mezzanotte, R. J., & Rabinowitz, J. C. (1982). Scene perception: Detecting and judging objects undergoing relational violations. Cognitive Psychology, 14(2), 143–177.CrossRef
Zurück zum Zitat Blane, M., Lei, Z. B., & Cooper, D. B. (2000). The 3L algorithm for fitting implicit polynomial curves and surfaces to data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(3), 298–313.CrossRef Blane, M., Lei, Z. B., & Cooper, D. B. (2000). The 3L algorithm for fitting implicit polynomial curves and surfaces to data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(3), 298–313.CrossRef
Zurück zum Zitat Chen, X., Golovinskiy, A., & Funkhouser, T. (2009). A benchmark for 3D mesh segmentation. In SIGGRAPH. Chen, X., Golovinskiy, A., & Funkhouser, T. (2009). A benchmark for 3D mesh segmentation. In SIGGRAPH.
Zurück zum Zitat Delaitre, V., Fouhey, D., Laptev, I., Sivic, J., Gupta, A., & Efros, A. (2012). Scene semantics from long-term observation of people. In ECCV. Delaitre, V., Fouhey, D., Laptev, I., Sivic, J., Gupta, A., & Efros, A. (2012). Scene semantics from long-term observation of people. In ECCV.
Zurück zum Zitat Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181.CrossRef Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181.CrossRef
Zurück zum Zitat Fleming, R., Barnett-Cowan, M., & Bülthoff, H. (2010). Perceived object stability is affected by the internal representation of gravity. Perception, 39, 109. Fleming, R., Barnett-Cowan, M., & Bülthoff, H. (2010). Perceived object stability is affected by the internal representation of gravity. Perception, 39, 109.
Zurück zum Zitat Fouhey, D., Delaitre, V., Gupta, A., Efros, A., Laptev, I., & Sivic, J. (2012). People watching: Human actions as a cue for single-view geometry. In ECCV. Fouhey, D., Delaitre, V., Gupta, A., Efros, A., Laptev, I., & Sivic, J. (2012). People watching: Human actions as a cue for single-view geometry. In ECCV.
Zurück zum Zitat Furukawa, Y., Curless, B., Seitz, S. M., & Szeliski, R. (2009). Manhattan-world stereo. In CVPR. Furukawa, Y., Curless, B., Seitz, S. M., & Szeliski, R. (2009). Manhattan-world stereo. In CVPR.
Zurück zum Zitat Grabner, H., Gall, J., & Van, G. L. (2011). What makes a chair a chair? In CVPR. Grabner, H., Gall, J., & Van, G. L. (2011). What makes a chair a chair? In CVPR.
Zurück zum Zitat Guo, R., & Hoiem, D. (2013). Support surface prediction in indoor scenes. In ICCV. Guo, R., & Hoiem, D. (2013). Support surface prediction in indoor scenes. In ICCV.
Zurück zum Zitat Gupta, A., Efros, A., & Hebert, M. (2010). Blocks world revisited: Image understanding using qualitative geometry and mechanics. In ECCV. Gupta, A., Efros, A., & Hebert, M. (2010). Blocks world revisited: Image understanding using qualitative geometry and mechanics. In ECCV.
Zurück zum Zitat Gupta, A., Satkin, S., Efros, A., & Hebert, M. (2011). From 3D scene geometry to human workspace. In CVPR. Gupta, A., Satkin, S., Efros, A., & Hebert, M. (2011). From 3D scene geometry to human workspace. In CVPR.
Zurück zum Zitat Hamrick, J., Battaglia, P., & Tenenbaum, J. (2011). Internal physics models guide probabilistic judgments about object dynamics. In Proceedings of the 33rd Annual Meeting of the Cognitive Science Society. Hamrick, J., Battaglia, P., & Tenenbaum, J. (2011). Internal physics models guide probabilistic judgments about object dynamics. In Proceedings of the 33rd Annual Meeting of the Cognitive Science Society.
Zurück zum Zitat Hedau, V., Hoiem, D., & Forsyth, D. (2010). Thinking inside the box: Using appearance models and context based on room geometry. In ECCV. Hedau, V., Hoiem, D., & Forsyth, D. (2010). Thinking inside the box: Using appearance models and context based on room geometry. In ECCV.
Zurück zum Zitat Janoch, A., Karayev, S., Jia, Y., Barron, J. T., Fritz, M., Saenko, K., & Darrell, T. (2011). A category-level 3-d object dataset: Putting the kinect to work. In ICCV workshop. Janoch, A., Karayev, S., Jia, Y., Barron, J. T., Fritz, M., Saenko, K., & Darrell, T. (2011). A category-level 3-d object dataset: Putting the kinect to work. In ICCV workshop.
Zurück zum Zitat Jia, Z., Gallagher, A., Saxena, A., & Chen, T. (2013). 3d-based reasoning with blocks, support, and stability. In CVPR. Jia, Z., Gallagher, A., Saxena, A., & Chen, T. (2013). 3d-based reasoning with blocks, support, and stability. In CVPR.
Zurück zum Zitat Jiang, Y., & Saxena, A. (2013). Infinite latent conditional random fields for modeling environments through humans. In Robotics: Science and Systems (RSS). Jiang, Y., & Saxena, A. (2013). Infinite latent conditional random fields for modeling environments through humans. In Robotics: Science and Systems (RSS).
Zurück zum Zitat Jiang, Y., Koppula, H.S., & Saxena, A. (2013). Hallucinated humans as the hidden context for labeling 3d scenes. In: CVPR. Jiang, Y., Koppula, H.S., & Saxena, A. (2013). Hallucinated humans as the hidden context for labeling 3d scenes. In: CVPR.
Zurück zum Zitat Karpathy, A., Miller, S., & Fei-Fei, L. (2013). Object discovery in 3d scenes via shape analysis. In International Conference on Robotics and Automation (ICRA). Karpathy, A., Miller, S., & Fei-Fei, L. (2013). Object discovery in 3d scenes via shape analysis. In International Conference on Robotics and Automation (ICRA).
Zurück zum Zitat Koppula, H., Anand, A., Joachims, T., & Saxena, A. (2011). Semantic labeling of 3d point clouds for indoor scenes. In NIPS. Koppula, H., Anand, A., Joachims, T., & Saxena, A. (2011). Semantic labeling of 3d point clouds for indoor scenes. In NIPS.
Zurück zum Zitat Kriegman, D. J. (1995). Let them fall where they may: Capture regions of curved objects and polyhedra. International Journal of Robotics Research, 16, 448–472.CrossRef Kriegman, D. J. (1995). Let them fall where they may: Capture regions of curved objects and polyhedra. International Journal of Robotics Research, 16, 448–472.CrossRef
Zurück zum Zitat Lee, D., Hebert, M., & Kanade, T. (2009). Geometric reasoning for single image structure recovery. In CVPR. Lee, D., Hebert, M., & Kanade, T. (2009). Geometric reasoning for single image structure recovery. In CVPR.
Zurück zum Zitat Lee, D., Gupta, A., Hebert, M., & Kanade, T. (2010). Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces advances in neural information processing systems. Cambridge: MIT. Lee, D., Gupta, A., Hebert, M., & Kanade, T. (2010). Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces advances in neural information processing systems. Cambridge: MIT.
Zurück zum Zitat McCloskey, M. (1983). Intuitive physics. Scientific American, 248(4), 114–122. McCloskey, M. (1983). Intuitive physics. Scientific American, 248(4), 114–122.
Zurück zum Zitat Nan, L., Xie, K., & Sharf, A. (2012). A search-classify approach for cluttered indoor scene understanding. ACM Transactions on Graphics (TOG), 31(6), 137.CrossRef Nan, L., Xie, K., & Sharf, A. (2012). A search-classify approach for cluttered indoor scene understanding. ACM Transactions on Graphics (TOG), 31(6), 137.CrossRef
Zurück zum Zitat Newcombe, R., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A., Kohli, P., Shotton, J., Hodges, S., & Fitzgibbon, A. (2011). Kinectfusion: Real-time dense surface mapping and tracking. In ISMAR. Newcombe, R., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A., Kohli, P., Shotton, J., Hodges, S., & Fitzgibbon, A. (2011). Kinectfusion: Real-time dense surface mapping and tracking. In ISMAR.
Zurück zum Zitat Petti, S., & Fraichard, T. (2005). Safe motion planning in dynamic environments. In IROS. Petti, S., & Fraichard, T. (2005). Safe motion planning in dynamic environments. In IROS.
Zurück zum Zitat Phillips, M., & Likhachev, M. (2011). Sipp: Safe interval path planning for dynamic environments. In ICRA. Phillips, M., & Likhachev, M. (2011). Sipp: Safe interval path planning for dynamic environments. In ICRA.
Zurück zum Zitat Poppinga, J., Vaskevicius, N., Birk, A., & Pathak, K. (2008). Fast plane detection and polygonalization in noisy 3D range images. In IROS. Poppinga, J., Vaskevicius, N., Birk, A., & Pathak, K. (2008). Fast plane detection and polygonalization in noisy 3D range images. In IROS.
Zurück zum Zitat Sagawa, R., Nishino, K., & Ikeuchi, K. (2005). Adaptively merging large-scale range data with reflectance properties. IEEE Transaction on Pattern Analysis and Machine Intelligence, 27, 392–405.CrossRef Sagawa, R., Nishino, K., & Ikeuchi, K. (2005). Adaptively merging large-scale range data with reflectance properties. IEEE Transaction on Pattern Analysis and Machine Intelligence, 27, 392–405.CrossRef
Zurück zum Zitat Savva, M., Chang, A. X., Hanrahan, P., & Fisher, M. (2014). Scenegrok: Inferring action maps in 3d environments. ACM Transactions on Graphics (TOG), 33(6), 212.CrossRef Savva, M., Chang, A. X., Hanrahan, P., & Fisher, M. (2014). Scenegrok: Inferring action maps in 3d environments. ACM Transactions on Graphics (TOG), 33(6), 212.CrossRef
Zurück zum Zitat Shao, T., Xu, W., Zhou, K., Wang, J., & Li, D. (2012). An interactive approach to semantic modeling of indoor scenes with an rgbd camera. ACM Transactions on Graphics (TOG), 31, 136. Shao, T., Xu, W., Zhou, K., Wang, J., & Li, D. (2012). An interactive approach to semantic modeling of indoor scenes with an rgbd camera. ACM Transactions on Graphics (TOG), 31, 136.
Zurück zum Zitat Shao, T., Monszpart, A., Zheng, Y., Koo, B., Ku, W., Zhou, K., et al. (2014). Imagining the unseen: Stability-based cuboid arrangements for scene understanding. ACM Transactions on Graphics (TOG), 33, 209. Shao, T., Monszpart, A., Zheng, Y., Koo, B., Ku, W., Zhou, K., et al. (2014). Imagining the unseen: Stability-based cuboid arrangements for scene understanding. ACM Transactions on Graphics (TOG), 33, 209.
Zurück zum Zitat Shi, Q. Y., & Ks, Fu. (1983). Parsing and translation of (attributed) expansive graph languages for scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 5(5), 472–485.CrossRefMATH Shi, Q. Y., & Ks, Fu. (1983). Parsing and translation of (attributed) expansive graph languages for scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 5(5), 472–485.CrossRefMATH
Zurück zum Zitat Silberman, N., Kohli, P., Hoiem, D. & Fergus, R. (2012). Indoor segmentation and support inference from RGBD images. In ECCV. Silberman, N., Kohli, P., Hoiem, D. & Fergus, R. (2012). Indoor segmentation and support inference from RGBD images. In ECCV.
Zurück zum Zitat Tu, Z., Chen, X., Yuille, A. L., & Zhu, S. C. (2005). Image parsing: Unifying segmentation, detection, and recognition. International Journal of Computer Vision, 63, 113.CrossRef Tu, Z., Chen, X., Yuille, A. L., & Zhu, S. C. (2005). Image parsing: Unifying segmentation, detection, and recognition. International Journal of Computer Vision, 63, 113.CrossRef
Zurück zum Zitat Wales, D. (2004). Energy landscapes: Applications to clusters, biomolecules and glasses. Cambridge: Cambridge Molecular Science, Cambridge University Press.CrossRef Wales, D. (2004). Energy landscapes: Applications to clusters, biomolecules and glasses. Cambridge: Cambridge Molecular Science, Cambridge University Press.CrossRef
Zurück zum Zitat Wu, C., Lenz, I., & Saxena, A. (2014). Hierarchical semantic labeling for task-relevant rgb-d perception. In Robotics: Science and systems (RSS). Wu, C., Lenz, I., & Saxena, A. (2014). Hierarchical semantic labeling for task-relevant rgb-d perception. In Robotics: Science and systems (RSS).
Zurück zum Zitat Zhao, Y., & Zhu, S. C. (2011). Image parsing via stochastic scene grammar. In NIPS. Zhao, Y., & Zhu, S. C. (2011). Image parsing via stochastic scene grammar. In NIPS.
Zurück zum Zitat Zheng, B., Takamatsu, J., & Ikeuchi, K. (2010). An adaptive and stable method for fitting implicit polynomial curves and surfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(3), 561–568.CrossRef Zheng, B., Takamatsu, J., & Ikeuchi, K. (2010). An adaptive and stable method for fitting implicit polynomial curves and surfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(3), 561–568.CrossRef
Zurück zum Zitat Zheng, B., Zhao, Y., Yu, J. C., Ikeuchi, K., & Zhu, S. C. (2013). Beyond point cloud: Scene understanding by reasoning geometry and physics. In CVPR. Zheng, B., Zhao, Y., Yu, J. C., Ikeuchi, K., & Zhu, S. C. (2013). Beyond point cloud: Scene understanding by reasoning geometry and physics. In CVPR.
Zurück zum Zitat Zheng, B., Zhao, Y., Yu, J. C., Ikeuchi, K., & Zhu, S. C. (2014). Detecting potential falling objects by inferring human action and natural disturbance. In IEEE international conference on robotics and automation (ICRA). Zheng, B., Zhao, Y., Yu, J. C., Ikeuchi, K., & Zhu, S. C. (2014). Detecting potential falling objects by inferring human action and natural disturbance. In IEEE international conference on robotics and automation (ICRA).
Metadaten
Titel
Scene Understanding by Reasoning Stability and Safety
verfasst von
Bo Zheng
Yibiao Zhao
Joey Yu
Katsushi Ikeuchi
Song-Chun Zhu
Publikationsdatum
01.04.2015
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 2/2015
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-014-0795-4

Weitere Artikel der Ausgabe 2/2015

International Journal of Computer Vision 2/2015 Zur Ausgabe