research-article

A search-classify approach for cluttered indoor scene understanding

Authors:
Liangliang Nan

Shenzhen VisuCA Key Lab/SIAT

Shenzhen VisuCA Key Lab/SIAT
View Profile

,
Ke Xie

Shenzhen VisuCA Key Lab/SIAT

Shenzhen VisuCA Key Lab/SIAT
View Profile

,
Andrei Sharf

Ben Gurion University

Ben Gurion University
View Profile

Authors Info & Claims

ACM Transactions on Graphics Volume 31 Issue 6Article No.: 137pp 1–10https://doi.org/10.1145/2366145.2366156

Published:01 November 2012Publication History

ACM Transactions on Graphics

Abstract

We present an algorithm for recognition and reconstruction of scanned 3D indoor scenes. 3D indoor reconstruction is particularly challenging due to object interferences, occlusions and overlapping which yield incomplete yet very complex scene arrangements. Since it is hard to assemble scanned segments into complete models, traditional methods for object recognition and reconstruction would be inefficient. We present a search-classify approach which interleaves segmentation and classification in an iterative manner. Using a robust classifier we traverse the scene and gradually propagate classification information. We reinforce classification by a template fitting step which yields a scene reconstruction. We deform-to-fit templates to classified objects to resolve classification ambiguities. The resulting reconstruction is an approximation which captures the general scene arrangement. Our results demonstrate successful classification and reconstruction of cluttered indoor scenes, captured in just few minutes.

References

Anguelov, D., Taskar, B., Chatalbashev, V., Koller, D., Gupta, D., Heitz, G., and Ng, A. 2005. Discriminative learning of markov random fields for segmentation of 3d scan data. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02, CVPR '05, 169--176. Google ScholarDigital Library
Belongie, S., Malik, J., and Puzicha, J. 2002. Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24, 509--522. Google ScholarDigital Library
Belongie, S., Malik, J., and Puzicha, J. 2002. Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24 (April), 509--522. Google ScholarDigital Library
Bochkanov, S., 2012. Alglib library. http://www.alglib.net/.Google Scholar
Breiman, L. 2001. Random forests. Mach. Learn. 45, 5--32. Google ScholarDigital Library
Dick, A. R., Torr, P. H. S., and Cipolla, R. 2004. Modelling and interpretation of architecture from several images. Int. J. Comput. Vision 60, 2, 111--134. Google ScholarDigital Library
Fei-Fei, L., Fergus, R., and Perona, P. 2007. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Comput. Vis. Image Underst. 106, 59--70. Google ScholarDigital Library
Fisher, M., and Hanrahan, P. 2010. Context-based search for 3d models. In ACM SIGGRAPH Asia 2010 papers, 182:1--182:10. Google ScholarDigital Library
Fisher, M., Savva, M., and Hanrahan, P. 2011. Characterizing structural relationships in scenes using graph kernels. ACM Trans. Graph., 34:1--34:12. Google ScholarDigital Library
Frome, A., Huber, D., Kolluri, R., and Bülow, T. 2004. Recognizing objects in range data using regional point descriptors. In ECCV, 224--237.Google Scholar
Frome, A., Huber, D., Kolluri, R., Bulow, T., and Malik, J. 2004. Recognizing objects in range data using regional point descriptors. In Proceedings of the European Conference on Computer Vision (ECCV).Google Scholar
Fu, H., Cohen-Or, D., Dror, G., and Sheffer, A. 2008. Upright orientation of man-made objects. In ACM SIGGRAPH 2008, 42:1--42:7. Google ScholarDigital Library
Furukawa, Y., Curless, B., Seitz, S. M., and Szeliski, R. 2009. Reconstructing building interiors from images.Google Scholar
Gal, R., Shamir, A., Hassner, T., Pauly, M., and Cohen-Or, D. 2007. Surface reconstruction using local shape priors. In Proc. of Eurographics Symp. on Geometry Processing, 253--262. Google ScholarDigital Library
Galleguillos, C., Rabinovich, A., and Belongie, S. 2008. Object categorization using co-occurrence, location and appearance. IEEE Conference on Computer Vision and Pattern Recognition (2008), 1--8.Google Scholar
Goesele, M., Snavely, N., Curless, B., Hoppe, H., and Seitz, S. 2007. Multi-view stereo for community photo collections. In Proc. of Int. Conf. on Comp. Vis., 1--8.Google Scholar
Golovinskiy, A., Kim, V. G., and Funkhouser, T. 2009. Shape-based recognition of 3D point clouds in urban environments. International Conference on Computer Vision (ICCV) (Sept.).Google Scholar
Hedau, V., Hoiem, D., and Forsyth, D. 2010. Thinking inside the box: using appearance models and context based on room geometry. In Proc. Euro. Conf. on Comp. Vis., 224--237. Google ScholarDigital Library
Johnson, A. E., and Hebert, M. 1999. Using spin images for efficient object recognition in cluttered 3d scenes. IEEE Trans. Pattern Anal. Mach. Intell. 21 (May), 433--449. Google ScholarDigital Library
Kim, Y. M., Mitra, N. J., Yan, D., and Guibas, L. 2012. Acquiring 3d indoor environments with variability and repetition. In ACM SIGGRAPH, "to appear". Google ScholarDigital Library
Koppula, H. S., Anand, A., Joachims, T., and Saxena, A. 2011. Semantic labeling of 3d point clouds for indoor scenes. In NIPS, 244--252.Google Scholar
Lai, K., and Fox, D. 2010. Object recognition in 3d point clouds using web data and domain adaptation. International Journal of Robotics Research 29, 1019--1037. Google ScholarDigital Library
Lai, K., Bo, L., Ren, X., and Fox, D. 2011. A large-scale hierarchical multi-view rgb-d object dataset. 2011 IEEE International Conference on Robotics and Automation, 1817--1824.Google Scholar
Li, Y., Wu, X., Chrysathou, Y., Sharf, A., Cohen-Or, D., and Mitra, N. J. 2011. Globfit: consistently fitting primitives by discovering global relations. In ACM SIGGRAPH, 52:1--52:12. Google ScholarDigital Library
Livny, Y., Yan, F., Olson, M., Chen, B., Zhang, H., and El-Sana, J. 2010. Automatic reconstruction of tree skeletal structures from point clouds. ACM Trans. Graph. 29, 151:1--151:8. Google ScholarDigital Library
Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 91--110. Google ScholarDigital Library
Matei, B., Shan, Y., Sawhney, H. S., Tan, Y., Kumar, R., Huber, D., and Hebert, M. 2006. Rapid object indexing using locality sensitive hashing and joint 3d-signature space estimation. IEEE Trans. Pattern Anal. Mach. Intell. 28, 1111--1126. Google ScholarDigital Library
Munoz, D., Bagnell, J. A., Vandapel, N., and Hebert, M. 2009. Contextual classification with functional max-margin markov networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Nan, L., Sharf, A., Zhang, H., Cohen-Or, D., and Chen, B. 2010. Smartboxes for interactive urban reconstruction. Proc. of ACM SIGGRAPH 29, 4, 1--10. Google ScholarDigital Library
Pollefeys, M., Nistér, D., Frahm, J. M., Akbarzadeh, A., Mordohai, P., Clipp, B., Engels, C., Gallup, D., Kim, S. J., Merrell, P., Salmi, C., Sinha, S., Talton, B., Wang, L., Yang, Q., Stewénius, H., Yang, R., Welch, G., and Towles, H. 2008. Detailed real-time urban 3D reconstruction from video. Int. J. Comput. Vision 78, 2--3, 143--167. Google ScholarDigital Library
Quigley, M., Batra, S., Gould, S., Klingbeil, E., Le, Q., Wellman, A., and Ng, A. Y. 2009. High-accuracy 3d sensing for mobile manipulation: improving object detection and door opening. In Proceedings of the 2009 IEEE international conference on Robotics and Automation, 3604--3610. Google ScholarDigital Library
Schnabel, R., Wahl, R., and Klein, R. 2007. Efficient ransac for point-cloud shape detection. Computer Graphics Forum 26, 2, 214--226.Google ScholarCross Ref
Schnabel, R., Degener, P., and Klein, R. 2009. Completion and reconstruction with primitive shapes. Computer Graphics Forum (Proc. of Eurographics) 28, 2, 503--512.Google ScholarCross Ref
Shao, T., Xu, W., Zhou, K., Wang, J., Li, D., and Guo, B. 2012. An interactive approach to semantic modeling of indoor scenes with an rgbd camera. In ACM SIGGRAPH, "to appear". Google ScholarDigital Library
Shen, C.-H., Huang, S.-S., Fu, H., and Hu, S.-M. 2011. Adaptive partitioning of urban facades. In Proceedings of the 2011 SIGGRAPH Asia Conference, 184:1--184:10. Google ScholarDigital Library
Shotton, J., Johnson, M., and Cipolla, R. 2008. Semantic texton forests for image categorization and segmentation. In Int. Conf. Computer Vision and Pattern Recognition.Google Scholar
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., and Blake, A. 2011. Real-Time human pose recognition in parts from a single depth image. In CVPR. Google ScholarDigital Library
Silberman, N., and Fergus, R. 2011. Indoor scene segmentation using a structured light sensor. In Proc. of Int. Conf. on Comp. Vis.Google Scholar
Sinha, S. N., Steedly, D., Szeliski, R., Agrawala, M., and Pollefeys, M. 2008. Interactive 3D architectural modeling from unordered photo collections. ACM Trans. on Graphics 27, 5, 1--10. Google ScholarDigital Library
Ullman, S. 1996. High-Level Vision: Object Recognition and Visual Cognition. The MIT Press.Google Scholar
Viola, P., and Jones, M. J. 2004. Robust real-time face detection. Int. J. Comput. Vision 57, 137--154. Google ScholarDigital Library
Vosselman, G., Gorte, B. G. H., Sithole, G., and Rabbani, T. 2004. Recognising structure in laser scanner point clouds. Information Sciences, 1--6.Google Scholar
Werner, T., and Zisserman, A. 2002. New techniques for automated architecture reconstruction from photographs. In Proceedings of the 7th European Conference on Computer Vision, Copenhagen, Denmark, vol. 2, 541--555. Google ScholarDigital Library
Xiao, J., Fang, T., Tan, P., Zhao, P., Ofek, E., and Quan, L. 2008. Image-based façade modeling. ACM Trans. on Graphics 27, 5, 1--10. Google ScholarDigital Library
Xiong, X., and Huber, D. 2010. Using context to create semantic 3d models of indoor environments. In Proceedings of the British Machine Vision Conference, 45.1--45.11.Google Scholar
Xu, K., Li, H., Zhang, H., Cohen-Or, D., Xiong, Y., and Cheng, Z.-Q. 2010. Style-content separation by anisotropic part scales. In ACM SIGGRAPH Asia 2010 papers, 184:1--184:10. Google ScholarDigital Library
Xu, K., Zheng, H., Zhang, H., Cohen-Or, D., Liu, L., and Xiong, Y. 2011. Photo-inspired model-driven 3d object modeling. ACM Transactions on Graphics, (Proc. of SIGGRAPH 2011) 30, 4, to appear. Google ScholarDigital Library
Zheng, Q., Sharf, A., Wan, G., Li, Y., Mitra, N. J., Cohen-Or, D., and Chen, B. 2010. Non-local scan consolidation for 3d urban scenes. Proc. of ACM SIGGRAPH 29, 1--9. Google ScholarDigital Library

Index Terms

A search-classify approach for cluttered indoor scene understanding
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
      2. Computer vision tasks
        Scene understanding
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees

Recommendations

Understanding Indoor Scene: Spatial Layout Estimation, Scene Classification, and Object Detection
ICMSSP '18: Proceedings of the 3rd International Conference on Multimedia Systems and Signal Processing

In this paper, we seek to understand scene from different viewpoints such as estimating the spatial layout of indoor scenes, detecting objects in the scene and making scene classification. In the previous work, every step has been done in a separate ...
Read More
Indoor Scene Understanding with Geometric and Semantic Contexts

Truly understanding a scene involves integrating information at multiple levels as well as studying the interactions between scene elements. Individual object detectors, layout estimators and scene classifiers are powerful but ultimately confounded by ...
Read More
Manhattan Scene Understanding via XSlit Imaging
CVPR '13: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition

A Manhattan World (MW) is composed of planar surfaces and parallel lines aligned with three mutually orthogonal principal axes. Traditional MW understanding algorithms rely on geometry priors such as the vanishing points and reference (ground) planes ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Graphics Volume 31, Issue 6
November 2012
794 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/2366145
Issue’s Table of Contents

Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 November 2012
Published in tog Volume 31, Issue 6

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
point cloud classification
reconstruction
scene understanding
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 181
  Total Citations
  View Citations
- 1,591
  Total Downloads
- Downloads (Last 12 months)61
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A search-classify approach for cluttered indoor scene understanding

ACM Transactions on Graphics

Abstract

References

Cited By

Index Terms

Recommendations

Understanding Indoor Scene: Spatial Layout Estimation, Scene Classification, and Object Detection

Indoor Scene Understanding with Geometric and Semantic Contexts

Manhattan Scene Understanding via XSlit Imaging

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A search-classify approach for cluttered indoor scene understanding

ACM Transactions on Graphics

Abstract

References

Cited By

Index Terms

Recommendations

Understanding Indoor Scene: Spatial Layout Estimation, Scene Classification, and Object Detection

Indoor Scene Understanding with Geometric and Semantic Contexts

Manhattan Scene Understanding via XSlit Imaging

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media