skip to main content
research-article

A search-classify approach for cluttered indoor scene understanding

Published:01 November 2012Publication History
Skip Abstract Section

Abstract

We present an algorithm for recognition and reconstruction of scanned 3D indoor scenes. 3D indoor reconstruction is particularly challenging due to object interferences, occlusions and overlapping which yield incomplete yet very complex scene arrangements. Since it is hard to assemble scanned segments into complete models, traditional methods for object recognition and reconstruction would be inefficient. We present a search-classify approach which interleaves segmentation and classification in an iterative manner. Using a robust classifier we traverse the scene and gradually propagate classification information. We reinforce classification by a template fitting step which yields a scene reconstruction. We deform-to-fit templates to classified objects to resolve classification ambiguities. The resulting reconstruction is an approximation which captures the general scene arrangement. Our results demonstrate successful classification and reconstruction of cluttered indoor scenes, captured in just few minutes.

References

  1. Anguelov, D., Taskar, B., Chatalbashev, V., Koller, D., Gupta, D., Heitz, G., and Ng, A. 2005. Discriminative learning of markov random fields for segmentation of 3d scan data. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02, CVPR '05, 169--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Belongie, S., Malik, J., and Puzicha, J. 2002. Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24, 509--522. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Belongie, S., Malik, J., and Puzicha, J. 2002. Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24 (April), 509--522. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bochkanov, S., 2012. Alglib library. http://www.alglib.net/.Google ScholarGoogle Scholar
  5. Breiman, L. 2001. Random forests. Mach. Learn. 45, 5--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Dick, A. R., Torr, P. H. S., and Cipolla, R. 2004. Modelling and interpretation of architecture from several images. Int. J. Comput. Vision 60, 2, 111--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Fei-Fei, L., Fergus, R., and Perona, P. 2007. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Comput. Vis. Image Underst. 106, 59--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Fisher, M., and Hanrahan, P. 2010. Context-based search for 3d models. In ACM SIGGRAPH Asia 2010 papers, 182:1--182:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Fisher, M., Savva, M., and Hanrahan, P. 2011. Characterizing structural relationships in scenes using graph kernels. ACM Trans. Graph., 34:1--34:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Frome, A., Huber, D., Kolluri, R., and Bülow, T. 2004. Recognizing objects in range data using regional point descriptors. In ECCV, 224--237.Google ScholarGoogle Scholar
  11. Frome, A., Huber, D., Kolluri, R., Bulow, T., and Malik, J. 2004. Recognizing objects in range data using regional point descriptors. In Proceedings of the European Conference on Computer Vision (ECCV).Google ScholarGoogle Scholar
  12. Fu, H., Cohen-Or, D., Dror, G., and Sheffer, A. 2008. Upright orientation of man-made objects. In ACM SIGGRAPH 2008, 42:1--42:7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Furukawa, Y., Curless, B., Seitz, S. M., and Szeliski, R. 2009. Reconstructing building interiors from images.Google ScholarGoogle Scholar
  14. Gal, R., Shamir, A., Hassner, T., Pauly, M., and Cohen-Or, D. 2007. Surface reconstruction using local shape priors. In Proc. of Eurographics Symp. on Geometry Processing, 253--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Galleguillos, C., Rabinovich, A., and Belongie, S. 2008. Object categorization using co-occurrence, location and appearance. IEEE Conference on Computer Vision and Pattern Recognition (2008), 1--8.Google ScholarGoogle Scholar
  16. Goesele, M., Snavely, N., Curless, B., Hoppe, H., and Seitz, S. 2007. Multi-view stereo for community photo collections. In Proc. of Int. Conf. on Comp. Vis., 1--8.Google ScholarGoogle Scholar
  17. Golovinskiy, A., Kim, V. G., and Funkhouser, T. 2009. Shape-based recognition of 3D point clouds in urban environments. International Conference on Computer Vision (ICCV) (Sept.).Google ScholarGoogle Scholar
  18. Hedau, V., Hoiem, D., and Forsyth, D. 2010. Thinking inside the box: using appearance models and context based on room geometry. In Proc. Euro. Conf. on Comp. Vis., 224--237. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Johnson, A. E., and Hebert, M. 1999. Using spin images for efficient object recognition in cluttered 3d scenes. IEEE Trans. Pattern Anal. Mach. Intell. 21 (May), 433--449. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kim, Y. M., Mitra, N. J., Yan, D., and Guibas, L. 2012. Acquiring 3d indoor environments with variability and repetition. In ACM SIGGRAPH, "to appear". Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Koppula, H. S., Anand, A., Joachims, T., and Saxena, A. 2011. Semantic labeling of 3d point clouds for indoor scenes. In NIPS, 244--252.Google ScholarGoogle Scholar
  22. Lai, K., and Fox, D. 2010. Object recognition in 3d point clouds using web data and domain adaptation. International Journal of Robotics Research 29, 1019--1037. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Lai, K., Bo, L., Ren, X., and Fox, D. 2011. A large-scale hierarchical multi-view rgb-d object dataset. 2011 IEEE International Conference on Robotics and Automation, 1817--1824.Google ScholarGoogle Scholar
  24. Li, Y., Wu, X., Chrysathou, Y., Sharf, A., Cohen-Or, D., and Mitra, N. J. 2011. Globfit: consistently fitting primitives by discovering global relations. In ACM SIGGRAPH, 52:1--52:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Livny, Y., Yan, F., Olson, M., Chen, B., Zhang, H., and El-Sana, J. 2010. Automatic reconstruction of tree skeletal structures from point clouds. ACM Trans. Graph. 29, 151:1--151:8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 91--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Matei, B., Shan, Y., Sawhney, H. S., Tan, Y., Kumar, R., Huber, D., and Hebert, M. 2006. Rapid object indexing using locality sensitive hashing and joint 3d-signature space estimation. IEEE Trans. Pattern Anal. Mach. Intell. 28, 1111--1126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Munoz, D., Bagnell, J. A., Vandapel, N., and Hebert, M. 2009. Contextual classification with functional max-margin markov networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  29. Nan, L., Sharf, A., Zhang, H., Cohen-Or, D., and Chen, B. 2010. Smartboxes for interactive urban reconstruction. Proc. of ACM SIGGRAPH 29, 4, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Pollefeys, M., Nistér, D., Frahm, J. M., Akbarzadeh, A., Mordohai, P., Clipp, B., Engels, C., Gallup, D., Kim, S. J., Merrell, P., Salmi, C., Sinha, S., Talton, B., Wang, L., Yang, Q., Stewénius, H., Yang, R., Welch, G., and Towles, H. 2008. Detailed real-time urban 3D reconstruction from video. Int. J. Comput. Vision 78, 2--3, 143--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Quigley, M., Batra, S., Gould, S., Klingbeil, E., Le, Q., Wellman, A., and Ng, A. Y. 2009. High-accuracy 3d sensing for mobile manipulation: improving object detection and door opening. In Proceedings of the 2009 IEEE international conference on Robotics and Automation, 3604--3610. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Schnabel, R., Wahl, R., and Klein, R. 2007. Efficient ransac for point-cloud shape detection. Computer Graphics Forum 26, 2, 214--226.Google ScholarGoogle ScholarCross RefCross Ref
  33. Schnabel, R., Degener, P., and Klein, R. 2009. Completion and reconstruction with primitive shapes. Computer Graphics Forum (Proc. of Eurographics) 28, 2, 503--512.Google ScholarGoogle ScholarCross RefCross Ref
  34. Shao, T., Xu, W., Zhou, K., Wang, J., Li, D., and Guo, B. 2012. An interactive approach to semantic modeling of indoor scenes with an rgbd camera. In ACM SIGGRAPH, "to appear". Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Shen, C.-H., Huang, S.-S., Fu, H., and Hu, S.-M. 2011. Adaptive partitioning of urban facades. In Proceedings of the 2011 SIGGRAPH Asia Conference, 184:1--184:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Shotton, J., Johnson, M., and Cipolla, R. 2008. Semantic texton forests for image categorization and segmentation. In Int. Conf. Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  37. Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., and Blake, A. 2011. Real-Time human pose recognition in parts from a single depth image. In CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Silberman, N., and Fergus, R. 2011. Indoor scene segmentation using a structured light sensor. In Proc. of Int. Conf. on Comp. Vis.Google ScholarGoogle Scholar
  39. Sinha, S. N., Steedly, D., Szeliski, R., Agrawala, M., and Pollefeys, M. 2008. Interactive 3D architectural modeling from unordered photo collections. ACM Trans. on Graphics 27, 5, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Ullman, S. 1996. High-Level Vision: Object Recognition and Visual Cognition. The MIT Press.Google ScholarGoogle Scholar
  41. Viola, P., and Jones, M. J. 2004. Robust real-time face detection. Int. J. Comput. Vision 57, 137--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Vosselman, G., Gorte, B. G. H., Sithole, G., and Rabbani, T. 2004. Recognising structure in laser scanner point clouds. Information Sciences, 1--6.Google ScholarGoogle Scholar
  43. Werner, T., and Zisserman, A. 2002. New techniques for automated architecture reconstruction from photographs. In Proceedings of the 7th European Conference on Computer Vision, Copenhagen, Denmark, vol. 2, 541--555. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Xiao, J., Fang, T., Tan, P., Zhao, P., Ofek, E., and Quan, L. 2008. Image-based façade modeling. ACM Trans. on Graphics 27, 5, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Xiong, X., and Huber, D. 2010. Using context to create semantic 3d models of indoor environments. In Proceedings of the British Machine Vision Conference, 45.1--45.11.Google ScholarGoogle Scholar
  46. Xu, K., Li, H., Zhang, H., Cohen-Or, D., Xiong, Y., and Cheng, Z.-Q. 2010. Style-content separation by anisotropic part scales. In ACM SIGGRAPH Asia 2010 papers, 184:1--184:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Xu, K., Zheng, H., Zhang, H., Cohen-Or, D., Liu, L., and Xiong, Y. 2011. Photo-inspired model-driven 3d object modeling. ACM Transactions on Graphics, (Proc. of SIGGRAPH 2011) 30, 4, to appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Zheng, Q., Sharf, A., Wan, G., Li, Y., Mitra, N. J., Cohen-Or, D., and Chen, B. 2010. Non-local scan consolidation for 3d urban scenes. Proc. of ACM SIGGRAPH 29, 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A search-classify approach for cluttered indoor scene understanding

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Graphics
            ACM Transactions on Graphics  Volume 31, Issue 6
            November 2012
            794 pages
            ISSN:0730-0301
            EISSN:1557-7368
            DOI:10.1145/2366145
            Issue’s Table of Contents

            Copyright © 2012 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 November 2012
            Published in tog Volume 31, Issue 6

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader