Abstract
We propose a real-time approach for indoor scene reconstruction. It is capable of producing a ready-to-use 3D geometric model even while the user is still scanning the environment with a consumer depth camera. Our approach features explicit representations of planar regions and nonplanar objects extracted from the noisy feed of the depth camera, via an online structure analysis on the dynamic, incomplete data. The structural information is incorporated into the volumetric representation of the scene, resulting in a seamless integration with KinectFusion's global data structure and an efficient implementation of the whole reconstruction process. Moreover, heuristics based on rectilinear shapes in typical indoor scenes effectively eliminate camera tracking drift and further improve reconstruction accuracy. The instantaneous feedback enabled by our on-the-fly structure analysis, including repeated object recognition, allows the user to selectively scan the scene and produce high-fidelity large-scale models efficiently. We demonstrate the capability of our system with real-life examples.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Online Structure Analysis for Real-Time Indoor Scene Reconstruction
- M. Arikan, M. Schwärzler, S. Flöry, M. Wimmer, and S. Maierhofer. 2013. O-snap: Optimization-based snapping for modeling architecture. ACM Trans. Graph. 32, 1, 6:1--6:15. Google ScholarDigital Library
- E. Ataer-Cansizoglu, Y. Taguchi, S. Ramalingam, and T. Garaas. 2013. Tracking an RGB-D camera using points and planes. In Proceedings of the International Conference on Computer Vision Workshop (ICCVW'13). 51--58. Google ScholarDigital Library
- D. H. Ballard. 1981. Strip trees: A hierarchical representation for curves. Comm. ACM 24, 5, 310--321. Google ScholarDigital Library
- P. Biber and W. Strasser. 2003. The normal distribution transform: A new approach to laser scan matching. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'03). 2743--2748.Google Scholar
- J. Biswas and M. Veloso. 2012. Planar polygon extraction and merging from depth images. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'12). 3859--3864.Google Scholar
- J. Chen, D. Bautembach, and S. Izadi. 2013. Scalable real-time volumetric surface reconstruction. ACM Trans. Graph. 32, 4, 113:1--113:16. Google ScholarDigital Library
- D. Cohen-Steiner, P. Alliez, and M. Desbrun. 2004. Variational shape approximation. ACM Trans. Graph. 23, 3, 905--914. Google ScholarDigital Library
- M. Dou, L. Guan, J.-M. Frahm, and H. Fuchs. 2013. Exploring high-level plane primitives for indoor 3D reconstruction with a hand-held RGB-D camera. In Proceedings of the Conference on Computer Vision Workshops (ACCV'12). Vol. 7729. 94--108. Google ScholarDigital Library
- D. H. Douglas and T. K. Peucker. 2011. Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. In Classics in Cartography. John Wiley and Sons, 15--28.Google Scholar
- H. Du, P. Henry, X. Ren, M. Cheng, D. B. Goldman, S. M. Seitz, and D. Fox. 2011. Interactive 3D modeling of indoor environments with a consumer depth camera.In Proceedings of the 13th International Conference on Ubiquitous Computing (UbiComp'11). 75--84. Google ScholarDigital Library
- C.-S. Fahn, J.-F. Wang, and J.-Y. Lee. 1989. An adaptive reduction procedure for the piecewise linear approximation of digitized curves. IEEE Trans. Pattern Anal. Mach. Intell. 11, 9, 967--973. Google ScholarDigital Library
- Y. Furukawa, B. Curless, S. M. Seitz, and R. Szeliski. 2009. Manhattan-world stereo. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09). 1422--1429.Google Scholar
- F. Glover and M. Laguna. 1997. Tabu Search. Kluwer Academic. Google ScholarDigital Library
- R. Hulik, M. Spanel, P. Smrz, and Z. Materna. 2014. Continuous plane detection in point-cloud data based on 3D Hough transform. J. Vis. Comm. Image Represent. 25, 1, 86--97. Google ScholarDigital Library
- S. Izadi, D. Kim, O. Hilliges, D. Molyneaux, R. Newcombe, P. Kohli, J. Shotton, S. Hodges, D. Freeman, A. Davison, and A. Fitzgibbon. 2011. KinectFusion: Real-time 3D reconstruction and interaction using a moving depth camera. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (UIST'11). 559--568. Google ScholarDigital Library
- M. Kazhdan, M. Bolitho, and H. Hoppe. 2006. Poisson surface reconstruction. In Proceedings of the 4th Eurographics Symposium on Geometry Processing (SGP'06). 61--70. Google ScholarDigital Library
- K. Khoshelham and S. O. Elberink. 2012. Accuracy and resolution of Kinect depth data for indoor mapping applications. Sensors 12, 2, 1437--1454.Google ScholarCross Ref
- Y. M. Kim, N. J. Mitra, D.-M. Yan, and L. Guibas. 2012. Acquiring 3D indoor environments with variability and repetition. ACM Trans. Graph. 31, 6, 138. Google ScholarDigital Library
- A. Kolesnikov. 2003. Efficient algorithms for vectorization and polygonal approximation. University of Joensuu. http://www.cs.joensuu.fi/∼koles/dissertation/Thesis_Kolesnikov_Ch0.pdf.Google Scholar
- L. J. Latecki and R. Lakmper. 1999. Convexity rule for shape decomposition based on discrete contour evolution. Comput. Vis. Image Understand. 73, 441--454. Google ScholarDigital Library
- D. C. Lee, A. Gupta, M. Hebert, and T. Kanade. 2010. Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In Proceedings of the Conference on Neural Information Processing Systems (NIPS'10). 1288--1296.Google Scholar
- D. C. Lee, M. Hebert, and T. Kanade. 2009. Geometric reasoning for single image structure recovery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09). 2136--2143.Google Scholar
- T.-K. Lee, S. Lim, S. Lee, S. An, and S.-Y. Oh. 2012. Indoor mapping using planes extracted from noisy RGB-D sensors. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'12). 1727--1733.Google ScholarCross Ref
- Y. Li, X. Wu, Y. Chrysathou, A. Sharf, D. Cohen-Or, and N. J. Mitra. 2011. GlobFit: Consistently fitting primitives by discovering global relations. ACM Trans. Graph. 30, 4, 52:1--52:12. Google ScholarDigital Library
- O. Mattausch, D. Panozzo, C. Mura, O. Sorkine-Hornung, and R. Pajarola. 2014. Object detection and classification from large-scale cluttered indoor scans. Comput. Graph. Forum 33, 2, 11--21. Google ScholarDigital Library
- L. Nan, K. Xie, and A. Sharf. 2012. A search-classify approach for cluttered indoor scene understanding. ACM Trans. Graph. 31, 6, 137. Google ScholarDigital Library
- R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohli, J. Shotton, S. Hodges, and A. W. Fitzgibbon. 2011. KinectFusion: Real-time dense surface mapping and tracking. In Proceedings of the 10th IEEE International Symposium on Mixed and Augmented Reality (ISMAR'11). 127--136. Google ScholarDigital Library
- M. Niessner, M. Zollhöfer, S. Izadi, and M. Stamminger. 2013. Real-time 3D reconstruction at scale using voxel hashing. ACM Trans. Graph. 32, 6, 169:1--169:11. Google ScholarDigital Library
- L. D. Pero, J. Bowdish, D. Fried, B. Kermgard, E. Hartley, and K. Barnard. 2012. Bayesian geometric modeling of indoor scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'12). 2719--2726. Google ScholarDigital Library
- I. Reisner-Kollmann, S. Maierhofer, and W. Purgathofer. 2013. Reconstruction shape boundaries with multimodal constraints. Comput. Graph. 37, 3, 137--147. Google ScholarDigital Library
- H. Roth and M. Vona. 2012. Moving volume KinectFusion. In Proceedings of the British Machine Vision Conference (BMVC'12). 1--11.Google Scholar
- R. F. Salas-Moreno, R. A. Newcombe, H. Strasdat, P. H. J. Kelly, and A. J. Davison. 2013. Slam++: Simultaneous localization and mapping at the level of objects. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'13). 1352--1359. Google ScholarDigital Library
- R. Schnabel, R. Wahl, and R. Klein. 2007. Efficient ransac for point-cloud shape detection. Comput. Graph. Forum 26, 2, 214--226.Google ScholarCross Ref
- T. Shao, W. Xu, K. Zhou, J. Wang, D. Li, and B. Guo. 2012. An imteractive approach to semantic modeling of indoor scenes with an RGBD camera. ACM Trans. Graph. 31, 6, 136:1--136:11. Google ScholarDigital Library
- N. Silberman, L. Shapira, R. Gal, and P. Kohli. 2014. A contour completion model for augmenting surface reconstruction. In Proceedings of the European Conference on Computer Vision (ECCV'14). Springer, 488--503.Google Scholar
- F. Steinbrucker, C. Kerl, D. Cremers, and J. Sturm 2013. Large-scale multiresolution surface reconstruction from RGB-D sequences. In Proceedings of the International Conference on Computer Vision (ICCV'13). 3264--3271. Google ScholarDigital Library
- Y.-N. Sun and S.-C. Huang. 2000. Genetic algorithms for error-bounded polygonal approximation. Int. J. Pattern Recogn. Artif. Intell. 14, 3, 297--314.Google ScholarCross Ref
- Y. Taguchi, Y.-D. Jian, S. Ramalingam, and C. Feng. 2013. Point-plane SLAM for hand-held 3D sensors. In Proceedings of the International Conference on Robotics and Automation (ICRA'13). 5182--5189.Google Scholar
- M. Tomono. 2012. Image-based planar reconstruction for dense robotic mapping. In Proceedings of the International Conference on Robotics and Automation (ICRA'12). 3005--3012.Google ScholarCross Ref
- T. Whelan, H. Johannsson, M. Kaess, J. J. Leonard, and J. B. McDonald. 2012. Robust tracking for real-time dense RGB-D mapping with Kintinous. Tech. rep. http://dspace.mit.edu/handle/1721.1/73167.Google Scholar
- Q.-Y. Zhou and V. Koltun. 2013. Dense scene reconstruction with points of interest. ACM Trans. Graph. 32, 4, 112:1--112:8. Google ScholarDigital Library
- Q.-Y. Zhou and V. Koltun. 2014a. Color map optimization for 3D reconstruction with consumer depth cameras. ACM Trans. Graph. 33, 4, 155:1--155:10. Google ScholarDigital Library
- Q.-Y. Zhou and V. Koltun. 2014b. Simultaneous localization and calibration: Self-calibration of consumer depth cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'14). Google ScholarDigital Library
Index Terms
- Online Structure Analysis for Real-Time Indoor Scene Reconstruction
Recommendations
Planar Structure Detection for Online Reconstructed Indoor Scene
ICCCV '18: Proceedings of the 1st International Conference on Control and Computer VisionOnline scanning and reconstruction with RGBD Video technology has greatly improved real-time reconstruction of indoor scenes with low-cost depth camera. However, due to the limitation of depth resolution of current depth cameras, the quality of ...
Understanding Indoor Scene: Spatial Layout Estimation, Scene Classification, and Object Detection
ICMSSP '18: Proceedings of the 3rd International Conference on Multimedia Systems and Signal ProcessingIn this paper, we seek to understand scene from different viewpoints such as estimating the spatial layout of indoor scenes, detecting objects in the scene and making scene classification. In the previous work, every step has been done in a separate ...
Real-Time RGBD Reconstruction Using Structural Constraint for Indoor AR
Advances in Multimedia Information Processing – PCM 2018AbstractRGBD-based 3D indoor scene reconstruction has been paid much attention due to the advantage of consumer depth camera. It is significant for many interactive application, especially in augmented reality. At present, the AR system mainly focus on ...
Comments