Abstract
Detailed scanning of indoor scenes is tedious for humans. We propose autonomous scene scanning by a robot to relieve humans from such a laborious task. In an autonomous setting, detailed scene acquisition is inevitably coupled with scene analysis at the required level of detail. We develop a framework for object-level scene reconstruction coupled with object-centric scene analysis. As a result, the autoscanning and reconstruction will be object-aware, guided by the object analysis. The analysis is, in turn, gradually improved with progressively increased object-wise data fidelity. In realizing such a framework, we drive the robot to execute an iterative analyze-and-validate algorithm which interleaves between object analysis and guided validations.
The object analysis incorporates online learning into a robust graph-cut based segmentation framework, achieving a global update of object-level segmentation based on the knowledge gained from robot-operated local validation. Based on the current analysis, the robot performs proactive validation over the scene with physical push and scan refinement, aiming at reducing the uncertainty of both object-level segmentation and object-wise reconstruction. We propose a joint entropy to measure such uncertainty based on segmentation confidence and reconstruction quality, and formulate the selection of validation actions as a maximum information gain problem. The output of our system is a reconstructed scene with both object extraction and object-wise geometry fidelity.
Supplemental Material
Available for Download
Supplemental files.
- Allen, P. K. 1988. Integrating vision and touch for object recognition tasks. Int. J. Robotics Research 7, 6, 1533. Google ScholarDigital Library
- Bach, F., Lanckriet, G., and Jordan, M. 2004. Multiple kernel learning, conic duality, and the smo algorithm. In Proc. ICML, 1--6. Google ScholarDigital Library
- Berger, M., Tagliasacchi, A., Seversky, L. M., Alliez, P., Levine, J. A., Sharf, A., and Silva, C. 2014. State of the art in surface reconstruction from point clouds. Eurographics STAR, 165--185.Google Scholar
- Bersch, C., Pangercic, D., Osentoski, S., Hausman, K., Marton, Z.-C., Ueda, R., Okada, K., and Beetz, M. 2012. Segmentation of cluttered scenes through interactive perception. In RSS Workshop on Robots in Clutter: Manipulation, Perception and Navigation in Human Environments.Google Scholar
- Callieri, M., Fasano, A., Impoco, G., Cignoni, P., Scopigno, R., Parrini, G., and Biagini, G. 2004. Roboscan: an automatic system for accurate and unattended 3D scanning. In Proc. of 3DPVT, 805--812. Google ScholarDigital Library
- Chen, S., Li, Y., and Kwok, N. M. 2011. Active vision in robotic systems: A survey of recent developments. Int. J. Robotics Research 30, 11, 1343--1377. Google ScholarDigital Library
- Chen, J., Bautembach, D., and Izadi, S. 2013. Scalable real-time volumetric surface reconstruction. ACM Trans. on Graph. (SIGGRAPH) 32, 4, 113:1--113:16. Google ScholarDigital Library
- Chen, X., Golovinskiy, A., and Funkhouser, T. 2013. A benchmark for 3D mesh segmentation. ACM Trans. on Graph. (SIGGRAPH) 28, 3, 73:1--73:12. Google ScholarDigital Library
- Chen, K., Lai, Y.-K., Wu, Y.-X., Martin, R., and Hu, S.-M. 2014. Automatic semantic modeling of indoor scenes from low-quality rgb-d data using contextual information. ACM Trans. on Graph. (SIGGRAPH Asia) 33, 6, 208:1--208:15. Google ScholarDigital Library
- Cover, T., and Thomas, J. 1991. Elements of Information Theory. Wiley. Google ScholarDigital Library
- Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., and Singer, Y. 2006. Online passive-aggressive algorithms. J. Mach. Learn. Res. 7 (Dec.), 551--585. Google ScholarDigital Library
- Curless, B., and Levoy, M. 1996. A volumetric method for building complex models from range images. In Proc. of SIGGRAPH, 303--312. Google ScholarDigital Library
- Fisher, M., Savva, M., and Hanrahan, P. 2011. Characterizing structural relationships in scenes using graph kernels. ACM Trans. on Graph. (SIGGRAPH) 30, 4, 34:1--34:11. Google ScholarDigital Library
- Fisher, M., Ritchie, D., Savva, M., Funkhouser, T., and Hanrahan, P. 2012. Example-based synthesis of 3D object arrangements. ACM Trans. on Graph. (SIGGRAPH Asia) 31, 6, 135:1--135:11. Google ScholarDigital Library
- Foster, R. B., Wang, R., and Grupen, R. 2011. A mobile robot for autonomous scene capture and rendering. UMass Technical Report UM-CS-2011-019.Google Scholar
- Golovinskiy, A., Kim, V. G., and Funkhouser, T. A. 2009. Shape-based recognition of 3D point clouds in urban environments. In Proc. ICCV, 2154--2161.Google Scholar
- Gupta, S., Arbelaez, P., and Malik, J. 2013. Perceptual organization and recognition of indoor scenes from RGB-D images. In Proc. CVPR, 564--571. Google ScholarDigital Library
- Hausman, K., Balint-Benczedi, F., Pangercic, D., Marton, Z.-C., Ueda, R., Okada, K., and Beetz, M. 2013. Tracking-based interactive segmentation of textureless objects. In Proc. ICRA, 1122--1129.Google Scholar
- Hedau, V., Hoiem, D., and Forsyth, D. 2010. Thinking inside the box: Using appearance models and context based on room geometry. In Proc. ECCV. 224--237. Google ScholarDigital Library
- Herbst, E., Henry, P., and Fox, D. 2014. Toward online 3-D object segmentation and mapping. In Proc. ICRA, 3193--3200.Google Scholar
- Jiang, Y., and Saxena, A. 2013. Hallucinating humans for learning robotic placement of objects. In Proc. Experimental Robotics, 921--937.Google Scholar
- Katz, S., and Tal, A. 2003. Hierarchical mesh decomposition using fuzzy clustering and cuts. ACM Trans. on Graph. (SIGGRAPH) 22, 3, 954--961. Google ScholarDigital Library
- Khalfaoui, S., Seulin, R., Fougerolle, Y., and Fofi, D. 2013. An efficient method for fully automatic 3D digitization of unknown objects. Computers in Industry 64, 9, 1152--1160. Google ScholarDigital Library
- Kim, Y. M., Mitra, N. J., Yan, D.-M., and Guibas, L. 2012. Acquiring 3D indoor environments with variability and repetition. ACM Trans. on Graph. (SIGGRAPH Asia) 31, 6, 138:1--138:11. Google ScholarDigital Library
- Levandowsky, M., and Winter, D. 1971. Distance between sets. Nature 234, 5, 34--35.Google ScholarCross Ref
- Li, Y., Dai, A., Guibas, L., and Niessner, M. 2015. Database-assisted object retrieval for real-time 3D reconstruction. Computer Graphics Forum (Eurographics) 34, 2. Google ScholarDigital Library
- Liu, T., Chaudhuri, S., Kim, V. G., Huang, Q., Mitra, N. J., and Funkhouser, T. 2014. Creating consistent scene graphs using a probabilistic grammar. ACM Trans. on Graph. (SIGGRAPH Asia) 33, 6, 211:1--211:12. Google ScholarDigital Library
- Mattausch, O., Panozzo, D., Mura, C., Sorkine-Hornung, O., and Pajarola, R. 2014. Object detection and classification from large-scale cluttered indoor scans. Computer Graphics Forum (Eurographics) 33, 2. Google ScholarDigital Library
- Nan, L., Xie, K., and Sharf, A. 2012. A search-classify approach for cluttered indoor scene understanding. ACM Trans. on Graph. (SIGGRAPH Asia) 31, 6, 137:1--137:10. Google ScholarDigital Library
- Newcombe, R. A., Davison, A. J., Izadi, S., Kohli, P., Hilliges, O., Shotton, J., Molyneaux, D., Hodges, S., Kim, D., and Fitzgibbon, A. 2011. KinectFusion: Real-time dense surface mapping and tracking. In Proc. IEEE Int. Symp. on Mixed and Augmented Reality, 127--136. Google ScholarDigital Library
- Niessner, M., Zollhöfer, M., Izadi, S., and Stamminger, M. 2013. Real-time 3D reconstruction at scale using voxel hashing. ACM Trans. on Graph. (SIGGRAPH Asia) 32, 6, 169:1--169:11. Google ScholarDigital Library
- Papon, J., Abramov, A., Schoeler, M., and Wörgötter, F. 2013. Voxel cloud connectivity segmentation - supervoxels for point clouds. In Proc. CVPR, 2027--2034. Google ScholarDigital Library
- Prisacariu, V. A., Kähler, O., Cheng, M. M., Valentin, J., Torr, P. H. S., Reid, I. D., and Murray, D. W. 2014. A framework for the volumetric integration of depth images. ArXiv e-prints, 1410.0925.Google Scholar
- ROS, 2014. ROS Wiki. http://wiki.ros.org/.Google Scholar
- Roth, H., and Vona, M. 2012. Moving volume KinectFusion. In Proc. BMVC, 112:1--112:11.Google Scholar
- Salas-Moreno, R. F., Newcombe, R. A., Strasdat, H., Kelly, P. H. J., and Davison, A. J. 2012. SLAM++: Simultaneous localisation and mapping at the level of objects. In CVPR, 1352--1359. Google ScholarDigital Library
- Savva, M., Chang, A. X., Hanrahan, P., Fisher, M., and Niessner, M. 2014. Scenegrok: Inferring action maps in 3D environments. ACM Trans. on Graph. (SIGGRAPH Asia) 33, 6. Google ScholarDigital Library
- Schnabel, R., Wahl, R., and Klein, R. 2007. Efficient RANSAC for point-cloud shape detection. Computer Graphics Forum 26, 2, 214--226.Google ScholarCross Ref
- Shao, T., Xu, W., Zhou, K., Wang, J., Li, D., and Guo, B. 2012. An interactive approach to semantic modeling of indoor scenes with an RGBD camera. ACM Trans. on Graph. (SIGGRAPH Asia) 31, 6, 136:1--136:11. Google ScholarDigital Library
- Silberman, N., Kohli, P., Hoiem, D., and Fergus, R. 2012. Indoor segmentation and support inference from RGBD images. In Proc. ECCV, 746--760. Google ScholarDigital Library
- Valentin, J., Vineet, V., Cheng, M.-M., Kim, D., Shotton, J., Kohli, P., Niessner, M., Criminisi, A., Izadi, S., and Torr, P. 2015. SemanticPaint: Interactive 3D labeling and learning at your finger tips. ACM Trans. on Graph., to appear. Google ScholarDigital Library
- Wagner, R., Frese, U., and Buml, B. 2013. Real-time dense multi-scale workspace modeling on a humanoid robot. In Proc. IROS, 5164--5171.Google Scholar
- Whelan, T., Kaess, M., Fallon, M., Johannsson, H., Leonard, J., and McDonald, J. 2012. Kintinuous: Spatially extended KinectFusion. In RSS Workshop on RGB-D: Advanced Reasoning with Depth Cameras.Google Scholar
- Wu, S., Sun, W., Long, P., Huang, H., Cohen-Or, D., Gong, M., Deussen, O., and Chen, B. 2014. Quality-driven poisson-guided autoscanning. ACM Trans. on Graph. (SIGGRAPH Asia) 33, 6, 203:1--203:12. Google ScholarDigital Library
- Zhang, Y., Xu, W., Tong, Y., and Zhou, K. 2014. Online structure analysis for real-time indoor scene reconstruction. ACM Trans. on Graph.. Google ScholarDigital Library
- Zhou, Q.-Y., and Koltun, V. 2013. Dense scene reconstruction with points of interest. ACM Trans. on Graph. (SIGGRAPH) 32, 4, 112:1--112:8. Google ScholarDigital Library
Index Terms
- Autoscanning for coupled scene reconstruction and proactive object analysis
Recommendations
A divide-and-conquer approach to large scene reconstruction with interactive scene analysis and segmentation
VRCAI '13: Proceedings of the 12th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and Its Applications in Industry3D reconstruction of real world objects is a fundamental problem in computer vision and computer graphics. It is a challenge for high fidelity reconstruction with consumer-grade depth camera, e.g. Microsoft Kinect on large scale scene. A divide-and-...
Simultaneous Scene Reconstruction and Auto-Calibration Using Constrained Iterative Closest Point for 3D Depth Sensor Array
CRV '15: Proceedings of the 2015 12th Conference on Computer and Robot VisionBeing able to monitor a large area is essential for intelligent warehouse automation. Complete depth map of aslant floor allows Automated Guided Vehicles (AGV) to navigate the environment and safely interact with nearby people and equipment, eliminating ...
Real-time High-accuracy Three-Dimensional Reconstruction with Consumer RGB-D Cameras
We present an integrated approach for reconstructing high-fidelity three-dimensional (3D) models using consumer RGB-D cameras. RGB-D registration and reconstruction algorithms are prone to errors from scanning noise, making it hard to perform 3D ...
Comments