nach oben

International Journal of Computer Vision

Erschienen in:

01.05.2014

Putting the User in the Loop for Image-Based Modeling

verfasst von: Adarsh Kowdle, Yao-Jen Chang, Andrew Gallagher, Dhruv Batra, Tsuhan Chen

Erschienen in: International Journal of Computer Vision | Ausgabe 1-2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We refer to the task of recovering the 3D structure of an object or a scene using 2D images as image-based modeling. In this paper, we formulate the task of recovering the 3D structure as a discrete optimization problem solved via energy minimization. In this standard framework of a Markov random field (MRF) defined over the image we present algorithms that allow the user to intuitively interact with the algorithm. We introduce an algorithm where the user guides the process of image-based modeling to find and model the object of interest by manually interacting with the nodes of the graph. We develop end user applications using this algorithm that allow object of interest 3D modeling on a mobile device and 3D printing of the object of interest. We also propose an alternate active learning algorithm that guides the user input. An initial attempt is made at reconstructing the scene without supervision. Given the reconstruction, an active learning algorithm uses intuitive cues to quantify the uncertainty of the algorithm and suggest regions, querying the user to provide support for the uncertain regions via simple scribbles. These constraints are used to update the unary and the pairwise energies that, when solved, lead to better reconstructions. We show through machine experiments and a user study that the proposed approach intelligently queries the users for constraints, and users achieve better reconstructions of the scene faster, especially for scenes with textureless surfaces lacking strong textural or structural cues that algorithms typically require.

Vorheriger Artikel The Ignorant Led by the Blind: A Hybrid Human–Machine Vision System for Fine-Grained Categorization

Nächster Artikel An Interactive Approach to Solving Correspondence Problems

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Superpixels are used to help reduce computational complexity.

We use mean-shift segmentation (Comaniciu and Meer 2002) to break an image to about thousand superpixels.

The parameter \(\lambda \) is set to 0.5.

http://chenlab.ece.cornell.edu/projects/Interactive_3D.

We use graph based segmentation (Felzenszwalb and Huttenlocher 2004) to break each image down to about 400 superpixels.

http://chenlab.ece.cornell.edu/projects/ActiveLearningFor3D.

http://chenlab.ece.cornell.edu/projects/iModel.

The 3D printouts were obtained using the online service http://www.shapeways.com.

Bagon, S. (2006). Matlab wrapper for graph cut. http://www.wisdom.weizmann.ac.il/bagon. Accessed 7 March 2013.

Bartoli, A. (2007). A random sampling strategy for piecewise planar scene segmentation. Cardiac and Vascular Institute of Ultrasound, 105(1), 42–59.

Batra, D., Kowdle, A., Parikh, D., Luo, J., & Chen, T. (2011). Interactively co-segmenting topically related images with intelligent scribble guidance. International Journal of Computer Vision, 93(3), 273–292.CrossRef

Baumgart, B.G. (1974). Geometric modeling for computer vision. PhD thesis, Stanford University.

Boykov, Y., & Kolmogorov, V. (2004). An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. Pattern Analysis and Machine Intelligence, 26(9), 1124–1137.CrossRef

Boykov, Y., Veksler, O., & Zabih, R. (2001). Efficient approximate energy minimization via graph cuts. Pattern Analysis and Machine Intelligence, 20(12), 1222–1239.CrossRef

Campbell, N., Vogiatzis, G., Hernndez, C., & Cipolla, R. (2007). Automatic 3d object segmentation in multiple views using volumetric graph-cuts. In BMVC, Bristol.

Campbell, N.D., Vogiatzis, G., Hernández, C., & Cipolla, R. (2008). Using multiple hypotheses to improve depth-maps for multi-view stereo. In ECCV.

Chen, Z., Chou, H.L., & Chen, W.C. (2008). A performance controllable octree construction method. In ICPR.

Collins, B., Deng, J., Li, K., & Fei-Fei, L. (2008). Towards scalable dataset construction: An active learning approach. In ECCV.

Comaniciu, D., & Meer, P. (2002). Mean shift: a robust approach toward feature space analysis. Pattern Analysis and Machine Intelligence, 24(5), 603–619.CrossRef

Criminisi, A., Reid, I.D., & Zisserman, A. (1999). Single view metrology. In ICCV.

Debevec, P., Taylor, C., & Malik, J. (1996). Modeling and rendering architecture from photographs: A hybrid geometry- and image-based approach. In SIGGRAPH.

Fang, Y. H., Chou, H. L., & Chen, Z. (2003). 3D Shape recovery of complex objects from multiple silhouette images. Pattern Recognition Letters, 24(9–10), 1279–1293.CrossRefMATH

Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181.CrossRef

Forbes, K., Nicolls, F., de Jager, G., & Voigt, A. (2006). Shape-from-silhouette with two mirrors and an uncalibrated camera. In ECCV, (pp. 165–178).

Furukawa, Y., & Ponce, J. (2009). Accurate, dense, and robust multi-view stereopsis. Pattern Analysis and Machine Intelligence, 32:1362–1376.

Furukawa, Y., Curless, B., Seitz, S., & Szeliski, R. (2009). Reconstructing building interiors from images. In ICCV.

Furukawa, Y., Curless, B., Seitz, S.M., & Szeliski, R. (2010). Towards internet-scale multi-view stereo. In CVPR.

Gallup, D., Frahm, J., & Pollefeys, M. (2010). Piecewise planar and non-planar stereo for urban scene reconstruction. In CVPR.

Goesele, M., Snavely, N., Curless, B., Hoppe, H., & Seitz, S.M. (2007). Multi-view stereo for community photo collections. In ICCV.

Gosselin, P. H., & Cord, M. (2008). Active learning methods for interactive image retrieval. IEEE Transactions on Image Processing, 17(7), 1200–1211.CrossRefMathSciNet

Hengel, A., Dick, A. R., ThormŁhlen, T., Ward, B., & Torr, P. H. S. (2007). Videotrace: Rapid interactive scene modelling from video. ACM Transactions on Graphics, 26(3), 86.CrossRef

Hoiem, D., Efros, A., & Hebert, M. (2005). Automatic photo pop-up. In SIGGRAPH.

Hoiem, D., Efros, A. A., & Hebert, M. (2007). Recovering surface layout from an image. IJCV, 75(1)

Jain, P., & Kapoor, A. (2009). Active learning for large multi-class problems. In CVPR, (pp. 762–769).

Kapoor, A., Grauman, K., Urtasun, R., & Darrell, T. (2007). Active learning with gaussian processes for object categorization. In ICCV.

Kohli, P., & Torr, P. H. S. (2008). Measuring uncertainty in graph cut solutions. Computer Vision and Image Understanding, 112(1), 30–38.CrossRef

Kohli, P., Nickisch, H., Rother, C., & Rhemann, C. (2012). User-centric learning and evaluation of interactive segmentation systems. In IJCV.

Kolmogorov, V., & Zabih, R. (2004). What energy functions can be minimized via graph cuts? Pattern Analysis and Machine Intelligence, 26(2), 147–159.CrossRef

Kowdle, A., Batra, D., Chen, W., & Chen, T. (2010). iModel: Interactive co-segmentation for object of interest 3d modeling. In ECCV – RMLE Workshop.

Kowdle, A., Chang, Y., Batra, D., & Chen, T. (2011a). Scribble based interactive 3d reconstruction via scene cosegmentation. In ICIP.

Kowdle, A., Chang, Y., Gallagher, A., & Chen, T. (2011b). Active learning for piecewise planar 3d reconstruction. In CVPR.

Kowdle, A., Liu, H., Hsu, S., Lew, J., Puri, C., Batra, D., & Chen, T. (2012a). iModel: Object of interest 3d modeling via interactive co-segmentation on a mobile device. In Demo session at CVPR.

Kowdle, A., Sinha, S., & Szeliski, R. (2012b). Multiple view object cosegmentation using appearance and stereo cues. In ECCV.

Lafarge, F., Keriven, R., Brédif, M., & Hiep, V. (2010). Hybrid multi-view reconstruction by jump-diffusion. In CVPR.

Lee, W., Woo, W., & Boyer, E. (2007). Identifying foreground from multiple images. In ACCV.

McGuinness, K., & O’Connor, N.E. (2012). Toward automated evaluation of interactive segmentation. In Computer Vision and Image Understanding. 115(6) (pp. 868-884).

Micusík, B., & Kosecká, J. (2010). Multi-view superpixel stereo in urban environments. International Journal of Computer Vision, 89(1), 106–119.CrossRef

Pollefeys, M., Van Gool, L., Vergauwen, M., Verbiest, F., Cornelis, K., Tops, J., et al. (2004). Visual modeling with a hand-held camera. International Journal of Computer Vision, 59(3), 207–232.

Pollefeys, M., Nistr, D., Frahm, J., Akbarzadeh, A., Mordohai, P., Clipp, B., et al. (2008). Detailed real-time urban 3d reconstruction from video. International Journal of Computer Vision, 78(2–3), 143–167.

Saxena, A., Sun, M., & Ng, A. Y. (2009). Make3d: Learning 3d scene structure from a single still image. Pattern Analysis and Machine Intelligence, 31(5), 824–840.

Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47(1–3), 7–42.

Seitz, S.M., Curless, B., Diebel, J., Scharstein, D., & Szeliski, R. (2006). A comparison and evaluation of multi-view stereo reconstruction algorithms. In CVPR.

Sinha, S., Steedly, D., Szeliski, R., Agrawala, M., & Pollefeys, M. (2008). Interactive 3d architectural modeling from unordered photo collections. In SIGGRAPH Asia.

Sinha, S., Steedly, D., & Szeliski, R. (2009). Piecewise planar stereo for image-based rendering. In ICCV.

Sketchup. (2000). Google sketchup. http://sketchup.google.com/. Accessed 7 March 2013.

Snavely, N., Seitz, S., & Szeliski, R. (2006). Photo tourism: Exploring photo collections in 3d. In SIGGRAPH.

Srivastava, S., Saxena, A., Theobalt, C., Thrun, S., & Ng, A.Y. (2009). i23 - Rapid interactive 3d reconstruction from a single image. In Vision, Modeling and Visualization.

Sturm, P.F., & Maybank, S.J. (1999). A method for interactive 3d reconstruction of piecewise planar objects from single images. In BMVC.

Szeliski, R. (1993). Rapid octree construction from image sequences. Computer Vision Graphics and Image Processing, 58(1), 23–32.CrossRef

Tang, K., Kowdle, A., Batra, D., & Chen, T. (2009). iScribble. http://chenlab.ece.cornell.edu/projects/iScribble/iScribble.html. Accessed 7 March 2013.

Vicente, S., Rother, C., & Kolmogorov, V. (2011). Object cosegmentation. In CVPR.

Vijayanarasimhan, S., Jain, P., & Grauman, K. (2010). Far-sighted active learning on a budget for image and video recognition. In CVPR.

Yan, R., Yang, J., & Hauptmann, A. (2003). Automatically labeling video data using multi-class active learning. In ICCV.

Zhou, X. S., & Huang, T. S. (2003). Relevance feedback in image retrieval: A comprehensive review. Multimedia Systems, 8(6), 536–544.CrossRef

Titel: Putting the User in the Loop for Image-Based Modeling
verfasst von: Adarsh Kowdle
Yao-Jen Chang
Andrew Gallagher
Dhruv Batra
Tsuhan Chen
Publikationsdatum: 01.05.2014
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 1-2/2014
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-014-0704-x

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 1-2/2014

Editorial: Special Issue on Active and Interactive Methods in Computer Vision

Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds

An Interactive Approach to Solving Correspondence Problems

Active Image Clustering with Pairwise Constraints from Humans

Collaborative Personalization of Image Enhancement

The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding

Premium Partner