Skip to main content
Erschienen in: International Journal of Computer Vision 1-2/2014

01.05.2014

Putting the User in the Loop for Image-Based Modeling

verfasst von: Adarsh Kowdle, Yao-Jen Chang, Andrew Gallagher, Dhruv Batra, Tsuhan Chen

Erschienen in: International Journal of Computer Vision | Ausgabe 1-2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We refer to the task of recovering the 3D structure of an object or a scene using 2D images as image-based modeling. In this paper, we formulate the task of recovering the 3D structure as a discrete optimization problem solved via energy minimization. In this standard framework of a Markov random field (MRF) defined over the image we present algorithms that allow the user to intuitively interact with the algorithm. We introduce an algorithm where the user guides the process of image-based modeling to find and model the object of interest by manually interacting with the nodes of the graph. We develop end user applications using this algorithm that allow object of interest 3D modeling on a mobile device and 3D printing of the object of interest. We also propose an alternate active learning algorithm that guides the user input. An initial attempt is made at reconstructing the scene without supervision. Given the reconstruction, an active learning algorithm uses intuitive cues to quantify the uncertainty of the algorithm and suggest regions, querying the user to provide support for the uncertain regions via simple scribbles. These constraints are used to update the unary and the pairwise energies that, when solved, lead to better reconstructions. We show through machine experiments and a user study that the proposed approach intelligently queries the users for constraints, and users achieve better reconstructions of the scene faster, especially for scenes with textureless surfaces lacking strong textural or structural cues that algorithms typically require.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
Superpixels are used to help reduce computational complexity.
 
2
We use mean-shift segmentation (Comaniciu and Meer 2002) to break an image to about thousand superpixels.
 
3
The parameter \(\lambda \) is set to 0.5.
 
5
We use graph based segmentation (Felzenszwalb and Huttenlocher 2004) to break each image down to about 400 superpixels.
 
9
The 3D printouts were obtained using the online service http://​www.​shapeways.​com.
 
Literatur
Zurück zum Zitat Bartoli, A. (2007). A random sampling strategy for piecewise planar scene segmentation. Cardiac and Vascular Institute of Ultrasound, 105(1), 42–59. Bartoli, A. (2007). A random sampling strategy for piecewise planar scene segmentation. Cardiac and Vascular Institute of Ultrasound, 105(1), 42–59.
Zurück zum Zitat Batra, D., Kowdle, A., Parikh, D., Luo, J., & Chen, T. (2011). Interactively co-segmenting topically related images with intelligent scribble guidance. International Journal of Computer Vision, 93(3), 273–292.CrossRef Batra, D., Kowdle, A., Parikh, D., Luo, J., & Chen, T. (2011). Interactively co-segmenting topically related images with intelligent scribble guidance. International Journal of Computer Vision, 93(3), 273–292.CrossRef
Zurück zum Zitat Baumgart, B.G. (1974). Geometric modeling for computer vision. PhD thesis, Stanford University. Baumgart, B.G. (1974). Geometric modeling for computer vision. PhD thesis, Stanford University.
Zurück zum Zitat Boykov, Y., & Kolmogorov, V. (2004). An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. Pattern Analysis and Machine Intelligence, 26(9), 1124–1137.CrossRef Boykov, Y., & Kolmogorov, V. (2004). An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. Pattern Analysis and Machine Intelligence, 26(9), 1124–1137.CrossRef
Zurück zum Zitat Boykov, Y., Veksler, O., & Zabih, R. (2001). Efficient approximate energy minimization via graph cuts. Pattern Analysis and Machine Intelligence, 20(12), 1222–1239.CrossRef Boykov, Y., Veksler, O., & Zabih, R. (2001). Efficient approximate energy minimization via graph cuts. Pattern Analysis and Machine Intelligence, 20(12), 1222–1239.CrossRef
Zurück zum Zitat Campbell, N., Vogiatzis, G., Hernndez, C., & Cipolla, R. (2007). Automatic 3d object segmentation in multiple views using volumetric graph-cuts. In BMVC, Bristol. Campbell, N., Vogiatzis, G., Hernndez, C., & Cipolla, R. (2007). Automatic 3d object segmentation in multiple views using volumetric graph-cuts. In BMVC, Bristol.
Zurück zum Zitat Campbell, N.D., Vogiatzis, G., Hernández, C., & Cipolla, R. (2008). Using multiple hypotheses to improve depth-maps for multi-view stereo. In ECCV. Campbell, N.D., Vogiatzis, G., Hernández, C., & Cipolla, R. (2008). Using multiple hypotheses to improve depth-maps for multi-view stereo. In ECCV.
Zurück zum Zitat Chen, Z., Chou, H.L., & Chen, W.C. (2008). A performance controllable octree construction method. In ICPR. Chen, Z., Chou, H.L., & Chen, W.C. (2008). A performance controllable octree construction method. In ICPR.
Zurück zum Zitat Collins, B., Deng, J., Li, K., & Fei-Fei, L. (2008). Towards scalable dataset construction: An active learning approach. In ECCV. Collins, B., Deng, J., Li, K., & Fei-Fei, L. (2008). Towards scalable dataset construction: An active learning approach. In ECCV.
Zurück zum Zitat Comaniciu, D., & Meer, P. (2002). Mean shift: a robust approach toward feature space analysis. Pattern Analysis and Machine Intelligence, 24(5), 603–619.CrossRef Comaniciu, D., & Meer, P. (2002). Mean shift: a robust approach toward feature space analysis. Pattern Analysis and Machine Intelligence, 24(5), 603–619.CrossRef
Zurück zum Zitat Criminisi, A., Reid, I.D., & Zisserman, A. (1999). Single view metrology. In ICCV. Criminisi, A., Reid, I.D., & Zisserman, A. (1999). Single view metrology. In ICCV.
Zurück zum Zitat Debevec, P., Taylor, C., & Malik, J. (1996). Modeling and rendering architecture from photographs: A hybrid geometry- and image-based approach. In SIGGRAPH. Debevec, P., Taylor, C., & Malik, J. (1996). Modeling and rendering architecture from photographs: A hybrid geometry- and image-based approach. In SIGGRAPH.
Zurück zum Zitat Fang, Y. H., Chou, H. L., & Chen, Z. (2003). 3D Shape recovery of complex objects from multiple silhouette images. Pattern Recognition Letters, 24(9–10), 1279–1293.CrossRefMATH Fang, Y. H., Chou, H. L., & Chen, Z. (2003). 3D Shape recovery of complex objects from multiple silhouette images. Pattern Recognition Letters, 24(9–10), 1279–1293.CrossRefMATH
Zurück zum Zitat Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181.CrossRef Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181.CrossRef
Zurück zum Zitat Forbes, K., Nicolls, F., de Jager, G., & Voigt, A. (2006). Shape-from-silhouette with two mirrors and an uncalibrated camera. In ECCV, (pp. 165–178). Forbes, K., Nicolls, F., de Jager, G., & Voigt, A. (2006). Shape-from-silhouette with two mirrors and an uncalibrated camera. In ECCV, (pp. 165–178).
Zurück zum Zitat Furukawa, Y., & Ponce, J. (2009). Accurate, dense, and robust multi-view stereopsis. Pattern Analysis and Machine Intelligence, 32:1362–1376. Furukawa, Y., & Ponce, J. (2009). Accurate, dense, and robust multi-view stereopsis. Pattern Analysis and Machine Intelligence, 32:1362–1376.
Zurück zum Zitat Furukawa, Y., Curless, B., Seitz, S., & Szeliski, R. (2009). Reconstructing building interiors from images. In ICCV. Furukawa, Y., Curless, B., Seitz, S., & Szeliski, R. (2009). Reconstructing building interiors from images. In ICCV.
Zurück zum Zitat Furukawa, Y., Curless, B., Seitz, S.M., & Szeliski, R. (2010). Towards internet-scale multi-view stereo. In CVPR. Furukawa, Y., Curless, B., Seitz, S.M., & Szeliski, R. (2010). Towards internet-scale multi-view stereo. In CVPR.
Zurück zum Zitat Gallup, D., Frahm, J., & Pollefeys, M. (2010). Piecewise planar and non-planar stereo for urban scene reconstruction. In CVPR. Gallup, D., Frahm, J., & Pollefeys, M. (2010). Piecewise planar and non-planar stereo for urban scene reconstruction. In CVPR.
Zurück zum Zitat Goesele, M., Snavely, N., Curless, B., Hoppe, H., & Seitz, S.M. (2007). Multi-view stereo for community photo collections. In ICCV. Goesele, M., Snavely, N., Curless, B., Hoppe, H., & Seitz, S.M. (2007). Multi-view stereo for community photo collections. In ICCV.
Zurück zum Zitat Gosselin, P. H., & Cord, M. (2008). Active learning methods for interactive image retrieval. IEEE Transactions on Image Processing, 17(7), 1200–1211.CrossRefMathSciNet Gosselin, P. H., & Cord, M. (2008). Active learning methods for interactive image retrieval. IEEE Transactions on Image Processing, 17(7), 1200–1211.CrossRefMathSciNet
Zurück zum Zitat Hengel, A., Dick, A. R., ThormŁhlen, T., Ward, B., & Torr, P. H. S. (2007). Videotrace: Rapid interactive scene modelling from video. ACM Transactions on Graphics, 26(3), 86.CrossRef Hengel, A., Dick, A. R., ThormŁhlen, T., Ward, B., & Torr, P. H. S. (2007). Videotrace: Rapid interactive scene modelling from video. ACM Transactions on Graphics, 26(3), 86.CrossRef
Zurück zum Zitat Hoiem, D., Efros, A., & Hebert, M. (2005). Automatic photo pop-up. In SIGGRAPH. Hoiem, D., Efros, A., & Hebert, M. (2005). Automatic photo pop-up. In SIGGRAPH.
Zurück zum Zitat Hoiem, D., Efros, A. A., & Hebert, M. (2007). Recovering surface layout from an image. IJCV, 75(1) Hoiem, D., Efros, A. A., & Hebert, M. (2007). Recovering surface layout from an image. IJCV, 75(1)
Zurück zum Zitat Jain, P., & Kapoor, A. (2009). Active learning for large multi-class problems. In CVPR, (pp. 762–769). Jain, P., & Kapoor, A. (2009). Active learning for large multi-class problems. In CVPR, (pp. 762–769).
Zurück zum Zitat Kapoor, A., Grauman, K., Urtasun, R., & Darrell, T. (2007). Active learning with gaussian processes for object categorization. In ICCV. Kapoor, A., Grauman, K., Urtasun, R., & Darrell, T. (2007). Active learning with gaussian processes for object categorization. In ICCV.
Zurück zum Zitat Kohli, P., & Torr, P. H. S. (2008). Measuring uncertainty in graph cut solutions. Computer Vision and Image Understanding, 112(1), 30–38.CrossRef Kohli, P., & Torr, P. H. S. (2008). Measuring uncertainty in graph cut solutions. Computer Vision and Image Understanding, 112(1), 30–38.CrossRef
Zurück zum Zitat Kohli, P., Nickisch, H., Rother, C., & Rhemann, C. (2012). User-centric learning and evaluation of interactive segmentation systems. In IJCV. Kohli, P., Nickisch, H., Rother, C., & Rhemann, C. (2012). User-centric learning and evaluation of interactive segmentation systems. In IJCV.
Zurück zum Zitat Kolmogorov, V., & Zabih, R. (2004). What energy functions can be minimized via graph cuts? Pattern Analysis and Machine Intelligence, 26(2), 147–159.CrossRef Kolmogorov, V., & Zabih, R. (2004). What energy functions can be minimized via graph cuts? Pattern Analysis and Machine Intelligence, 26(2), 147–159.CrossRef
Zurück zum Zitat Kowdle, A., Batra, D., Chen, W., & Chen, T. (2010). iModel: Interactive co-segmentation for object of interest 3d modeling. In ECCV – RMLE Workshop. Kowdle, A., Batra, D., Chen, W., & Chen, T. (2010). iModel: Interactive co-segmentation for object of interest 3d modeling. In ECCVRMLE Workshop.
Zurück zum Zitat Kowdle, A., Chang, Y., Batra, D., & Chen, T. (2011a). Scribble based interactive 3d reconstruction via scene cosegmentation. In ICIP. Kowdle, A., Chang, Y., Batra, D., & Chen, T. (2011a). Scribble based interactive 3d reconstruction via scene cosegmentation. In ICIP.
Zurück zum Zitat Kowdle, A., Chang, Y., Gallagher, A., & Chen, T. (2011b). Active learning for piecewise planar 3d reconstruction. In CVPR. Kowdle, A., Chang, Y., Gallagher, A., & Chen, T. (2011b). Active learning for piecewise planar 3d reconstruction. In CVPR.
Zurück zum Zitat Kowdle, A., Liu, H., Hsu, S., Lew, J., Puri, C., Batra, D., & Chen, T. (2012a). iModel: Object of interest 3d modeling via interactive co-segmentation on a mobile device. In Demo session at CVPR. Kowdle, A., Liu, H., Hsu, S., Lew, J., Puri, C., Batra, D., & Chen, T. (2012a). iModel: Object of interest 3d modeling via interactive co-segmentation on a mobile device. In Demo session at CVPR.
Zurück zum Zitat Kowdle, A., Sinha, S., & Szeliski, R. (2012b). Multiple view object cosegmentation using appearance and stereo cues. In ECCV. Kowdle, A., Sinha, S., & Szeliski, R. (2012b). Multiple view object cosegmentation using appearance and stereo cues. In ECCV.
Zurück zum Zitat Lafarge, F., Keriven, R., Brédif, M., & Hiep, V. (2010). Hybrid multi-view reconstruction by jump-diffusion. In CVPR. Lafarge, F., Keriven, R., Brédif, M., & Hiep, V. (2010). Hybrid multi-view reconstruction by jump-diffusion. In CVPR.
Zurück zum Zitat Lee, W., Woo, W., & Boyer, E. (2007). Identifying foreground from multiple images. In ACCV. Lee, W., Woo, W., & Boyer, E. (2007). Identifying foreground from multiple images. In ACCV.
Zurück zum Zitat McGuinness, K., & O’Connor, N.E. (2012). Toward automated evaluation of interactive segmentation. In Computer Vision and Image Understanding. 115(6) (pp. 868-884). McGuinness, K., & O’Connor, N.E. (2012). Toward automated evaluation of interactive segmentation. In Computer Vision and Image Understanding. 115(6) (pp. 868-884).
Zurück zum Zitat Micusík, B., & Kosecká, J. (2010). Multi-view superpixel stereo in urban environments. International Journal of Computer Vision, 89(1), 106–119.CrossRef Micusík, B., & Kosecká, J. (2010). Multi-view superpixel stereo in urban environments. International Journal of Computer Vision, 89(1), 106–119.CrossRef
Zurück zum Zitat Pollefeys, M., Van Gool, L., Vergauwen, M., Verbiest, F., Cornelis, K., Tops, J., et al. (2004). Visual modeling with a hand-held camera. International Journal of Computer Vision, 59(3), 207–232. Pollefeys, M., Van Gool, L., Vergauwen, M., Verbiest, F., Cornelis, K., Tops, J., et al. (2004). Visual modeling with a hand-held camera. International Journal of Computer Vision, 59(3), 207–232.
Zurück zum Zitat Pollefeys, M., Nistr, D., Frahm, J., Akbarzadeh, A., Mordohai, P., Clipp, B., et al. (2008). Detailed real-time urban 3d reconstruction from video. International Journal of Computer Vision, 78(2–3), 143–167. Pollefeys, M., Nistr, D., Frahm, J., Akbarzadeh, A., Mordohai, P., Clipp, B., et al. (2008). Detailed real-time urban 3d reconstruction from video. International Journal of Computer Vision, 78(2–3), 143–167.
Zurück zum Zitat Saxena, A., Sun, M., & Ng, A. Y. (2009). Make3d: Learning 3d scene structure from a single still image. Pattern Analysis and Machine Intelligence, 31(5), 824–840. Saxena, A., Sun, M., & Ng, A. Y. (2009). Make3d: Learning 3d scene structure from a single still image. Pattern Analysis and Machine Intelligence, 31(5), 824–840.
Zurück zum Zitat Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47(1–3), 7–42. Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47(1–3), 7–42.
Zurück zum Zitat Seitz, S.M., Curless, B., Diebel, J., Scharstein, D., & Szeliski, R. (2006). A comparison and evaluation of multi-view stereo reconstruction algorithms. In CVPR. Seitz, S.M., Curless, B., Diebel, J., Scharstein, D., & Szeliski, R. (2006). A comparison and evaluation of multi-view stereo reconstruction algorithms. In CVPR.
Zurück zum Zitat Sinha, S., Steedly, D., Szeliski, R., Agrawala, M., & Pollefeys, M. (2008). Interactive 3d architectural modeling from unordered photo collections. In SIGGRAPH Asia. Sinha, S., Steedly, D., Szeliski, R., Agrawala, M., & Pollefeys, M. (2008). Interactive 3d architectural modeling from unordered photo collections. In SIGGRAPH Asia.
Zurück zum Zitat Sinha, S., Steedly, D., & Szeliski, R. (2009). Piecewise planar stereo for image-based rendering. In ICCV. Sinha, S., Steedly, D., & Szeliski, R. (2009). Piecewise planar stereo for image-based rendering. In ICCV.
Zurück zum Zitat Snavely, N., Seitz, S., & Szeliski, R. (2006). Photo tourism: Exploring photo collections in 3d. In SIGGRAPH. Snavely, N., Seitz, S., & Szeliski, R. (2006). Photo tourism: Exploring photo collections in 3d. In SIGGRAPH.
Zurück zum Zitat Srivastava, S., Saxena, A., Theobalt, C., Thrun, S., & Ng, A.Y. (2009). i23 - Rapid interactive 3d reconstruction from a single image. In Vision, Modeling and Visualization. Srivastava, S., Saxena, A., Theobalt, C., Thrun, S., & Ng, A.Y. (2009). i23 - Rapid interactive 3d reconstruction from a single image. In Vision, Modeling and Visualization.
Zurück zum Zitat Sturm, P.F., & Maybank, S.J. (1999). A method for interactive 3d reconstruction of piecewise planar objects from single images. In BMVC. Sturm, P.F., & Maybank, S.J. (1999). A method for interactive 3d reconstruction of piecewise planar objects from single images. In BMVC.
Zurück zum Zitat Szeliski, R. (1993). Rapid octree construction from image sequences. Computer Vision Graphics and Image Processing, 58(1), 23–32.CrossRef Szeliski, R. (1993). Rapid octree construction from image sequences. Computer Vision Graphics and Image Processing, 58(1), 23–32.CrossRef
Zurück zum Zitat Vicente, S., Rother, C., & Kolmogorov, V. (2011). Object cosegmentation. In CVPR. Vicente, S., Rother, C., & Kolmogorov, V. (2011). Object cosegmentation. In CVPR.
Zurück zum Zitat Vijayanarasimhan, S., Jain, P., & Grauman, K. (2010). Far-sighted active learning on a budget for image and video recognition. In CVPR. Vijayanarasimhan, S., Jain, P., & Grauman, K. (2010). Far-sighted active learning on a budget for image and video recognition. In CVPR.
Zurück zum Zitat Yan, R., Yang, J., & Hauptmann, A. (2003). Automatically labeling video data using multi-class active learning. In ICCV. Yan, R., Yang, J., & Hauptmann, A. (2003). Automatically labeling video data using multi-class active learning. In ICCV.
Zurück zum Zitat Zhou, X. S., & Huang, T. S. (2003). Relevance feedback in image retrieval: A comprehensive review. Multimedia Systems, 8(6), 536–544.CrossRef Zhou, X. S., & Huang, T. S. (2003). Relevance feedback in image retrieval: A comprehensive review. Multimedia Systems, 8(6), 536–544.CrossRef
Metadaten
Titel
Putting the User in the Loop for Image-Based Modeling
verfasst von
Adarsh Kowdle
Yao-Jen Chang
Andrew Gallagher
Dhruv Batra
Tsuhan Chen
Publikationsdatum
01.05.2014
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 1-2/2014
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-014-0704-x

Weitere Artikel der Ausgabe 1-2/2014

International Journal of Computer Vision 1-2/2014 Zur Ausgabe

Premium Partner