Skip to main content
Erschienen in: International Journal of Computer Vision 2/2012

01.11.2012

Object Detection using Geometrical Context Feedback

verfasst von: Min Sun, Sid Yingze Bao, Silvio Savarese

Erschienen in: International Journal of Computer Vision | Ausgabe 2/2012

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We propose a new coherent framework for joint object detection, 3D layout estimation, and object supporting region segmentation from a single image. Our approach is based on the mutual interactions among three novel modules: (i) object detector; (ii) scene 3D layout estimator; (iii) object supporting region segmenter. The interactions between such modules capture the contextual geometrical relationship between objects, the physical space including these objects, and the observer. An important property of our algorithm is that the object detector module is capable of adaptively changing its confidence in establishing whether a certain region of interest contains an object (or not) as new evidence is gathered about the scene layout. This enables an iterative estimation procedure where the detector becomes more and more accurate as additional evidence about a specific scene becomes available. Extensive quantitative and qualitative experiments are conducted on the table-top dataset (Sun et al. in ECCV, 2010b) and two publicly available datasets (Hoiem et al. in CVPR, 2006; Sudderth et al. in IJCV, 2008), and demonstrate competitive object detection, 3D layout estimation, and segmentation results.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Here we omit the superscript o to have a concise notation.
 
2
When the area of the intersection between the foreground region (fg) and the object bounding box over the area of the object bounding box is bigger than 0.5, the object is considered as sufficient overlap with the foreground region.
 
3
The training instances and testing instances are separated.
 
4
As explained in Bao et al. (2010) and in Sect. 2.2.2, at least 3 objects are necessary for estimating the layout.
 
5
\(e_{H}=\frac{1}{N}\sum_{i}|\frac{\widehat{H_{i}}-H_{i}}{H_{i}}|\), where \(\widehat{H_{i}}\) and H i are the best estimated and ground truth vanishing line.
 
Literatur
Zurück zum Zitat Bao, S. Y., Sun, M., & Savarese, S. (2010). Toward coherent object detection and scenelayout understanding. In CVPR. Bao, S. Y., Sun, M., & Savarese, S. (2010). Toward coherent object detection and scenelayout understanding. In CVPR.
Zurück zum Zitat Brostow, G. J., Shotton, J., Fauqueur, J., & Cipolla, R. (2008). Segmentation and recognition using structure from motion point clouds. In ECCV. Brostow, G. J., Shotton, J., Fauqueur, J., & Cipolla, R. (2008). Segmentation and recognition using structure from motion point clouds. In ECCV.
Zurück zum Zitat Cornelis, N., Leibe, B., Cornelis, K., & Van Gool, L. (2006). 3D city modeling using cognitive loops. In 3DPVT. Cornelis, N., Leibe, B., Cornelis, K., & Van Gool, L. (2006). 3D city modeling using cognitive loops. In 3DPVT.
Zurück zum Zitat Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR.
Zurück zum Zitat Dance, C., Willamowski, J., Fan, L., Bray, C., & Csurka, G. (2004). Visual categorization with bags of keypoints. In ECCV workshop on statistical learning in computer vision. Dance, C., Willamowski, J., Fan, L., Bray, C., & Csurka, G. (2004). Visual categorization with bags of keypoints. In ECCV workshop on statistical learning in computer vision.
Zurück zum Zitat Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2007). The PASCAL visual object classes challenge 2007 (VOC2007) results. Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2007). The PASCAL visual object classes challenge 2007 (VOC2007) results.
Zurück zum Zitat Fei-Fei, L., Fergus, R., & Perona, P. (2003). A Bayesian approach to unsupervised one-shot learning of object categories. In ICCV. Fei-Fei, L., Fergus, R., & Perona, P. (2003). A Bayesian approach to unsupervised one-shot learning of object categories. In ICCV.
Zurück zum Zitat Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. In IJCV. Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. In IJCV.
Zurück zum Zitat Felzenszwalb, P. F., & Huttenlocher, D. P. (2005). Pictorial structures for object recognition. In IJCV. Felzenszwalb, P. F., & Huttenlocher, D. P. (2005). Pictorial structures for object recognition. In IJCV.
Zurück zum Zitat Fergus, R., Perona, P., & Zisserman, A. (2005). A sparse object category model for efficient learning and exhaustive recognition. In CVPR. Fergus, R., Perona, P., & Zisserman, A. (2005). A sparse object category model for efficient learning and exhaustive recognition. In CVPR.
Zurück zum Zitat Gonfaus, J. M., Boix, X., van de Weijer, J., Bagdanov, A. D., Serrat, J., & Gonzàlez, J. (2010). Harmony potentials for joint classification and segmentation. In CVPR. Gonfaus, J. M., Boix, X., van de Weijer, J., Bagdanov, A. D., Serrat, J., & Gonzàlez, J. (2010). Harmony potentials for joint classification and segmentation. In CVPR.
Zurück zum Zitat Gould, S., Fulton, R., & Koller, D. (2009). Decomposing a scene into geometric and semantically consistent regions. In ICCV. Gould, S., Fulton, R., & Koller, D. (2009). Decomposing a scene into geometric and semantically consistent regions. In ICCV.
Zurück zum Zitat Grauman, K., & Darrell, T. (2005). The pyramid match kernel: discriminative classification with sets of image features. In ICCV. Grauman, K., & Darrell, T. (2005). The pyramid match kernel: discriminative classification with sets of image features. In ICCV.
Zurück zum Zitat Gupta, A., & Davis, L. S. (2008). Beyond nouns: exploiting prepositions and comparative adjectives for learning visual classifiers. In ECCV. Gupta, A., & Davis, L. S. (2008). Beyond nouns: exploiting prepositions and comparative adjectives for learning visual classifiers. In ECCV.
Zurück zum Zitat Hedau, V., Hoiem, D., & Forsyth, D. (2009). Recovering the spatial layout of cluttered rooms. In ICCV. Hedau, V., Hoiem, D., & Forsyth, D. (2009). Recovering the spatial layout of cluttered rooms. In ICCV.
Zurück zum Zitat Heitz, G., Gould, S., Saxena, A., & Koller, D. (2008). Cascaded classification models: combining models for holistic scene understanding. In NIPS. Heitz, G., Gould, S., Saxena, A., & Koller, D. (2008). Cascaded classification models: combining models for holistic scene understanding. In NIPS.
Zurück zum Zitat Hoiem, D., Efros, A. A., & Hebert, M. (2005). Geometric context from a single image. In ICCV. Hoiem, D., Efros, A. A., & Hebert, M. (2005). Geometric context from a single image. In ICCV.
Zurück zum Zitat Hoiem, D., Efros, A. A., & Hebert, M. (2006). Putting objects in perspective. In CVPR. Hoiem, D., Efros, A. A., & Hebert, M. (2006). Putting objects in perspective. In CVPR.
Zurück zum Zitat Hoiem, D., Efros, A., & Hebert, M. (2007). Recovering surface layout from an image. In IJCV. Hoiem, D., Efros, A., & Hebert, M. (2007). Recovering surface layout from an image. In IJCV.
Zurück zum Zitat Hoiem, D., Efros, A. A., & Hebert, M. (2008). Closing the loop on scene interpretation. In CVPR. Hoiem, D., Efros, A. A., & Hebert, M. (2008). Closing the loop on scene interpretation. In CVPR.
Zurück zum Zitat Ladicky, L., Russell, C., Kohli, P., & Torr, P. (2010). Graph cut based inference with co-occurrence statistics. In ECCV. Ladicky, L., Russell, C., Kohli, P., & Torr, P. (2010). Graph cut based inference with co-occurrence statistics. In ECCV.
Zurück zum Zitat Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In ECCV workshop on statistical learning in computer vision. Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In ECCV workshop on statistical learning in computer vision.
Zurück zum Zitat Li, C., Kowdle, A., Saxena, A., & Chen, T. (2010). Towards holistic scene understanding: feedback enabled cascaded classification models. In NIPS. Li, C., Kowdle, A., Saxena, A., & Chen, T. (2010). Towards holistic scene understanding: feedback enabled cascaded classification models. In NIPS.
Zurück zum Zitat Li, L. J., & Fei-Fei, L. (2007). What, where and who? classifying event by scene and object recognition. In ICCV. Li, L. J., & Fei-Fei, L. (2007). What, where and who? classifying event by scene and object recognition. In ICCV.
Zurück zum Zitat Li, L. J., Socher, R., & Fei-Fei, L. (2009). Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In CVPR. Li, L. J., Socher, R., & Fei-Fei, L. (2009). Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In CVPR.
Zurück zum Zitat Liebelt, J., & Schmid, C. (2010). Multi-view object class detection with a 3D geometric model. In CVPR. Liebelt, J., & Schmid, C. (2010). Multi-view object class detection with a 3D geometric model. In CVPR.
Zurück zum Zitat Payet, N., & Todorovic, S. (2011). Scene shape from textures of objects. In CVPR. Payet, N., & Todorovic, S. (2011). Scene shape from textures of objects. In CVPR.
Zurück zum Zitat Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., & Belongie, S. (2007). Objects in context. In ICCV. Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., & Belongie, S. (2007). Objects in context. In ICCV.
Zurück zum Zitat Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2008). Labelme: A database and web-based tool for image annotation. In IJCV. Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2008). Labelme: A database and web-based tool for image annotation. In IJCV.
Zurück zum Zitat Savarese, S., & Fei-Fei, L. (2007). 3D generic object categorization, localization and pose estimation. In CVPR. Savarese, S., & Fei-Fei, L. (2007). 3D generic object categorization, localization and pose estimation. In CVPR.
Zurück zum Zitat Saxena, A., Sun, M., & Ng, A. Y. (2009). Make3D: learning 3D scene structure from a single still image. In PAMI. Saxena, A., Sun, M., & Ng, A. Y. (2009). Make3D: learning 3D scene structure from a single still image. In PAMI.
Zurück zum Zitat Su, H., Sun, M., Fei-Fei, L., & Savarese, S. (2009). Learning a dense multi-view representation for detection, viewpoint classification, and synthesis of object categories. In ICCV. Su, H., Sun, M., Fei-Fei, L., & Savarese, S. (2009). Learning a dense multi-view representation for detection, viewpoint classification, and synthesis of object categories. In ICCV.
Zurück zum Zitat Sudderth, E. B., Torralba, A., Freeman, W. T., & Willsky, A. S. (2008). Describing visual scenes using transformed objects and parts. In IJCV. Sudderth, E. B., Torralba, A., Freeman, W. T., & Willsky, A. S. (2008). Describing visual scenes using transformed objects and parts. In IJCV.
Zurück zum Zitat Sun, M., Su, H., Savarese, S., & Fei-Fei, L. (2009). A multi-view probabilistic model for 3D object classes. In CVPR. Sun, M., Su, H., Savarese, S., & Fei-Fei, L. (2009). A multi-view probabilistic model for 3D object classes. In CVPR.
Zurück zum Zitat Sun, M., Bao, S. Y., & Savarese, S. (2010a). Object detection with geometrical context feedback loop. In BMVC. Sun, M., Bao, S. Y., & Savarese, S. (2010a). Object detection with geometrical context feedback loop. In BMVC.
Zurück zum Zitat Sun, M., Bradski, G., Xu, B. X., & Savarese, S. (2010b). Depth-encoded hough voting for coherent object detection, pose estimation, and shape recovery. In ECCV. Sun, M., Bradski, G., Xu, B. X., & Savarese, S. (2010b). Depth-encoded hough voting for coherent object detection, pose estimation, and shape recovery. In ECCV.
Zurück zum Zitat Thomas, A., Ferrari, V., Leibe, B., Tuytelaars, T., Schiele, B., & Van Gool, L. (2006). Towards multi-view object class detection. In CVPR. Thomas, A., Ferrari, V., Leibe, B., Tuytelaars, T., Schiele, B., & Van Gool, L. (2006). Towards multi-view object class detection. In CVPR.
Zurück zum Zitat Torralba, A., Murphy, K. P., Freeman, W. T., & Rubin, M. A. (2003). Context-based vision system for place and object recognition. In ICCV. Torralba, A., Murphy, K. P., Freeman, W. T., & Rubin, M. A. (2003). Context-based vision system for place and object recognition. In ICCV.
Zurück zum Zitat Viola, P., & Jones, M. (2002). Robust real-time object detection. In IJCV. Viola, P., & Jones, M. (2002). Robust real-time object detection. In IJCV.
Metadaten
Titel
Object Detection using Geometrical Context Feedback
verfasst von
Min Sun
Sid Yingze Bao
Silvio Savarese
Publikationsdatum
01.11.2012
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 2/2012
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-012-0547-2

Weitere Artikel der Ausgabe 2/2012

International Journal of Computer Vision 2/2012 Zur Ausgabe

Premium Partner