Skip to main content
Top
Published in: International Journal of Computer Vision 1/2016

01-08-2016

SUN Database: Exploring a Large Collection of Scene Categories

Authors: Jianxiong Xiao, Krista A. Ehinger, James Hays, Antonio Torralba, Aude Oliva

Published in: International Journal of Computer Vision | Issue 1/2016

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Progress in scene understanding requires reasoning about the rich and diverse visual environments that make up our daily experience. To this end, we propose the Scene Understanding database, a nearly exhaustive collection of scenes categorized at the same level of specificity as human discourse. The database contains 908 distinct scene categories and 131,072 images. Given this data with both scene and object labels available, we perform in-depth analysis of co-occurrence statistics and the contextual relationship. To better understand this large scale taxonomy of scene categories, we perform two human experiments: we quantify human scene recognition accuracy, and we measure how typical each image is of its assigned scene category. Next, we perform computational experiments: scene recognition with global image features, indoor versus outdoor classification, and “scene detection,” in which we relax the assumption that one image depicts only one scene category. Finally, we relate human experiments to machine performance and explore the relationship between human and machine recognition errors and the relationship between image “typicality” and machine recognition accuracy.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
All the images and scene definitions are available at sundatabase.mit.edu or sun.cs.princeton.edu.
 
2
This difference also explains why our count is much higher than (Biederman 1987) estimate of about 1,000 basic-level objects – we included all object words in our count, not just basic-level terms.
 
3
The number of images is continuously growing as we run our scripts to query more images from time to time.
 
4
Category size ranged from 22 images in the smallest categories to 2360 in the largest. A total of 124,901 images were used in the experiment.
 
5
All workers were located in the United States and had a good performance record with the service (at least 100 HITs completed with an acceptance rate of 95 % or better). Workers were paid $0.03 per trial.
 
6
Note that we use color for dense SIFT computation and train the feature codebook using SUN database that contains color images only. The 15-scene dataset from Lazebnik et al. (2006) contains several categories of grayscale images, which do not have color information. Therefore, the result of our color-based dense SIFT on the 15-scene database (see Fig. 19a) is much worse than what is reported in Lazebnik et al. (2006).
 
7
\(F(3,19846) = 278, p < .001\).
 
8
Due to the difficulty of the one-versus-all classification task, confidence was low across all classifications, and even correctly-classified images had average confidence scores below zero.
 
9
A \(4 \times 2\) ANOVA gives significant main effects of image typicality (\(F(3,19842) = 79.8, p < .001\)) and correct vs. incorrect classification (\(F(1,19842) = 6006, p < .001\)) and a significant interaction between these factors (\(F(3,19842) = 43.5, p < .001\)).
 
Literature
go back to reference Ahonen, T., Matas, J., He, C., & Pietikäinen, M., et al. (2009). Rotation invariant image description with local binary pattern histogram fourier features. In Scandinavian Conference on Image Analysis. Ahonen, T., Matas, J., He, C., & Pietikäinen, M., et al. (2009). Rotation invariant image description with local binary pattern histogram fourier features. In Scandinavian Conference on Image Analysis.
go back to reference Arbelaez, P., Maire, M., Fowlkes, C., & Malik, J. (2011). Contour detection and hierarchical image segmentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 33(5), 898–916. Arbelaez, P., Maire, M., Fowlkes, C., & Malik, J. (2011). Contour detection and hierarchical image segmentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 33(5), 898–916.
go back to reference Barnard, K., Duygulu, P., Forsyth, D., De Freitas, N., Blei, D. M., & Jordan, M. I. (2003). Matching words and pictures. The Journal of Machine Learning Research, 3, 1107–1135. Barnard, K., Duygulu, P., Forsyth, D., De Freitas, N., Blei, D. M., & Jordan, M. I. (2003). Matching words and pictures. The Journal of Machine Learning Research, 3, 1107–1135.
go back to reference Biederman, I. (1987). Recognition-by-components: a theory of human image understanding. Psychological Review, 94(2), 115. Biederman, I. (1987). Recognition-by-components: a theory of human image understanding. Psychological Review, 94(2), 115.
go back to reference Boutell, M. R., Luo, J., Shen, X., & Brown, C. M. (2004). Learning multi-label scene classification. Pattern recognition, 37(9), 1757–1771. Boutell, M. R., Luo, J., Shen, X., & Brown, C. M. (2004). Learning multi-label scene classification. Pattern recognition, 37(9), 1757–1771.
go back to reference Bunge, J., & Fitzpatrick, M. (1993). Estimating the number of species: A review. Journal of the American Statistical Association, 88(421), 364–373. Bunge, J., & Fitzpatrick, M. (1993). Estimating the number of species: A review. Journal of the American Statistical Association, 88(421), 364–373.
go back to reference Dalal, N., & Triggs, B. (2005, June). Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on (Vol. 1, pp. 886-893). IEEE. Dalal, N., & Triggs, B. (2005, June). Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on (Vol. 1, pp. 886-893). IEEE.
go back to reference Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009, June). Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on (pp. 248-255). IEEE. Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009, June). Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on (pp. 248-255). IEEE.
go back to reference Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., & Darrell, T. (2013). Decaf: A deep convolutional activation feature for generic visual recognition. Retrieved from arXiv:1310.1531. Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., & Darrell, T. (2013). Decaf: A deep convolutional activation feature for generic visual recognition. Retrieved from arXiv:​1310.​1531.
go back to reference Ehinger, K. A., Xiao, J., Torralba, A., & Oliva, A. (2011). Estimating scene typicality from human ratings and image features. Massachusetts: Cognitive science. Ehinger, K. A., Xiao, J., Torralba, A., & Oliva, A. (2011). Estimating scene typicality from human ratings and image features. Massachusetts: Cognitive science.
go back to reference Epstein, R., & Kanwisher, N. (1998). A cortical representation of the local visual environment. Nature, 392(6676), 598–601. Epstein, R., & Kanwisher, N. (1998). A cortical representation of the local visual environment. Nature, 392(6676), 598–601.
go back to reference Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338. Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.
go back to reference Fei-Fei, L., Fergus, R., & Perona, P., et al. (2004). Learning generative visual models from few training examples. In Computer Vision and Pattern Recognition Workshop on Generative-Model Based Vision. Fei-Fei, L., Fergus, R., & Perona, P., et al. (2004). Learning generative visual models from few training examples. In Computer Vision and Pattern Recognition Workshop on Generative-Model Based Vision.
go back to reference Fei-Fei, L., & Perona, P. (2005). A bayesian hierarchical model for learning natural scene categories. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on (Vol. 2, pp. 524–531). IEEE. Fei-Fei, L., & Perona, P. (2005). A bayesian hierarchical model for learning natural scene categories. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on (Vol. 2, pp. 524–531). IEEE.
go back to reference Fellbaum, C. (1998). Wordnet: An electronic lexical database. Bradford: Bradford Books. Fellbaum, C. (1998). Wordnet: An electronic lexical database. Bradford: Bradford Books.
go back to reference Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 32(9), 1627–1645. Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 32(9), 1627–1645.
go back to reference Griffin, G., Holub, A., Perona, P., et al. (2007). Caltech-256 object category dataset. Technical Report. Griffin, G., Holub, A., Perona, P., et al. (2007). Caltech-256 object category dataset. Technical Report.
go back to reference Hays, J., & Efros, A. A. (2008). IM2GPS: estimating geographic information from a single image. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on (pp. 1–8). IEEE. Hays, J., & Efros, A. A. (2008). IM2GPS: estimating geographic information from a single image. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on (pp. 1–8). IEEE.
go back to reference Hoiem, D., Efros, A. A., & Hebert, M. (2007). Recovering surface layout from an image. International Journal of Computer Vision, 75(1), 151–172. Hoiem, D., Efros, A. A., & Hebert, M. (2007). Recovering surface layout from an image. International Journal of Computer Vision, 75(1), 151–172.
go back to reference Jolicoeur, P., Gluck, M. A., & Kosslyn, S. M. (1984). Pictures and names: Making the connection. Cognitive Psychology, 16(2), 243–275.CrossRef Jolicoeur, P., Gluck, M. A., & Kosslyn, S. M. (1984). Pictures and names: Making the connection. Cognitive Psychology, 16(2), 243–275.CrossRef
go back to reference Kosecka, J., & Zhang, W. (2002). Video compass. In Computer Vision-ECCV 2002 (pp. 476–490). Berlin: Springer. Kosecka, J., & Zhang, W. (2002). Video compass. In Computer Vision-ECCV 2002 (pp. 476–490). Berlin: Springer.
go back to reference Lalonde, J.F., Hoiem, D., Efros, A.A., Rother, C., Winn, J., & Criminisi, A., et al. (2007). Photo clip art. SIGGRAPH. Lalonde, J.F., Hoiem, D., Efros, A.A., Rother, C., Winn, J., & Criminisi, A., et al. (2007). Photo clip art. SIGGRAPH.
go back to reference Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on (Vol. 2, pp. 2169–2178). IEEE. Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on (Vol. 2, pp. 2169–2178). IEEE.
go back to reference Martin, D., Fowlkes, C., Tal, D., & Malik, J. (2001). A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on (Vol. 2, pp. 416-423). IEEE. Martin, D., Fowlkes, C., Tal, D., & Malik, J. (2001). A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on (Vol. 2, pp. 416-423). IEEE.
go back to reference Matas, J., Chum, O., Urban, M., & Pajdla, T. (2004). Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 22(10), 761–767. Matas, J., Chum, O., Urban, M., & Pajdla, T. (2004). Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 22(10), 761–767.
go back to reference Ojala, T., Pietikainen, M., & Maenpaa, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(7), 971–987. Ojala, T., Pietikainen, M., & Maenpaa, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(7), 971–987.
go back to reference Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175. Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.
go back to reference Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2008, June). Lost in quantization: Improving particular object retrieval in large scale image databases. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on (pp. 1–8). IEEE. Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2008, June). Lost in quantization: Improving particular object retrieval in large scale image databases. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on (pp. 1–8). IEEE.
go back to reference Sharif Razavian, A., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN Features off-the-shelf: An Astounding Baseline for Recognition. Retrieved from arXiv:1403.6382. Sharif Razavian, A., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN Features off-the-shelf: An Astounding Baseline for Recognition. Retrieved from arXiv:​1403.​6382.
go back to reference Renninger, L. W., & Malik, J. (2004). When is scene identification just texture recognition?. Vision Research, 44(19), 2301–2311. Renninger, L. W., & Malik, J. (2004). When is scene identification just texture recognition?. Vision Research, 44(19), 2301–2311.
go back to reference Rosch, E. H. (1973). Natural categories. Cognitive Psychology, 4(3), 328–350. Rosch, E. H. (1973). Natural categories. Cognitive Psychology, 4(3), 328–350.
go back to reference Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M., & Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psychology, 8(3), 382–439. Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M., & Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psychology, 8(3), 382–439.
go back to reference Russakovsky, O., Deng, J., Huang, Z., Berg, A. C., & Fei-Fei, L. (2013, December). Detecting avocados to zucchinis: what have we done, and where are we going?. In Computer Vision (ICCV), 2013 IEEE International Conference on (pp. 2064–2071). IEEE. Russakovsky, O., Deng, J., Huang, Z., Berg, A. C., & Fei-Fei, L. (2013, December). Detecting avocados to zucchinis: what have we done, and where are we going?. In Computer Vision (ICCV), 2013 IEEE International Conference on (pp. 2064–2071). IEEE.
go back to reference Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2008). LabelMe: a database and web-based tool for image annotation. International Journal of Computer Vision, 77(1–3), 157–173. Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2008). LabelMe: a database and web-based tool for image annotation. International Journal of Computer Vision, 77(1–3), 157–173.
go back to reference Sadeghi, M. A., & Farhadi, A. (2011, June). Recognition using visual phrases. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on (pp. 1745–1752). IEEE. Sadeghi, M. A., & Farhadi, A. (2011, June). Recognition using visual phrases. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on (pp. 1745–1752). IEEE.
go back to reference Sanchez, J., Perronnin, F., Mensink, T., & Verbeek, J. (2013). Image classification with the Fisher vector: Theory and practice. International Journal of Computer Vision, 105(3), 222–245. Sanchez, J., Perronnin, F., Mensink, T., & Verbeek, J. (2013). Image classification with the Fisher vector: Theory and practice. International Journal of Computer Vision, 105(3), 222–245.
go back to reference Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2013). Overfeat: Integrated recognition, localization and detection using convolutional networks. Retrieved from arXiv:1312.6229. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2013). Overfeat: Integrated recognition, localization and detection using convolutional networks. Retrieved from arXiv:​1312.​6229.
go back to reference Shechtman, E., & Irani, M. (2007, June). Matching local self-similarities across images and videos. In Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on (pp. 1–8). IEEE. Shechtman, E., & Irani, M. (2007, June). Matching local self-similarities across images and videos. In Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on (pp. 1–8). IEEE.
go back to reference Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2006). Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In ECCV (pp. 1–15). Berlin: Springer. Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2006). Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In ECCV (pp. 1–15). Berlin: Springer.
go back to reference Sivic, J., & Zisserman, A. (2004, June). Video data mining using configurations of viewpoint invariant regions. In Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on (Vol. 1, pp. I–488). IEEE. Sivic, J., & Zisserman, A. (2004, June). Video data mining using configurations of viewpoint invariant regions. In Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on (Vol. 1, pp. I–488). IEEE.
go back to reference Song, S., & Xiao, J. (2014). Sliding Shapes for 3D object detection in RGB-D images. In European Conference on Computer Vision. Song, S., & Xiao, J. (2014). Sliding Shapes for 3D object detection in RGB-D images. In European Conference on Computer Vision.
go back to reference Spain, M., & Perona, P. (2008). Some objects are more equal than others: measuring and predicting importance. In: European Conference on Computer Vision. Spain, M., & Perona, P. (2008). Some objects are more equal than others: measuring and predicting importance. In: European Conference on Computer Vision.
go back to reference Torralba, A., Fergus, R., & Freeman, W. T. (2008). 80 million tiny images: A large data set for nonparametric object and scene recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 30(11), 1958–1970. Torralba, A., Fergus, R., & Freeman, W. T. (2008). 80 million tiny images: A large data set for nonparametric object and scene recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 30(11), 1958–1970.
go back to reference Torralba, A., Murphy, K. P., Freeman, W. T., & Rubin, M. A. (2003, October). Context-based vision system for place and object recognition. In Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on (pp. 273–280). IEEE. Torralba, A., Murphy, K. P., Freeman, W. T., & Rubin, M. A. (2003, October). Context-based vision system for place and object recognition. In Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on (pp. 273–280). IEEE.
go back to reference Tversky, B., & Hemenway, K. (1983). Categories of environmental scenes. Cognitive Psychology, 15(1), 121–149. Tversky, B., & Hemenway, K. (1983). Categories of environmental scenes. Cognitive Psychology, 15(1), 121–149.
go back to reference Vedaldi, A., & Fulkerson, B. (2010, October). An open and portable library of computer vision algorithms: VLFeat. In Proceedings of the international conference on Multimedia (pp. 1469–1472). ACM. Vedaldi, A., & Fulkerson, B. (2010, October). An open and portable library of computer vision algorithms: VLFeat. In Proceedings of the international conference on Multimedia (pp. 1469–1472). ACM.
go back to reference Vogel, J., & Schiele, B. (2004). A semantic typicality measure for natural scene categorization. In Pattern Recognition (pp. 195–203). Berlin: Springer. Vogel, J., & Schiele, B. (2004). A semantic typicality measure for natural scene categorization. In Pattern Recognition (pp. 195–203). Berlin: Springer.
go back to reference Vogel, J., & Schiele, B. (2007). Semantic modeling of natural scenes for content-based image retrieval. International Journal of Computer Vision, 72(2), 133–157. Vogel, J., & Schiele, B. (2007). Semantic modeling of natural scenes for content-based image retrieval. International Journal of Computer Vision, 72(2), 133–157.
go back to reference Xiao, J., Ehinger, K. A., Oliva, A., & Torralba, A. (2012, June). Recognizing scene viewpoint using panoramic place representation. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on (pp. 2695–2702). IEEE. Xiao, J., Ehinger, K. A., Oliva, A., & Torralba, A. (2012, June). Recognizing scene viewpoint using panoramic place representation. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on (pp. 2695–2702). IEEE.
go back to reference Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., & Torralba, A. (2010, June). Sun database: Large-scale scene recognition from abbey to zoo. In Computer vision and pattern recognition (CVPR), 2010 IEEE conference on (pp. 3485–3492). IEEE. Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., & Torralba, A. (2010, June). Sun database: Large-scale scene recognition from abbey to zoo. In Computer vision and pattern recognition (CVPR), 2010 IEEE conference on (pp. 3485–3492). IEEE.
go back to reference Xiao, J., Owens, A., & Torralba, A. (2013, December). SUN3D: A database of big spaces reconstructed using sfm and object labels. In Computer Vision (ICCV), 2013 IEEE International Conference on (pp. 1625–1632). IEEE. Xiao, J., Owens, A., & Torralba, A. (2013, December). SUN3D: A database of big spaces reconstructed using sfm and object labels. In Computer Vision (ICCV), 2013 IEEE International Conference on (pp. 1625–1632). IEEE.
go back to reference Zhang, Y., Song, S., Tan, P., & Xiao, J., et al. (2014). PanoContext: A whole-room 3D context model for panoramic scene understanding. In European Conference on Computer Vision. Zhang, Y., Song, S., Tan, P., & Xiao, J., et al. (2014). PanoContext: A whole-room 3D context model for panoramic scene understanding. In European Conference on Computer Vision.
Metadata
Title
SUN Database: Exploring a Large Collection of Scene Categories
Authors
Jianxiong Xiao
Krista A. Ehinger
James Hays
Antonio Torralba
Aude Oliva
Publication date
01-08-2016
Publisher
Springer US
Published in
International Journal of Computer Vision / Issue 1/2016
Print ISSN: 0920-5691
Electronic ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-014-0748-y

Other articles of this Issue 1/2016

International Journal of Computer Vision 1/2016 Go to the issue

Premium Partner