Skip to main content
Erschienen in: International Journal of Computer Vision 3/2015

01.12.2015

ImageNet Large Scale Visual Recognition Challenge

verfasst von: Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, Li Fei-Fei

Erschienen in: International Journal of Computer Vision | Ausgabe 3/2015

Einloggen

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the 5 years of the challenge, and propose future directions and improvements.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
In this paper, we will be using the term object recognition broadly to encompass both image classification (a task requiring an algorithm to determine what object classes are present in the image) as well as object detection (a task requiring an algorithm to localize all objects present in the image).
 
2
In 2010, the test annotations were later released publicly; since then the test annotation have been kept hidden.
 
3
In addition, ILSVRC in 2012 also included a taster fine-grained classification task, where algorithms would classify dog photographs into one of 120 dog breeds (Khosla et al. 2011). Fine-grained classification has evolved into its own Fine-Grained classification challenge in 2013 (Berg et al. 2013), which is outside the scope of this paper.
 
5
Some datasets such as PASCAL VOC (Everingham et al. 2010) and LabelMe (Russell et al. 2007) are able to provide more detailed annotations: for example, marking individual object instances as being truncated. We chose not to provide this level of detail in favor of annotating more images and more object instances.
 
6
Some of the training objects are actually annotated with more detailed classes: for example, one of the 200 object classes is the category “dog,” and some training instances are annotated with the specific dog breed.
 
7
The validation/test split is consistent with ILSVRC2012: validation images of ILSVRC2012 remained in the validation set of ILSVRC2013, and ILSVRC2012 test images remained in ILSVRC2013 test set.
 
8
In this paper we focus on the mean average precision across all categories as the measure of a team’s performance. This is done for simplicity and is justified since the ordering of teams by mean average precision was always the same as the ordering by object categories won.
 
9
Table 8 omits 4 teams which submitted results but chose not to officially participate in the challenge.
 
10
Personal communication with members of the UvA team.
 
11
For rigid versus deformable objects, the average scale in each bin is 34.1–34.2 % for classification and localization, and 13.5–13.7 % for detection. For texture, the average scale in each of the four bins is 31.1–31.3 % for classification and localization, and 12.7–12.8 % for detection.
 
12
Natural object detection classes are removed from this analysis because there are only 3 and 13 natural untextured and low-textured classes respectively, and none remain after scale normalization. All other bins contain at least 9 object classes after scale normalization.
 
Literatur
Zurück zum Zitat Ahonen, T., Hadid, A., & Pietikinen, M. (2006). Face description with local binary patterns: Application to face recognition. Pattern Analysis and Machine Intelligence, 28(14), 2037–2041.CrossRef Ahonen, T., Hadid, A., & Pietikinen, M. (2006). Face description with local binary patterns: Application to face recognition. Pattern Analysis and Machine Intelligence, 28(14), 2037–2041.CrossRef
Zurück zum Zitat Alexe, B., Deselares, T., & Ferrari, V. (2012). Measuring the objectness of image windows. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2189–2202.CrossRef Alexe, B., Deselares, T., & Ferrari, V. (2012). Measuring the objectness of image windows. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2189–2202.CrossRef
Zurück zum Zitat Arandjelovic, R., & Zisserman, A. (2012). Three things everyone should know to improve object retrieval. In CVPR. Arandjelovic, R., & Zisserman, A. (2012). Three things everyone should know to improve object retrieval. In CVPR.
Zurück zum Zitat Arbeláez, P., Pont-Tuset, J., Barron, J., Marques, F., & Malik, J. (2014). Multiscale combinatorial grouping. In Computer vision and pattern recognition. Arbeláez, P., Pont-Tuset, J., Barron, J., Marques, F., & Malik, J. (2014). Multiscale combinatorial grouping. In Computer vision and pattern recognition.
Zurück zum Zitat Arbelaez, P., Maire, M., Fowlkes, C., & Malik, J. (2011). Contour detection and hierarchical image segmentation. IEEE Transaction on Pattern Analysis and Machine Intelligence, 33, 898–916.CrossRef Arbelaez, P., Maire, M., Fowlkes, C., & Malik, J. (2011). Contour detection and hierarchical image segmentation. IEEE Transaction on Pattern Analysis and Machine Intelligence, 33, 898–916.CrossRef
Zurück zum Zitat Batra, D., Agrawal, H., Banik, P., Chavali, N., Mathialagan, C. S., & Alfadda, A. (2013). Cloudcv: Large-scale distributed computer vision as a cloud service. Batra, D., Agrawal, H., Banik, P., Chavali, N., Mathialagan, C. S., & Alfadda, A. (2013). Cloudcv: Large-scale distributed computer vision as a cloud service.
Zurück zum Zitat Bell, S., Upchurch, P., Snavely, N., & Bala, K. (2013). OpenSurfaces: A richly annotated catalog of surface appearance. In ACM transactions on graphics (SIGGRAPH). Bell, S., Upchurch, P., Snavely, N., & Bala, K. (2013). OpenSurfaces: A richly annotated catalog of surface appearance. In ACM transactions on graphics (SIGGRAPH).
Zurück zum Zitat Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. CoRR, abs/1405.3531. Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. CoRR, abs/1405.3531.
Zurück zum Zitat Chen, Q., Song, Z., Huang, Z., Hua, Y., & Yan, S. (2014). Contextualizing object detection and classification. In CVPR. Chen, Q., Song, Z., Huang, Z., Hua, Y., & Yan, S. (2014). Contextualizing object detection and classification. In CVPR.
Zurück zum Zitat Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y. (2006). Online passive-aggressive algorithms. Journal of Machine Learning Research, 7, 551–585.MATHMathSciNet Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y. (2006). Online passive-aggressive algorithms. Journal of Machine Learning Research, 7, 551–585.MATHMathSciNet
Zurück zum Zitat Dean, T., Ruzon, M., Segal, M., Shlens, J., Vijayanarasimhan, S., & Yagnik, J. (2013). Fast, accurate detection of 100,000 object classes on a single machine. In CVPR. Dean, T., Ruzon, M., Segal, M., Shlens, J., Vijayanarasimhan, S., & Yagnik, J. (2013). Fast, accurate detection of 100,000 object classes on a single machine. In CVPR.
Zurück zum Zitat Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In CVPR. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In CVPR.
Zurück zum Zitat Deng, J., Russakovsky, O., Krause, J., Bernstein, M., Berg, A. C., & Fei-Fei, L. (2014). Scalable multi-label annotation. In CHI. Deng, J., Russakovsky, O., Krause, J., Bernstein, M., Berg, A. C., & Fei-Fei, L. (2014). Scalable multi-label annotation. In CHI.
Zurück zum Zitat Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., & Darrell, T. (2013). Decaf: A deep convolutional activation feature for generic visual recognition. CoRR, abs/1310.1531. Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., & Darrell, T. (2013). Decaf: A deep convolutional activation feature for generic visual recognition. CoRR, abs/1310.1531.
Zurück zum Zitat Dubout, C., & Fleuret, F. (2012). Exact acceleration of linear object detectors. In Proceedings of the European conference on computer vision (ECCV). Dubout, C., & Fleuret, F. (2012). Exact acceleration of linear object detectors. In Proceedings of the European conference on computer vision (ECCV).
Zurück zum Zitat Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The Pascal Visual Object Classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.CrossRef Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The Pascal Visual Object Classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.CrossRef
Zurück zum Zitat Everingham, M., Eslami, S. M. A., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2014). The Pascal Visual Object Classes (VOC) challenge—A retrospective. International Journal of Computer Vision, 111, 98–136.CrossRef Everingham, M., Eslami, S. M. A., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2014). The Pascal Visual Object Classes (VOC) challenge—A retrospective. International Journal of Computer Vision, 111, 98–136.CrossRef
Zurück zum Zitat Fei-Fei, L., & Perona, P. (2005). A Bayesian hierarchical model for learning natural scene categories. In CVPR. Fei-Fei, L., & Perona, P. (2005). A Bayesian hierarchical model for learning natural scene categories. In CVPR.
Zurück zum Zitat Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few examples: An incremental bayesian approach tested on 101 object categories. In CVPR. Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few examples: An incremental bayesian approach tested on 101 object categories. In CVPR.
Zurück zum Zitat Felzenszwalb, P., Girshick, R., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.CrossRef Felzenszwalb, P., Girshick, R., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.CrossRef
Zurück zum Zitat Frome, A., Corrado, G., Shlens, J., Bengio, S., Dean, J., Ranzato, M., & Mikolov, T. (2013). Devise: A deep visual-semantic embedding model. In Advances in neural information processing systems, NIPS. Frome, A., Corrado, G., Shlens, J., Bengio, S., Dean, J., Ranzato, M., & Mikolov, T. (2013). Devise: A deep visual-semantic embedding model. In Advances in neural information processing systems, NIPS.
Zurück zum Zitat Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The kitti dataset. International Journal of Robotics Research, 32, 1231–1237.CrossRef Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The kitti dataset. International Journal of Robotics Research, 32, 1231–1237.CrossRef
Zurück zum Zitat Girshick, R. B., Donahue, J., Darrell, T., & Malik, J. (2013). Rich feature hierarchies for accurate object detection and semantic segmentation (v4). CoRR. Girshick, R. B., Donahue, J., Darrell, T., & Malik, J. (2013). Rich feature hierarchies for accurate object detection and semantic segmentation (v4). CoRR.
Zurück zum Zitat Girshick, R., Donahue, J., Darrell, T., & Malik., J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR. Girshick, R., Donahue, J., Darrell, T., & Malik., J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR.
Zurück zum Zitat Gould, S., Fulton, R., & Koller, D. (2009). Decomposing a scene into geometric and semantically consistent regions. In ICCV. Gould, S., Fulton, R., & Koller, D. (2009). Decomposing a scene into geometric and semantically consistent regions. In ICCV.
Zurück zum Zitat Graham, B. (2013). Sparse arrays of signatures for online character recognition. CoRR. Graham, B. (2013). Sparse arrays of signatures for online character recognition. CoRR.
Zurück zum Zitat Griffin, G., Holub, A., & Perona, P. (2007). Caltech-256 object category dataset. Technical report 7694, Caltech. Griffin, G., Holub, A., & Perona, P. (2007). Caltech-256 object category dataset. Technical report 7694, Caltech.
Zurück zum Zitat Harada, T., & Kuniyoshi, Y. (2012). Graphical Gaussian vector for image categorization. In NIPS. Harada, T., & Kuniyoshi, Y. (2012). Graphical Gaussian vector for image categorization. In NIPS.
Zurück zum Zitat Harel, J., Koch, C., & Perona, P. (2007). Graph-based visual saliency. In NIPS. Harel, J., Koch, C., & Perona, P. (2007). Graph-based visual saliency. In NIPS.
Zurück zum Zitat He, K., Zhang, X., Ren, S., & Su, J. (2014). Spatial pyramid pooling in deep convolutional networks for visual recognition. In ECCV. He, K., Zhang, X., Ren, S., & Su, J. (2014). Spatial pyramid pooling in deep convolutional networks for visual recognition. In ECCV.
Zurück zum Zitat Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. CoRR, abs/1207.0580. Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. CoRR, abs/1207.0580.
Zurück zum Zitat Hoiem, D., Chodpathumwan, Y., & Dai, Q. (2012). Diagnosing error in object detectors. In ECCV. Hoiem, D., Chodpathumwan, Y., & Dai, Q. (2012). Diagnosing error in object detectors. In ECCV.
Zurück zum Zitat Howard, A. (2014). Some improvements on deep convolutional neural network based image classification. In ICLR. Howard, A. (2014). Some improvements on deep convolutional neural network based image classification. In ICLR.
Zurück zum Zitat Huang, G. B., Ramesh, M., Berg, T., & Learned-Miller, E. (2007). Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical report 07–49, University of Massachusetts, Amherst. Huang, G. B., Ramesh, M., Berg, T., & Learned-Miller, E. (2007). Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical report 07–49, University of Massachusetts, Amherst.
Zurück zum Zitat Iandola, F. N., Moskewicz, M. W., Karayev, S., Girshick, R. B., Darrell, T., & Keutzer, K. (2014). Densenet: Implementing efficient convnet descriptor pyramids. CoRR. Iandola, F. N., Moskewicz, M. W., Karayev, S., Girshick, R. B., Darrell, T., & Keutzer, K. (2014). Densenet: Implementing efficient convnet descriptor pyramids. CoRR.
Zurück zum Zitat Jojic, N., Frey, B. J., & Kannan, A. (2003). Epitomic analysis of appearance and shape. In ICCV. Jojic, N., Frey, B. J., & Kannan, A. (2003). Epitomic analysis of appearance and shape. In ICCV.
Zurück zum Zitat Kanezaki, A., Inaba, S., Ushiku, Y., Yamashita, Y., Muraoka, H., Kuniyoshi, Y., & Harada, T. (2014). Hard negative classes for multiple object detection. In ICRA. Kanezaki, A., Inaba, S., Ushiku, Y., Yamashita, Y., Muraoka, H., Kuniyoshi, Y., & Harada, T. (2014). Hard negative classes for multiple object detection. In ICRA.
Zurück zum Zitat Khosla, A., Jayadevaprakash, N., Yao, B., & Fei-Fei, L. (2011). Novel dataset for fine-grained image categorization. In First workshop on fine-grained visual categorization, CVPR. Khosla, A., Jayadevaprakash, N., Yao, B., & Fei-Fei, L. (2011). Novel dataset for fine-grained image categorization. In First workshop on fine-grained visual categorization, CVPR.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). ImageNet classification with deep convolutional neural networks. In NIPS. Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). ImageNet classification with deep convolutional neural networks. In NIPS.
Zurück zum Zitat Kuettel, D., Guillaumin, M., & Ferrari, V. (2012). Segmentation propagation in ImageNet. In ECCV. Kuettel, D., Guillaumin, M., & Ferrari, V. (2012). Segmentation propagation in ImageNet. In ECCV.
Zurück zum Zitat Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR. Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR.
Zurück zum Zitat Lin, M., Chen, Q., & Yan, S. (2014a). Network in network. In ICLR. Lin, M., Chen, Q., & Yan, S. (2014a). Network in network. In ICLR.
Zurück zum Zitat Lin, Y., Lv, F., Cao, L., Zhu, S., Yang, M., Cour, T., Yu, K., & Huang, T. (2011). Large-scale image classification: Fast feature extraction and SVM training. In CVPR. Lin, Y., Lv, F., Cao, L., Zhu, S., Yang, M., Cour, T., Yu, K., & Huang, T. (2011). Large-scale image classification: Fast feature extraction and SVM training. In CVPR.
Zurück zum Zitat Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollr, P., & Zitnick, C. L. (2014b). Microsoft COCO: Common objects in context. In ECCV. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollr, P., & Zitnick, C. L. (2014b). Microsoft COCO: Common objects in context. In ECCV.
Zurück zum Zitat Liu, C., Yuen, J., & Torralba, A. (2011). Nonparametric scene parsing via label transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 2368–2382.CrossRef Liu, C., Yuen, J., & Torralba, A. (2011). Nonparametric scene parsing via label transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 2368–2382.CrossRef
Zurück zum Zitat Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.CrossRef Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.CrossRef
Zurück zum Zitat Maji, S., & Malik, J. (2009). Object detection using a max-margin hough transform. In CVPR. Maji, S., & Malik, J. (2009). Object detection using a max-margin hough transform. In CVPR.
Zurück zum Zitat Manen, S., Guillaumin, M., & Van Gool, L. (2013). Prime object proposals with randomized Prim’s algorithm. In ICCV. Manen, S., Guillaumin, M., & Van Gool, L. (2013). Prime object proposals with randomized Prim’s algorithm. In ICCV.
Zurück zum Zitat Mensink, T., Verbeek, J., Perronnin, F., & Csurka, G. (2012). Metric learning for large scale image classification: Generalizing to new classes at near-zero cost. In ECCV. Mensink, T., Verbeek, J., Perronnin, F., & Csurka, G. (2012). Metric learning for large scale image classification: Generalizing to new classes at near-zero cost. In ECCV.
Zurück zum Zitat Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In ICLR. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In ICLR.
Zurück zum Zitat Miller, G. A. (1995). Wordnet: A lexical database for English. Commun. ACM, 38(11), 39–41.CrossRef Miller, G. A. (1995). Wordnet: A lexical database for English. Commun. ACM, 38(11), 39–41.CrossRef
Zurück zum Zitat Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. In IJCV. Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. In IJCV.
Zurück zum Zitat Ordonez, V., Deng, J., Choi, Y., Berg, A. C., & Berg, T. L. (2013). From large scale image categorization to entry-level categories. In IEEE international conference on computer vision (ICCV). Ordonez, V., Deng, J., Choi, Y., Berg, A. C., & Berg, T. L. (2013). From large scale image categorization to entry-level categories. In IEEE international conference on computer vision (ICCV).
Zurück zum Zitat Ouyang, W., & Wang, X. (2013). Joint deep learning for pedestrian detection. In ICCV. Ouyang, W., & Wang, X. (2013). Joint deep learning for pedestrian detection. In ICCV.
Zurück zum Zitat Ouyang, W., Luo, P., Zeng, X., Qiu, S., Tian, Y., Li, H., Yang, S., Wang, Z., Xiong, Y., Qian, C., Zhu, Z., Wang, R., Loy, C. C., Wang, X., & Tang, X. (2014). Deepid-net: multi-stage and deformable deep convolutional neural networks for object detection. CoRR, abs/1409.3505. Ouyang, W., Luo, P., Zeng, X., Qiu, S., Tian, Y., Li, H., Yang, S., Wang, Z., Xiong, Y., Qian, C., Zhu, Z., Wang, R., Loy, C. C., Wang, X., & Tang, X. (2014). Deepid-net: multi-stage and deformable deep convolutional neural networks for object detection. CoRR, abs/1409.3505.
Zurück zum Zitat Papandreou, G. (2014). Deep epitomic convolutional neural networks. CoRR. Papandreou, G. (2014). Deep epitomic convolutional neural networks. CoRR.
Zurück zum Zitat Papandreou, G., Chen, L.-C., & Yuille, A. L. (2014). Modeling image patches with a generic dictionary of mini-epitomes. Papandreou, G., Chen, L.-C., & Yuille, A. L. (2014). Modeling image patches with a generic dictionary of mini-epitomes.
Zurück zum Zitat Perronnin, F., & Dance, C. R. (2007). Fisher kernels on visual vocabularies for image categorization. In CVPR. Perronnin, F., & Dance, C. R. (2007). Fisher kernels on visual vocabularies for image categorization. In CVPR.
Zurück zum Zitat Perronnin, F., Akata, Z., Harchaoui, Z., & Schmid, C. (2012). Towards good practice in large-scale learning for image classification. In CVPR. Perronnin, F., Akata, Z., Harchaoui, Z., & Schmid, C. (2012). Towards good practice in large-scale learning for image classification. In CVPR.
Zurück zum Zitat Perronnin, F., Sánchez, J., & Mensink, T. (2010). Improving the fisher kernel for large-scale image classification. In ECCV (4). Perronnin, F., Sánchez, J., & Mensink, T. (2010). Improving the fisher kernel for large-scale image classification. In ECCV (4).
Zurück zum Zitat Russakovsky, O., Deng, J., Huang, Z., Berg, A., & Fei-Fei, L. (2013). Detecting avocados to zucchinis: What have we done, & where are we going? In ICCV. Russakovsky, O., Deng, J., Huang, Z., Berg, A., & Fei-Fei, L. (2013). Detecting avocados to zucchinis: What have we done, & where are we going? In ICCV.
Zurück zum Zitat Russell, B., Torralba, A., Murphy, K., & Freeman, W. T. (2007). LabelMe: A database and web-based tool for image annotation. In IJCV. Russell, B., Torralba, A., Murphy, K., & Freeman, W. T. (2007). LabelMe: A database and web-based tool for image annotation. In IJCV.
Zurück zum Zitat Sanchez, J., & Perronnin, F. (2011). High-dim. signature compression for large-scale image classification. In CVPR. Sanchez, J., & Perronnin, F. (2011). High-dim. signature compression for large-scale image classification. In CVPR.
Zurück zum Zitat Sanchez, J., Perronnin, F., & de Campos, T. (2012). Modeling spatial layout of images beyond spatial pyramids. In PRL. Sanchez, J., Perronnin, F., & de Campos, T. (2012). Modeling spatial layout of images beyond spatial pyramids. In PRL.
Zurück zum Zitat Scheirer, W., Kumar, N., Belhumeur, P. N., & Boult, T. E. (2012). Multi-attribute spaces: Calibration for attribute fusion and similarity search. In CVPR. Scheirer, W., Kumar, N., Belhumeur, P. N., & Boult, T. E. (2012). Multi-attribute spaces: Calibration for attribute fusion and similarity search. In CVPR.
Zurück zum Zitat Schmidhuber, J. (2012). Multi-column deep neural networks for image classification. In CVPR. Schmidhuber, J. (2012). Multi-column deep neural networks for image classification. In CVPR.
Zurück zum Zitat Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2013). Overfeat: Integrated recognition, localization and detection using convolutional networks. CoRR, abs/1312.6229. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2013). Overfeat: Integrated recognition, localization and detection using convolutional networks. CoRR, abs/1312.6229.
Zurück zum Zitat Sheng, V. S., Provost, F., & Ipeirotis, P. G. (2008). Get another label? Improving data quality and data mining using multiple, noisy labelers. In SIGKDD. Sheng, V. S., Provost, F., & Ipeirotis, P. G. (2008). Get another label? Improving data quality and data mining using multiple, noisy labelers. In SIGKDD.
Zurück zum Zitat Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556.
Zurück zum Zitat Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep fisher networks for large-scale image classification. In NIPS. Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep fisher networks for large-scale image classification. In NIPS.
Zurück zum Zitat Sorokin, A., & Forsyth, D. (2008). Utility data annotation with Amazon Mechanical Turk. In InterNet08. Sorokin, A., & Forsyth, D. (2008). Utility data annotation with Amazon Mechanical Turk. In InterNet08.
Zurück zum Zitat Su, H., Deng, J., & Fei-Fei, L. (2012). Crowdsourcing annotations for visual object detection. In AAAI human computation workshop. Su, H., Deng, J., & Fei-Fei, L. (2012). Crowdsourcing annotations for visual object detection. In AAAI human computation workshop.
Zurück zum Zitat Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., & Rabinovich, A. (2014). Going deeper with convolutions. Technical report. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., & Rabinovich, A. (2014). Going deeper with convolutions. Technical report.
Zurück zum Zitat Tang, Y. (2013). Deep learning using support vector machines. CoRR, abs/1306.0239. Tang, Y. (2013). Deep learning using support vector machines. CoRR, abs/1306.0239.
Zurück zum Zitat Thorpe, S., Fize, D., Marlot, C., et al. (1996). Speed of processing in the human visual system. Nature, 381(6582), 520–522.CrossRef Thorpe, S., Fize, D., Marlot, C., et al. (1996). Speed of processing in the human visual system. Nature, 381(6582), 520–522.CrossRef
Zurück zum Zitat Torralba, A., & Efros, A. A. (2011). Unbiased look at dataset bias. In CVPR’11. Torralba, A., & Efros, A. A. (2011). Unbiased look at dataset bias. In CVPR’11.
Zurück zum Zitat Torralba, A., Fergus, R., & Freeman, W. (2008). 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 1958–1970.CrossRef Torralba, A., Fergus, R., & Freeman, W. (2008). 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 1958–1970.CrossRef
Zurück zum Zitat Uijlings, J., van de Sande, K., Gevers, T., & Smeulders, A. (2013). Selective search for object recognition. International Journal of Computer Vision, 104, 154–171.CrossRef Uijlings, J., van de Sande, K., Gevers, T., & Smeulders, A. (2013). Selective search for object recognition. International Journal of Computer Vision, 104, 154–171.CrossRef
Zurück zum Zitat van de Sande, K. E. A., Snoek, C. G. M., & Smeulders, A. W. M. (2014). Fisher and vlad with flair. In Proceedings of the IEEE conference on computer vision and pattern recognition. van de Sande, K. E. A., Snoek, C. G. M., & Smeulders, A. W. M. (2014). Fisher and vlad with flair. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat van de Sande, K. E. A., Uijlings, J. R. R., Gevers, T., & Smeulders, A. W. M. (2011b). Segmentation as selective search for object recognition. In ICCV. van de Sande, K. E. A., Uijlings, J. R. R., Gevers, T., & Smeulders, A. W. M. (2011b). Segmentation as selective search for object recognition. In ICCV.
Zurück zum Zitat van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2010). Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1582–1596.CrossRef van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2010). Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1582–1596.CrossRef
Zurück zum Zitat van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2011a). Empowering visual categorization with the GPU. IEEE Transactions on Multimedia, 13(1), 60–70.CrossRef van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2011a). Empowering visual categorization with the GPU. IEEE Transactions on Multimedia, 13(1), 60–70.CrossRef
Zurück zum Zitat Vittayakorn, S., & Hays, J. (2011). Quality assessment for crowdsourced object annotations. In BMVC. Vittayakorn, S., & Hays, J. (2011). Quality assessment for crowdsourced object annotations. In BMVC.
Zurück zum Zitat von Ahn, L., & Dabbish, L. (2005). Esp: Labeling images with a computer game. In AAAI spring symposium: Knowledge collection from volunteer contributors. von Ahn, L., & Dabbish, L. (2005). Esp: Labeling images with a computer game. In AAAI spring symposium: Knowledge collection from volunteer contributors.
Zurück zum Zitat Vondrick, C., Patterson, D., & Ramanan, D. (2012). Efficiently scaling up crowdsourced video annotation. International Journal of Computer Vision, 1010, 184–204. Vondrick, C., Patterson, D., & Ramanan, D. (2012). Efficiently scaling up crowdsourced video annotation. International Journal of Computer Vision, 1010, 184–204.
Zurück zum Zitat Wan, L., Zeiler, M., Zhang, S., LeCun, Y., & Fergus, R. (2013). Regularization of neural networks using dropconnect. In Proceedings of the international conference on machine learning (ICML’13). Wan, L., Zeiler, M., Zhang, S., LeCun, Y., & Fergus, R. (2013). Regularization of neural networks using dropconnect. In Proceedings of the international conference on machine learning (ICML’13).
Zurück zum Zitat Wang, M., Xiao, T., Li, J., Hong, C., Zhang, J., & Zhang, Z. (2014). Minerva: A scalable and highly efficient training platform for deep learning. In APSys. Wang, M., Xiao, T., Li, J., Hong, C., Zhang, J., & Zhang, Z. (2014). Minerva: A scalable and highly efficient training platform for deep learning. In APSys.
Zurück zum Zitat Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In CVPR. Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In CVPR.
Zurück zum Zitat Wang, X., Yang, M., Zhu, S., & Lin, Y. (2013). Regionlets for generic object detection. In ICCV. Wang, X., Yang, M., Zhu, S., & Lin, Y. (2013). Regionlets for generic object detection. In ICCV.
Zurück zum Zitat Welinder, P., Branson, S., Belongie, S., & Perona, P. (2010). The multidimensional wisdom of crowds. In NIPS. Welinder, P., Branson, S., Belongie, S., & Perona, P. (2010). The multidimensional wisdom of crowds. In NIPS.
Zurück zum Zitat Xiao, J., Hays, J., Ehinger, K., Oliva, A., & Torralba., A. (2010). SUN database: Large-scale scene recognition from Abbey to Zoo. In CVPR. Xiao, J., Hays, J., Ehinger, K., Oliva, A., & Torralba., A. (2010). SUN database: Large-scale scene recognition from Abbey to Zoo. In CVPR.
Zurück zum Zitat Yang, J., Yu, K., Gong, Y., & Huang, T. (2009). Linear spatial pyramid matching using sparse coding for image classification. In CVPR. Yang, J., Yu, K., Gong, Y., & Huang, T. (2009). Linear spatial pyramid matching using sparse coding for image classification. In CVPR.
Zurück zum Zitat Yao, B., Yang, X., & Zhu, S.-C. (2007). Introduction to a large scale general purpose ground truth dataset: methodology, annotation tool, and benchmarks. Berlin: Springer. Yao, B., Yang, X., & Zhu, S.-C. (2007). Introduction to a large scale general purpose ground truth dataset: methodology, annotation tool, and benchmarks. Berlin: Springer.
Zurück zum Zitat Zeiler, M. D., & Fergus, R. (2013). Visualizing and understanding convolutional networks. CoRR, abs/1311.2901. Zeiler, M. D., & Fergus, R. (2013). Visualizing and understanding convolutional networks. CoRR, abs/1311.2901.
Zurück zum Zitat Zeiler, M. D., Taylor, G. W., & Fergus, R. (2011). Adaptive deconvolutional networks for mid and high level feature learning. In ICCV. Zeiler, M. D., Taylor, G. W., & Fergus, R. (2011). Adaptive deconvolutional networks for mid and high level feature learning. In ICCV.
Zurück zum Zitat Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. In NIPS. Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. In NIPS.
Zurück zum Zitat Zhou, X., Yu, K., Zhang, T., & Huang, T. (2010). Image classification using super-vector coding of local image descriptors. In ECCV. Zhou, X., Yu, K., Zhang, T., & Huang, T. (2010). Image classification using super-vector coding of local image descriptors. In ECCV.
Metadaten
Titel
ImageNet Large Scale Visual Recognition Challenge
verfasst von
Olga Russakovsky
Jia Deng
Hao Su
Jonathan Krause
Sanjeev Satheesh
Sean Ma
Zhiheng Huang
Andrej Karpathy
Aditya Khosla
Michael Bernstein
Alexander C. Berg
Li Fei-Fei
Publikationsdatum
01.12.2015
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 3/2015
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-015-0816-y

Weitere Artikel der Ausgabe 3/2015

International Journal of Computer Vision 3/2015 Zur Ausgabe