Skip to main content
Erschienen in: International Journal of Computer Vision 1/2015

01.01.2015

The Pascal Visual Object Classes Challenge: A Retrospective

verfasst von: Mark Everingham, S. M. Ali Eslami, Luc Van Gool, Christopher K. I. Williams, John Winn, Andrew Zisserman

Erschienen in: International Journal of Computer Vision | Ausgabe 1/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The Pascal Visual Object Classes (VOC) challenge consists of two components: (i) a publicly available dataset of images together with ground truth annotation and standardised evaluation software; and (ii) an annual competition and workshop. There are five challenges: classification, detection, segmentation, action classification, and person layout. In this paper we provide a review of the challenge from 2008–2012. The paper is intended for two audiences: algorithm designers, researchers who want to see what the state of the art is, as measured by performance on the VOC datasets, along with the limitations and weak points of the current generation of algorithms; and, challenge designers, who want to see what we as organisers have learnt from the process and our recommendations for the organisation of future challenges. To analyse the performance of submitted algorithms on the VOC datasets we introduce a number of novel evaluation methods: a bootstrapping method for determining whether differences in the performance of two algorithms are significant or not; a normalised average precision so that performance can be compared across classes with different proportions of positive instances; a clustering method for visualising the performance across multiple algorithms so that the hard and easy images can be identified; and the use of a joint classifier over the submitted algorithms in order to measure their complementarity and combined performance. We also analyse the community’s progress through time using the methods of Hoiem et al. (Proceedings of European Conference on Computer Vision, 2012) to identify the types of occurring errors. We conclude the paper with an appraisal of the aspects of the challenge that worked well, and those that could be improved in future challenges.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
Pascal stands for pattern analysis, statistical modelling and computational learning. It was an EU Network of Excellence funded project under the IST Programme of the European Union.
 
2
Matlab ® is a registered trademark of MathWorks, Inc.
 
Literatur
Zurück zum Zitat Alexe, B., Deselaers, T., & Ferrari, V. (2010). What is an object? In Proceedings of Conference on Computer Vision and Pattern Recognition (pp. 73–80). Alexe, B., Deselaers, T., & Ferrari, V. (2010). What is an object? In Proceedings of Conference on Computer Vision and Pattern Recognition (pp. 73–80).
Zurück zum Zitat Alexiou, I., & Bharath, A. (2012). Efficient Kernels couple visual words through categorical opponency. In Proceedings of British Machine Vision Conference. Alexiou, I., & Bharath, A. (2012). Efficient Kernels couple visual words through categorical opponency. In Proceedings of British Machine Vision Conference.
Zurück zum Zitat Bertail, P., Clémençon, S. J., & Vayatis, N. (2009). On bootstrapping the ROC curve. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), Advances in Neural Information Processing Systems (Vol. 21, pp. 137–144). Red Hook, NY: Curran Associates, Inc. Bertail, P., Clémençon, S. J., & Vayatis, N. (2009). On bootstrapping the ROC curve. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), Advances in Neural Information Processing Systems (Vol. 21, pp. 137–144). Red Hook, NY: Curran Associates, Inc.
Zurück zum Zitat Carreira, J., Caseiro, R., Batista, J., & Sminchisescu, C. (2012). Semantic segmentation with second-order pooling. In Proceedings of European Conference on Computer Vision. Carreira, J., Caseiro, R., Batista, J., & Sminchisescu, C. (2012). Semantic segmentation with second-order pooling. In Proceedings of European Conference on Computer Vision.
Zurück zum Zitat Chen, Q., Song, Z., Hua, Y., Huang, Z., & Yan, S. (2012). Generalized hierarchical matching for image classification. In Proceedings of Conference on Computer Vision and Pattern Recognition. Chen, Q., Song, Z., Hua, Y., Huang, Z., & Yan, S. (2012). Generalized hierarchical matching for image classification. In Proceedings of Conference on Computer Vision and Pattern Recognition.
Zurück zum Zitat Csurka, G., Dance, C., Fan, L., Williamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In Proceedings of ECCV2004 Workshop on Statistical Learning in Computer Vision (pp. 59–74). Csurka, G., Dance, C., Fan, L., Williamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In Proceedings of ECCV2004 Workshop on Statistical Learning in Computer Vision (pp. 59–74).
Zurück zum Zitat Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of Conference on Computer Vision and Pattern Recognition. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of Conference on Computer Vision and Pattern Recognition.
Zurück zum Zitat Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., & Darrell, T. (2013). Decaf: A deep convolutional activation feature for generic visual recognition. CoRR abs/1310.1531. Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., & Darrell, T. (2013). Decaf: A deep convolutional activation feature for generic visual recognition. CoRR abs/1310.1531.
Zurück zum Zitat Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 88, 303–338.CrossRef Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 88, 303–338.CrossRef
Zurück zum Zitat Farhadi, A., Endres, I., Hoiem, D., & Forsyth, D. (2009). Describing objects by their attributes. In Proceedings of Conference on Computer Vision and Pattern Recognition, IEEE (pp. 1778– 1785). Farhadi, A., Endres, I., Hoiem, D., & Forsyth, D. (2009). Describing objects by their attributes. In Proceedings of Conference on Computer Vision and Pattern Recognition, IEEE (pp. 1778– 1785).
Zurück zum Zitat Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.CrossRef Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.CrossRef
Zurück zum Zitat Girshick, R. B., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of Conference on Computer Vision and Pattern Recognition. Girshick, R. B., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of Conference on Computer Vision and Pattern Recognition.
Zurück zum Zitat Hall, P., Hyndman, R., & Fan, Y. (2004). Nonparametric confidence intervals for receiver operating characteristic curves. Biometrika, 91, 743–50.CrossRefMATHMathSciNet Hall, P., Hyndman, R., & Fan, Y. (2004). Nonparametric confidence intervals for receiver operating characteristic curves. Biometrika, 91, 743–50.CrossRefMATHMathSciNet
Zurück zum Zitat Hoiem, D., Chodpathumwan, Y., & Dai, Q. (2012). Diagnosing error in object detectors. In Proceedings of European Conference on Computer Vision. Hoiem, D., Chodpathumwan, Y., & Dai, Q. (2012). Diagnosing error in object detectors. In Proceedings of European Conference on Computer Vision.
Zurück zum Zitat Ion, A., Carreira, J., Sminchisescu, C. (2011a). Image segmentation by figure-ground composition into maximal cliques. In Proceedings of International Conference on Computer Vision. Ion, A., Carreira, J., Sminchisescu, C. (2011a). Image segmentation by figure-ground composition into maximal cliques. In Proceedings of International Conference on Computer Vision.
Zurück zum Zitat Ion, A., Carreira, J., & Sminchisescu, C. (2011b). Probabilistic joint image segmentation and labeling. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems (Vol. 24, pp. 1827–1835). Red Hook, NY: Curran Associates, Inc. Ion, A., Carreira, J., & Sminchisescu, C. (2011b). Probabilistic joint image segmentation and labeling. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems (Vol. 24, pp. 1827–1835). Red Hook, NY: Curran Associates, Inc.
Zurück zum Zitat Karaoglu, S., Van Gemert, J., & Gevers, T. (2012). Object reading: Text recognition for object recognition. In Proceedings of ECCV 2012 Workshops and Gemonstrations. Karaoglu, S., Van Gemert, J., & Gevers, T. (2012). Object reading: Text recognition for object recognition. In Proceedings of ECCV 2012 Workshops and Gemonstrations.
Zurück zum Zitat Khan, F., Anwer, R., Van de Weijer, J., Bagdanov, A., Vanrell, M., & Lopez, A. M. (2012a). Color attributes for object detection. In Proceedings of Conference on Computer Vision and Pattern Recognition. Khan, F., Anwer, R., Van de Weijer, J., Bagdanov, A., Vanrell, M., & Lopez, A. M. (2012a). Color attributes for object detection. In Proceedings of Conference on Computer Vision and Pattern Recognition.
Zurück zum Zitat Khan, F., Van de Weijer, J., & Vanrell, M. (2012b). Modulating shape features by color attention for object recognition. International Journal of Computer Vision, 98(1), 49–64.CrossRef Khan, F., Van de Weijer, J., & Vanrell, M. (2012b). Modulating shape features by color attention for object recognition. International Journal of Computer Vision, 98(1), 49–64.CrossRef
Zurück zum Zitat Khosla, A., Yao, B., & Fei-Fei, L. (2011). Combining randomization and discrimination for fine-grained image categorization. In Proceedings of Conference on Computer Vision and Pattern Recognition. Khosla, A., Yao, B., & Fei-Fei, L. (2011). Combining randomization and discrimination for fine-grained image categorization. In Proceedings of Conference on Computer Vision and Pattern Recognition.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems (Vol. 25, pp. 1106–1114). Red Hook, NY: Curran Associates, Inc. Krizhevsky, A., Sutskever, I., Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems (Vol. 25, pp. 1106–1114). Red Hook, NY: Curran Associates, Inc.
Zurück zum Zitat Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of Conference on Computer Vision and Pattern Recognition (pp 2169–2178). Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of Conference on Computer Vision and Pattern Recognition (pp 2169–2178).
Zurück zum Zitat Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In Proceedings of ECCV Workshop on Statistical Learning in Computer Vision. Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In Proceedings of ECCV Workshop on Statistical Learning in Computer Vision.
Zurück zum Zitat Li, F., Carreira, J., Lebanon, G., & Sminchisescu, C. (2013). Composite statistical inference for semantic segmentation. In Proceedings of Conference on Computer Vision and Pattern Recognition. Li, F., Carreira, J., Lebanon, G., & Sminchisescu, C. (2013). Composite statistical inference for semantic segmentation. In Proceedings of Conference on Computer Vision and Pattern Recognition.
Zurück zum Zitat Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91– 110.CrossRef Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91– 110.CrossRef
Zurück zum Zitat Nanni, L., & Lumini, A. (2013). Heterogeneous bag-of-features for object/scene recognition. Applied Soft Computing, 13(4), 2171–2178.CrossRef Nanni, L., & Lumini, A. (2013). Heterogeneous bag-of-features for object/scene recognition. Applied Soft Computing, 13(4), 2171–2178.CrossRef
Zurück zum Zitat Oquab, M., Bottou, L., Laptev, I., Sivic, J. (2014). Learning and transferring mid-level image representations using convolutional neural networks. In Proceedings of Conference on Computer Vision and Pattern Recognition. Oquab, M., Bottou, L., Laptev, I., Sivic, J. (2014). Learning and transferring mid-level image representations using convolutional neural networks. In Proceedings of Conference on Computer Vision and Pattern Recognition.
Zurück zum Zitat Russakovsky, O., Lin, Y., Yu, K., & Fei-Fei, L. (2012). Object-centric spatial pooling for image classification. In Proceedings of European Conference on Computer Vision. Russakovsky, O., Lin, Y., Yu, K., & Fei-Fei, L. (2012). Object-centric spatial pooling for image classification. In Proceedings of European Conference on Computer Vision.
Zurück zum Zitat Russell, B., Torralba, A., Murphy, K., & Freeman, W. T. (2008). LabelMe: A database and web-based tool for image annotation. International Journal of Computer Vision, 77(1–3), 157–173. http://labelme.csail.mit.edu/ Russell, B., Torralba, A., Murphy, K., & Freeman, W. T. (2008). LabelMe: A database and web-based tool for image annotation. International Journal of Computer Vision, 77(1–3), 157–173. http://​labelme.​csail.​mit.​edu/​
Zurück zum Zitat Salton, G., & Mcgill, M. J. (1986). Introduction to modern information retrieval. New York, NY: McGraw-Hill Inc. Salton, G., & Mcgill, M. J. (1986). Introduction to modern information retrieval. New York, NY: McGraw-Hill Inc.
Zurück zum Zitat Sener, F., Bas, C., Ikizler-Cinbis, N. (2012). On recognizing actions in still images via multiple features. In Proceedings of ECCV Workshop on Action Recognition and Pose Estimation in Still Images. Sener, F., Bas, C., Ikizler-Cinbis, N. (2012). On recognizing actions in still images via multiple features. In Proceedings of ECCV Workshop on Action Recognition and Pose Estimation in Still Images.
Zurück zum Zitat Song, Z., Chen, Q., Huang, Z., Hua, Y., & Yan, S. (2011). Contextualizing object detection and classification. In Proceedings of Conference on Computer Vision and Pattern Recognition. Song, Z., Chen, Q., Huang, Z., Hua, Y., & Yan, S. (2011). Contextualizing object detection and classification. In Proceedings of Conference on Computer Vision and Pattern Recognition.
Zurück zum Zitat Torralba, A., & Efros, A. A. (2011). Unbiased look at dataset bias. In Proceedings of Conference on Computer Vision and Pattern Recognition, IEEE (pp. 1521–1528). Torralba, A., & Efros, A. A. (2011). Unbiased look at dataset bias. In Proceedings of Conference on Computer Vision and Pattern Recognition, IEEE (pp. 1521–1528).
Zurück zum Zitat Uijlings, J., Van de Sande, K., Gevers, T., & Smeulders, A. (2013). Selective search for object recognition. International Journal of Computer Vision, 104(2), 154–171. Uijlings, J., Van de Sande, K., Gevers, T., & Smeulders, A. (2013). Selective search for object recognition. International Journal of Computer Vision, 104(2), 154–171.
Zurück zum Zitat Van de Sande, K., Uijlings, J., Gevers, T., & Smeulders, A. (2011). Segmentation as selective search for object recognition. In Proceedings of International Conference on Computer Vision. Van de Sande, K., Uijlings, J., Gevers, T., & Smeulders, A. (2011). Segmentation as selective search for object recognition. In Proceedings of International Conference on Computer Vision.
Zurück zum Zitat Van Gemert, J. (2011). Exploiting photographic style for category-level image classification by generalizing the spatial pyramid. In Proceedings of International Conference on Multimedia Retrieval. Van Gemert, J. (2011). Exploiting photographic style for category-level image classification by generalizing the spatial pyramid. In Proceedings of International Conference on Multimedia Retrieval.
Zurück zum Zitat Vedaldi, A., Gulshan, V., Varma, M., & Zisserman, A. (2009). Multiple kernels for object detection. In International Conference on Computer Vision. Vedaldi, A., Gulshan, V., Varma, M., & Zisserman, A. (2009). Multiple kernels for object detection. In International Conference on Computer Vision.
Zurück zum Zitat Viola, P., & Jones, M. (2004). Robust real-time object detection. International Journal of Computer Vision, 57(2), 137–154.CrossRef Viola, P., & Jones, M. (2004). Robust real-time object detection. International Journal of Computer Vision, 57(2), 137–154.CrossRef
Zurück zum Zitat Wang, X., Lin, L., Huang, L., & Yan, S. (2013). Incorporating structural alternatives and sharing into hierarchy for multiclass object recognition and detection. In Proceedings of Conference on Computer Vision and Pattern Recognition. Wang, X., Lin, L., Huang, L., & Yan, S. (2013). Incorporating structural alternatives and sharing into hierarchy for multiclass object recognition and detection. In Proceedings of Conference on Computer Vision and Pattern Recognition.
Zurück zum Zitat Xia, W., Song, Z., Feng, J., Cheong, L. F., & Yan, S. (2012). Segmentation over detection by coupled global and local sparse representations. In Proceedings of European Conference on Computer Vision. Xia, W., Song, Z., Feng, J., Cheong, L. F., & Yan, S. (2012). Segmentation over detection by coupled global and local sparse representations. In Proceedings of European Conference on Computer Vision.
Zurück zum Zitat Yang, J., Yu, K., Gong, Y., & Huang, T. (2009). Linear spatial pyramid matching using sparse coding for image classification. In Proceedings of Conference on Computer Vision and Pattern Recognition. Yang, J., Yu, K., Gong, Y., & Huang, T. (2009). Linear spatial pyramid matching using sparse coding for image classification. In Proceedings of Conference on Computer Vision and Pattern Recognition.
Zurück zum Zitat Zeiler, M. D., & Fergus, R. (2013). Visualizing and understanding convolutional networks. CoRR abs/1311.2901. Zeiler, M. D., & Fergus, R. (2013). Visualizing and understanding convolutional networks. CoRR abs/1311.2901.
Zurück zum Zitat Zhu, L., Chen, Y., Yuille, A., & Freeman, W. (2010). Latent hierarchical structural learning for object detection. In Proceedings of Conference on Computer Vision and Pattern Recognition. Zhu, L., Chen, Y., Yuille, A., & Freeman, W. (2010). Latent hierarchical structural learning for object detection. In Proceedings of Conference on Computer Vision and Pattern Recognition.
Zurück zum Zitat Zisserman, A., Winn, J., Fitzgibbon, A., Van Gool, L., Sivic, J., Williams, C., et al. (2012). In memoriam: Mark Everingham. Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2081–2082.CrossRef Zisserman, A., Winn, J., Fitzgibbon, A., Van Gool, L., Sivic, J., Williams, C., et al. (2012). In memoriam: Mark Everingham. Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2081–2082.CrossRef
Metadaten
Titel
The Pascal Visual Object Classes Challenge: A Retrospective
verfasst von
Mark Everingham
S. M. Ali Eslami
Luc Van Gool
Christopher K. I. Williams
John Winn
Andrew Zisserman
Publikationsdatum
01.01.2015
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 1/2015
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-014-0733-5

Weitere Artikel der Ausgabe 1/2015

International Journal of Computer Vision 1/2015 Zur Ausgabe