nach oben

International Journal of Computer Vision

Erschienen in:

01.08.2016

Do We Need More Training Data?

verfasst von: Xiangxin Zhu, Carl Vondrick, Charless C. Fowlkes, Deva Ramanan

Erschienen in: International Journal of Computer Vision | Ausgabe 1/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Datasets for training object recognition systems are steadily increasing in size. This paper investigates the question of whether existing detectors will continue to improve as data grows, or saturate in performance due to limited model complexity and the Bayes risk associated with the feature spaces in which they operate. We focus on the popular paradigm of discriminatively trained templates defined on oriented gradient features. We investigate the performance of mixtures of templates as the number of mixture components and the amount of training data grows. Surprisingly, even with proper treatment of regularization and “outliers”, the performance of classic mixture models appears to saturate quickly (\({\sim }10\) templates and \({\sim }100\) positive training examples per template). This is not a limitation of the feature space as compositional mixtures that share template parameters via parts and that can synthesize new templates not encountered during training yield significantly better performance. Based on our analysis, we conjecture that the greatest gains in detection performance will continue to derive from improved representations and learning algorithms that can make efficient use of large datasets.

Vorheriger Artikel Sparse Output Coding for Scalable Visual Recognition

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

The dataset can be downloaded from http://vision.ics.uci.edu/datasets/.

Beis, J.S., & Lowe, D.G. (1997). Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE Computer Society Conference on IEEE (pp. 1000–1006).

Boiman, O., Shechtman, E., & Irani, M. (2008). In defense of nearest-neighbor based image classification. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on IEEE (pp. 1–8).

Bosch, A., Zisserman, A., & Muoz, X. (2007). Image classification using random forests and ferns. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on IEEE (pp. 1–8).

Bourdev, L., & Malik, J. (2009). Poselets: Body part detectors trained using 3D human pose annotations. In International Conference on Computer Vision.

Chang, C., & Lin, C. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(27), 1–27:27, software http://www.csie.ntu.edu.tw/~cjlin/libsvm.

Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR 2005.

Deng, J., Berg, A., Li, K., & Fei-Fei, L. (2010). What does classifying more than 10,000 image categories tell us?. In International Conference on Computer Vision.

Divvala, S.K., Efros, A.A., & Hebert, M. (2012). How important are deformable parts in the deformable parts model? In European Conference on Computer Vision (ECCV), Parts and Attributes Workshop.

Everingham, M., Van Gool, L., Williams, C., Winn, J., & Zisserman, A. (2010). The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.CrossRef

Felzenszwalb, P., & Huttenlocher, D. (2012). Distance transforms of sampled functions. Theory of Computing, 8(1), 415–428.MathSciNetCrossRefMATH

Felzenszwalb, P., Girshick, R., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE TPAMI, 32(9), 1627–1645.CrossRef

Gross, R., Matthews, I., Cohn, J., Kanade, T., & Baker, S. (2010). Multi-pie. Image and Vision Computing, 28(5), 807–813.CrossRef

Halevy, A., Norvig, P., & Pereira, F. (2009). The unreasonable effectiveness of data. IEEE Intelligent Systems, 24(2), 8–12.CrossRef

Hays, J., & Efros, A. (2007). Scene completion using millions of photographs. ACM Transactions on Graphics (TOG), 26, 4.CrossRef

Hays, J., & Efros, A.A. (2008). Im2gps: Estimating geographic information from a single image. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on IEEE (pp. 1–8).

Hoiem, D., Chodpathumwan. Y., & Dai, Q. (2012). Diagnosing error in object detectors. In Computer Vision ECCV 2012 (Vol. 7574, pp. 340–353). Berlin: Springer.

Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1106–1114.

Liu, C., Yuen, J., & Torralba, A. (2011). Nonparametric scene parsing via label transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(12), 2368–2382.CrossRef

Malisiewicz, T., Gupta, A., & Efros, A. (2011). Ensemble of exemplar-svms for object detection and beyond. In IEEE, International Conference on Computer Vision (pp. 89–96).

McAllester, D. A. (1999). Some pac-bayesian theorems. Machine Learning, 37(3), 355–363.CrossRefMATH

Muja, M., & Lowe, D.G. (2009). Fast approximate nearest neighbors with automatic algorithm configuration. In International Conference on Computer Vision Theory and Applications (VISSAPP09) (pp. 331–340).

Parikh, D., & Zitnick, C. (2011). Finding the weakest link in person detectors. In Computer Vision and Pattern Recognition IEEE (pp. 1425–1432).

Platt, J. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in Large Margin Classifiers (pp. 61–74), MIT Press.

Shakhnarovich, G., Darrell, T., & Indyk, P. (2005). Nearest-neighbor methods in learning and vision: Theory and practice. Cambridge: MIT press.

Shakhnarovich, G., Viola, P., & Darrell, T. (2003). Fast pose estimation with parameter-sensitive hashing. In Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on IEEE (pp. 750–757).

Tighe, J., & Lazebnik, S. (2010). Superparsing: Scalable nonparametric image parsing with superpixels. In Computer Vision-ECCV 2010 (pp. 352–365). Springer.

Torralba, A., & Efros, A. (2011). Unbiased look at dataset bias. In Computer Vision and Pattern Recognition IEEE (pp. 1521–1528).

Torralba, A., Fergus, R., & Freeman, W. T. (2008). 80 Million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11), 1958–1970.CrossRef

Tuytelaars, T., & Mikolajczyk, K. (2008). Local invariant feature detectors: A survey. Foundations and Trends in Computer Graphics and Vision, 3(3), 177–280.CrossRef

Vedaldi, A., Gulshan, V., Varma, M., & Zisserman, A. (2009). Multiple kernels for object detection. In Computer Vision, 2009 IEEE 12th International Conference on IEEE (pp. 606–613).

Wu, Y., & Liu, Y. (2007). Robust truncated hinge loss support vector machines. Journal of the American Statistical Association, 102(479), 974–983.MathSciNetCrossRefMATH

Zhang, H., Berg, A. C., Maire, M., & Malik, J. (2006). Svm-knn: Discriminative nearest neighbor classification for visual category recognition. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on IEEE (Vol. 2, pp. 2126–2136).

Zhu, X., & Ramanan, D. (2012). Face detection, pose estimation, and landmark localization in the wild. In Computer Vision and Pattern Recognition.

Titel: Do We Need More Training Data?
verfasst von: Xiangxin Zhu
Carl Vondrick
Charless C. Fowlkes
Deva Ramanan
Publikationsdatum: 01.08.2016
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 1/2016
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-015-0812-2

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 1/2016

Joint Inference in Weakly-Annotated Image Datasets via Dense Correspondence

Guest Editorial: Big Data

Large Scale Retrieval and Generation of Image Descriptions

SUN Database: Exploring a Large Collection of Scene Categories

Sparse Output Coding for Scalable Visual Recognition