nach oben

International Journal of Computer Vision

Erschienen in:

12.07.2016

Convolutional Patch Representations for Image Retrieval: An Unsupervised Approach

verfasst von: Mattis Paulin, Julien Mairal, Matthijs Douze, Zaid Harchaoui, Florent Perronnin, Cordelia Schmid

Erschienen in: International Journal of Computer Vision | Ausgabe 1/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Convolutional neural networks (CNNs) are able to model local stationary structures in natural images in a multi-scale fashion, when learning all model parameters with supervision. While excellent performance was achieved for image classification when large amounts of labeled visual data are available, their success for unsupervised tasks such as image retrieval has been moderate so far.Our paper focuses on this latter setting and explores several methods for learning patch descriptors without supervision with application to matching and instance-level retrieval. To that effect, we propose a new family of patch representations, based on the recently introduced convolutional kernel networks. We show that our descriptor, named Patch-CKN, performs better than SIFT as well as other convolutional networks learned by artificially introducing supervision and is significantly faster to train. To demonstrate its effectiveness, we perform an extensive evaluation on standard benchmarks for patch and image retrieval where we obtain state-of-the-art results. We also introduce a new dataset called RomePatches, which allows to simultaneously study descriptor performance for patch and image retrieval.

Vorheriger Artikel Image Annotation by Propagating Labels from Semantic Neighbourhoods

Nächster Artikel A Discrete MRF Framework for Integrated Multi-Atlas Registration and Segmentation

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

https://github.com/szagoruyko/cvpr15deepcompare.

Note that in the kernel literature, “feature map” denotes the mapping between data points and their representation in a reproducing kernel Hilbert space (RKHS). Here, feature maps refer to spatial maps representing local image characteristics at every location, as usual in the neural network literature LeCun et al. (1998).

Note that to be more rigorous, the maps \(M_l\) need to be slightly larger in spatial size than \(\varphi _M^l\) since otherwise a patch \(P_{l,z}\) at location z from \(\varOmega _l\) may take pixel values outside of \(\varOmega _l\). We omit this fact for simplicity.

www.cs.cornell.edu/projects/p2f.

http://sites.skoltech.ru/compvision/projects/neuralcodes/

Agrawal, P., Carreira, J., Malik, J. (2015) Learning to see by moving. In IEEE conference on computer vision and pattern recognition

Arandjelovic, R., Zisserman, A. (2013) All about VLAD. In IEEE conference on computer vision and pattern recognition

Babenko, A., Lempitsky, V. (2015) Aggregating deep convolutional features for image retrieval. In International conference on computer vision

Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V. (2014) Neural codes for image retrieval. In European conference on computer vision

Bach, F. R., & Jordan, M. I. (2002). Kernel independent component analysis. Journal of Machine Learning Research, 3, 1–48.MathSciNetMATH

Bay, H., Tuytelaars, T., Van Gool, L. (2006) SURF: Speeded up robust features. In European conference on computer vision

Bo, L., Ren, X., Fox, D. (2010) Kernel descriptors for visual recognition. Advances in neural information processing systems

Bottou, L. (2012). Stochastic gradient descent tricks. Neural networks: Tricks of the trade. Berlin: Springer.

Brown, M., Hua, G., & Winder, S. (2011). Discriminative learning of local image descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33, 43–57.CrossRef

Calonder, M., Lepetit, V., Strecha, C., Fua (2010). BRIEF: Binary robust independent elementary features. In European conference on computer vision

Chopra, S., Hadsell, R., LeCun, Y. (2005) Learning a similarity metric discriminatively, with application to face verification. In IEEE conference on computer vision and pattern recognition

Coates, A., & Ng, A. Y. (2012). Learning feature representations with k-mean. Neural networks: Tricks of the trade. Heidelberg: Springer.

Cucker, F., & Zhou, D. X. (2007). Learning theory : An approximation theory viewpoint., Cambridge Monographs on Applied and Computational Mathematics Cambridge: Cambridge University Press.CrossRefMATH

Deng, J., Dong, W., Socher, R., Li, LJ., Li, K., Fei-Fei, L. (2009) ImageNet: A large-scale hierarchical image database. In IEEE conference on computer vision and pattern recognition

Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T. (2014) DeCAF: A deep convolutional activation feature for generic visual recognition. In International conference on machine learning

Dong, J., Soatto, S. (2015) Domain-size pooling in local descriptors: Dsp-sift. In IEEE conference on computer vision and pattern recognition

Dosovitskiy, A., Springenberg, JT., Riedmiller, M., Brox, T. (2014) Discriminative unsupervised feature learning with convolutional neural networks. Advances in Neural Information Processing Systems

Erhan, D., Manzagol, PA., Bengio, Y., Bengio, S., Vincent, P. (2009) The difficulty of training deep architectures and the effect of unsupervised pre-training. In Twelfth international conference on artificial intelligence and statistics

Erhan, D., Bengio, Y., Courville, A., Manzagol, P. A., Vincent, P., & Bengio, S. (2010). Why does unsupervised pre-training help deep learning? The Journal of Machine Learning Research, 11, 625–660.MathSciNetMATH

Fischer, P., Dosovitskiy, A., Brox, T. (2014) Descriptor matching with Convolutional Neural Networks: a comparison to SIFT. arXiv Preprint

Gong, Y., Wang, L., Guo, R., Lazebnik, S. (2014) Multi-scale orderless pooling of deep convolutional activation features. In European conference on computer vision

Goroshin, R., Bruna, J., Tompson, J., Eigen, D., LeCun, Y. (2014) Unsupervised feature learning from temporal data. In Advances in Neural Information Processing Systems

Goroshin, R., Mathieu, M., LeCun, Y. (2015) Learning to linearize under uncertainty. In Advances in Neural Information Processing Systems

Jayaraman, D., Grauman, K. (2015) Learning image representations equivariant to ego-motion. In IEEE conference on computer vision and pattern recognition

Jégou, H., Chum, O. (2012) Negative evidences and co-occurrences in image retrieval: the benefit of PCA and whitening. In European conference on computer vision

Jégou, H., Douze, M., Schmid, C. (2008) Hamming embedding and weak geometric consistency for large scale image search. In European conference on computer vision

Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In IEEE conference on computer vision and pattern recognition

Jégou, H., Douze, M., & Schmid, C. (2011). Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 117–128.CrossRef

Jégou, H., Perronnin, F., Douze, M., Sánchez, J., Pérez, P., & Schmid, C. (2012). Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9), 1704–1716.CrossRef

Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T. (2014) Caffe: Convolutional architecture for fast feature embedding. In ACM multimedia conference

Jiang, W., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., Wu, Y. (2014) Learning fine-grained image similarity with deep ranking. In IEEE conference on computer vision and pattern recognition

Krizhevsky, A., Sutskever, I., Hinton, G. (2012) ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems

LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., Jackel, L. (1989) Handwritten digit recognition with a back-propagation network. Advances in Neural Information Processing Systems

LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.CrossRef

Li, Y., Snavely, N., Huttenlocher, DP. (2010) Location recognition using prioritized feature matching. In European conference on computer vision

Long, J., Zhang, N., Darrell, T. (2014) Do Convnets learn correspondances? Advances in Neural Information Processing Systems

Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal on Computer Vision, 60(2), 91–110.CrossRef

Mairal, J., Bach, F., Ponce, J. (2014a) Sparse modeling for image and vision processing. Foundations and Trends in Computer Graphics and Vision

Mairal, J., Koniusz, P., Harchaoui, Z., Schmid C (2014b) Convolutional kernel networks. Advances in Neural Information Processing Systems

Mikolajczyk, K., & Schmid, C. (2004). Scale & affine invariant interest point detectors. International Journal on Computer Vision, 60(1), 63–86.CrossRef

Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615–1630.CrossRef

Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., et al. (2005). A comparison of affine region detectors. International Journal on Computer Vision, 65, 43–72.CrossRef

Ng, JYH., Yang, F., Davis, LS. (2015) Exploiting Local Features from Deep Networks for Image Retrieval. In DeepVision workshop

Nister, D., Stewenius, H. (2006) Scalable recognition with a vocabulary tree. In IEEE conference on computer vision and pattern recognition

Paulin, M., Douze, M., Harchaoui, Z., Mairal, J., Perronnin, F., Schmid, C. (2015) Local convolutional features with unsupervised training for image retrieval. In International conference on computer vision

Perd’och, M., Chum, O., Matas, J. (2009) Efficient representation of local geometry for large scale object retrieval. In IEEE conference on computer vision and pattern recognition

Perronnin, F., Dance, C. (2007) Fisher kernels on visual vocabularies for image categorization. In IEEE conference on computer vision and pattern recognition

Perronnin, F., Sánchez, J., Liu, Y. (2010) Large-scale image categorization with explicit data embedding. In IEEE conference on computer vision and pattern recognition

Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A. (2007) Object retrieval with large vocabularies and fast spatial matching. In IEEE conference on computer vision and pattern recognition

Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A. (2008) Lost in quantization: Improving particular object retrieval in large scale image databases. In IEEE conference on computer vision and pattern recognition

Philbin, J., Isard, M., Sivic, J., Zisserman, A. (2010) Descriptor learning for efficient retrieval. In European conference on computer vision

Rahimi, A., Recht, B. (2008) Random features for large-scale kernel machines. Advances in Neural Information Processing Systems

Razavian, AS., Azizpour, H., Sullivan, J., Carlsson, S. (2014) CNN features off-the-shelf: an astounding baseline for recognition. preprint arXiv:1403.6382

Schölkopf, B., & Smola, A. J. (2002). Learning with kernels: Support vector machines, regularization, optimization, and beyond. Cambridge: MIT press.

Simo-Serra, E., Trulls, E., Ferraz, L., Kokkinos, I., Moreno-Noguer, F. (2015) Discriminative learning of deep convolutional feature point descriptors. In International conference on computer vision

Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Learning local feature descriptors using convex optimisation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36, 1573–1585.CrossRef

Tola, E., Lepetit, V., & Fua, P. (2010). Daisy: An efficient dense descriptor applied to wide-baseline stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(5), 815–830.

Tolias, G., Sicre, R., Jégou H (2015) Particular Object Retrieval with Integral Max-Pooling of CNN Activations. preprint arXiv:1511.05879

Tuytelaars, T., & Mikolajczyk, K. (2008). Local invariant feature detectors: A survey. Foundations and Trends in Computer Graphics and Vision, 3(3), 177–280.

Vedaldi, A., & Zisserman, A. (2012). Efficient additive kernels via explicit feature maps. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3), 480–492.CrossRef

Wang, Z., Fan, B., Wu, F. (2011) Local intensity order pattern for feature description. In International conference on computer vision

Williams, C., Seeger, M. (2001) Using the Nyström method to speed up kernel machines. Advances in Neural Information Processing Systems

Winder, S., Hua, G., Brown, M. (2009) Picking the best Daisy. In IEEE conference on computer vision and pattern recognition

Yosinski, J., Clune, J., Bengio, Y., Lipson, H. (2014) How transferable are features in deep neural networks? Advances in Neural Information Processing Systems

Zagoruyko, S., Komodakis, N. (2015) Learning to compare image patches via convolutional neural networks. In IEEE conference on computer vision and pattern recognition

Zbontar, J., LeCun, Y. (2015) Computing the stereo matching cost with a convolutional neural network. In IEEE conference on computer vision and pattern recognition

Titel: Convolutional Patch Representations for Image Retrieval: An Unsupervised Approach
verfasst von: Mattis Paulin
Julien Mairal
Matthijs Douze
Zaid Harchaoui
Florent Perronnin
Cordelia Schmid
Publikationsdatum: 12.07.2016
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 1/2017
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-016-0924-3

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 1/2017

A Discrete MRF Framework for Integrated Multi-Atlas Registration and Segmentation

Image Annotation by Propagating Labels from Semantic Neighbourhoods

Learning a Distance Metric from Relative Comparisons between Quadruplets of Images

Fast-Match: Fast Affine Template Matching

Hollywood 3D: What are the Best 3D Features for Action Recognition?

Spatially Coherent Interpretations of Videos Using Pattern Theory

Premium Partner