nach oben

International Journal of Computer Vision

Erschienen in:

23.04.2020

Product Quantization Network for Fast Visual Search

verfasst von: Tan Yu , Jingjing Meng, Chen Fang, Hailin Jin, Junsong Yuan

Erschienen in: International Journal of Computer Vision | Ausgabe 8-9/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Product quantization has been widely used in fast image retrieval due to its effectiveness of coding high-dimensional visual features. By constructing the approximation function, we extend the hard-assignment quantization to soft-assignment quantization. Thanks to the differentiable property of the soft-assignment quantization, the product quantization operation can be integrated as a layer in a convolutional neural network, constructing the proposed product quantization network (PQN). Meanwhile, by extending the triplet loss to the asymmetric triplet loss, we directly optimize the retrieval accuracy of the learned representation based on asymmetric similarity measurement. Utilizing PQN, we can learn a discriminative and compact image representation in an end-to-end manner, which further enables a fast and accurate image retrieval. By revisiting residual quantization, we further extend the proposed PQN to residual product quantization network (RPQN). Benefited from the residual learning triggered by residual quantization, RPQN achieves a higher accuracy than PQN using the same computation cost. Moreover, we extend PQN to temporal product quantization network (TPQN) by exploiting temporal consistency in videos to speed up the video retrieval. It integrates frame-wise feature learning, frame-wise features aggregation and video-level feature quantization in a single neural network. Comprehensive experiments conducted on multiple public benchmark datasets demonstrate the state-of-the-art performance of the proposed PQN, RPQN and TPQN in fast image and video retrieval.

Vorheriger Artikel Anchor-Based Self-Ensembling for Semi-Supervised Deep Pairwise Hashing

Nächster Artikel Tensorized Multi-view Subspace Representation Learning

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Babenko, A., & Lempitsky, V. (2014). Additive quantization for extreme vector compression. In CVPR (pp. 931–938).

Babenko, A., & Lempitsky, V. (2015). Aggregating local deep features for image retrieval. In ICCV (pp. 1269–1277).

Babenko, A., Slesarev, A., Chigorin, A., & Lempitsky, V. (2014). Neural codes for image retrieval. In ECCV (pp. 584–599). Berlin: Springer.

Bai, S., Bai, X., Tian, Q., & Latecki, L. J. (2018). Regularized diffusion process on bidirectional context for object retrieval. TPAMI.

Bai, S., Zhou, Z., Wang, J., Bai, X., Latecki, L. J., & Tian, Q. (2017). Ensemble diffusion for retrieval.

Cakir, F., He, K., Bargal, S. A., & Sclaroff, S. (2017). Mihash: Online hashing with mutual information. In ICCV.

Cao, L., Li, Z., Mu, Y., & Chang, S. F. (2012). Submodular video hashing: a unified framework towards video pooling and indexing. In Proceedings of the 20th ACM international conference on Multimedia (pp. 299–308). ACM.

Cao, Y., Long, M., Wang, J., Zhu, H., & Wen, Q. (2016). Deep quantization network for efficient image retrieval. In AAAI.

Cao, Z., Long, M., Wang, J., & Yu, P. S. (2017). Hashnet: Deep learning to hash by continuation. In ICCV.

Charikar, M. S. (2002). Similarity estimation techniques from rounding algorithms. In Proceedings of the 34th annual ACM symposium on theory of computing (pp. 380–388).

Chen, Y., Guan, T., & Wang, C. (2010). Approximate nearest neighbor search by residual vector quantization. Sensors, 10(12), 11259–11273.CrossRef

Chua, T. S., Tang, J., Hong, R., Li, H., Luo, Z., & Zheng, Y. (2009). Nus-wide: a real-world web image database from national university of singapore. In Proceedings of the ACM international conference on image and video retrieval (p 48).

Datar, M., Immorlica, N., Indyk, P., Mirrokni, V. S. (2004). Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the twentieth annual symposium on Computational geometry (pp. 253–262).

Ge, T., He, K., Ke, Q., & Sun, J. (2013). Optimized product quantization for approximate nearest neighbor search. In CVPR (pp. 2946–2953). IEEE.

Gong, Y., Lazebnik, S., Gordo, A., & Perronnin, F. (2013). Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE T-PAMI, 35(12), 2916–2929.CrossRef

Gordo, A., Almazán, J., Revaud, J., & Larlus, D. (2016). Deep image retrieval: Learning global representations for image search. In ECCV (pp. 241–257). Springer.

He, K., Cakir, F., Bargal, S. A., & Sclaroff, S. (2018). Hashing as tie-aware learning to rank. In CVPR.

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).

Hong, W., Meng, J., & Yuan, J. (2018). Distributed composite quantization. In AAAI.

Hong, W., Meng, J., & Yuan, J. (2018). Tensorized projection for high-dimensional binary embedding. In AAAI.

Hong, W., & Yuan, J. (2018). Fried binary embedding: From high-dimensional visual features to high-dimensional binary codes. IEEE Transactions on Image Processing, 27(10), 1.MathSciNetCrossRef

Hong, W., Yuan, J., & Bhattacharjee, S. D. (2017). Fried binary embedding for high-dimensional visual features. CVPR, 11, 18.

Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning (pp. 448–456).

Jain, H., Zepeda, J., Perez, P., & Gribonval, R. (2017). Subic: A supervised, structured binary code for image search. In ICCV (pp. 833–842).

Jegou, H., Douze, M., & Schmid, C. (2011). Product quantization for nearest neighbor search. IEEE T-PAMI, 33(1), 117–128.CrossRef

Jégou, H., Douze, M., Schmid, C., & Pérez, P. (2010). Aggregating local descriptors into a compact image representation. In: CVPR (pp. 3304–3311).

Jiang, Q. Y., & Li, W. J. (2018). Asymmetric deep supervised hashing. AAAI.

Klein, B., & Wolf, L. (2017). In defense of product quantization. arXiv preprint arXiv:1711.08589.

Krizhevsky, A. (2009). Learning multiple layers of features from tiny images.

Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., & Serre, T. (2011). Hmdb: a large video database for human motion recognition. In IEEE International Conference on Computer Vision (ICCV), 2011 (pp. 2556–2563). IEEE.

Lai, H., Pan, Y., Liu, Y., & Yan, S. (2015). Simultaneous feature learning and hash coding with deep neural networks. arXiv preprint arXiv:1504.03410.

Li, Q., Sun, Z., He, R., & Tan, T. (2017). Deep supervised discrete hashing. In NIPS (pp. 2479–2488).

Li, W. J., Wang, S., & Kang, W. C. (2015). Feature learning based deep supervised hashing with pairwise labels. arXiv preprint arXiv:1511.03855

Liong, V. E., Lu, J., Tan, Y. P., & Zhou, J. (2017). Deep video hashing. IEEE Transactions on Multimedia, 19(6), 1209–1219.CrossRef

Liu, H., Wang, R., Shan, S., & Chen, X. (2016). Deep supervised hashing for fast image retrieval. In CVPR (pp. 2064–2072).

Liu, W., Wang, J., Ji, R., Jiang, Y. G., & Chang, S. F. (2012). Supervised hashing with kernels. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (pp. 2074–2081). IEEE.

Liu, X., Zhao, L., Ding, D., & Dong, Y. (2017). Deep hashing with category mask for fast video retrieval. CoRR arXiv:1712.08315.

Martinez, J., Clement, J., Hoos, H. H., & Little, J. J. (2016). Revisiting additive quantization. In European Conference on Computer Vision (pp. 137–153). Springer.

Ng, J.Y.H., Yang, F., Davis, L. S. (2015). Exploiting local features from deep networks for image retrieval. arXiv preprint arXiv:1504.05133.

Norouzi, M., & Fleet, D. J. (2013). Cartesian k-means. In CVPR (pp. 3017–3024).

Norouzi, M., Fleet, D. J., & Salakhutdinov, R. R. (2012). Hamming distance metric learning. In Advances in neural information processing systems (pp. 1061–1069).

Perronnin, F., Liu, Y., Sánchez, J., & Poirier, H. (2010). Large-scale image retrieval with compressed fisher vectors. In CVPR (pp. 3384–3391).

Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In CVPR (pp. 1–8).

Sablayrolles, A., Douze, M., Jégou, H., & Usunier, N. (2017). How should we evaluate supervised hashing? In ICASSP.

Salakhutdinov, R., & Hinton, G. (2007). Semantic hashing. RBM, 500(3), 500.

Shen, F., Shen, C., Liu, W., & Shen, H. T. (2013). Supervised discrete hashing. IEEE T-PAMI, 35(12), 2916–2929.CrossRef

Shen, F., Shen, C., Liu, W., & Shen, H. T. (2015). Supervised discrete hashing. In: CVPR (Vol. 2, p. 5).

Soomro, K., Zamir, A. R., & Shah, M. (2012). Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402.

Tu, Z., Li, H., Zhang, D., Dauwels, J., Li, B., & Yuan, J. (2019). Action-stage emphasized spatio-temporal VLAD for video action recognition. IEEE Transactions on Image Processing.

Tu, Z., Xie, W., Qin, Q., Veltkamp, R. C., Li, B., & Yuan, J. Multi-stream cnn: Learning representations based on human-related regions for action recognition. Pattern Recognition.

Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., & Van Gool, L. (2016a). Temporal segment networks: Towards good practices for deep action recognition. In European Conference on Computer Vision (pp. 20–36). Springer.

Wang, X., Shi, Y., & Kitani, K. M. (2016b). Deep supervised hashing with triplet labels. In ACCV (pp. 70–84). Springer.

Wang, X., Zhang, T., Qi, G.J., Tang, J., & Wang, J. (2016c). Supervised quantization for similarity search. In CVPR (pp. 2018–2026).

Weiss, Y., Torralba, A., & Fergus, R. (2009). Spectral hashing. In NIPS (pp. 1753–1760).

Wu, C.Y., Manmatha, R., Smola, A. J., & Krähenbühl, P. (2017a). Sampling matters in deep embedding learning. In ICCV.

Wu, G., Liu, L., Guo, Y., Ding, G., Han, J., Shen, J., & Shao, L. (2017b). Unsupervised deep video hashing with balanced rotation. In IJCAI.

Xia, R., Pan, Y., Lai, H., Liu, C., & Yan, S. (2014). Supervised hashing for image retrieval via image representation learning. In AAAI (pp. 2156–2162). AAAI Press.

Xia, Y., He, K., Kohli, P., & Sun, J. (2015). Sparse projections for high-dimensional binary codes. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3332–3339).

Ye, G., Liu, D., Wang, J., & Chang, S. F. (2013). Large-scale video hashing via structure learning. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2272–2279).

Yu, T., Meng, J., & Yuan, J. (2017a). Is my object in this video? reconstruction-based object search in videos. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (pp. 4551–4557). AAAI Press.

Yu, T., Wang, Z., & Yuan, J. (2017b). Compressive quantization for fast object instance search in videos. In ICCV (pp. 833–842).

Yu, T., Wu, Y., Bhattacharjee, S. D., & Yuan, J. (2017c). Efficient object instance search using fuzzy objects matching. In AAAI.

Yu, T., Wu, Y., & Yuan, J. (2017d). Hope: Hierarchical object prototype encoding for efficient object instance search in videos. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2424–2433).

Yu, T., Yuan, J., Fang, C., Jin, H. (2018). Product quantization network for fast image retrieval. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 186–201).

Zhang, R., Lin, L., Zhang, R., Zuo, W., & Zhang, L. (2015). Bit-scalable deep hashing with regularized similarity learning for image retrieval and person re-identification. IEEE TIP, 24(12), 4766–4779.MathSciNetMATH

Zhang, T., Du, C., & Wang, J. (2014). Composite quantization for approximate nearest neighbor search. In ICML, 2 (pp. 838–846).

Zhang, Z., Chen, Y., & Saligrama, V. (2016). Efficient training of very deep neural networks for supervised hashing. In CVPR (pp. 1487–1495).

Zhao, F., Huang, Y., Wang, L., & Tan, T. (2015). Deep semantic ranking based hashing for multi-label image retrieval. In CVPR (pp. 1556–1564).

Zhu, H., Long, M., Wang, J., & Cao, Y. (2016). Deep hashing network for efficient similarity retrieval. In AAAI.

Titel: Product Quantization Network for Fast Visual Search
verfasst von: Tan Yu
Jingjing Meng
Chen Fang
Hailin Jin
Junsong Yuan
Publikationsdatum: 23.04.2020
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 8-9/2020
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-020-01326-x

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 8-9/2020

Learning an Evolutionary Embedding via Massive Knowledge Distillation

Learning Multifunctional Binary Codes for Personalized Image Retrieval

SSN: Learning Sparse Switchable Normalization via SparsestMax

Hadamard Matrix Guided Online Hashing

Rectified Wing Loss for Efficient and Robust Facial Landmark Localisation with Convolutional Neural Networks

Anchor-Based Self-Ensembling for Semi-Supervised Deep Pairwise Hashing