nach oben

International Journal of Computer Vision

Erschienen in:

24.07.2020

Zero-Shot Object Detection: Joint Recognition and Localization of Novel Concepts

verfasst von: Shafin Rahman, Salman H. Khan, Fatih Porikli

Erschienen in: International Journal of Computer Vision | Ausgabe 12/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Zero shot learning (ZSL) identifies unseen objects for which no training images are available. Conventional ZSL approaches are restricted to a recognition setting where each test image is categorized into one of several unseen object classes. We posit that this setting is ill-suited for real-world applications where unseen objects appear only as a part of a complete scene, warranting both ‘recognition’ and ‘localization’ of the unseen category. To address this limitation, we introduce a new ‘Zero-Shot Detection’ (ZSD) problem setting, which aims at simultaneously recognizing and locating object instances belonging to novel categories, without any training samples. We introduce an integrated solution to the ZSD problem that jointly models the complex interplay between visual and semantic domain information. Ours is an end-to-end trainable deep network for ZSD that effectively overcomes the noise in the unsupervised semantic descriptions. To this end, we utilize the concept of meta-classes to design an original loss function that achieves synergy between max-margin class separation and semantic domain clustering. In order to set a benchmark for ZSD, we propose an experimental protocol for the large-scale ILSVRC dataset that adheres to practical challenges, e.g., rare classes are more likely to be the unseen ones. Furthermore, we present a baseline approach extended from conventional recognition to the ZSD setting. Our extensive experiments show a significant boost in performance (in terms of mAP and Recall) on the imperative yet difficult ZSD problem on ImageNet detection, MSCOCO and FashionZSD datasets.

Vorheriger Artikel Rooted Spanning Superpixels

Nächster Artikel Video Based Face Recognition by Using Discriminatively Learned Convex Models

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Meta-classes are obtained by clustering semantically similar classes.

Although, we acknowledge that Recall@100 stays an appropriate measure for large-scale datasets that are not fully labeled (such as Visual Genome-see Sect. 5.5).

Akata, Z., Malinowski, M., Fritz, M., & Schiele, B. (2016). Multi-cue zero-shot learning with strong supervision. In The IEEE conference on computer vision and pattern recognition (CVPR).

Akata, Z., Perronnin, F., Harchaoui, Z., & Schmid, C. (2016). Label-embedding for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(7), 1425–1438. https://doi.org/10.1109/TPAMI.2015.2487986.CrossRef

Akata, Z., Reed, S., Walter, D., Lee, H., & Schiele, B. (2015). Evaluation of output embeddings for fine-grained image classification. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition, 07–12 June-2015 (pp. 2927–2936). https://doi.org/10.1109/CVPR.2015.7298911.

Al-Halah, Z., Tapaswi, M., & Stiefelhagen, R. (2016). Recovering the missing link: Predicting class-attribute associations for unsupervised zero-shot learning. In The IEEE conference on computer vision and pattern recognition (CVPR).

Bansal, A., Sikka, K., Sharma, G., Chellappa, R., & Divakaran, A. (2018). Zero-shot object detection. In The European conference on computer vision (ECCV).

Changpinyo, S., Chao, W. L., Gong, B., & Sha, F. (2016). Synthesized classifiers for zero-shot learning. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition, January 2016 (pp. 5327–5336).

Dai, J., Li, Y., He, K., & Sun, J. (2016). R-FCN: Object detection via region-based fully convolutional networks. arXiv:1605.06409.

Demirel, B., Cinbis, R. G., & Ikizler-Cinbis, N. (2018). Zero-shot object detection by hybrid region embedding. In British machine vision conference (BMVC).

Demirel, B., Gokberk Cinbis, R., & Ikizler-Cinbis, N. (2017). Attributes2classname: A discriminative model for attribute-based unsupervised zero-shot learning. In The IEEE international conference on computer vision (ICCV).

Deng, J., Ding, N., Jia, Y., Frome, A., Murphy, K., Bengio, S., Li, Y., Neven, H., & Adam, H. (2014). Large-scale object classification using label relation graphs. In ECCV (pp. 48–64). Springer.

Deutsch, S., Kolouri, S., Kim, K., Owechko, Y., & Soatto, S. (2017). Zero shot learning via multi-scale manifold regularization. In CVPR.

Farhadi, A., Endres, I., Hoiem, D., & Forsyth, D. (2009). Describing objects by their attributes. In 2009 IEEE conference on computer vision and pattern recognition, CVPR 2009 (pp. 1778–1785). IEEE.

Frome, A., Corrado, G. S., Shlens, J., Bengio, S., Dean, J., Ranzato, M. A., et al. (2013). Devise: A deep visual-semantic embedding model. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (Vol. 26, pp. 2121–2129). Red Hook: Curran Associates Inc.

Fu, Y., Yang, Y., Hospedales, T., Xiang, T., & Gong, S. (2015). Transductive multi-label zero-shot learning. arXiv:1503.07790.

Fu, Z., Xiang, T., Kodirov, E., & Gong, S. (2017). Zero-shot learning on semantic class prototype graph. IEEE Transactions on Pattern Analysis and Machine Intelligence, PP(99), 1. https://doi.org/10.1109/TPAMI.2017.2737007.CrossRef

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In The IEEE conference on computer vision and pattern recognition (CVPR).

Hu, R., Xu, H., Rohrbach, M., Feng, J., Saenko, K., & Darrell, T. (2016). Natural language object retrieval. In CVPR (pp. 4555–4564).

Jayaraman, D., & Grauman, K. (2014). Zero-shot recognition with unreliable attributes. In Advances in neural information processing systems (pp. 3464–3472).

Jetley, S., Sapienza, M., Golodetz, S., & Torr, P. H. (2016). Straight to shapes: Real-time detection of encoded shapes. arXiv:1611.07932.

Kodirov, E., Xiang, T., Fu, Z., & Gong, S. (2015). Unsupervised domain adaptation for zero-shot learning. In The IEEE international conference on computer vision (ICCV).

Kodirov, E., Xiang, T., & Gong, S. (2017). Semantic autoencoder for zero-shot learning. In CVPR.

Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., et al. (2017). Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision, 123(1), 32–73.MathSciNetCrossRef

Lampert, C., Nickisch, H., & Harmeling, S. (2009). Learning to detect unseen object classes by between-class attribute transfer. In 2009 IEEE computer society conference on computer vision and pattern recognition workshops, CVPR workshops 2009 (pp. 951–958). https://doi.org/10.1109/CVPRW.2009.5206594.

Lampert, C. H., Nickisch, H., & Harmeling, S. (2014). Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(3), 453–465. https://doi.org/10.1109/TPAMI.2013.140.CrossRef

Lei Ba, J., Swersky, K., & Fidler, S., et al. (2015). Predicting deep zero-shot convolutional neural networks using textual descriptions. In CVPR (pp. 4247–4255).

Li, X., Liao, S., Lan, W., Du, X., & Yang, G. (2015). Zero-shot image tagging by hierarchical semantic embedding. In RDIR (pp. 879–882). ACM.

Li, Y., Wang, D., Hu, H., Lin, Y., & Zhuang, Y. (2017). Zero-shot recognition using dual visual-semantic mapping paths. In The IEEE conference on computer vision and pattern recognition (CVPR).

Li, Z., Gavves, E., Mensink, T., & Snoek, C. G. (2014). Attributes make sense on segmented objects. In European conference on computer vision (pp. 350–365). Springer.

Li, Z., Tao, R., Gavves, E., Snoek, C., & Smeulders, A. (2017). Tracking by natural language specification. In CVPR (pp. 6495–6503).

Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In ECCV (pp. 740–755). Springer.

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., et al. (2016). SSD: Single shot multibox detector (pp. 21–37). Cham: Springer. https://doi.org/10.1007/978-3-319-46448-0_2.CrossRef

Maxime Bucher, S. H., & Jurie, F. (2016). Improving semantic embedding consistency by metric learning for zero-shot classification. In Proceedings of the 14th European conference on computer vision.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (Vol. 26, pp. 3111–3119). Red Hook: Curran Associates Inc.

Miller, G. A. (1995). Wordnet: A lexical database for English. Communications of the ACM, 38(11), 39–41.CrossRef

Morgado, P., & Vasconcelos, N. (2017). Semantically consistent regularization for zero-shot recognition. In The IEEE conference on computer vision and pattern recognition (CVPR).

Palatucci, M., Pomerleau, D., Hinton, G. E., & Mitchell, T. M. (2009). Zero-shot learning with semantic output codes. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, & A. Culotta (Eds.), Advances in neural information processing systems (Vol. 22, pp. 1410–1418). Red Hook: Curran Associates Inc.

Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In Empirical methods in natural language processing (EMNLP) (pp. 1532–1543).

Rahman, S., Khan, S., & Barnes, N. (2018). Polarity loss for zero-shot object detection. arXiv:1811.08982.

Rahman, S., Khan, S., & Barnes, N. (2019). Transductive learning for zero-shot object detection. In Proceedings of the IEEE international conference on computer vision (pp. 6082–6091).

Rahman, S., Khan, S., & Barnes, N. (2020a). Improved visual-semantic alignment for zero-shot object detection. In AAAI (pp. 11,932–11,939).

Rahman, S., Khan, S., Barnes, N., & Khan, F. S. (2020b). Any-shot object detection. arXiv:2003.07003.

Rahman, S., Khan, S., & Porikli, F. (2018). A unified approach for conventional zero-shot, generalized zero-shot, and few-shot learning. IEEE Transactions on Image Processing, 27(11), 5652–5667. https://doi.org/10.1109/TIP.2018.2861573.MathSciNetCrossRef

Rahman, S., Khan, S., & Porikli, F. (2019). Zero-shot object detection: Learning to simultaneously recognize and localize novel concepts. In C. V. Jawahar, H. Li, G. Mori, & K. Schindler (Eds.), Computer vision—ACCV 2018 (pp. 547–563). Cham: Springer.CrossRef

Redmon, J., & Farhadi, A. (2017). Yolo9000: Better, faster, stronger. In The IEEE conference on computer vision and pattern recognition (CVPR).

Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031.CrossRef

Romera-Paredes, B., & Torr, P. (2015). An embarrassingly simple approach to zero-shot learning. In Proceedings of the 32nd international conference on machine learning (pp. 2152–2161).

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 115(3), 211–252. https://doi.org/10.1007/s11263-015-0816-y.MathSciNetCrossRef

Schonfeld, E., Ebrahimi, S., Sinha, S., Darrell, T., & Akata, Z. (2019). Generalized zero- and few-shot learning via aligned variational autoencoders. In The IEEE conference on computer vision and pattern recognition (CVPR).

Shigeto, Y., Suzuki, I., Hara, K., Shimbo, M., & Matsumoto, Y. (2015). Ridge regression, hubness, and zero-shot learning. In Joint European conference on machine learning and knowledge discovery in databases (pp. 135–151). Springer.

Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.

Socher, R., Ganjoo, M., Manning, C. D., & Ng, A. (2013). Zero-shot learning through cross-modal transfer. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (Vol. 26, pp. 935–943). Red Hook: Curran Associates Inc.

Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The Caltech-UCSD Birds-200-2011 Dataset. Tech. Rep. CNS-TR-2011-001, California Institute of Technology.

Wang, X., & Ji, Q. (2013). A unified probabilistic approach modeling relationships between attributes and objects. In Proceedings of the IEEE international conference on computer vision (pp. 2120–2127). https://doi.org/10.1109/ICCV.2013.264.

Xian, Y., Akata, Z., Sharma, G., Nguyen, Q., Hein, M., & Schiele, B. (2016). Latent embeddings for zero-shot classification. In The IEEE conference on computer vision and pattern recognition (CVPR).

Xian, Y., Lampert, C. H., Schiele, B., & Akata, Z. (2018a). Zero-shot learning—A comprehensive evaluation of the good, the bad and the ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2018.2857768.

Xian, Y., Lorenz, T., Schiele, B., & Akata, Z. (2018b). Feature generating networks for zero-shot learning. In The IEEE conference on computer vision and pattern recognition (CVPR).

Xian, Y., Sharma, S., Schiele, B., & Akata, Z. (2019). F-vaegan-d2: A feature generating framework for any-shot learning. In The IEEE conference on computer vision and pattern recognition (CVPR).

Xiao, H., Rasul, K., & Vollgraf, R. (2017). Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms.

Xu, X., Shen, F., Yang, Y., Zhang, D., Shen, H. T., & Song, J. (2017). Matrix tri-factorization with manifold regularizations for zero-shot learning. In Proceedings of CVPR.

Ye, M., & Guo, Y. (2017). Zero-shot classification with discriminative semantic representation learning. In The IEEE conference on computer vision and pattern recognition (CVPR).

Yu, F. X., Cao, L., Feris, R. S., Smith, J. R., & Chang, S. F. (2013). Designing category-level attributes for discriminative visual recognition. In 2013 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 771–778). https://doi.org/10.1109/CVPR.2013.105.

Zhang, L., Xiang, T., & Gong, S. (2017). Learning a deep embedding model for zero-shot learning. In The IEEE conference on computer vision and pattern recognition (CVPR).

Zhang, Y., Gong, B., & Shah, M. (2016). Fast zero-shot image tagging. In The IEEE conference on computer vision and pattern recognition (CVPR).

Zhang, Z., & Saligrama, V. (2015). Zero-shot learning via semantic similarity embedding. In The IEEE international conference on computer vision (ICCV).

Zhang, Z., & Saligrama, V. (2016). Zero-shot learning via joint latent similarity embedding. In The IEEE conference on computer vision and pattern recognition (CVPR).

Zhu, P., Wang, H., Bolukbasi, T., & Saligrama, V. (2018). Zero-shot detection. arXiv:1803.07113.

Titel: Zero-Shot Object Detection: Joint Recognition and Localization of Novel Concepts
verfasst von: Shafin Rahman
Salman H. Khan
Fatih Porikli
Publikationsdatum: 24.07.2020
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 12/2020
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-020-01355-6

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 12/2020

Transferrable Feature and Projection Learning with Class Hierarchy for Zero-Shot Learning

Pix2Vox++: Multi-scale Context-aware 3D Object Reconstruction from Single and Multiple Images

Long-Short Temporal–Spatial Clues Excited Network for Robust Person Re-identification

Video Based Face Recognition by Using Discriminatively Learned Convex Models

Necessary and Sufficient Polynomial Constraints on Compatible Triplets of Essential Matrices

CR-Net: A Deep Classification-Regression Network for Multimodal Apparent Personality Analysis

Premium Partner