Skip to main content

04.04.2024

FSODv2: A Deep Calibrated Few-Shot Object Detection Network

verfasst von: Qi Fan, Wei Zhuo, Chi-Keung Tang, Yu-Wing Tai

Erschienen in: International Journal of Computer Vision

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Traditional methods for object detection typically necessitate a substantial amount of training data, and creating high-quality training data is time-consuming. We propose a novel Few-Shot Object Detection network (FSODv2) in this paper that aims to detect objects from previously unseen categories using only a few annotated examples. Attention RPN, Multi-Relation Detector, and Contrastive Training strategy are central to our method (Fan et al., in: CVPR, 2020), which exploit similarity between few shot support set and query set to detect novel objects while suppressing false detection in the background. We also contribute a new dataset, FSOD-1k, which contains 1000 categories of various objects with high-quality annotations to train our network. To the best of our knowledge, this is one of the first datasets designed for few-shot object detection. This paper improves our FSOD model through well-designed model calibration in three areas: (1) we propose an improved FPN with multi-scale support inputs to calibrate the multi-scale support-query feature matching by exploiting multi-scale features from the same support image with different input scales; (2) we introduce a support classification supervision branch to calibrate the support feature supervision, aligning to the query feature training supervision; (3) we propose backbone calibration to preserve prior knowledge while alleviating backbone bias toward base classes by employing classification dataset to help our model calibration procedure, where such dataset has previously only been used for pre-training in other related works. Besides, we propose a Fast Attention RPN to improve evaluation speed and save computational memory during inference. Once trained, our few-shot network can detect objects from previously unseen categories without further training or fine-tuning, resulting in new state-of-the-art performance on different datasets in the few-shot setting. Our method is general in scope and has numerous potential applications. The dataset link is https://​github.​com/​fanq15/​Few-Shot-Object-Detection-Dataset.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
The fine-tuning stage benefits from more ways during the multi-way training, so we use as many ways as possible to fill up the GPU memory.
 
2
Since Feature Reweighting and Meta R-CNN are evaluated on MS COCO, in this subsection we discard pre-training on Lin et al. (2014) for fair comparison to follow the same experimental setting as described.
 
3
We also discard the MS COCO pretraining in this experiment.
 
Literatur
Zurück zum Zitat Alayrac, J.-B., Donahue, J., Luc, P., Miech, A., Barr, I., Hasson, Y., Lenc, K., Mensch, A., Millican, K., Reynolds, M., & Ring, R. (2022). Flamingo: A visual language model for few-shot learning. In: NeurIPS. Alayrac, J.-B., Donahue, J., Luc, P., Miech, A., Barr, I., Hasson, Y., Lenc, K., Mensch, A., Millican, K., Reynolds, M., & Ring, R. (2022). Flamingo: A visual language model for few-shot learning. In: NeurIPS.
Zurück zum Zitat Arteta, C., Lempitsky, V., & Zisserman, A. (2016). Counting in the wild. In: ECCV. Arteta, C., Lempitsky, V., & Zisserman, A. (2016). Counting in the wild. In: ECCV.
Zurück zum Zitat Bertinetto, L., Valmadre, J., Henriques, J. F., Vedaldi, A., & Torr, P. H. (2016). Fully-convolutional siamese networks for object tracking. In: ECCV. Bertinetto, L., Valmadre, J., Henriques, J. F., Vedaldi, A., & Torr, P. H. (2016). Fully-convolutional siamese networks for object tracking. In: ECCV.
Zurück zum Zitat Bodla, N., Singh, B., Chellappa, R., & Davis, L. S. (2017). Soft-NMS—improving object detection with one line of code. In: ICCV. Bodla, N., Singh, B., Chellappa, R., & Davis, L. S. (2017). Soft-NMS—improving object detection with one line of code. In: ICCV.
Zurück zum Zitat Buda, M., Maki, A., & Mazurowski, M. A. (2018). A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks. Buda, M., Maki, A., & Mazurowski, M. A. (2018). A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks.
Zurück zum Zitat Bulat, A., Guerrero, R., Martinez, B., & Tzimiropoulos, G. (2023). FS-DETR: Few-shot detection transformer with prompting and without re-training. In: ICCV. Bulat, A., Guerrero, R., Martinez, B., & Tzimiropoulos, G. (2023). FS-DETR: Few-shot detection transformer with prompting and without re-training. In: ICCV.
Zurück zum Zitat Cai, Q., Pan, Y., Yao, T., Yan, C., & Mei, T. (2018). Memory matching networks for one-shot image recognition. In: CVPR. Cai, Q., Pan, Y., Yao, T., Yan, C., & Mei, T. (2018). Memory matching networks for one-shot image recognition. In: CVPR.
Zurück zum Zitat Cao, Y., Wang, J., Jin, Y., Wu, T., Chen, K., Liu, Z., & Lin, D. (2021). Few-shot object detection via association and discrimination. In: NeurIPS. Cao, Y., Wang, J., Jin, Y., Wu, T., Chen, K., Liu, Z., & Lin, D. (2021). Few-shot object detection via association and discrimination. In: NeurIPS.
Zurück zum Zitat Cao, Y., Wang, J., Lin, Y., & Lin, D. (2022). Mini: Mining implicit novel instances for few-shot object detection. arXiv:2205.03381. Cao, Y., Wang, J., Lin, Y., & Lin, D. (2022). Mini: Mining implicit novel instances for few-shot object detection. arXiv:​2205.​03381.
Zurück zum Zitat Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-end object detection with transformers. In: ECCV. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-end object detection with transformers. In: ECCV.
Zurück zum Zitat Chen, Q., Chen, X., Wang, J., Zhang, S., Yao, K., Feng, H., Han, J., Ding, E., Zeng, G., & Wang, J. (2023). Group detr: Fast detr training with group-wise one-to-many assignment. In: ICCV. Chen, Q., Chen, X., Wang, J., Zhang, S., Yao, K., Feng, H., Han, J., Ding, E., Zeng, G., & Wang, J. (2023). Group detr: Fast detr training with group-wise one-to-many assignment. In: ICCV.
Zurück zum Zitat Chen, Y., Li, W., Sakaridis, C., Dai, D., & Van Gool, L. (2018). Domain adaptive faster r-cnn for object detection in the wild. In: CVPR. Chen, Y., Li, W., Sakaridis, C., Dai, D., & Van Gool, L. (2018). Domain adaptive faster r-cnn for object detection in the wild. In: CVPR.
Zurück zum Zitat Chen, H., Wang, Y., Wang, G., & Qiao, Y. (2018). Lstd: a low-shot transfer detector for object detection. In: AAAI. Chen, H., Wang, Y., Wang, G., & Qiao, Y. (2018). Lstd: a low-shot transfer detector for object detection. In: AAAI.
Zurück zum Zitat Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In: CVPR. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In: CVPR.
Zurück zum Zitat Dai, X., Chen, Y., Yang, J., Zhang, P., Yuan, L., & Zhang, L. (2021). Dynamic detr: End-to-end object detection with dynamic attention. In: ICCV. Dai, X., Chen, Y., Yang, J., Zhang, P., Yuan, L., & Zhang, L. (2021). Dynamic detr: End-to-end object detection with dynamic attention. In: ICCV.
Zurück zum Zitat Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In: CVPR. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In: CVPR.
Zurück zum Zitat Demirel, B., Baran, O. B., & Cinbis, R. G. (2023). Meta-tuning loss functions and data augmentation for few-shot object detection. In: CVPR. Demirel, B., Baran, O. B., & Cinbis, R. G. (2023). Meta-tuning loss functions and data augmentation for few-shot object detection. In: CVPR.
Zurück zum Zitat Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009) Imagenet: a large-scale hierarchical image database. In: CVPR. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009) Imagenet: a large-scale hierarchical image database. In: CVPR.
Zurück zum Zitat Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:​1810.​04805.
Zurück zum Zitat Dong, N., & Xing, E.P. (2018). Few-shot semantic segmentation with prototype learning. In: BMVC. Dong, N., & Xing, E.P. (2018). Few-shot semantic segmentation with prototype learning. In: BMVC.
Zurück zum Zitat Dong, X., Zheng, L., Ma, F., Yang, Y., & Meng, D. (2018). Few-example object detection with model communication. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(7), 1641–1654.CrossRef Dong, X., Zheng, L., Ma, F., Yang, Y., & Meng, D. (2018). Few-example object detection with model communication. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(7), 1641–1654.CrossRef
Zurück zum Zitat Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., & Uszkoreit, J. (2020). An image is worth \(16\times 16\) words: Transformers for image recognition at scale. arXiv:2010.11929. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., & Uszkoreit, J. (2020). An image is worth \(16\times 16\) words: Transformers for image recognition at scale. arXiv:​2010.​11929.
Zurück zum Zitat Du, J., Zhang, S., Chen, Q., Le, H., Sun, Y., Ni, Y., Wang, J., He, B., & Wang, J. (2023). \(\sigma \)-adaptive decoupled prototype for few-shot object detection. In: ICCV. Du, J., Zhang, S., Chen, Q., Le, H., Sun, Y., Ni, Y., Wang, J., He, B., & Wang, J. (2023). \(\sigma \)-adaptive decoupled prototype for few-shot object detection. In: ICCV.
Zurück zum Zitat Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.CrossRef Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.CrossRef
Zurück zum Zitat Fan, Z., Ma, Y., Li, Z., & Sun, J. (2021). Generalized few-shot object detection without forgetting. In: CVPR. Fan, Z., Ma, Y., Li, Z., & Sun, J. (2021). Generalized few-shot object detection without forgetting. In: CVPR.
Zurück zum Zitat Fan, Q., Pei, W., Tai, Y.-W., & Tang, C.-K. (2022). Self-support few-shot semantic segmentation. In: ECCV. Fan, Q., Pei, W., Tai, Y.-W., & Tang, C.-K. (2022). Self-support few-shot semantic segmentation. In: ECCV.
Zurück zum Zitat Fan, Q., Segu, M., Tai, Y.-W., Yu, F., Tang, C.-K., Schiele, B., & Dai, D. (2023). Towards robust object detection invariant to real-world domain shifts. In: ICLR. Fan, Q., Segu, M., Tai, Y.-W., Yu, F., Tang, C.-K., Schiele, B., & Dai, D. (2023). Towards robust object detection invariant to real-world domain shifts. In: ICLR.
Zurück zum Zitat Fan, Q., Tang, C.-K., & Tai, Y.-W. (2022). Few-shot object detection with model calibration. In: ECCV. Fan, Q., Tang, C.-K., & Tai, Y.-W. (2022). Few-shot object detection with model calibration. In: ECCV.
Zurück zum Zitat Fan, Q., Zhuo, W., Tang, C.-K., & Tai, Y.-W. (2020). Few-shot object detection with attention-RPN and multi-relation detector. In: CVPR. Fan, Q., Zhuo, W., Tang, C.-K., & Tai, Y.-W. (2020). Few-shot object detection with attention-RPN and multi-relation detector. In: CVPR.
Zurück zum Zitat Fei-Fei, L., Fergus, R., & Perona, P. (2006). One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4), 594–611.CrossRef Fei-Fei, L., Fergus, R., & Perona, P. (2006). One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4), 594–611.CrossRef
Zurück zum Zitat Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.CrossRef Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.CrossRef
Zurück zum Zitat Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML. Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML.
Zurück zum Zitat Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML. Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML.
Zurück zum Zitat Gao, P., Geng, S., Zhang, R., Ma, T., Fang, R., Zhang, Y., Li, H., & Qiao, Y. (2023). Clip-adapter: Better vision-language models with feature adapters. International Journal of Computer Vision. Gao, P., Geng, S., Zhang, R., Ma, T., Fang, R., Zhang, Y., Li, H., & Qiao, Y. (2023). Clip-adapter: Better vision-language models with feature adapters. International Journal of Computer Vision.
Zurück zum Zitat Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? The Kitti vision benchmark suite. In: CVPR. Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? The Kitti vision benchmark suite. In: CVPR.
Zurück zum Zitat Gidaris, S., & Komodakis, N. (2019). Generating classification weights with gnn denoising autoencoders for few-shot learning. In: CVPR. Gidaris, S., & Komodakis, N. (2019). Generating classification weights with gnn denoising autoencoders for few-shot learning. In: CVPR.
Zurück zum Zitat Girshick, R. (2015). Fast r-cnn. In: ICCV. Girshick, R. (2015). Fast r-cnn. In: ICCV.
Zurück zum Zitat Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR. Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR.
Zurück zum Zitat Gu, X., Lin, T.-Y., Kuo, W., & Cui, Y. (2021). Open-vocabulary object detection via vision and language knowledge distillation. arXiv:2104.13921. Gu, X., Lin, T.-Y., Kuo, W., & Cui, Y. (2021). Open-vocabulary object detection via vision and language knowledge distillation. arXiv:​2104.​13921.
Zurück zum Zitat Gui, L.-Y., Wang, Y.-X., Ramanan, D., & Moura, J. M. F. (2018). Few-shot human motion prediction via meta-learning. In: ECCV. Gui, L.-Y., Wang, Y.-X., Ramanan, D., & Moura, J. M. F. (2018). Few-shot human motion prediction via meta-learning. In: ECCV.
Zurück zum Zitat Guirguis, K., Meier, J., Eskandar, G., Kayser, M., Yang, B., & Beyerer, J. (2023). Niff: Alleviating forgetting in generalized few-shot object detection via neural instance feature forging. In: CVPR. Guirguis, K., Meier, J., Eskandar, G., Kayser, M., Yang, B., & Beyerer, J. (2023). Niff: Alleviating forgetting in generalized few-shot object detection via neural instance feature forging. In: CVPR.
Zurück zum Zitat Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On calibration of modern neural networks. In: ICML. Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On calibration of modern neural networks. In: ICML.
Zurück zum Zitat Gupta, A., Dollar, P., & Girshick, R. (2019). Lvis: A dataset for large vocabulary instance segmentation. In: CVPR. Gupta, A., Dollar, P., & Girshick, R. (2019). Lvis: A dataset for large vocabulary instance segmentation. In: CVPR.
Zurück zum Zitat Han, G., He, Y., Huang, S., Ma, J., & Chang, S.-F. (2021). Query adaptive few-shot object detection with heterogeneous graph convolutional networks. In: ICCV. Han, G., He, Y., Huang, S., Ma, J., & Chang, S.-F. (2021). Query adaptive few-shot object detection with heterogeneous graph convolutional networks. In: ICCV.
Zurück zum Zitat Han, J., Ren, Y., Ding, J., Yan, K., & Xia, G.-S. (2023). Few-shot object detection via variational feature aggregation. In: AAAI. Han, J., Ren, Y., Ding, J., Yan, K., & Xia, G.-S. (2023). Few-shot object detection via variational feature aggregation. In: AAAI.
Zurück zum Zitat Hariharan, B., & Girshick, R. (2017). Low-shot visual recognition by shrinking and hallucinating features. In: ICCV. Hariharan, B., & Girshick, R. (2017). Low-shot visual recognition by shrinking and hallucinating features. In: ICCV.
Zurück zum Zitat He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. In: CVPR. He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. In: CVPR.
Zurück zum Zitat He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In: ICCV. He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In: ICCV.
Zurück zum Zitat He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: CVPR. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: CVPR.
Zurück zum Zitat Hénaff, O.J., Koppula, S., Alayrac, J.-B., Oord, A., Vinyals, O., & Carreira, J. (2021). Efficient visual pretraining with contrastive detection. In: ICCV. Hénaff, O.J., Koppula, S., Alayrac, J.-B., Oord, A., Vinyals, O., & Carreira, J. (2021). Efficient visual pretraining with contrastive detection. In: ICCV.
Zurück zum Zitat Hu, H., Bai, S., Li, A., Cui, J., & Wang, L. (2021). Dense relation distillation with context-aware aggregation for few-shot object detection. In: CVPR. Hu, H., Bai, S., Li, A., Cui, J., & Wang, L. (2021). Dense relation distillation with context-aware aggregation for few-shot object detection. In: CVPR.
Zurück zum Zitat Hu, X., Jiang, Y., Tang, K., Chen, J., Miao, C., & Zhang, H. (2020). Learning to segment the tail. In: CVPR. Hu, X., Jiang, Y., Tang, K., Chen, J., Miao, C., & Zhang, H. (2020). Learning to segment the tail. In: CVPR.
Zurück zum Zitat Hu, T., Yang, P., Zhang, C., Yu, G., Mu, Y., & Snoek, C. G. M. (2019). Attention-based multi-context guiding for few-shot semantic segmentation. In: AAAI. Hu, T., Yang, P., Zhang, C., Yu, G., Mu, Y., & Snoek, C. G. M. (2019). Attention-based multi-context guiding for few-shot semantic segmentation. In: AAAI.
Zurück zum Zitat Jia, C., Yang, Y., Xia, Y., Chen, Y.-T., Parekh, Z., Pham, H., Le, Q., Sung, Y.-H., Li, Z., & Duerig, T. (2021). Scaling up visual and vision-language representation learning with noisy text supervision. In: ICML. Jia, C., Yang, Y., Xia, Y., Chen, Y.-T., Parekh, Z., Pham, H., Le, Q., Sung, Y.-H., Li, Z., & Duerig, T. (2021). Scaling up visual and vision-language representation learning with noisy text supervision. In: ICML.
Zurück zum Zitat Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., & Hu, H. (2023). Detrs with hybrid matching. In: CVPR. Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., & Hu, H. (2023). Detrs with hybrid matching. In: CVPR.
Zurück zum Zitat Kang, B., Liu, Z., Wang, X., Yu, F., Feng, J., & Darrell, T. (2019). Few-shot object detection via feature reweighting. In: ICCV. Kang, B., Liu, Z., Wang, X., Yu, F., Feng, J., & Darrell, T. (2019). Few-shot object detection via feature reweighting. In: ICCV.
Zurück zum Zitat Kang, B., Xie, S., Rohrbach, M., Yan, Z., Gordo, A., Feng, J., & Kalantidis, Y. (2019). Decoupling representation and classifier for long-tailed recognition. arXiv:1910.09217. Kang, B., Xie, S., Rohrbach, M., Yan, Z., Gordo, A., Feng, J., & Kalantidis, Y. (2019). Decoupling representation and classifier for long-tailed recognition. arXiv:​1910.​09217.
Zurück zum Zitat Karlinsky, L., Shtok, J., Harary, S., Schwartz, E., Aides, A., Feris, R., Giryes, R., & Bronstein, A. M. (2019). Repmet: Representative-based metric learning for classification and few-shot object detection. In: CVPR. Karlinsky, L., Shtok, J., Harary, S., Schwartz, E., Aides, A., Feris, R., Giryes, R., & Bronstein, A. M. (2019). Repmet: Representative-based metric learning for classification and few-shot object detection. In: CVPR.
Zurück zum Zitat Kaul, P., Xie, W., & Zisserman, A. (2022). Label, verify, correct: A simple few shot object detection method. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 14237–14247). Kaul, P., Xie, W., & Zisserman, A. (2022). Label, verify, correct: A simple few shot object detection method. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 14237–14247).
Zurück zum Zitat Kim, D., Angelova, A., & Kuo, W. (2023). Contrastive feature masking open-vocabulary vision transformer. In: ICCV. Kim, D., Angelova, A., & Kuo, W. (2023). Contrastive feature masking open-vocabulary vision transformer. In: ICCV.
Zurück zum Zitat Kim, J., Kim, T., Kim, S., & Yoo, C. D. (2019). Edge-labeling graph neural network for few-shot learning. In: CVPR. Kim, J., Kim, T., Kim, S., & Yoo, C. D. (2019). Edge-labeling graph neural network for few-shot learning. In: CVPR.
Zurück zum Zitat Kim, B., & Kim, J. (2020). Adjusting decision boundary for class imbalanced learning. IEEE Access, 8, 81674–81685.CrossRef Kim, B., & Kim, J. (2020). Adjusting decision boundary for class imbalanced learning. IEEE Access, 8, 81674–81685.CrossRef
Zurück zum Zitat Koch, G., Zemel, R., & Salakhutdinov, R. (2015). Siamese neural networks for one-shot image recognition. In: ICML Workshop. Koch, G., Zemel, R., & Salakhutdinov, R. (2015). Siamese neural networks for one-shot image recognition. In: ICML Workshop.
Zurück zum Zitat Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.-J., Shamma, D. A., & Bernstein, M. S. (2017). Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision, 123(1), 32–73.MathSciNetCrossRef Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.-J., Shamma, D. A., & Bernstein, M. S. (2017). Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision, 123(1), 32–73.MathSciNetCrossRef
Zurück zum Zitat Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In: NeurIPS. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In: NeurIPS.
Zurück zum Zitat Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Duerig, T., & Ferrari, V. (2018). The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. arXiv:1811.00982. Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Duerig, T., & Ferrari, V. (2018). The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. arXiv:​1811.​00982.
Zurück zum Zitat Lake, B. M., Salakhutdinov, R. R., & Tenenbaum, J. (2013). One-shot learning by inverting a compositional causal process. In: NeurIPS. Lake, B. M., Salakhutdinov, R. R., & Tenenbaum, J. (2013). One-shot learning by inverting a compositional causal process. In: NeurIPS.
Zurück zum Zitat Lake, B., Salakhutdinov, R., Gross, J., & Tenenbaum, J. (2011). One shot learning of simple visual concepts. In: Proceedings of the annual meeting of the cognitive science society, 33. Lake, B., Salakhutdinov, R., Gross, J., & Tenenbaum, J. (2011). One shot learning of simple visual concepts. In: Proceedings of the annual meeting of the cognitive science society, 33.
Zurück zum Zitat Lake, B. M., Salakhutdinov, R., & Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 1332–1338.MathSciNetCrossRef Lake, B. M., Salakhutdinov, R., & Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 1332–1338.MathSciNetCrossRef
Zurück zum Zitat Li, A., & Li, Z. (2021). Transformation invariant few-shot object detection. In: CVPR. Li, A., & Li, Z. (2021). Transformation invariant few-shot object detection. In: CVPR.
Zurück zum Zitat Li, H., Eigen, D., Dodge, S., Zeiler, M., & Wang, X. (2019). Finding task-relevant features for few-shot learning by category traversal. In: CVPR. Li, H., Eigen, D., Dodge, S., Zeiler, M., & Wang, X. (2019). Finding task-relevant features for few-shot learning by category traversal. In: CVPR.
Zurück zum Zitat Li, Z., Hoogs, A., & Xu, C. (2022). Discover and mitigate unknown biases with debiasing alternate networks. In: ECCV. Li, Z., Hoogs, A., & Xu, C. (2022). Discover and mitigate unknown biases with debiasing alternate networks. In: ECCV.
Zurück zum Zitat Li, J., Li, D., Xiong, C., & Hoi, S. (2022). Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In: ICML. Li, J., Li, D., Xiong, C., & Hoi, S. (2022). Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In: ICML.
Zurück zum Zitat Li, J., Selvaraju, R., Gotmare, A., Joty, S., Xiong, C., & Hoi, S. C. H. (2021). Align before fuse: Vision and language representation learning with momentum distillation. In: NeurIPS. Li, J., Selvaraju, R., Gotmare, A., Joty, S., Xiong, C., & Hoi, S. C. H. (2021). Align before fuse: Vision and language representation learning with momentum distillation. In: NeurIPS.
Zurück zum Zitat Li, Y., Wang, T., Kang, B., Tang, S., Wang, C., Li, J., & Feng, J. (2020). Overcoming classifier imbalance for long-tail object detection with balanced group softmax. In: CVPR. Li, Y., Wang, T., Kang, B., Tang, S., Wang, C., Li, J., & Feng, J. (2020). Overcoming classifier imbalance for long-tail object detection with balanced group softmax. In: CVPR.
Zurück zum Zitat Li, W., Wang, L., Xu, J., Huo, J., Yang, G., & Luo, J. (2019). Revisiting local descriptor based image-to-class measure for few-shot learning. In: CVPR. Li, W., Wang, L., Xu, J., Huo, J., Yang, G., & Luo, J. (2019). Revisiting local descriptor based image-to-class measure for few-shot learning. In: CVPR.
Zurück zum Zitat Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., & Yan, J. (2019). SiamRPN++: Evolution of siamese visual tracking with very deep networks. In: CVPR. Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., & Yan, J. (2019). SiamRPN++: Evolution of siamese visual tracking with very deep networks. In: CVPR.
Zurück zum Zitat Li, Y., Xie, S., Chen, X., Dollar, P., He, K., & Girshick, R. (2021). Benchmarking detection transfer learning with vision transformers. arXiv:2111.11429. Li, Y., Xie, S., Chen, X., Dollar, P., He, K., & Girshick, R. (2021). Benchmarking detection transfer learning with vision transformers. arXiv:​2111.​11429.
Zurück zum Zitat Li, F., Zhang, H., Liu, S., Guo, J., Ni, L. M., & Zhang, L. (2022). Dn-detr: Accelerate detr training by introducing query denoising. In: CVPR. Li, F., Zhang, H., Liu, S., Guo, J., Ni, L. M., & Zhang, L. (2022). Dn-detr: Accelerate detr training by introducing query denoising. In: CVPR.
Zurück zum Zitat Li, J., Zhang, Y., Qiang, W., Si, L., Jiao, C., Hu, X., Zheng, C., & Sun, F. (2023). Disentangle and remerge: interventional knowledge distillation for few-shot object detection from a conditional causal perspective. In: AAAI. Li, J., Zhang, Y., Qiang, W., Si, L., Jiao, C., Hu, X., Zheng, C., & Sun, F. (2023). Disentangle and remerge: interventional knowledge distillation for few-shot object detection from a conditional causal perspective. In: AAAI.
Zurück zum Zitat Li, Y., Zhu, H., Cheng, Y., Wang, W., Teo, C. S., Xiang, C., Vadakkepat, P., & Lee, T. H. (2021). Few-shot object detection via classification refinement and distractor retreatment. In: CVPR. Li, Y., Zhu, H., Cheng, Y., Wang, W., Teo, C. S., Xiang, C., Vadakkepat, P., & Lee, T. H. (2021). Few-shot object detection via classification refinement and distractor retreatment. In: CVPR.
Zurück zum Zitat Lifchitz, Y., Avrithis, Y., Picard, S., & Bursuc, A. (2019). Dense classification and implanting for few-shot learning. In: CVPR. Lifchitz, Y., Avrithis, Y., Picard, S., & Bursuc, A. (2019). Dense classification and implanting for few-shot learning. In: CVPR.
Zurück zum Zitat Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In: ICCV. Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In: ICCV.
Zurück zum Zitat Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In: ECCV. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In: ECCV.
Zurück zum Zitat Liu, S. & Huang, D. (2018). Receptive field block net for accurate and fast object detection. In: ECCV. Liu, S. & Huang, D. (2018). Receptive field block net for accurate and fast object detection. In: ECCV.
Zurück zum Zitat Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In: ECCV. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In: ECCV.
Zurück zum Zitat Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., & Zhang, L. (2022). Dab-detr: Dynamic anchor boxes are better queries for detr. arXiv:2201.12329. Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., & Zhang, L. (2022). Dab-detr: Dynamic anchor boxes are better queries for detr. arXiv:​2201.​12329.
Zurück zum Zitat Lu, X., Diao, W., Mao, Y., Li, J., Wang, P., Sun, X., & Fu, K. (2023). Breaking immutable: Information-coupled prototype elaboration for few-shot object detection. In: AAAI. Lu, X., Diao, W., Mao, Y., Li, J., Wang, P., Sun, X., & Fu, K. (2023). Breaking immutable: Information-coupled prototype elaboration for few-shot object detection. In: AAAI.
Zurück zum Zitat Lu, E., Xie, W., & Zisserman, A. (2018). Class-agnostic counting. In: ACCV. Lu, E., Xie, W., & Zisserman, A. (2018). Class-agnostic counting. In: ACCV.
Zurück zum Zitat Ma, C., Jiang, Y., Wen, X., Yuan, Z., & Qi, X. (2023). Codet: Co-occurrence guided region-word alignment for open-vocabulary object detection. arXiv:2310.16667. Ma, C., Jiang, Y., Wen, X., Yuan, Z., & Qi, X. (2023). Codet: Co-occurrence guided region-word alignment for open-vocabulary object detection. arXiv:​2310.​16667.
Zurück zum Zitat Ma, J., Niu, Y., Xu, J., Huang, S., Han, G., & Chang, S.-F. (2023). Digeo: Discriminative geometry-aware learning for generalized few-shot object detection. In: CVPR. Ma, J., Niu, Y., Xu, J., Huang, S., Han, G., & Chang, S.-F. (2023). Digeo: Discriminative geometry-aware learning for generalized few-shot object detection. In: CVPR.
Zurück zum Zitat Michaelis, C., Bethge, M., & Ecker, A. S. (2018). One-shot segmentation in clutter. In: ICML. Michaelis, C., Bethge, M., & Ecker, A. S. (2018). One-shot segmentation in clutter. In: ICML.
Zurück zum Zitat Miller, G. A. (1995). Wordnet: A lexical database for English. Communications of the ACM, 38(11), 39–41.CrossRef Miller, G. A. (1995). Wordnet: A lexical database for English. Communications of the ACM, 38(11), 39–41.CrossRef
Zurück zum Zitat Munkhdalai, T., & Yu, H. (2017). Meta networks. In: ICML. Munkhdalai, T., & Yu, H. (2017). Meta networks. In: ICML.
Zurück zum Zitat Munkhdalai, T., Yuan, X., Mehri, S., & Trischler, A. (2018). Rapid adaptation with conditionally shifted neurons. In: ICML. Munkhdalai, T., Yuan, X., Mehri, S., & Trischler, A. (2018). Rapid adaptation with conditionally shifted neurons. In: ICML.
Zurück zum Zitat Oreshkin, B., López, P. R., & Lacoste, A. (2018). Tadam: Task dependent adaptive metric for improved few-shot learning. In: NeurIPS. Oreshkin, B., López, P. R., & Lacoste, A. (2018). Tadam: Task dependent adaptive metric for improved few-shot learning. In: NeurIPS.
Zurück zum Zitat Pei, W., Wu, S., Mei, D., Chen, F., Tian, J., & Lu, G. (2022). Few-shot object detection by knowledge distillation using bag-of-visual-words representations. In: ECCV. Pei, W., Wu, S., Mei, D., Chen, F., Tian, J., & Lu, G. (2022). Few-shot object detection by knowledge distillation using bag-of-visual-words representations. In: ECCV.
Zurück zum Zitat Qiao, L., Zhao, Y., Li, Z., Qiu, X., Wu, J., & Zhang, C. (2021). Defrcn: Decoupled faster r-cnn for few-shot object detection. In: ICCV. Qiao, L., Zhao, Y., Li, Z., Qiu, X., Wu, J., & Zhang, C. (2021). Defrcn: Decoupled faster r-cnn for few-shot object detection. In: ICCV.
Zurück zum Zitat Qin, X., Zhang, Z., Huang, C., Dehghan, M., Zaiane, O. R., & Jagersand, M. (2020). U2-net: Going deeper with nested u-structure for salient object detection. Pattern Recognition. Qin, X., Zhang, Z., Huang, C., Dehghan, M., Zaiane, O. R., & Jagersand, M. (2020). U2-net: Going deeper with nested u-structure for salient object detection. Pattern Recognition.
Zurück zum Zitat Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., & Krueger, G. (2021). Learning transferable visual models from natural language supervision. In: ICML. Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., & Krueger, G. (2021). Learning transferable visual models from natural language supervision. In: ICML.
Zurück zum Zitat Ravi, S., & Larochelle, H. (2017). Optimization as a model for few-shot learning. In: ICLR. Ravi, S., & Larochelle, H. (2017). Optimization as a model for few-shot learning. In: ICLR.
Zurück zum Zitat Redmon, J., & Farhadi, A. (2017). Yolo9000: Better, faster, stronger. In: CVPR. Redmon, J., & Farhadi, A. (2017). Yolo9000: Better, faster, stronger. In: CVPR.
Zurück zum Zitat Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In: CVPR. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In: CVPR.
Zurück zum Zitat Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In: NeurIPS. Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In: NeurIPS.
Zurück zum Zitat Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., & Lillicrap, T. (2016). Meta-learning with memory-augmented neural networks. In: ICML. Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., & Lillicrap, T. (2016). Meta-learning with memory-augmented neural networks. In: ICML.
Zurück zum Zitat Schwartz, E., Karlinsky, L., Shtok, J., Harary, S., Marder, M., Pankanti, S., Feris, R., Kumar, A., Giries, R., & Bronstein, A. M. (2019). Repmet: Representative-based metric learning for classification and one-shot object detection. In: CVPR. Schwartz, E., Karlinsky, L., Shtok, J., Harary, S., Marder, M., Pankanti, S., Feris, R., Kumar, A., Giries, R., & Bronstein, A. M. (2019). Repmet: Representative-based metric learning for classification and one-shot object detection. In: CVPR.
Zurück zum Zitat Shi, C., & Yang, S. (2023). Edadet: Open-vocabulary object detection using early dense alignment. In: ICCV. Shi, C., & Yang, S. (2023). Edadet: Open-vocabulary object detection using early dense alignment. In: ICCV.
Zurück zum Zitat Shu, M., Nie, W., Huang, D.-A., Yu, Z., Goldstein, T., Anandkumar, A., & Xiao, C. (2022). Test-time prompt tuning for zero-shot generalization in vision-language models. In: NeurIPS. Shu, M., Nie, W., Huang, D.-A., Yu, Z., Goldstein, T., Anandkumar, A., & Xiao, C. (2022). Test-time prompt tuning for zero-shot generalization in vision-language models. In: NeurIPS.
Zurück zum Zitat Singh, A., Hu, R., Goswami, V., Couairon, G., Galuba, W., Rohrbach, M., & Kiela, D. (2022). Flava: A foundational language and vision alignment model. In: CVPR. Singh, A., Hu, R., Goswami, V., Couairon, G., Galuba, W., Rohrbach, M., & Kiela, D. (2022). Flava: A foundational language and vision alignment model. In: CVPR.
Zurück zum Zitat Singh, K. K., Mahajan, D., Grauman, K., Lee, Y. J., Feiszli, M., & Ghadiyaram, D. (2020). Don’t judge an object by its context: Learning to overcome contextual bias. In: CVPR. Singh, K. K., Mahajan, D., Grauman, K., Lee, Y. J., Feiszli, M., & Ghadiyaram, D. (2020). Don’t judge an object by its context: Learning to overcome contextual bias. In: CVPR.
Zurück zum Zitat Singh, B., Najibi, M., & Davis, L. S. (2018). Sniper: Efficient multi-scale training. In: NeurIPS. Singh, B., Najibi, M., & Davis, L. S. (2018). Sniper: Efficient multi-scale training. In: NeurIPS.
Zurück zum Zitat Snell, J., Swersky, K., & Zemel, R. (2017). Prototypical networks for few-shot learning. In: NeurIPS. Snell, J., Swersky, K., & Zemel, R. (2017). Prototypical networks for few-shot learning. In: NeurIPS.
Zurück zum Zitat Sun, B., Li, B., Cai, S., Yuan, Y., & Zhang, C. (2021). Fsce: Few-shot object detection via contrastive proposal encoding. In: CVPR. Sun, B., Li, B., Cai, S., Yuan, Y., & Zhang, C. (2021). Fsce: Few-shot object detection via contrastive proposal encoding. In: CVPR.
Zurück zum Zitat Tan, J., Wang, C., Li, B., Li, Q., Ouyang, W., Yin, C., & Yan, J. (2020). Equalization loss for long-tailed object recognition. In: CVPR. Tan, J., Wang, C., Li, B., Li, Q., Ouyang, W., Yin, C., & Yan, J. (2020). Equalization loss for long-tailed object recognition. In: CVPR.
Zurück zum Zitat Tao, Y., Sun, J., Yang, H., Chen, L., Wang, X., Yang, W., Du, D., & Zheng, M. (2023). Local and global logit adjustments for long-tailed learning. In: ICCV. Tao, Y., Sun, J., Yang, H., Chen, L., Wang, X., Yang, W., Du, D., & Zheng, M. (2023). Local and global logit adjustments for long-tailed learning. In: ICCV.
Zurück zum Zitat Thrun, S. (1996). Is learning the n-th thing any easier than learning the first? In: NeurIPS. Thrun, S. (1996). Is learning the n-th thing any easier than learning the first? In: NeurIPS.
Zurück zum Zitat Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., & Rodriguez, A. (2023). Llama: Open and efficient foundation language models. arXiv:2302.13971. Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., & Rodriguez, A. (2023). Llama: Open and efficient foundation language models. arXiv:​2302.​13971.
Zurück zum Zitat Triantafillou, E., Zemel, R., & Urtasun, R. (2017). Few-shot learning through an information retrieval lens. In: NeurIPS. Triantafillou, E., Zemel, R., & Urtasun, R. (2017). Few-shot learning through an information retrieval lens. In: NeurIPS.
Zurück zum Zitat Uijlings, J. R., Van De Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. International Journal of Computer Vision, 104(2), 154–171.CrossRef Uijlings, J. R., Van De Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. International Journal of Computer Vision, 104(2), 154–171.CrossRef
Zurück zum Zitat Vinyals, O., Blundell, C., Lillicrap, T., & Wierstra, D. (2016). Matching networks for one shot learning. In: NeurIPS. Vinyals, O., Blundell, C., Lillicrap, T., & Wierstra, D. (2016). Matching networks for one shot learning. In: NeurIPS.
Zurück zum Zitat Vioda, P. (2001). Rapid object detection using a boosted cascade of simple features. In: CVPR. Vioda, P. (2001). Rapid object detection using a boosted cascade of simple features. In: CVPR.
Zurück zum Zitat Wang, T. (2023). Learning to detect and segment for open vocabulary object detection. In: CVPR. Wang, T. (2023). Learning to detect and segment for open vocabulary object detection. In: CVPR.
Zurück zum Zitat Wang, W., Bao, H., Dong, L., Bjorck, J., Peng, Z., Liu, Q., Aggarwal, K., Mohammed, O.K., Singhal, S., Som, S., & Wei, F. (2023). Image as a foreign language: Beit pretraining for vision and vision-language tasks. In: CVPR. Wang, W., Bao, H., Dong, L., Bjorck, J., Peng, Z., Liu, Q., Aggarwal, K., Mohammed, O.K., Singhal, S., Som, S., & Wei, F. (2023). Image as a foreign language: Beit pretraining for vision and vision-language tasks. In: CVPR.
Zurück zum Zitat Wang, Y., Fei, J., Wang, H., Li, W., Bao, T., Wu, L., Zhao, R., & Shen, Y. (2023). Balancing logit variation for long-tailed semantic segmentation. In: CVPR. Wang, Y., Fei, J., Wang, H., Li, W., Bao, T., Wu, L., Zhao, R., & Shen, Y. (2023). Balancing logit variation for long-tailed semantic segmentation. In: CVPR.
Zurück zum Zitat Wang, Y.-X., Girshick, R., Hebert, M., & Hariharan, B. (2018). Low-shot learning from imaginary data. In: CVPR. Wang, Y.-X., Girshick, R., Hebert, M., & Hariharan, B. (2018). Low-shot learning from imaginary data. In: CVPR.
Zurück zum Zitat Wang, X., Huang, T. E., Darrell, T., Gonzalez, J. E., & Yu, F. (2020). Frustratingly simple few-shot object detection. In: ICML. Wang, X., Huang, T. E., Darrell, T., Gonzalez, J. E., & Yu, F. (2020). Frustratingly simple few-shot object detection. In: ICML.
Zurück zum Zitat Wang, T., Li, Y., Kang, B., Li, J., Liew, J., Tang, S., Hoi, S., & Feng, J. (2020). The devil is in classification: A simple framework for long-tail instance segmentation. In: ECCV. Wang, T., Li, Y., Kang, B., Li, J., Liew, J., Tang, S., Hoi, S., & Feng, J. (2020). The devil is in classification: A simple framework for long-tail instance segmentation. In: ECCV.
Zurück zum Zitat Wang, Z., Yu, J., Yu, A. W., Dai, Z., Tsvetkov, Y., & Cao, Y. (2021). Simvlm: Simple visual language model pretraining with weak supervision. arXiv:2108.10904. Wang, Z., Yu, J., Yu, A. W., Dai, Z., Tsvetkov, Y., & Cao, Y. (2021). Simvlm: Simple visual language model pretraining with weak supervision. arXiv:​2108.​10904.
Zurück zum Zitat Wang, J., Zhang, W., Zang, Y., Cao, Y., Pang, J., Gong, T., Chen, K., Liu, Z., Loy, C. C., & Lin, D. (2021). Seesaw loss for long-tailed instance segmentation. In: CVPR. Wang, J., Zhang, W., Zang, Y., Cao, Y., Pang, J., Gong, T., Chen, K., Liu, Z., Loy, C. C., & Lin, D. (2021). Seesaw loss for long-tailed instance segmentation. In: CVPR.
Zurück zum Zitat Wong, A., & Yuille, A. L. (2015). One shot learning via compositions of meaningful patches. In: ICCV. Wong, A., & Yuille, A. L. (2015). One shot learning via compositions of meaningful patches. In: ICCV.
Zurück zum Zitat Wu, A., Han, Y., Zhu, L., & Yang, Y. (2021). Universal-prototype enhancing for few-shot object detection. In: ICCV. Wu, A., Han, Y., Zhu, L., & Yang, Y. (2021). Universal-prototype enhancing for few-shot object detection. In: ICCV.
Zurück zum Zitat Wu, J., Liu, S., Huang, D., & Wang, Y. (2020). Multi-scale positive sample refinement for few-shot object detection. In: ECCV. Wu, J., Liu, S., Huang, D., & Wang, Y. (2020). Multi-scale positive sample refinement for few-shot object detection. In: ECCV.
Zurück zum Zitat Wu, S., Zhang, W., Jin, S., Liu, W., & Loy, C. C. (2023). Aligning bag of regions for open-vocabulary object detection. In: CVPR. Wu, S., Zhang, W., Jin, S., Liu, W., & Loy, C. C. (2023). Aligning bag of regions for open-vocabulary object detection. In: CVPR.
Zurück zum Zitat Xiao, Y., & Marlet, R. (2020). Few-shot object detection and viewpoint estimation for objects in the wild. In: ECCV. Xiao, Y., & Marlet, R. (2020). Few-shot object detection and viewpoint estimation for objects in the wild. In: ECCV.
Zurück zum Zitat Xu, J., Le, H., & Samaras, D. (2023). Generating features with increased crop-related diversity for few-shot object detection. In: CVPR. Xu, J., Le, H., & Samaras, D. (2023). Generating features with increased crop-related diversity for few-shot object detection. In: CVPR.
Zurück zum Zitat Yan, X., Chen, Z., Xu, A., Wang, X., Liang, X., & Lin, L. (2019). Meta r-cnn: Towards general solver for instance-level low-shot learning. In: ICCV. Yan, X., Chen, Z., Xu, A., Wang, X., Liang, X., & Lin, L. (2019). Meta r-cnn: Towards general solver for instance-level low-shot learning. In: ICCV.
Zurück zum Zitat Yang, J., Li, C., Zhang, P., Xiao, B., Liu, C., Yuan, L., & Gao, J. (2022). Unified contrastive learning in image-text-label space. In: CVPR. Yang, J., Li, C., Zhang, P., Xiao, B., Liu, C., Yuan, L., & Gao, J. (2022). Unified contrastive learning in image-text-label space. In: CVPR.
Zurück zum Zitat Yang, Y., Wei, F., Shi, M., & Li, G. (2020). Restoring negative information in few-shot object detection. In: NeurIPS. Yang, Y., Wei, F., Shi, M., & Li, G. (2020). Restoring negative information in few-shot object detection. In: NeurIPS.
Zurück zum Zitat Yang, F. S. Y., Zhang, L., Xiang, T., Torr, P. H., & Hospedales, T. M. (2018). Learning to compare: Relation network for few-shot learning. In: CVPR. Yang, F. S. Y., Zhang, L., Xiang, T., Torr, P. H., & Hospedales, T. M. (2018). Learning to compare: Relation network for few-shot learning. In: CVPR.
Zurück zum Zitat Yao, L., Han, J., Liang, X., Xu, D., Zhang, W., Li, Z., & Xu, H. (2023). Detclipv2: Scalable open-vocabulary object detection pre-training via word-region alignment. In: CVPR. Yao, L., Han, J., Liang, X., Xu, D., Zhang, W., Li, Z., & Xu, H. (2023). Detclipv2: Scalable open-vocabulary object detection pre-training via word-region alignment. In: CVPR.
Zurück zum Zitat Yao, L., Han, J., Wen, Y., Liang, X., Xu, D., Zhang, W., Li, Z., Xu, C., & Xu, H. (2022). Detclip: Dictionary-enriched visual-concept paralleled pre-training for open-world detection. In: NeurIPS. Yao, L., Han, J., Wen, Y., Liang, X., Xu, D., Zhang, W., Li, Z., Xu, C., & Xu, H. (2022). Detclip: Dictionary-enriched visual-concept paralleled pre-training for open-world detection. In: NeurIPS.
Zurück zum Zitat Yuan, L., Chen, D., Chen, Y.-L., Codella, N., Dai, X., Gao, J., Hu, H., Huang, X., Li, B., Li, C., & Liu, C. (2021). Florence: A new foundation model for computer vision. arXiv:2111.11432. Yuan, L., Chen, D., Chen, Y.-L., Codella, N., Dai, X., Gao, J., Hu, H., Huang, X., Li, B., Li, C., & Liu, C. (2021). Florence: A new foundation model for computer vision. arXiv:​2111.​11432.
Zurück zum Zitat Zang, Y., Li, W., Zhou, K., Huang, C., & Loy, C.C. (2022). Open-vocabulary detr with conditional matching. In: ECCV. Zang, Y., Li, W., Zhou, K., Huang, C., & Loy, C.C. (2022). Open-vocabulary detr with conditional matching. In: ECCV.
Zurück zum Zitat Zhang, W., & Wang, Y.-X. (2021). Hallucination improves few-shot object detection. In: CVPR. Zhang, W., & Wang, Y.-X. (2021). Hallucination improves few-shot object detection. In: CVPR.
Zurück zum Zitat Zhang, G., Cui, K., Wu, R., Lu, S., & Tian, Y. (2021). PNPDet: efficient few-shot detection without forgetting via plug-and-play sub-networks. In: WACV. Zhang, G., Cui, K., Wu, R., Lu, S., & Tian, Y. (2021). PNPDet: efficient few-shot detection without forgetting via plug-and-play sub-networks. In: WACV.
Zurück zum Zitat Zhang, H., Dana, K., Shi, J., Zhang, Z., Wang, X., Tyagi, A., & Agrawal, A. (2018). Context encoding for semantic segmentation. In: CVPR. Zhang, H., Dana, K., Shi, J., Zhang, Z., Wang, X., Tyagi, A., & Agrawal, A. (2018). Context encoding for semantic segmentation. In: CVPR.
Zurück zum Zitat Zhang, R., Hu, X., Li, B., Huang, S., Deng, H., Qiao, Y., Gao, P., & Li, H. (2023). Prompt, generate, then cache: Cascade of foundation models makes strong few-shot learners. In: CVPR. Zhang, R., Hu, X., Li, B., Huang, S., Deng, H., Qiao, Y., Gao, P., & Li, H. (2023). Prompt, generate, then cache: Cascade of foundation models makes strong few-shot learners. In: CVPR.
Zurück zum Zitat Zhang, H., Li, F., Liu, S., Zhang, L., Su, H., Zhu, J., Ni, L. M., & Shum, H.-Y. (2022). Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv:2203.03605. Zhang, H., Li, F., Liu, S., Zhang, L., Su, H., Zhu, J., Ni, L. M., & Shum, H.-Y. (2022). Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv:​2203.​03605.
Zurück zum Zitat Zhang, R., Zhang, W., Fang, R., Gao, P., Li, K., Dai, J., Qiao, Y., & Li, H. (2022). Tip-adapter: Training-free adaption of clip for few-shot classification. In: ECCV. Zhang, R., Zhang, W., Fang, R., Gao, P., Li, K., Dai, J., Qiao, Y., & Li, H. (2022). Tip-adapter: Training-free adaption of clip for few-shot classification. In: ECCV.
Zurück zum Zitat Zhao, Y., Chen, W., Tan, X., Huang, K., & Zhu, J. (2022). Adaptive logit adjustment loss for long-tailed visual recognition. In: AAAI. Zhao, Y., Chen, W., Tan, X., Huang, K., & Zhu, J. (2022). Adaptive logit adjustment loss for long-tailed visual recognition. In: AAAI.
Zurück zum Zitat Zhao, L., Teng, Y., & Wang, L. (2024). Logit normalization for long-tail object detection. International Journal of Computer Vision. Zhao, L., Teng, Y., & Wang, L. (2024). Logit normalization for long-tail object detection. International Journal of Computer Vision.
Zurück zum Zitat Zhong, Y., Yang, J., Zhang, P., Li, C., Codella, N., Li, L.H., Zhou, L., Dai, X., Yuan, L., Li, Y., & Gao, J. (2022). Regionclip: Region-based language-image pretraining. In: CVPR. Zhong, Y., Yang, J., Zhang, P., Li, C., Codella, N., Li, L.H., Zhou, L., Dai, X., Yuan, L., Li, Y., & Gao, J. (2022). Regionclip: Region-based language-image pretraining. In: CVPR.
Zurück zum Zitat Zhou, X., Girdhar, R., Joulin, A., Krähenbühl, P., & Misra, I. (2022). Detecting twenty-thousand classes using image-level supervision. In: ECCV. Zhou, X., Girdhar, R., Joulin, A., Krähenbühl, P., & Misra, I. (2022). Detecting twenty-thousand classes using image-level supervision. In: ECCV.
Zurück zum Zitat Zhou, K., Yang, J., Loy, C. C., & Liu, Z. (2022). Conditional prompt learning for vision-language models. In: CVPR. Zhou, K., Yang, J., Loy, C. C., & Liu, Z. (2022). Conditional prompt learning for vision-language models. In: CVPR.
Zurück zum Zitat Zhou, K., Yang, J., Loy, C. C., & Liu, Z. (2022). Learning to prompt for vision-language models. International Journal of Computer Vision, 130, 2337–2348.CrossRef Zhou, K., Yang, J., Loy, C. C., & Liu, Z. (2022). Learning to prompt for vision-language models. International Journal of Computer Vision, 130, 2337–2348.CrossRef
Zurück zum Zitat Zhu, C., Chen, F., Ahmed, U., & Savvides, M. (2021). Semantic relation reasoning for shot-stable few-shot object detection. In: CVPR. Zhu, C., Chen, F., Ahmed, U., & Savvides, M. (2021). Semantic relation reasoning for shot-stable few-shot object detection. In: CVPR.
Zurück zum Zitat Zhu, X., Su, W., Lu, L., Li, B., Wang, X., & Dai, J. (2020). Deformable detr: Deformable transformers for end-to-end object detection. arXiv:2010.04159. Zhu, X., Su, W., Lu, L., Li, B., Wang, X., & Dai, J. (2020). Deformable detr: Deformable transformers for end-to-end object detection. arXiv:​2010.​04159.
Zurück zum Zitat Zong, Z., Song, G., & Liu, Y. (2023). Detrs with collaborative hybrid assignments training. In: ICCV. Zong, Z., Song, G., & Liu, Y. (2023). Detrs with collaborative hybrid assignments training. In: ICCV.
Metadaten
Titel
FSODv2: A Deep Calibrated Few-Shot Object Detection Network
verfasst von
Qi Fan
Wei Zhuo
Chi-Keung Tang
Yu-Wing Tai
Publikationsdatum
04.04.2024
Verlag
Springer US
Erschienen in
International Journal of Computer Vision
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-024-02049-z

Premium Partner