nach oben

International Journal of Computer Vision

04.04.2024

FSODv2: A Deep Calibrated Few-Shot Object Detection Network

verfasst von: Qi Fan, Wei Zhuo, Chi-Keung Tang, Yu-Wing Tai

Erschienen in: International Journal of Computer Vision

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Traditional methods for object detection typically necessitate a substantial amount of training data, and creating high-quality training data is time-consuming. We propose a novel Few-Shot Object Detection network (FSODv2) in this paper that aims to detect objects from previously unseen categories using only a few annotated examples. Attention RPN, Multi-Relation Detector, and Contrastive Training strategy are central to our method (Fan et al., in: CVPR, 2020), which exploit similarity between few shot support set and query set to detect novel objects while suppressing false detection in the background. We also contribute a new dataset, FSOD-1k, which contains 1000 categories of various objects with high-quality annotations to train our network. To the best of our knowledge, this is one of the first datasets designed for few-shot object detection. This paper improves our FSOD model through well-designed model calibration in three areas: (1) we propose an improved FPN with multi-scale support inputs to calibrate the multi-scale support-query feature matching by exploiting multi-scale features from the same support image with different input scales; (2) we introduce a support classification supervision branch to calibrate the support feature supervision, aligning to the query feature training supervision; (3) we propose backbone calibration to preserve prior knowledge while alleviating backbone bias toward base classes by employing classification dataset to help our model calibration procedure, where such dataset has previously only been used for pre-training in other related works. Besides, we propose a Fast Attention RPN to improve evaluation speed and save computational memory during inference. Once trained, our few-shot network can detect objects from previously unseen categories without further training or fine-tuning, resulting in new state-of-the-art performance on different datasets in the few-shot setting. Our method is general in scope and has numerous potential applications. The dataset link is https://github.com/fanq15/Few-Shot-Object-Detection-Dataset.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

The fine-tuning stage benefits from more ways during the multi-way training, so we use as many ways as possible to fill up the GPU memory.

Since Feature Reweighting and Meta R-CNN are evaluated on MS COCO, in this subsection we discard pre-training on Lin et al. (2014) for fair comparison to follow the same experimental setting as described.

We also discard the MS COCO pretraining in this experiment.

Alayrac, J.-B., Donahue, J., Luc, P., Miech, A., Barr, I., Hasson, Y., Lenc, K., Mensch, A., Millican, K., Reynolds, M., & Ring, R. (2022). Flamingo: A visual language model for few-shot learning. In: NeurIPS.

Arteta, C., Lempitsky, V., & Zisserman, A. (2016). Counting in the wild. In: ECCV.

Bertinetto, L., Valmadre, J., Henriques, J. F., Vedaldi, A., & Torr, P. H. (2016). Fully-convolutional siamese networks for object tracking. In: ECCV.

Bodla, N., Singh, B., Chellappa, R., & Davis, L. S. (2017). Soft-NMS—improving object detection with one line of code. In: ICCV.

Buda, M., Maki, A., & Mazurowski, M. A. (2018). A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks.

Bulat, A., Guerrero, R., Martinez, B., & Tzimiropoulos, G. (2023). FS-DETR: Few-shot detection transformer with prompting and without re-training. In: ICCV.

Cai, Q., Pan, Y., Yao, T., Yan, C., & Mei, T. (2018). Memory matching networks for one-shot image recognition. In: CVPR.

Cao, Y., Wang, J., Jin, Y., Wu, T., Chen, K., Liu, Z., & Lin, D. (2021). Few-shot object detection via association and discrimination. In: NeurIPS.

Cao, Y., Wang, J., Lin, Y., & Lin, D. (2022). Mini: Mining implicit novel instances for few-shot object detection. arXiv:2205.03381.

Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-end object detection with transformers. In: ECCV.

Chen, Q., Chen, X., Wang, J., Zhang, S., Yao, K., Feng, H., Han, J., Ding, E., Zeng, G., & Wang, J. (2023). Group detr: Fast detr training with group-wise one-to-many assignment. In: ICCV.

Chen, Y., Li, W., Sakaridis, C., Dai, D., & Van Gool, L. (2018). Domain adaptive faster r-cnn for object detection in the wild. In: CVPR.

Chen, H., Wang, Y., Wang, G., & Qiao, Y. (2018). Lstd: a low-shot transfer detector for object detection. In: AAAI.

Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In: CVPR.

Dai, X., Chen, Y., Yang, J., Zhang, P., Yuan, L., & Zhang, L. (2021). Dynamic detr: End-to-end object detection with dynamic attention. In: ICCV.

Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In: CVPR.

Demirel, B., Baran, O. B., & Cinbis, R. G. (2023). Meta-tuning loss functions and data augmentation for few-shot object detection. In: CVPR.

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009) Imagenet: a large-scale hierarchical image database. In: CVPR.

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805.

Dong, N., & Xing, E.P. (2018). Few-shot semantic segmentation with prototype learning. In: BMVC.

Dong, X., Zheng, L., Ma, F., Yang, Y., & Meng, D. (2018). Few-example object detection with model communication. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(7), 1641–1654.CrossRef

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., & Uszkoreit, J. (2020). An image is worth \(16\times 16\) words: Transformers for image recognition at scale. arXiv:2010.11929.

Du, J., Zhang, S., Chen, Q., Le, H., Sun, Y., Ni, Y., Wang, J., He, B., & Wang, J. (2023). \(\sigma \)-adaptive decoupled prototype for few-shot object detection. In: ICCV.

Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.CrossRef

Fan, Z., Ma, Y., Li, Z., & Sun, J. (2021). Generalized few-shot object detection without forgetting. In: CVPR.

Fan, Q., Pei, W., Tai, Y.-W., & Tang, C.-K. (2022). Self-support few-shot semantic segmentation. In: ECCV.

Fan, Q., Segu, M., Tai, Y.-W., Yu, F., Tang, C.-K., Schiele, B., & Dai, D. (2023). Towards robust object detection invariant to real-world domain shifts. In: ICLR.

Fan, Q., Tang, C.-K., & Tai, Y.-W. (2021). Few-shot video object detection. arXiv:2104.14805.

Fan, Q., Tang, C.-K., & Tai, Y.-W. (2022). Few-shot object detection with model calibration. In: ECCV.

Fan, Q., Zhuo, W., Tang, C.-K., & Tai, Y.-W. (2020). Few-shot object detection with attention-RPN and multi-relation detector. In: CVPR.

Fei-Fei, L., Fergus, R., & Perona, P. (2006). One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4), 594–611.CrossRef

Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.CrossRef

Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML.

Gao, P., Geng, S., Zhang, R., Ma, T., Fang, R., Zhang, Y., Li, H., & Qiao, Y. (2023). Clip-adapter: Better vision-language models with feature adapters. International Journal of Computer Vision.

Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? The Kitti vision benchmark suite. In: CVPR.

Gidaris, S., & Komodakis, N. (2019). Generating classification weights with gnn denoising autoencoders for few-shot learning. In: CVPR.

Girshick, R. (2015). Fast r-cnn. In: ICCV.

Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR.

Gu, X., Lin, T.-Y., Kuo, W., & Cui, Y. (2021). Open-vocabulary object detection via vision and language knowledge distillation. arXiv:2104.13921.

Gui, L.-Y., Wang, Y.-X., Ramanan, D., & Moura, J. M. F. (2018). Few-shot human motion prediction via meta-learning. In: ECCV.

Guirguis, K., Meier, J., Eskandar, G., Kayser, M., Yang, B., & Beyerer, J. (2023). Niff: Alleviating forgetting in generalized few-shot object detection via neural instance feature forging. In: CVPR.

Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On calibration of modern neural networks. In: ICML.

Gupta, A., Dollar, P., & Girshick, R. (2019). Lvis: A dataset for large vocabulary instance segmentation. In: CVPR.

Han, G., He, Y., Huang, S., Ma, J., & Chang, S.-F. (2021). Query adaptive few-shot object detection with heterogeneous graph convolutional networks. In: ICCV.

Han, J., Ren, Y., Ding, J., Yan, K., & Xia, G.-S. (2023). Few-shot object detection via variational feature aggregation. In: AAAI.

Hariharan, B., & Girshick, R. (2017). Low-shot visual recognition by shrinking and hallucinating features. In: ICCV.

He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. In: CVPR.

He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In: ICCV.

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: CVPR.

Hénaff, O.J., Koppula, S., Alayrac, J.-B., Oord, A., Vinyals, O., & Carreira, J. (2021). Efficient visual pretraining with contrastive detection. In: ICCV.

Hu, H., Bai, S., Li, A., Cui, J., & Wang, L. (2021). Dense relation distillation with context-aware aggregation for few-shot object detection. In: CVPR.

Hu, X., Jiang, Y., Tang, K., Chen, J., Miao, C., & Zhang, H. (2020). Learning to segment the tail. In: CVPR.

Hu, T., Yang, P., Zhang, C., Yu, G., Mu, Y., & Snoek, C. G. M. (2019). Attention-based multi-context guiding for few-shot semantic segmentation. In: AAAI.

Jia, C., Yang, Y., Xia, Y., Chen, Y.-T., Parekh, Z., Pham, H., Le, Q., Sung, Y.-H., Li, Z., & Duerig, T. (2021). Scaling up visual and vision-language representation learning with noisy text supervision. In: ICML.

Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., & Hu, H. (2023). Detrs with hybrid matching. In: CVPR.

Kang, B., Liu, Z., Wang, X., Yu, F., Feng, J., & Darrell, T. (2019). Few-shot object detection via feature reweighting. In: ICCV.

Kang, B., Xie, S., Rohrbach, M., Yan, Z., Gordo, A., Feng, J., & Kalantidis, Y. (2019). Decoupling representation and classifier for long-tailed recognition. arXiv:1910.09217.

Karlinsky, L., Shtok, J., Harary, S., Schwartz, E., Aides, A., Feris, R., Giryes, R., & Bronstein, A. M. (2019). Repmet: Representative-based metric learning for classification and few-shot object detection. In: CVPR.

Kaul, P., Xie, W., & Zisserman, A. (2022). Label, verify, correct: A simple few shot object detection method. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 14237–14247).

Kim, D., Angelova, A., & Kuo, W. (2023). Contrastive feature masking open-vocabulary vision transformer. In: ICCV.

Kim, J., Kim, T., Kim, S., & Yoo, C. D. (2019). Edge-labeling graph neural network for few-shot learning. In: CVPR.

Kim, B., & Kim, J. (2020). Adjusting decision boundary for class imbalanced learning. IEEE Access, 8, 81674–81685.CrossRef

Koch, G., Zemel, R., & Salakhutdinov, R. (2015). Siamese neural networks for one-shot image recognition. In: ICML Workshop.

Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.-J., Shamma, D. A., & Bernstein, M. S. (2017). Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision, 123(1), 32–73.MathSciNetCrossRef

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In: NeurIPS.

Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Duerig, T., & Ferrari, V. (2018). The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. arXiv:1811.00982.

Lake, B. M., Salakhutdinov, R. R., & Tenenbaum, J. (2013). One-shot learning by inverting a compositional causal process. In: NeurIPS.

Lake, B., Salakhutdinov, R., Gross, J., & Tenenbaum, J. (2011). One shot learning of simple visual concepts. In: Proceedings of the annual meeting of the cognitive science society, 33.

Lake, B. M., Salakhutdinov, R., & Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 1332–1338.MathSciNetCrossRef

Li, A., & Li, Z. (2021). Transformation invariant few-shot object detection. In: CVPR.

Li, H., Eigen, D., Dodge, S., Zeiler, M., & Wang, X. (2019). Finding task-relevant features for few-shot learning by category traversal. In: CVPR.

Li, Z., Hoogs, A., & Xu, C. (2022). Discover and mitigate unknown biases with debiasing alternate networks. In: ECCV.

Li, J., Li, D., Xiong, C., & Hoi, S. (2022). Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In: ICML.

Li, J., Selvaraju, R., Gotmare, A., Joty, S., Xiong, C., & Hoi, S. C. H. (2021). Align before fuse: Vision and language representation learning with momentum distillation. In: NeurIPS.

Li, Y., Wang, T., Kang, B., Tang, S., Wang, C., Li, J., & Feng, J. (2020). Overcoming classifier imbalance for long-tail object detection with balanced group softmax. In: CVPR.

Li, W., Wang, L., Xu, J., Huo, J., Yang, G., & Luo, J. (2019). Revisiting local descriptor based image-to-class measure for few-shot learning. In: CVPR.

Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., & Yan, J. (2019). SiamRPN++: Evolution of siamese visual tracking with very deep networks. In: CVPR.

Li, Y., Xie, S., Chen, X., Dollar, P., He, K., & Girshick, R. (2021). Benchmarking detection transfer learning with vision transformers. arXiv:2111.11429.

Li, F., Zhang, H., Liu, S., Guo, J., Ni, L. M., & Zhang, L. (2022). Dn-detr: Accelerate detr training by introducing query denoising. In: CVPR.

Li, J., Zhang, Y., Qiang, W., Si, L., Jiao, C., Hu, X., Zheng, C., & Sun, F. (2023). Disentangle and remerge: interventional knowledge distillation for few-shot object detection from a conditional causal perspective. In: AAAI.

Li, Y., Zhu, H., Cheng, Y., Wang, W., Teo, C. S., Xiang, C., Vadakkepat, P., & Lee, T. H. (2021). Few-shot object detection via classification refinement and distractor retreatment. In: CVPR.

Lifchitz, Y., Avrithis, Y., Picard, S., & Bursuc, A. (2019). Dense classification and implanting for few-shot learning. In: CVPR.

Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In: ICCV.

Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In: ECCV.

Liu, S. & Huang, D. (2018). Receptive field block net for accurate and fast object detection. In: ECCV.

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In: ECCV.

Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., & Zhang, L. (2022). Dab-detr: Dynamic anchor boxes are better queries for detr. arXiv:2201.12329.

Lu, X., Diao, W., Mao, Y., Li, J., Wang, P., Sun, X., & Fu, K. (2023). Breaking immutable: Information-coupled prototype elaboration for few-shot object detection. In: AAAI.

Lu, E., Xie, W., & Zisserman, A. (2018). Class-agnostic counting. In: ACCV.

Ma, C., Jiang, Y., Wen, X., Yuan, Z., & Qi, X. (2023). Codet: Co-occurrence guided region-word alignment for open-vocabulary object detection. arXiv:2310.16667.

Ma, J., Niu, Y., Xu, J., Huang, S., Han, G., & Chang, S.-F. (2023). Digeo: Discriminative geometry-aware learning for generalized few-shot object detection. In: CVPR.

Michaelis, C., Bethge, M., & Ecker, A. S. (2018). One-shot segmentation in clutter. In: ICML.

Miller, G. A. (1995). Wordnet: A lexical database for English. Communications of the ACM, 38(11), 39–41.CrossRef

Munkhdalai, T., & Yu, H. (2017). Meta networks. In: ICML.

Munkhdalai, T., Yuan, X., Mehri, S., & Trischler, A. (2018). Rapid adaptation with conditionally shifted neurons. In: ICML.

Oreshkin, B., López, P. R., & Lacoste, A. (2018). Tadam: Task dependent adaptive metric for improved few-shot learning. In: NeurIPS.

Pei, W., Wu, S., Mei, D., Chen, F., Tian, J., & Lu, G. (2022). Few-shot object detection by knowledge distillation using bag-of-visual-words representations. In: ECCV.

Qiao, L., Zhao, Y., Li, Z., Qiu, X., Wu, J., & Zhang, C. (2021). Defrcn: Decoupled faster r-cnn for few-shot object detection. In: ICCV.

Qin, X., Zhang, Z., Huang, C., Dehghan, M., Zaiane, O. R., & Jagersand, M. (2020). U2-net: Going deeper with nested u-structure for salient object detection. Pattern Recognition.

Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., & Krueger, G. (2021). Learning transferable visual models from natural language supervision. In: ICML.

Ravi, S., & Larochelle, H. (2017). Optimization as a model for few-shot learning. In: ICLR.

Redmon, J., & Farhadi, A. (2017). Yolo9000: Better, faster, stronger. In: CVPR.

Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In: CVPR.

Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In: NeurIPS.

Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., & Lillicrap, T. (2016). Meta-learning with memory-augmented neural networks. In: ICML.

Schwartz, E., Karlinsky, L., Shtok, J., Harary, S., Marder, M., Pankanti, S., Feris, R., Kumar, A., Giries, R., & Bronstein, A. M. (2019). Repmet: Representative-based metric learning for classification and one-shot object detection. In: CVPR.

Shi, C., & Yang, S. (2023). Edadet: Open-vocabulary object detection using early dense alignment. In: ICCV.

Shu, M., Nie, W., Huang, D.-A., Yu, Z., Goldstein, T., Anandkumar, A., & Xiao, C. (2022). Test-time prompt tuning for zero-shot generalization in vision-language models. In: NeurIPS.

Singh, A., Hu, R., Goswami, V., Couairon, G., Galuba, W., Rohrbach, M., & Kiela, D. (2022). Flava: A foundational language and vision alignment model. In: CVPR.

Singh, K. K., Mahajan, D., Grauman, K., Lee, Y. J., Feiszli, M., & Ghadiyaram, D. (2020). Don’t judge an object by its context: Learning to overcome contextual bias. In: CVPR.

Singh, B., Najibi, M., & Davis, L. S. (2018). Sniper: Efficient multi-scale training. In: NeurIPS.

Snell, J., Swersky, K., & Zemel, R. (2017). Prototypical networks for few-shot learning. In: NeurIPS.

Sun, B., Li, B., Cai, S., Yuan, Y., & Zhang, C. (2021). Fsce: Few-shot object detection via contrastive proposal encoding. In: CVPR.

Tan, J., Wang, C., Li, B., Li, Q., Ouyang, W., Yin, C., & Yan, J. (2020). Equalization loss for long-tailed object recognition. In: CVPR.

Tao, Y., Sun, J., Yang, H., Chen, L., Wang, X., Yang, W., Du, D., & Zheng, M. (2023). Local and global logit adjustments for long-tailed learning. In: ICCV.

Thrun, S. (1996). Is learning the n-th thing any easier than learning the first? In: NeurIPS.

Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., & Rodriguez, A. (2023). Llama: Open and efficient foundation language models. arXiv:2302.13971.

Triantafillou, E., Zemel, R., & Urtasun, R. (2017). Few-shot learning through an information retrieval lens. In: NeurIPS.

Uijlings, J. R., Van De Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. International Journal of Computer Vision, 104(2), 154–171.CrossRef

Vinyals, O., Blundell, C., Lillicrap, T., & Wierstra, D. (2016). Matching networks for one shot learning. In: NeurIPS.

Vioda, P. (2001). Rapid object detection using a boosted cascade of simple features. In: CVPR.

Wang, T. (2023). Learning to detect and segment for open vocabulary object detection. In: CVPR.

Wang, W., Bao, H., Dong, L., Bjorck, J., Peng, Z., Liu, Q., Aggarwal, K., Mohammed, O.K., Singhal, S., Som, S., & Wei, F. (2023). Image as a foreign language: Beit pretraining for vision and vision-language tasks. In: CVPR.

Wang, Y., Fei, J., Wang, H., Li, W., Bao, T., Wu, L., Zhao, R., & Shen, Y. (2023). Balancing logit variation for long-tailed semantic segmentation. In: CVPR.

Wang, Y.-X., Girshick, R., Hebert, M., & Hariharan, B. (2018). Low-shot learning from imaginary data. In: CVPR.

Wang, X., Huang, T. E., Darrell, T., Gonzalez, J. E., & Yu, F. (2020). Frustratingly simple few-shot object detection. In: ICML.

Wang, T., Li, Y., Kang, B., Li, J., Liew, J., Tang, S., Hoi, S., & Feng, J. (2020). The devil is in classification: A simple framework for long-tail instance segmentation. In: ECCV.

Wang, Z., Yu, J., Yu, A. W., Dai, Z., Tsvetkov, Y., & Cao, Y. (2021). Simvlm: Simple visual language model pretraining with weak supervision. arXiv:2108.10904.

Wang, J., Zhang, W., Zang, Y., Cao, Y., Pang, J., Gong, T., Chen, K., Liu, Z., Loy, C. C., & Lin, D. (2021). Seesaw loss for long-tailed instance segmentation. In: CVPR.

Wong, A., & Yuille, A. L. (2015). One shot learning via compositions of meaningful patches. In: ICCV.

Wu, A., Han, Y., Zhu, L., & Yang, Y. (2021). Universal-prototype enhancing for few-shot object detection. In: ICCV.

Wu, J., Liu, S., Huang, D., & Wang, Y. (2020). Multi-scale positive sample refinement for few-shot object detection. In: ECCV.

Wu, S., Zhang, W., Jin, S., Liu, W., & Loy, C. C. (2023). Aligning bag of regions for open-vocabulary object detection. In: CVPR.

Xiao, Y., & Marlet, R. (2020). Few-shot object detection and viewpoint estimation for objects in the wild. In: ECCV.

Xu, J., Le, H., & Samaras, D. (2023). Generating features with increased crop-related diversity for few-shot object detection. In: CVPR.

Yan, X., Chen, Z., Xu, A., Wang, X., Liang, X., & Lin, L. (2019). Meta r-cnn: Towards general solver for instance-level low-shot learning. In: ICCV.

Yang, J., Li, C., Zhang, P., Xiao, B., Liu, C., Yuan, L., & Gao, J. (2022). Unified contrastive learning in image-text-label space. In: CVPR.

Yang, Y., Wei, F., Shi, M., & Li, G. (2020). Restoring negative information in few-shot object detection. In: NeurIPS.

Yang, F. S. Y., Zhang, L., Xiang, T., Torr, P. H., & Hospedales, T. M. (2018). Learning to compare: Relation network for few-shot learning. In: CVPR.

Yao, L., Han, J., Liang, X., Xu, D., Zhang, W., Li, Z., & Xu, H. (2023). Detclipv2: Scalable open-vocabulary object detection pre-training via word-region alignment. In: CVPR.

Yao, L., Han, J., Wen, Y., Liang, X., Xu, D., Zhang, W., Li, Z., Xu, C., & Xu, H. (2022). Detclip: Dictionary-enriched visual-concept paralleled pre-training for open-world detection. In: NeurIPS.

Yuan, L., Chen, D., Chen, Y.-L., Codella, N., Dai, X., Gao, J., Hu, H., Huang, X., Li, B., Li, C., & Liu, C. (2021). Florence: A new foundation model for computer vision. arXiv:2111.11432.

Zang, Y., Li, W., Zhou, K., Huang, C., & Loy, C.C. (2022). Open-vocabulary detr with conditional matching. In: ECCV.

Zhang, W., & Wang, Y.-X. (2021). Hallucination improves few-shot object detection. In: CVPR.

Zhang, G., Cui, K., Wu, R., Lu, S., & Tian, Y. (2021). PNPDet: efficient few-shot detection without forgetting via plug-and-play sub-networks. In: WACV.

Zhang, H., Dana, K., Shi, J., Zhang, Z., Wang, X., Tyagi, A., & Agrawal, A. (2018). Context encoding for semantic segmentation. In: CVPR.

Zhang, R., Hu, X., Li, B., Huang, S., Deng, H., Qiao, Y., Gao, P., & Li, H. (2023). Prompt, generate, then cache: Cascade of foundation models makes strong few-shot learners. In: CVPR.

Zhang, H., Li, F., Liu, S., Zhang, L., Su, H., Zhu, J., Ni, L. M., & Shum, H.-Y. (2022). Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv:2203.03605.

Zhang, R., Zhang, W., Fang, R., Gao, P., Li, K., Dai, J., Qiao, Y., & Li, H. (2022). Tip-adapter: Training-free adaption of clip for few-shot classification. In: ECCV.

Zhao, Y., Chen, W., Tan, X., Huang, K., & Zhu, J. (2022). Adaptive logit adjustment loss for long-tailed visual recognition. In: AAAI.

Zhao, L., Teng, Y., & Wang, L. (2024). Logit normalization for long-tail object detection. International Journal of Computer Vision.

Zhong, Y., Yang, J., Zhang, P., Li, C., Codella, N., Li, L.H., Zhou, L., Dai, X., Yuan, L., Li, Y., & Gao, J. (2022). Regionclip: Region-based language-image pretraining. In: CVPR.

Zhou, X., Girdhar, R., Joulin, A., Krähenbühl, P., & Misra, I. (2022). Detecting twenty-thousand classes using image-level supervision. In: ECCV.

Zhou, K., Yang, J., Loy, C. C., & Liu, Z. (2022). Conditional prompt learning for vision-language models. In: CVPR.

Zhou, K., Yang, J., Loy, C. C., & Liu, Z. (2022). Learning to prompt for vision-language models. International Journal of Computer Vision, 130, 2337–2348.CrossRef

Zhu, C., Chen, F., Ahmed, U., & Savvides, M. (2021). Semantic relation reasoning for shot-stable few-shot object detection. In: CVPR.

Zhu, X., Su, W., Lu, L., Li, B., Wang, X., & Dai, J. (2020). Deformable detr: Deformable transformers for end-to-end object detection. arXiv:2010.04159.

Zong, Z., Song, G., & Liu, Y. (2023). Detrs with collaborative hybrid assignments training. In: ICCV.

Titel: FSODv2: A Deep Calibrated Few-Shot Object Detection Network
verfasst von: Qi Fan
Wei Zhuo
Chi-Keung Tang
Yu-Wing Tai
Publikationsdatum: 04.04.2024
Verlag: Springer US
Erschienen in: International Journal of Computer Vision
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-024-02049-z

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Premium Partner