Skip to main content
Log in

Multi-scale kronecker-product relation networks for few-shot learning

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Few-shot learning aims to train classifiers to learn new visual object categories from few training examples. Recently, metric-learning based methods have made promising progress. Relation Network is a metric-based method that uses simple convolutional neural networks to learn deep relationships between image features in order to recognize new objects. However, during the feature comparing phase, Relation Network is considered sensitive to the spatial positions of the compared objects. Moreover, it learns from only single-scale features which can lead to a poor generalization ability due to scale variation of the compared objects. To solve these problems, we intend to extend Relation Network to be position-aware and integrate multi-scale features for more robust metric learning and better generalization ability. In this paper, we propose a novel few-shot learning method called Multi-scale Kronecker-Product Relation Networks For Few-Shot Learning (MsKPRN). Our method combines feature maps with spatial correlation maps generated from a Kronecker-product module to capture position-wise correlations between the compared features and then feeds them to a relation network module, which captures similarities between the combined features in a multi-scale manner. Extensive experiments demonstrate that the proposed method outperforms the related state-of-the-art methods on popular few-shot learning datasets. Particularly, MsKPRN has improved the accuracy of Relation Network from 50.44 to 57.02 and from 65.63 to 72.06 on 5-way 1-shot and 5-shot scenarios, respectively. Our code will be available on: https://github.com/mouniraziz/MsKPRN.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Abdelaziz M, Zhang Z (2021) Few-shot learning with saliency maps as additional visual information. Multimedia Tools and Applications 80(7):10491–10508

    Article  Google Scholar 

  2. Baik S, Hong S, Lee KM (2020) Learning to forget for meta-learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 2379–2387

  3. Biederman I (1987) Recognition-by-components: A theory of human image understanding. Psychological Review 94(2):115–147

    Article  Google Scholar 

  4. Cai Q, Pan Y, Yao T, Yan C, Mei T (2018) Memory matching networks for one-shot image recognition. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 4080–4088

  5. Chen Z, Fu Y, Zhang Y, Jiang Y-G, Xue X, Sigal L (2019) Multi-level semantic feature augmentation for one-shot learning. IEEE Transactions on Image Processing 28(9):4594–4605

    Article  MathSciNet  Google Scholar 

  6. Chen Z, Fu Y, Wang Y-X, Ma L, Liu W, Hebert M (2019) Image deformation meta-networks for one-shot learning. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 8680–8689

  7. Chen H, Li H, Li Y, Chen C (2020) Multi-scale adaptive task attention network for few-shot learning. arXiv:2011.14479

  8. Chu W-H, Li Y-J, Chang J-C, Wang Y-CF (2019) Spot and learn: A maximum-entropy patch sampler for few-shot image classification. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 6251–6260

  9. Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT 2019: Annual conference of the north american chapter of the association for computational linguistics, pp 4171–4186

  10. Fei-Fei L, Fergus R, Perona P (2006) One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(4):594–611

    Article  Google Scholar 

  11. Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th international conference on machine learning-vol 70, pp 1126–1135

  12. Flennerhag S, Rusu AA, Pascanu R, Visin F, Yin H, Hadsell R (2020) Meta-learning with warped gradient descent. In: ICLR 2020: Eighth international conference on learning representations

  13. Gidaris S, Komodakis N (2018) Dynamic few-shot visual learning without forgetting. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4367–4375

  14. Han M, Wang R, Yang J, Xue L, Hu M (2020) Multi-scale feature network for few-shot learning. Multimedia Tools and Applications 79(17):11617–11637

    Article  Google Scholar 

  15. Hariharan B, Girshick R (2017) Low-shot visual recognition by shrinking and hallucinating features. In: 2017 IEEE International conference on computer vision (ICCV), pp 3037–3046

  16. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), pp 770–778

  17. Huang H, Zhang J, Zhang J, Xu J, Wu Q (2020) Low-rank pairwise alignment bilinear network for few-shot fine-grained image classification. IEEE Transactions on Multimedia

  18. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 7132–7141

  19. Khosla A, Jayadevaprakash N, Yao B, Li FF (2011) Novel dataset for fine-grained image categorization: Stanford dogs. In: Proc. CVPR workshop on fine-grained visual categorization (FGVC) (Vol. 2, No. 1)

  20. Kingma DP, Ba JL (2015) Adam: A method for stochastic optimization. In: ICLR 2015 : International conference on learning representations 2015

  21. Koch G, Zemel R, Salakhutdinov R (2015) Siamese neural networks for one-shot image recognition. In: ICML deep learning workshop, vol 2

  22. Krause J, Stark M, Deng J, Fei-Fei L (2013) 3D Object representations for fine-grained categorization. In: 2013 IEEE International conference on computer vision workshops, pp 554–561

  23. Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Communications of The ACM 60(6):84–90

    Article  Google Scholar 

  24. Lake BM, Salakhutdinov R, Gross J, Tenenbaum JB (2011) One shot learning of simple visual concepts. Cogn Sci:33(33)

  25. Lee K, Maji S, Ravichandran A, Soatto S (2019) Meta-learning with differentiable convex optimization. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 10657–10665

  26. Li W, Wang L, Xu J, Huo J, Gao Y, Luo J (2019) Revisiting local descriptor based image-to-class measure for few-shot learning. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 7260–7268

  27. Li Z, Zhou F, Chen F, Li H (2017) Meta-SGD: Learning to learn quickly for few-shot learning. arXiv:1707.09835

  28. Mishra N, Rohaninejad M, Chen X, Abbeel P (2017) A simple neural attentive meta-learner. arXiv:1707.03141

  29. Munkhdalai T, Yu H (2017) Meta networks. In: ICML’17 Proceedings of the 34th international conference on machine learning - vol 70, pp 2554–2563

  30. Oh J, Yoo H, Kim C, Yun S-Y (2021) BOIL: Towards representation change for few-shot learning. In: ICLR 2021: The ninth international conference on learning representations

  31. Oreshkin B, López PR, Lacoste A (2018) TADAM: Task dependent adaptive metric for improved few-shot learning. In: NIPS 2018: The 32nd annual conference on neural information processing systems, pp 721–731

  32. Pennington J, Socher R, Manning C (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  33. Ravi S, Larochelle H (2017) Optimization as a model for few-shot learning. In: ICLR 2017: International conference on learning representations 2017

  34. Ren M, Ravi S, Triantafillou E, Snell J, Swersky K, Tenenbaum JB, Zemel RS (2018) Meta-learning for semi-supervised few-shot classification. In: ICLR 2018: International conference on learning representations 2018

  35. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Bernstein M (2015) ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115(3):211–252

    Article  MathSciNet  Google Scholar 

  36. Santoro A, Bartunov S, Botvinick M, Wierstra D, Lillicrap T (2016) Meta-learning with memory-augmented neural networks. In: ICML’16 Proceedings of the 33rd international conference on international conference on machine learning - vol 48, pp 1842–1850

  37. Satorras VG, Estrach JB (2018) Few-shot learning with graph neural networks. In: 6th International conference on learning representations, ICLR 2018

  38. Schwartz E, Karlinsky L, Feris RS, Giryes R, Bronstein AM (2019) Baby steps towards few-shot learning with multiple semantics. arXiv:1906.01905

  39. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017). Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626

  40. Shen Y, Xiao T, Li H, Yi S, Wang X (2018) End-to-end deep kronecker-product matching for person re-identification. In: 2018 IEEE CVF Conference on computer vision and pattern recognition, pp 6886–6895

  41. Shen Y, Xiao T, Yi S, Chen D, Wang X, Li H (2020) Person re-identification with deep kronecker-product matching and group-shuffling random walk. IEEE Trans Pattern Anal Mach Intell:1–1

  42. Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. In: Advances in neural information processing systems, pp 4077–4087

  43. Steiner B, DeVito Z, Chintala S, Gross S, Paszke A, Massa F, Yang, E (2019) PyTorch: An imperative style, high-performance deep learning library. In: NeurIPS 2019: Thirty-third conference on neural information processing systems, pp 8024–8035

  44. Sung F, Yang Y, Zhang L, Xiang T, Torr PHS, Hospedales TM (2018) Learning to compare: Relation network for few-shot learning. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 1199–1208

  45. Tan, M et al (2020) EfficientDet: Scalable and efficient object detection. In: 2020 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 10781–10790

  46. Tao A, Sapra K, Catanzaro B (2020) Hierarchical multi-scale attention for semantic segmentation. arXiv:arXiv:2005.10821

  47. Thrun S, Pratt L (1998) Learning to learn: introduction and overview. Learning Learn:3–17

  48. Vilalta R, Drissi Y (2002) A perspective view and survey of meta-learning. Artificial Intelligence Review 18(2):77–95

    Article  Google Scholar 

  49. Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K, Wierstra D (2016) Matching networks for one shot learning. In NIPS’16 Proceedings of the 30th international conference on neural information processing systems, pp 3637–3645

  50. Wang Y-X, Girshick R, Hebert M, Hariharan B (2018) Low-shot learning from imaginary data. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 7278–7286

  51. Wang X, Ma B, Yu Z, Li F, Cai Y (2020) Multi-scale decision network with feature fusion and weighting for few-shot learning. IEEE Access 8:92172–92181

    Google Scholar 

  52. Welinder P, Branson S, Mita T, Wah C, Schroff F, Belongie S, Perona P (2010) Caltech-UCSD birds 200

  53. Wu Z, Li Y, Guo L, Jia K (2019) Parn: Position-aware relation networks for few-shot learning. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6659–6667

  54. Xing C, Rostamzadeh N, Oreshkin B, Pinheiro PO (2019) Adaptive cross-modal few-shot learning. In: NeurIPS 2019: Thirty-third conference on neural information processing systems, pp 4848-4858

  55. Xue Z, Duan L, Li W, Chen L, Luo J (2020) Region comparison network for interpretable few-shot image classification. arXiv:2009.03558

  56. Xue Z, Xie Z, Xing Z, Duan L (2020) Relative position and map networks in few-shot learning for image classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 932–933

  57. Zhang H, Koniusz P (2019) Power normalizing second-order similarity network for few-shot learning. In: 2019 IEEE Winter conference on applications of computer vision (WACV), pp 1185–1193

  58. Zhang H, Torr PH, Koniusz P (2020) Few-shot Learning with multi-scale self-supervision. arXiv:2001.01600

  59. Zhang H, Zhang J, Koniusz P (2019) Few-shot learning via saliency-guided hallucination of samples. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 2770–2779

  60. Zhong Z, Zheng L, Kang G, Li S, Yang Y (2020) Random erasing data augmentation. In: Proceedings of the AAAI conference on artificial intelligence, pp 13001–13008

Download references

Acknowledgements

We would like to thank the anonymous referees for their helpful comments and suggestions.

Funding

This study was funded by the National Natural Science Foundation of China (Grant No.61379109,M1321007) and Science and Technology Plan of Hunan Province (Grant No.2014GK2018, 2016JC2011).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zuping Zhang.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abdelaziz, M., Zhang, Z. Multi-scale kronecker-product relation networks for few-shot learning. Multimed Tools Appl 81, 6703–6722 (2022). https://doi.org/10.1007/s11042-021-11735-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11735-w

Keywords

Navigation