Skip to main content
Erschienen in: Neural Processing Letters 3/2023

27.08.2022

Learning Transferable Feature Representation with Swin Transformer for Object Recognition

verfasst von: Jian-Xin Ren, Yu-Jie Xiong, Xi-Jiong Xie, Yu-Fan Dai

Erschienen in: Neural Processing Letters | Ausgabe 3/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Recent, substantial advancements in deep learning technologies have driven the flourishing of computer vision. However, the heavy dependence on the scale of training data limits deep learning applications because it is generally hard to obtain such a large number of data in many practical scenarios. And, deep learning seems to offer no significant advantage compared with traditional machine methods in a lack of sufficient training data. The proposed approach in this paper overcomes the problem of insufficient training data by taking Swin Transformer as the backbone for feature extraction and performing the fine-tuning strategies on the target dataset for learning transferable feature representation. Our experimental results demonstrate that the proposed method has a good performance for object recognition on small-scale datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Kang Y, Chao G, Hu X, Tu Z, Chu D (2022) Deep learning for fine-grained image recognition: a comprehensive study. In: 2022 4th Asia pacific information technology conference, pp 31–39 Kang Y, Chao G, Hu X, Tu Z, Chu D (2022) Deep learning for fine-grained image recognition: a comprehensive study. In: 2022 4th Asia pacific information technology conference, pp 31–39
2.
Zurück zum Zitat Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110CrossRef Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110CrossRef
3.
Zurück zum Zitat Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110(3):346–359CrossRef Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110(3):346–359CrossRef
4.
Zurück zum Zitat Deng Z, Zhou L (2018) Detection and recognition of traffic planar objects using colorized laser scan and perspective distortion rectification. IEEE Trans Intell Transp Syst 19(5):1485–1495CrossRef Deng Z, Zhou L (2018) Detection and recognition of traffic planar objects using colorized laser scan and perspective distortion rectification. IEEE Trans Intell Transp Syst 19(5):1485–1495CrossRef
6.
Zurück zum Zitat Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25 Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25
7.
Zurück zum Zitat Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252MathSciNetCrossRef Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252MathSciNetCrossRef
8.
Zurück zum Zitat Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations, pp 1–14 Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations, pp 1–14
9.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition, pp 770–778 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition, pp 770–778
10.
Zurück zum Zitat Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: IEEE conference on computer vision and pattern recognition, pp 4700–4708 Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: IEEE conference on computer vision and pattern recognition, pp 4700–4708
11.
Zurück zum Zitat Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Conf Workshop Neural Inf Process Syst 30(11):6000–6010 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Conf Workshop Neural Inf Process Syst 30(11):6000–6010
12.
Zurück zum Zitat Shi X, Chen Z, Wang H, Yeung D-Y, Wong W-K, Woo Wc (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting. Conf Workshop Neural Inf Process Syst 1(9):802–810 Shi X, Chen Z, Wang H, Yeung D-Y, Wong W-K, Woo Wc (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting. Conf Workshop Neural Inf Process Syst 1(9):802–810
13.
Zurück zum Zitat Zhou J, Sun J, Zhang M, Ma Y (2020) Dependable scheduling for real-time workflows on cyber-physical cloud systems. IEEE Trans Ind Inf 17(11):7820–7829CrossRef Zhou J, Sun J, Zhang M, Ma Y (2020) Dependable scheduling for real-time workflows on cyber-physical cloud systems. IEEE Trans Ind Inf 17(11):7820–7829CrossRef
14.
Zurück zum Zitat Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: International conference on learning representations Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: International conference on learning representations
15.
Zurück zum Zitat Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: IEEE international conference on computer vision, pp 10012–10022 Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: IEEE international conference on computer vision, pp 10012–10022
16.
Zurück zum Zitat Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H (2021) Training data-efficient image transformers & distillation through attention. In: International conference on machine learning, pp 10347–10357 Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H (2021) Training data-efficient image transformers & distillation through attention. In: International conference on machine learning, pp 10347–10357
17.
Zurück zum Zitat Xu W, Xu Y, Chang T, Tu Z (2021) Co-scale conv-attentional image transformers. In: IEEE conference on computer vision and pattern recognition, pp 9981–9990 Xu W, Xu Y, Chang T, Tu Z (2021) Co-scale conv-attentional image transformers. In: IEEE conference on computer vision and pattern recognition, pp 9981–9990
18.
Zurück zum Zitat Touvron H, Cord M, Sablayrolles A, Synnaeve G, Jégou H (2021) Going deeper with image transformers. In: IEEE conference on computer vision and pattern recognition, pp 32–42 Touvron H, Cord M, Sablayrolles A, Synnaeve G, Jégou H (2021) Going deeper with image transformers. In: IEEE conference on computer vision and pattern recognition, pp 32–42
19.
Zurück zum Zitat Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255 Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255
20.
Zurück zum Zitat Zhou J, Cao K, Zhou X, Chen M, Wei T, Hu S (2021) Throughput-conscious energy allocation and reliability-aware task assignment for renewable powered in-situ server systems. IEEE Trans Comput Aided Des Integr Circuits Syst 41(3):516–529CrossRef Zhou J, Cao K, Zhou X, Chen M, Wei T, Hu S (2021) Throughput-conscious energy allocation and reliability-aware task assignment for renewable powered in-situ server systems. IEEE Trans Comput Aided Des Integr Circuits Syst 41(3):516–529CrossRef
21.
Zurück zum Zitat Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Handb Syst Autoimmune Dis 1(4) Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Handb Syst Autoimmune Dis 1(4)
22.
Zurück zum Zitat Chao G, Luo Y, Ding W (2019) Recent advances in supervised dimension reduction: a survey. Mach Learn Knowl Extr 1(1):341–358CrossRef Chao G, Luo Y, Ding W (2019) Recent advances in supervised dimension reduction: a survey. Mach Learn Knowl Extr 1(1):341–358CrossRef
23.
Zurück zum Zitat Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: IEEE conference on computer vision and pattern recognition, pp 779–788 Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: IEEE conference on computer vision and pattern recognition, pp 779–788
24.
Zurück zum Zitat Zagoruyko S, Komodakis N (2016) Wide residual networks. In: British machine vision conference, pp 1–13 Zagoruyko S, Komodakis N (2016) Wide residual networks. In: British machine vision conference, pp 1–13
25.
Zurück zum Zitat Zhang Z, Zhang H, Zhao L, Chen T, Arik S, Pfister T (2022) Nested hierarchical transformer: towards accurate, data-efficient and interpretable visual understanding. arXiv preprint arXiv:2105.12723 Zhang Z, Zhang H, Zhao L, Chen T, Arik S, Pfister T (2022) Nested hierarchical transformer: towards accurate, data-efficient and interpretable visual understanding. arXiv preprint arXiv:​2105.​12723
26.
Zurück zum Zitat Hassani A, Walton S, Shah N, Abuduweili A, Li J, Shi H (2021) Escaping the big data paradigm with compact transformers. arXiv preprint arXiv:2104.05704 Hassani A, Walton S, Shah N, Abuduweili A, Li J, Shi H (2021) Escaping the big data paradigm with compact transformers. arXiv preprint arXiv:​2104.​05704
Metadaten
Titel
Learning Transferable Feature Representation with Swin Transformer for Object Recognition
verfasst von
Jian-Xin Ren
Yu-Jie Xiong
Xi-Jiong Xie
Yu-Fan Dai
Publikationsdatum
27.08.2022
Verlag
Springer US
Erschienen in
Neural Processing Letters / Ausgabe 3/2023
Print ISSN: 1370-4621
Elektronische ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-022-11004-3

Weitere Artikel der Ausgabe 3/2023

Neural Processing Letters 3/2023 Zur Ausgabe

Neuer Inhalt