Skip to main content
Top
Published in: Optical Memory and Neural Networks 2/2023

01-12-2023

Low Rank Adaptation for Stable Domain Adaptation of Vision Transformers

Authors: N. Filatov, M. Kindulov

Published in: Optical Memory and Neural Networks | Special Issue 2/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Unsupervised domain adaptation plays a crucial role in semantic segmentation tasks due to the high cost of annotating data. Existing approaches often rely on large transformer models and momentum networks to stabilize and improve the self-training process. In this study, we investigate the applicability of low-rank adaptation (LoRA) to domain adaptation in computer vision. Our focus is on the unsupervised domain adaptation task of semantic segmentation, which requires adapting models from a synthetic dataset (GTA5) to a real-world dataset (City-scapes). We employ the Swin Transformer as the feature extractor and TransDA domain adaptation framework. Through experiments, we demonstrate that LoRA effectively stabilizes the self-training process, achieving similar training dynamics to the exponentially moving average (EMA) mechanism. Moreover, LoRA provides comparable metrics to EMA under the same limited computation budget. In GTA5 → Cityscapes experiments, the adaptation pipeline with LoRA achieves a mIoU of 0.515, slightly surpassing the EMA baseline’s mIoU of 0.513, while also offering an 11% speedup in training time and video memory saving. These re-sults highlight LoRA as a promising approach for domain adaptation in computer vision, offering a viable alternative to momentum networks which also saves computational resources.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Cheng, B., Schwing, A., and Kirillov, A., Per-pixel classification is not all you need for semantic segmentation, Adv. Neural Inf. Process. Syst., 2021, vol. 34, pp. 17864–17875. Cheng, B., Schwing, A., and Kirillov, A., Per-pixel classification is not all you need for semantic segmentation, Adv. Neural Inf. Process. Syst., 2021, vol. 34, pp. 17864–17875.
2.
go back to reference Lialin, V., Deshpande, V., and Rumshisky, A., Scaling down to scale up: A guide to parameter-efficient fine-tuning, arXiv preprint arXiv:2303.15647, 2023. Lialin, V., Deshpande, V., and Rumshisky, A., Scaling down to scale up: A guide to parameter-efficient fine-tuning, arXiv preprint arXiv:2303.15647, 2023.
3.
go back to reference Hu, E.J. et al., Lora: Low-rank adaptation of large language models, arXiv preprint arXiv:2106.09685, 2021. Hu, E.J. et al., Lora: Low-rank adaptation of large language models, arXiv preprint arXiv:2106.09685, 2021.
4.
go back to reference Chen, M., Zheng, Z., Yang, Y., and Chua, T.-S., PiPa: Pixel-and Patch-wise Self-supervised Learning for Domain Adaptative Semantic Segmentation, arXiv preprint arXiv:2211.07609, 2022. Chen, M., Zheng, Z., Yang, Y., and Chua, T.-S., PiPa: Pixel-and Patch-wise Self-supervised Learning for Domain Adaptative Semantic Segmentation, arXiv preprint arXiv:2211.07609, 2022.
5.
go back to reference Xie, B., Li, S., Li, M., Liu, C.H., Huang, G., and Wang, G., Sepico: Semantic-guided pixel contrast for domain adaptive semantic segmentation, IEEE Trans. Pattern Anal. Mach. Intell., 2023. Xie, B., Li, S., Li, M., Liu, C.H., Huang, G., and Wang, G., Sepico: Semantic-guided pixel contrast for domain adaptive semantic segmentation, IEEE Trans. Pattern Anal. Mach. Intell., 2023.
6.
go back to reference Khan, S., Naseer, M., Hayat, M., Zamir, S.W., Khan, F.S., and Shah, M., Transformers in vision: A survey, ACM Comput. Surv. (CSUR), 2022, vol. 54, no. 10s, pp. 1–41.CrossRef Khan, S., Naseer, M., Hayat, M., Zamir, S.W., Khan, F.S., and Shah, M., Transformers in vision: A survey, ACM Comput. Surv. (CSUR), 2022, vol. 54, no. 10s, pp. 1–41.CrossRef
7.
go back to reference Vaswani, A. et al., Attention is all you need, Adv. Neural Inf. Process. Syst., 2017, vol. 30. Vaswani, A. et al., Attention is all you need, Adv. Neural Inf. Process. Syst., 2017, vol. 30.
8.
go back to reference Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S., End-to-end object detection with transformers, in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Part I 16, Springer, 2020, pp. 213–229. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S., End-to-end object detection with transformers, in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Part I 16, Springer, 2020, pp. 213–229.
9.
go back to reference Dosovitskiy, A. et al.,“An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929, 2020. Dosovitskiy, A. et al.,“An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929, 2020.
10.
go back to reference Liu, Z. et al., Swin transformer: Hierarchical vision transformer using shifted windows, in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012–10022. Liu, Z. et al., Swin transformer: Hierarchical vision transformer using shifted windows, in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012–10022.
11.
go back to reference Wang, H., Shen, T., Zhang, W., Duan, L.-Y., and Mei, T., Classes matter: A fine-grained adversarial approach to cross-domain semantic segmentation, in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV, Springer, 2020, pp. 642–659. Wang, H., Shen, T., Zhang, W., Duan, L.-Y., and Mei, T., Classes matter: A fine-grained adversarial approach to cross-domain semantic segmentation, in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV, Springer, 2020, pp. 642–659.
12.
go back to reference Zhang, P., Zhang, B., Zhang, T., Chen, D., Wang, Y., and Wen, F., Prototypical pseudo label denoising and target structure learning for domain adaptive semantic segmentation, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 12414–12424. Zhang, P., Zhang, B., Zhang, T., Chen, D., Wang, Y., and Wen, F., Prototypical pseudo label denoising and target structure learning for domain adaptive semantic segmentation, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 12414–12424.
13.
go back to reference Guan, L. and Yuan, X., Iterative Loop Learning Combining Self-Training and Active Learning for Domain Adaptive Semantic Segmentation, arXiv preprint arXiv:2301.13361, 2023. Guan, L. and Yuan, X., Iterative Loop Learning Combining Self-Training and Active Learning for Domain Adaptive Semantic Segmentation, arXiv preprint arXiv:2301.13361, 2023.
14.
go back to reference Hoyer, L., Dai, D., Wang, H., and van Gool, L., MIC: Masked image consistency for context-enhanced domain adaptation, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 11721–11732. Hoyer, L., Dai, D., Wang, H., and van Gool, L., MIC: Masked image consistency for context-enhanced domain adaptation, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 11721–11732.
15.
go back to reference Liu, H. et al., Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning, Adv. Neural Inf. Process. Syst., 2022, vol. 35, pp. 1950–1965. Liu, H. et al., Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning, Adv. Neural Inf. Process. Syst., 2022, vol. 35, pp. 1950–1965.
16.
go back to reference Lester, B., Al-Rfou, R., and Constant, N., The power of scale for parameter-efficient prompt tuning, arXiv preprint arXiv:2104.08691, 2021. Lester, B., Al-Rfou, R., and Constant, N., The power of scale for parameter-efficient prompt tuning, arXiv preprint arXiv:2104.08691, 2021.
17.
go back to reference Chen, R. et al., Smoothing matters: Momentum transformer for domain adaptive semantic segmentation, arXiv preprint arXiv:2203.07988, 2022. Chen, R. et al., Smoothing matters: Momentum transformer for domain adaptive semantic segmentation, arXiv preprint arXiv:2203.07988, 2022.
18.
go back to reference Richter, S.R., Vineet, V., Roth, S., and Koltun, V., Playing for data: Ground truth from computer games, in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part II 14, Springer, 2016, pp. 102–118. Richter, S.R., Vineet, V., Roth, S., and Koltun, V., Playing for data: Ground truth from computer games, in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part II 14, Springer, 2016, pp. 102–118.
19.
go back to reference Cordts, M. et al., The cityscapes dataset for semantic urban scene understanding, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 3213–3223. Cordts, M. et al., The cityscapes dataset for semantic urban scene understanding, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 3213–3223.
20.
go back to reference Xiao, T., Liu, Y., Zhou, B., Jiang, Y., and Sun, J., Unified Perceptual Parsing for Scene Understanding, presented at the Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 418–434. https://openaccess.thecvf.com/content_ECCV_2018/html/Tete_Xiao_Unified_Perceptual_Parsing_ECCV_2018_paper.html. Xiao, T., Liu, Y., Zhou, B., Jiang, Y., and Sun, J., Unified Perceptual Parsing for Scene Understanding, presented at the Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 418–434. https://​openaccess.​thecvf.​com/​content_​ECCV_​2018/​html/​Tete_​Xiao_​Unified_​Perceptual_​Parsing_​ECCV_​2018_​paper.​html.​
21.
Metadata
Title
Low Rank Adaptation for Stable Domain Adaptation of Vision Transformers
Authors
N. Filatov
M. Kindulov
Publication date
01-12-2023
Publisher
Pleiades Publishing
Published in
Optical Memory and Neural Networks / Issue Special Issue 2/2023
Print ISSN: 1060-992X
Electronic ISSN: 1934-7898
DOI
https://doi.org/10.3103/S1060992X2306005X

Other articles of this Special Issue 2/2023

Optical Memory and Neural Networks 2/2023 Go to the issue

Premium Partner