Skip to main content
Erschienen in: Optical Memory and Neural Networks 2/2023

01.12.2023

Low Rank Adaptation for Stable Domain Adaptation of Vision Transformers

verfasst von: N. Filatov, M. Kindulov

Erschienen in: Optical Memory and Neural Networks | Sonderheft 2/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Unsupervised domain adaptation plays a crucial role in semantic segmentation tasks due to the high cost of annotating data. Existing approaches often rely on large transformer models and momentum networks to stabilize and improve the self-training process. In this study, we investigate the applicability of low-rank adaptation (LoRA) to domain adaptation in computer vision. Our focus is on the unsupervised domain adaptation task of semantic segmentation, which requires adapting models from a synthetic dataset (GTA5) to a real-world dataset (City-scapes). We employ the Swin Transformer as the feature extractor and TransDA domain adaptation framework. Through experiments, we demonstrate that LoRA effectively stabilizes the self-training process, achieving similar training dynamics to the exponentially moving average (EMA) mechanism. Moreover, LoRA provides comparable metrics to EMA under the same limited computation budget. In GTA5 → Cityscapes experiments, the adaptation pipeline with LoRA achieves a mIoU of 0.515, slightly surpassing the EMA baseline’s mIoU of 0.513, while also offering an 11% speedup in training time and video memory saving. These re-sults highlight LoRA as a promising approach for domain adaptation in computer vision, offering a viable alternative to momentum networks which also saves computational resources.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Cheng, B., Schwing, A., and Kirillov, A., Per-pixel classification is not all you need for semantic segmentation, Adv. Neural Inf. Process. Syst., 2021, vol. 34, pp. 17864–17875. Cheng, B., Schwing, A., and Kirillov, A., Per-pixel classification is not all you need for semantic segmentation, Adv. Neural Inf. Process. Syst., 2021, vol. 34, pp. 17864–17875.
2.
Zurück zum Zitat Lialin, V., Deshpande, V., and Rumshisky, A., Scaling down to scale up: A guide to parameter-efficient fine-tuning, arXiv preprint arXiv:2303.15647, 2023. Lialin, V., Deshpande, V., and Rumshisky, A., Scaling down to scale up: A guide to parameter-efficient fine-tuning, arXiv preprint arXiv:2303.15647, 2023.
3.
Zurück zum Zitat Hu, E.J. et al., Lora: Low-rank adaptation of large language models, arXiv preprint arXiv:2106.09685, 2021. Hu, E.J. et al., Lora: Low-rank adaptation of large language models, arXiv preprint arXiv:2106.09685, 2021.
4.
Zurück zum Zitat Chen, M., Zheng, Z., Yang, Y., and Chua, T.-S., PiPa: Pixel-and Patch-wise Self-supervised Learning for Domain Adaptative Semantic Segmentation, arXiv preprint arXiv:2211.07609, 2022. Chen, M., Zheng, Z., Yang, Y., and Chua, T.-S., PiPa: Pixel-and Patch-wise Self-supervised Learning for Domain Adaptative Semantic Segmentation, arXiv preprint arXiv:2211.07609, 2022.
5.
Zurück zum Zitat Xie, B., Li, S., Li, M., Liu, C.H., Huang, G., and Wang, G., Sepico: Semantic-guided pixel contrast for domain adaptive semantic segmentation, IEEE Trans. Pattern Anal. Mach. Intell., 2023. Xie, B., Li, S., Li, M., Liu, C.H., Huang, G., and Wang, G., Sepico: Semantic-guided pixel contrast for domain adaptive semantic segmentation, IEEE Trans. Pattern Anal. Mach. Intell., 2023.
6.
Zurück zum Zitat Khan, S., Naseer, M., Hayat, M., Zamir, S.W., Khan, F.S., and Shah, M., Transformers in vision: A survey, ACM Comput. Surv. (CSUR), 2022, vol. 54, no. 10s, pp. 1–41.CrossRef Khan, S., Naseer, M., Hayat, M., Zamir, S.W., Khan, F.S., and Shah, M., Transformers in vision: A survey, ACM Comput. Surv. (CSUR), 2022, vol. 54, no. 10s, pp. 1–41.CrossRef
7.
Zurück zum Zitat Vaswani, A. et al., Attention is all you need, Adv. Neural Inf. Process. Syst., 2017, vol. 30. Vaswani, A. et al., Attention is all you need, Adv. Neural Inf. Process. Syst., 2017, vol. 30.
8.
Zurück zum Zitat Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S., End-to-end object detection with transformers, in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Part I 16, Springer, 2020, pp. 213–229. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S., End-to-end object detection with transformers, in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Part I 16, Springer, 2020, pp. 213–229.
9.
Zurück zum Zitat Dosovitskiy, A. et al.,“An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929, 2020. Dosovitskiy, A. et al.,“An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929, 2020.
10.
Zurück zum Zitat Liu, Z. et al., Swin transformer: Hierarchical vision transformer using shifted windows, in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012–10022. Liu, Z. et al., Swin transformer: Hierarchical vision transformer using shifted windows, in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012–10022.
11.
Zurück zum Zitat Wang, H., Shen, T., Zhang, W., Duan, L.-Y., and Mei, T., Classes matter: A fine-grained adversarial approach to cross-domain semantic segmentation, in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV, Springer, 2020, pp. 642–659. Wang, H., Shen, T., Zhang, W., Duan, L.-Y., and Mei, T., Classes matter: A fine-grained adversarial approach to cross-domain semantic segmentation, in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV, Springer, 2020, pp. 642–659.
12.
Zurück zum Zitat Zhang, P., Zhang, B., Zhang, T., Chen, D., Wang, Y., and Wen, F., Prototypical pseudo label denoising and target structure learning for domain adaptive semantic segmentation, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 12414–12424. Zhang, P., Zhang, B., Zhang, T., Chen, D., Wang, Y., and Wen, F., Prototypical pseudo label denoising and target structure learning for domain adaptive semantic segmentation, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 12414–12424.
13.
Zurück zum Zitat Guan, L. and Yuan, X., Iterative Loop Learning Combining Self-Training and Active Learning for Domain Adaptive Semantic Segmentation, arXiv preprint arXiv:2301.13361, 2023. Guan, L. and Yuan, X., Iterative Loop Learning Combining Self-Training and Active Learning for Domain Adaptive Semantic Segmentation, arXiv preprint arXiv:2301.13361, 2023.
14.
Zurück zum Zitat Hoyer, L., Dai, D., Wang, H., and van Gool, L., MIC: Masked image consistency for context-enhanced domain adaptation, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 11721–11732. Hoyer, L., Dai, D., Wang, H., and van Gool, L., MIC: Masked image consistency for context-enhanced domain adaptation, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 11721–11732.
15.
Zurück zum Zitat Liu, H. et al., Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning, Adv. Neural Inf. Process. Syst., 2022, vol. 35, pp. 1950–1965. Liu, H. et al., Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning, Adv. Neural Inf. Process. Syst., 2022, vol. 35, pp. 1950–1965.
16.
Zurück zum Zitat Lester, B., Al-Rfou, R., and Constant, N., The power of scale for parameter-efficient prompt tuning, arXiv preprint arXiv:2104.08691, 2021. Lester, B., Al-Rfou, R., and Constant, N., The power of scale for parameter-efficient prompt tuning, arXiv preprint arXiv:2104.08691, 2021.
17.
Zurück zum Zitat Chen, R. et al., Smoothing matters: Momentum transformer for domain adaptive semantic segmentation, arXiv preprint arXiv:2203.07988, 2022. Chen, R. et al., Smoothing matters: Momentum transformer for domain adaptive semantic segmentation, arXiv preprint arXiv:2203.07988, 2022.
18.
Zurück zum Zitat Richter, S.R., Vineet, V., Roth, S., and Koltun, V., Playing for data: Ground truth from computer games, in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part II 14, Springer, 2016, pp. 102–118. Richter, S.R., Vineet, V., Roth, S., and Koltun, V., Playing for data: Ground truth from computer games, in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part II 14, Springer, 2016, pp. 102–118.
19.
Zurück zum Zitat Cordts, M. et al., The cityscapes dataset for semantic urban scene understanding, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 3213–3223. Cordts, M. et al., The cityscapes dataset for semantic urban scene understanding, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 3213–3223.
20.
Zurück zum Zitat Xiao, T., Liu, Y., Zhou, B., Jiang, Y., and Sun, J., Unified Perceptual Parsing for Scene Understanding, presented at the Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 418–434. https://openaccess.thecvf.com/content_ECCV_2018/html/Tete_Xiao_Unified_Perceptual_Parsing_ECCV_2018_paper.html. Xiao, T., Liu, Y., Zhou, B., Jiang, Y., and Sun, J., Unified Perceptual Parsing for Scene Understanding, presented at the Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 418–434. https://​openaccess.​thecvf.​com/​content_​ECCV_​2018/​html/​Tete_​Xiao_​Unified_​Perceptual_​Parsing_​ECCV_​2018_​paper.​html.​
21.
Zurück zum Zitat Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., and Girdhar, R., Masked-attention Mask Transformer for Universal Image Segmentation. arXiv, Jun. 15, 2022. http://arxiv.org/abs/2112.01527. https://doi.org/10.48550/arXiv.2112.01527 Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., and Girdhar, R., Masked-attention Mask Transformer for Universal Image Segmentation. arXiv, Jun. 15, 2022. http://​arxiv.​org/​abs/​2112.​01527.​ https://​doi.​org/​10.​48550/​arXiv.​2112.​01527
Metadaten
Titel
Low Rank Adaptation for Stable Domain Adaptation of Vision Transformers
verfasst von
N. Filatov
M. Kindulov
Publikationsdatum
01.12.2023
Verlag
Pleiades Publishing
Erschienen in
Optical Memory and Neural Networks / Ausgabe Sonderheft 2/2023
Print ISSN: 1060-992X
Elektronische ISSN: 1934-7898
DOI
https://doi.org/10.3103/S1060992X2306005X

Weitere Artikel der Sonderheft 2/2023

Optical Memory and Neural Networks 2/2023 Zur Ausgabe

Premium Partner