Skip to main content

2021 | OriginalPaper | Buchkapitel

Dilated Residual Aggregation Network for Text-Guided Image Manipulation

verfasst von : Siwei Lu, Di Luo, Zhenguo Yang, Tianyong Hao, Qing Li, Wenyin Liu

Erschienen in: Artificial Neural Networks and Machine Learning – ICANN 2021

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Text-guided image manipulation aims to modify the visual attributes of images according to textual descriptions. Existing works either mismatch between generated images and textual descriptions or may pollute the text-irrelevant image regions. In this paper, we propose a dilated residual aggregation network (denoted as DRA) for text-guided image manipulation, which exploits a long-distance residual with dilated convolutions (RD) to aggregate the encoded visual content and style features and the textual features of the guiding descriptions. In particular, the dilated convolutions increase the receptive field without sacrificing spatial resolutions of intermediate features, benefiting to reconstructing the texture details matching with the textual descriptions. Furthermore, we propose an attention-guided injection module (AIM) to inject textual semantics into feature maps of DRA without polluting the text-irrelevant image regions by combining triplet attention mechanism and central biasing instance normalization. Quantitative and qualitative experiments conducted on the CUB-200-2011 and Oxford-102 datasets demonstrate the superior performance of the proposed DRA.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Dong, H., Yu, S., Wu, C., Guo, Y.: Semantic image synthesis via adversarial learning. In: Proceedings of the IEEE International Conference on Computer Vision (2017) Dong, H., Yu, S., Wu, C., Guo, Y.: Semantic image synthesis via adversarial learning. In: Proceedings of the IEEE International Conference on Computer Vision (2017)
2.
Zurück zum Zitat Nam, S., Kim, Y., Kim, S. J.: Text-adaptive generative adversarial networks: manipulating images with natural language. arXiv preprint arXiv:1810.11919(2018) Nam, S., Kim, Y., Kim, S. J.: Text-adaptive generative adversarial networks: manipulating images with natural language. arXiv preprint arXiv:​1810.​11919(2018)
5.
Zurück zum Zitat Yu, X., Chen, Y., Li, T., Liu, S., Li, G.: Multi-mapping image-to-image translation via learning disentanglement. arXiv preprint arXiv:1909.07877 (2019) Yu, X., Chen, Y., Li, T., Liu, S., Li, G.: Multi-mapping image-to-image translation via learning disentanglement. arXiv preprint arXiv:​1909.​07877 (2019)
6.
Zurück zum Zitat Yu, X., Ying, Z., Li, T., Liu, S., Li, G.: Multi-mapping image-to-image translation with central biasing normalization. arXiv preprint arXiv:1806.10050 (2018) Yu, X., Ying, Z., Li, T., Liu, S., Li, G.: Multi-mapping image-to-image translation with central biasing normalization. arXiv preprint arXiv:​1806.​10050 (2018)
7.
Zurück zum Zitat Zhang, Y., Tian, Y.: Residual dense network for image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018) Zhang, Y., Tian, Y.: Residual dense network for image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
8.
Zurück zum Zitat Liu, J., Zhang, W., Tang, Y., Tang, J., Wu, G.: Residual feature aggregation network for image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020) Liu, J., Zhang, W., Tang, Y., Tang, J., Wu, G.: Residual feature aggregation network for image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
9.
Zurück zum Zitat Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018) Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
10.
Zurück zum Zitat Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018) Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
11.
Zurück zum Zitat Misra, D., Nalamada, T., Arasanipalai, A.U., Hou, Q.: Rotate to attend: convolutional triplet attention module. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Misra, D., Nalamada, T., Arasanipalai, A.U., Hou, Q.: Rotate to attend: convolutional triplet attention module. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021)
12.
Zurück zum Zitat Huang, X., Liu, M.Y., Belongie, S.: Multimodal unsupervised image-to-image translation. In: Proceedings of the European Conference on Computer Vision (2018) Huang, X., Liu, M.Y., Belongie, S.: Multimodal unsupervised image-to-image translation. In: Proceedings of the European Conference on Computer Vision (2018)
13.
Zurück zum Zitat Anokhin, I., Solovev, P., Korzhenkov, D., Kharlamov, A., Khakhulin, T.: High-resolution daytime translation without domain labels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020) Anokhin, I., Solovev, P., Korzhenkov, D., Kharlamov, A., Khakhulin, T.: High-resolution daytime translation without domain labels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
14.
Zurück zum Zitat Lee, H.Y., Tseng, H.Y., Huang, J.B., Singh, M., Yang, M.H.: Diverse image-to-image translation via disentangled representations. In: Proceedings of the European Conference on Computer Vision (2018) Lee, H.Y., Tseng, H.Y., Huang, J.B., Singh, M., Yang, M.H.: Diverse image-to-image translation via disentangled representations. In: Proceedings of the European Conference on Computer Vision (2018)
15.
Zurück zum Zitat Lin, Q., Yan, B., Li, J., Tan, W.: MMFL: multimodal fusion learning for text-guided image inpainting. In: ACM MM (2020) Lin, Q., Yan, B., Li, J., Tan, W.: MMFL: multimodal fusion learning for text-guided image inpainting. In: ACM MM (2020)
17.
Zurück zum Zitat Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: International Conference on Machine Learning (2019) Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: International Conference on Machine Learning (2019)
18.
Zurück zum Zitat Li, B., Qi, X.: Manigan: text-guided image manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020) Li, B., Qi, X.: Manigan: text-guided image manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
19.
Zurück zum Zitat Li, B., Qi, X.: Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation. arXiv preprint arXiv:2010.12136 (2020) Li, B., Qi, X.: Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation. arXiv preprint arXiv:​2010.​12136 (2020)
Metadaten
Titel
Dilated Residual Aggregation Network for Text-Guided Image Manipulation
verfasst von
Siwei Lu
Di Luo
Zhenguo Yang
Tianyong Hao
Qing Li
Wenyin Liu
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-86365-4_3