Skip to main content
Top
Published in: Cognitive Computation 6/2019

08-04-2019

Unsupervised Object Transfiguration with Attention

Authors: Zihan Ye, Fan Lyu, Linyan Li, Yu Sun, Qiming Fu, Fuyuan Hu

Published in: Cognitive Computation | Issue 6/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Object transfiguration is a subtask of the image-to-image translation, which translates two independent image sets and has a wide range of applications. Recently, some studies based on Generative Adversarial Network (GAN) have achieved impressive results in the image-to-image translation. However, the object transfiguration task only translates regions containing target objects instead of whole images; most of the existing methods never consider this issue, which results in mistranslation on the backgrounds of images. To address this problem, we present a novel pipeline called Deep Attention Unit Generative Adversarial Networks (DAU-GAN). During the translating process, the DAU computes attention masks that point out where the target objects are. DAU makes GAN concentrate on translating target objects while ignoring meaningless backgrounds. Additionally, we construct an attention-consistent loss and a background-consistent loss to compel our model to translate intently target objects and preserve backgrounds further effectively. We have comparison experiments on three popular related datasets, demonstrating that the DAU-GAN achieves superior performance to the state-of-the-art. We also export attention masks in different stages to confirm its effect during the object transfiguration task. The proposed DAU-GAN can translate object effectively as well as preserve backgrounds information at the same time. In our model, DAU learns to focus on the most important information by producing attention masks. These masks compel DAU-GAN to effectively distinguish target objects and backgrounds during the translation process and to achieve impressive translation results in two subsets of ImageNet and CelebA. Moreover, the results show that we cannot only investigate the model from the image itself but also research from other modal information.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Johnson J, Alahi A, Fei-Fei L. Perceptual losses for real-time style transfer and super-resolution. ECCV. 2016:694–711. Johnson J, Alahi A, Fei-Fei L. Perceptual losses for real-time style transfer and super-resolution. ECCV. 2016:694–711.
2.
go back to reference Isola P, Zhu JY, Zhou T, Efros AA. Image-to-image translation with conditional adversarial networks. CVPR. 2017:1125–34. Isola P, Zhu JY, Zhou T, Efros AA. Image-to-image translation with conditional adversarial networks. CVPR. 2017:1125–34.
3.
go back to reference Ledig C, Theis L, Huszar F, Caballero J, Cunningham A, Acosta A, et al. Photo-realistic single image superresolution using a generative adversarial network. CVPR. 2017:4681–90. Ledig C, Theis L, Huszar F, Caballero J, Cunningham A, Acosta A, et al. Photo-realistic single image superresolution using a generative adversarial network. CVPR. 2017:4681–90.
4.
go back to reference Zhang H, Xu T, Li H, Zhang S, Huang X, Wang X, et al. Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. CVPR. 2017:5907–15. Zhang H, Xu T, Li H, Zhang S, Huang X, Wang X, et al. Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. CVPR. 2017:5907–15.
5.
go back to reference Feng Y, Ren J, Jiang J. Object-based 2d-to-3d video conversion for effective stereoscopic content generation in 3d-tv applications. IEEE Trans Broadcast. 2011;57(2):500–9.CrossRef Feng Y, Ren J, Jiang J. Object-based 2d-to-3d video conversion for effective stereoscopic content generation in 3d-tv applications. IEEE Trans Broadcast. 2011;57(2):500–9.CrossRef
6.
go back to reference Ren J, Jiang J, Wang D, Ipson S. Fusion of intensity and inter-component chromatic difference for effective and robust colour edge detection. IET Image Process. 2010;4(4):294–301.CrossRef Ren J, Jiang J, Wang D, Ipson S. Fusion of intensity and inter-component chromatic difference for effective and robust colour edge detection. IET Image Process. 2010;4(4):294–301.CrossRef
7.
go back to reference Zabalza J, et al. Novel segemented stacked autoencoder for effective dimensionality reduction and feature extraction in hyperspectral imaging. Neurocomputing. 2016;185:1–10.CrossRef Zabalza J, et al. Novel segemented stacked autoencoder for effective dimensionality reduction and feature extraction in hyperspectral imaging. Neurocomputing. 2016;185:1–10.CrossRef
8.
go back to reference Han J, Zhang D, Hu X, Guo L, Ren J, Wu F. Background prior-based salient object detection via deep reconstruction residual. TCSVT. 2015;25(8):1309–21. Han J, Zhang D, Hu X, Guo L, Ren J, Wu F. Background prior-based salient object detection via deep reconstruction residual. TCSVT. 2015;25(8):1309–21.
9.
go back to reference Yan Y, Ren J, Zhao H, Sun G, Wang Z, Zheng J, et al. Cognitive fusion of thermal and visible imagery for effective detection and tracking of pedestrians in videos. Cogn Comput. 2018;10(1):94–104.CrossRef Yan Y, Ren J, Zhao H, Sun G, Wang Z, Zheng J, et al. Cognitive fusion of thermal and visible imagery for effective detection and tracking of pedestrians in videos. Cogn Comput. 2018;10(1):94–104.CrossRef
10.
go back to reference Han J, Zhang D, Cheng G, Guo L, Ren J. Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning. IEEE Trans Geosci Remote Sens. 2015;53(6):3325–37.CrossRef Han J, Zhang D, Cheng G, Guo L, Ren J. Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning. IEEE Trans Geosci Remote Sens. 2015;53(6):3325–37.CrossRef
11.
go back to reference Gao F, Zhang Y, Wang J, Sun J, Yang E, Hussain A. Visual attention model based vehicle target detection in synthetic aperture radar images: a novel approach. Cogn Comput. 2015;7(4):434–44.CrossRef Gao F, Zhang Y, Wang J, Sun J, Yang E, Hussain A. Visual attention model based vehicle target detection in synthetic aperture radar images: a novel approach. Cogn Comput. 2015;7(4):434–44.CrossRef
12.
go back to reference Gao F, You J, Wang J, Sun J, Yang E, Zhou H. A novel target detection method for SAR images based on shadow proposal and saliency analysis. Neurocomputing. 2017;267:220–31.CrossRef Gao F, You J, Wang J, Sun J, Yang E, Zhou H. A novel target detection method for SAR images based on shadow proposal and saliency analysis. Neurocomputing. 2017;267:220–31.CrossRef
13.
go back to reference Gao F, Ma F, Wang J, et al. Visual saliency modeling for river detection in high-resolution SAR imagery. IEEE Access. 2018;6:1000–14.CrossRef Gao F, Ma F, Wang J, et al. Visual saliency modeling for river detection in high-resolution SAR imagery. IEEE Access. 2018;6:1000–14.CrossRef
14.
go back to reference Gao F, Ma F, Zhang Y, Wang J, Sun J, Yang E, et al. Biologically inspired progressive enhancement target detection from heavy cluttered SAR images[J]. Cogn Comput. 2016;8(5):955–66.CrossRef Gao F, Ma F, Zhang Y, Wang J, Sun J, Yang E, et al. Biologically inspired progressive enhancement target detection from heavy cluttered SAR images[J]. Cogn Comput. 2016;8(5):955–66.CrossRef
15.
go back to reference Fu X, Huang J, Zeng D, Huang Y, Ding X, Paisley J. Removing rain from single images via a deep detail network. CVPR. 2017:3855–63. Fu X, Huang J, Zeng D, Huang Y, Ding X, Paisley J. Removing rain from single images via a deep detail network. CVPR. 2017:3855–63.
16.
go back to reference Shufei Zhang et al. Learning from few samples with memory network, cognitive computation, 2018; 10(1) 15–22. Shufei Zhang et al. Learning from few samples with memory network, cognitive computation, 2018; 10(1) 15–22.
17.
go back to reference Luo C, et al. Zero-shot learning via attribute regression and class prototype rectification. IEEE Transactions on Image Processing. 2018;27(2):637–48.CrossRef Luo C, et al. Zero-shot learning via attribute regression and class prototype rectification. IEEE Transactions on Image Processing. 2018;27(2):637–48.CrossRef
18.
go back to reference Liu MY, Breuel T, Kautz J. Unsupervised image-to-image translation networks. Advances in Neural Information Processing Systems. 2017:700–8. Liu MY, Breuel T, Kautz J. Unsupervised image-to-image translation networks. Advances in Neural Information Processing Systems. 2017:700–8.
19.
go back to reference Liao J, Yao Y, Yuan L, Hua G, Kang SB. Visual attribute transfer through deep image analogy. ACM Trans Graph. 2017;36(4):120.CrossRef Liao J, Yao Y, Yuan L, Hua G, Kang SB. Visual attribute transfer through deep image analogy. ACM Trans Graph. 2017;36(4):120.CrossRef
20.
go back to reference Choi Y, Choi M, Kim M, Ha JW, Kim S, Choo J. Stargan: unified generative adversarial networks for multi-domain image-to-image translation. arXiv preprint. 2017;arXiv:1711.09020. Choi Y, Choi M, Kim M, Ha JW, Kim S, Choo J. Stargan: unified generative adversarial networks for multi-domain image-to-image translation. arXiv preprint. 2017;arXiv:1711.09020.
21.
go back to reference Zhu JY, Park T, Isola P, Efros AA. Unpaired image-to-image translation using cycle-consistent adversarial networks. CVPR. 2017:2223–32. Zhu JY, Park T, Isola P, Efros AA. Unpaired image-to-image translation using cycle-consistent adversarial networks. CVPR. 2017:2223–32.
22.
go back to reference Yi Z, Zhang H, Tan P, Gong M. Dualgan: unsupervised dual learning for image-to-image translation. CVPR. 2017:2849–57. Yi Z, Zhang H, Tan P, Gong M. Dualgan: unsupervised dual learning for image-to-image translation. CVPR. 2017:2849–57.
23.
go back to reference Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep-convolutional neural networks. NIPS. 2012:1097–105. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep-convolutional neural networks. NIPS. 2012:1097–105.
24.
go back to reference Zhao B, Feng J, Wu X, Yan S. A survey on deep learning-based fine-grained object classification and semantic segmentation. Int J Autom Comput. 2017;14(2):119–35.CrossRef Zhao B, Feng J, Wu X, Yan S. A survey on deep learning-based fine-grained object classification and semantic segmentation. Int J Autom Comput. 2017;14(2):119–35.CrossRef
25.
go back to reference Yan Y, Ren J, Sun G, Zhao H, Han J, Li X, et al. Unsupervised image saliency detection with gestalt-laws guided optimization and visual attention based refinement. Pattern Recogn. 2018;79:65–78.CrossRef Yan Y, Ren J, Sun G, Zhao H, Han J, Li X, et al. Unsupervised image saliency detection with gestalt-laws guided optimization and visual attention based refinement. Pattern Recogn. 2018;79:65–78.CrossRef
26.
go back to reference Aboudib A, Gripon V, Coppin G. A biologically inspired framework for visual information processing and an application on modeling bottom-up visual attention. Cogn Comput. 2016;8(6):1007–26.CrossRef Aboudib A, Gripon V, Coppin G. A biologically inspired framework for visual information processing and an application on modeling bottom-up visual attention. Cogn Comput. 2016;8(6):1007–26.CrossRef
27.
go back to reference Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial nets. In: NIPS. 2014:2672–80. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial nets. In: NIPS. 2014:2672–80.
28.
go back to reference Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint. 2015;arXiv:1511.06434. Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint. 2015;arXiv:1511.06434.
29.
go back to reference Zhu JY, Kr¨ahenb¨uhl P, Shechtman E, Efros AA. Generative visual manipulation on the natural image manifold. In: European Conference on Computer Vision. 2016:597–613. Zhu JY, Kr¨ahenb¨uhl P, Shechtman E, Efros AA. Generative visual manipulation on the natural image manifold. In: European Conference on Computer Vision. 2016:597–613.
30.
go back to reference Gao F, Huang T, Wang J, Sun J, Hussain A, Yang E. Dual-branch deep convolution neural network for polarimetric SAR image classification. Appl Sci. 2017;7(5):447.CrossRef Gao F, Huang T, Wang J, Sun J, Hussain A, Yang E. Dual-branch deep convolution neural network for polarimetric SAR image classification. Appl Sci. 2017;7(5):447.CrossRef
31.
go back to reference Gao F, Yang Y, Wang J, et al. A deep convolutional generative adversarial networks (DCGANs)-based semi-supervised method for object recognition in synthetic aperture radar (SAR) images. Remote Sens, 2018, 10(6).CrossRef Gao F, Yang Y, Wang J, et al. A deep convolutional generative adversarial networks (DCGANs)-based semi-supervised method for object recognition in synthetic aperture radar (SAR) images. Remote Sens, 2018, 10(6).CrossRef
32.
go back to reference Reed, Scott and Akata, Zeynep and Yan, Xinchen and Logeswaran, Lajanugen and Schiele, Bernt and Lee, Honglak.: Generative adversarial text to image synthesis. In: ICML. 2016: 1060–1069. Reed, Scott and Akata, Zeynep and Yan, Xinchen and Logeswaran, Lajanugen and Schiele, Bernt and Lee, Honglak.: Generative adversarial text to image synthesis. In: ICML. 2016: 1060–1069.
33.
go back to reference Huang X, Liu MY, Belongie S, et al. Multimodal unsupervised image-to-image translation. arXiv preprint. 2018;arXiv:1804.04732. Huang X, Liu MY, Belongie S, et al. Multimodal unsupervised image-to-image translation. arXiv preprint. 2018;arXiv:1804.04732.
34.
go back to reference Zhu JY, Zhang R, Pathak D, Darrell T, Efros AA, Wang O, et al. Toward multimodal image-to-image translation. NIPS. 2017:465–76. Zhu JY, Zhang R, Pathak D, Darrell T, Efros AA, Wang O, et al. Toward multimodal image-to-image translation. NIPS. 2017:465–76.
35.
go back to reference Briggs F, Mangun GR, Usrey WM. Attention enhances synaptic efficacy and the signal-to-noise ratio in neural circuits. Nature. 2013;499(7459):476–80.CrossRef Briggs F, Mangun GR, Usrey WM. Attention enhances synaptic efficacy and the signal-to-noise ratio in neural circuits. Nature. 2013;499(7459):476–80.CrossRef
36.
go back to reference Wang Z, Ren J, Zhang D, et al. A deep-learning based feature hybrid framework for spatiotemporal saliency detection inside videos. Neurocomputing. 2018;289:68–83.CrossRef Wang Z, Ren J, Zhang D, et al. A deep-learning based feature hybrid framework for spatiotemporal saliency detection inside videos. Neurocomputing. 2018;289:68–83.CrossRef
37.
go back to reference Ma S, Fu J, Chen CW, Mei T. DA-GAN: instance-level image translation by deep attention generative adversarial networks (with supplementary materials). CVPR. 2018:5657–66. Ma S, Fu J, Chen CW, Mei T. DA-GAN: instance-level image translation by deep attention generative adversarial networks (with supplementary materials). CVPR. 2018:5657–66.
38.
go back to reference He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. CVPR. 2016:770–8. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. CVPR. 2016:770–8.
39.
go back to reference Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, et al. Residual attention network for image classification. CVPR. 2017:3156–64. Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, et al. Residual attention network for image classification. CVPR. 2017:3156–64.
40.
go back to reference Liu X, Deng Z. Segmentation of drivable road using deep fully convolutional residual network with pyramid pooling. Cogn Comput. 2018;10(2):272–81.CrossRef Liu X, Deng Z. Segmentation of drivable road using deep fully convolutional residual network with pyramid pooling. Cogn Comput. 2018;10(2):272–81.CrossRef
41.
go back to reference Fu J, Zheng H, Mei T. Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. CVPR. 2017:4438–46. Fu J, Zheng H, Mei T. Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. CVPR. 2017:4438–46.
42.
go back to reference Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, et al. Show, attend and tell: neural image caption generation with visual attention. ICML. 2015:2048–57. Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, et al. Show, attend and tell: neural image caption generation with visual attention. ICML. 2015:2048–57.
43.
go back to reference Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks. ICAIS. 2011:315–23. Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks. ICAIS. 2011:315–23.
44.
go back to reference Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. ImageNet: a large-scale hierarchical image database. CVPR. 2009:248–55. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. ImageNet: a large-scale hierarchical image database. CVPR. 2009:248–55.
45.
go back to reference Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation[C]//International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015: 234–241. Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation[C]//International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015: 234–241.
46.
go back to reference Yang P, Huang K, Liu CL. Geometry preserving multi-task metric learning. Mach Learn. 2013;92(1):133–75.CrossRef Yang P, Huang K, Liu CL. Geometry preserving multi-task metric learning. Mach Learn. 2013;92(1):133–75.CrossRef
47.
go back to reference Yang X, Huang K, Zhang R, et al. Learning latent features with infinite nonnegative binary matrix trifactorization. TETCI. 2018;99:1–14. Yang X, Huang K, Zhang R, et al. Learning latent features with infinite nonnegative binary matrix trifactorization. TETCI. 2018;99:1–14.
Metadata
Title
Unsupervised Object Transfiguration with Attention
Authors
Zihan Ye
Fan Lyu
Linyan Li
Yu Sun
Qiming Fu
Fuyuan Hu
Publication date
08-04-2019
Publisher
Springer US
Published in
Cognitive Computation / Issue 6/2019
Print ISSN: 1866-9956
Electronic ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-019-09633-3

Other articles of this Issue 6/2019

Cognitive Computation 6/2019 Go to the issue

Premium Partner