Skip to main content
main-content

Tipp

Weitere Kapitel dieses Buchs durch Wischen aufrufen

2021 | OriginalPaper | Buchkapitel

Conditional Invertible Neural Networks for Diverse Image-to-Image Translation

verfasst von : Lynton Ardizzone, Jakob Kruse, Carsten Lüth, Niels Bracher, Carsten Rother, Ullrich Köthe

Erschienen in: Pattern Recognition

Verlag: Springer International Publishing

share
TEILEN

Abstract

We introduce a new architecture called a conditional invertible neural network (cINN), and use it to address the task of diverse image-to-image translation for natural images. This is not easily possible with existing INN models due to some fundamental limitations. The cINN combines the purely generative INN model with an unconstrained feed-forward network, which efficiently preprocesses the conditioning image into maximally informative features. All parameters of a cINN are jointly optimized with a stable, maximum likelihood-based training procedure. Even though INN-based models have received far less attention in the literature than GANs, they have been shown to have some remarkable properties absent in GANs, e.g. apparent immunity to mode collapse. We find that our cINNs leverage these properties for image-to-image translation, demonstrated on day to night translation and image colorization. Furthermore, we take advantage of our bidirectional cINN architecture to explore and manipulate emergent properties of the latent space, such as changing the image style in an intuitive way.
Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Alemi, A., Poole, B., Fischer, I., Dillon, J., Saurous, R.A., Murphy, K.: Fixing a broken elbo. In: International Conference on Machine Learning, pp. 159–168 (2018) Alemi, A., Poole, B., Fischer, I., Dillon, J., Saurous, R.A., Murphy, K.: Fixing a broken elbo. In: International Conference on Machine Learning, pp. 159–168 (2018)
2.
Zurück zum Zitat Ardizzone, L., Kruse, J., Rother, C., Köthe, U.: Analyzing inverse problems with invertible neural networks. In: International Conference on Learning Representations (2019) Ardizzone, L., Kruse, J., Rother, C., Köthe, U.: Analyzing inverse problems with invertible neural networks. In: International Conference on Learning Representations (2019)
3.
Zurück zum Zitat Ardizzone, L., Mackowiak, R., Köthe, U., Rother, C.: Exact information bottleneck with invertible neural networks: getting the best of discriminative and generative modeling. arXiv preprint arXiv:​2001.​06448 (2020) Ardizzone, L., Mackowiak, R., Köthe, U., Rother, C.: Exact information bottleneck with invertible neural networks: getting the best of discriminative and generative modeling. arXiv preprint arXiv:​2001.​06448 (2020)
5.
6.
Zurück zum Zitat Deshpande, A., Lu, J., Yeh, M.C., Jin Chong, M., Forsyth, D.: Learning diverse image colorization. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6837–6845 (2017) Deshpande, A., Lu, J., Yeh, M.C., Jin Chong, M., Forsyth, D.: Learning diverse image colorization. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6837–6845 (2017)
10.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
11.
Zurück zum Zitat Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Advances in Neural Information Processing Systems. pp. 6626–6637 (2017) Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Advances in Neural Information Processing Systems. pp. 6626–6637 (2017)
12.
Zurück zum Zitat Hoogeboom, E., Peters, J., van den Berg, R., Welling, M.: Integer discrete flows and lossless compression. In: Advances in Neural Information Processing Systems, pp. 12134–12144 (2019) Hoogeboom, E., Peters, J., van den Berg, R., Welling, M.: Integer discrete flows and lossless compression. In: Advances in Neural Information Processing Systems, pp. 12134–12144 (2019)
13.
Zurück zum Zitat Iizuka, S., Simo-Serra, E., Ishikawa, H.: Let there be color! joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Trans. Graphics (TOG) 35(4), 110 (2016) CrossRef Iizuka, S., Simo-Serra, E., Ishikawa, H.: Let there be color! joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Trans. Graphics (TOG) 35(4), 110 (2016) CrossRef
14.
Zurück zum Zitat Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR 2017, pp. 1125–1134 (2017) Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR 2017, pp. 1125–1134 (2017)
15.
Zurück zum Zitat Jacobsen, J.H., Behrmann, J., Zemel, R., Bethge, M.: Excessive invariance causes adversarial vulnerability. arXiv preprint arXiv:​1811.​00401 (2018) Jacobsen, J.H., Behrmann, J., Zemel, R., Bethge, M.: Excessive invariance causes adversarial vulnerability. arXiv preprint arXiv:​1811.​00401 (2018)
17.
Zurück zum Zitat Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. arXiv:​1710.​10196 (2017) Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. arXiv:​1710.​10196 (2017)
18.
Zurück zum Zitat Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019) Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)
20.
Zurück zum Zitat Kondo, R., Kawano, K., Koide, S., Kutsuna, T.: Flow-based image-to-image translation with feature disentanglement. In: Advances in Neural Information Processing Systems, pp. 4168–4178 (2019) Kondo, R., Kawano, K., Koide, S., Kutsuna, T.: Flow-based image-to-image translation with feature disentanglement. In: Advances in Neural Information Processing Systems, pp. 4168–4178 (2019)
21.
Zurück zum Zitat Kopf, J., Cohen, M.F., Lischinski, D., Uyttendaele, M.: Joint bilateral upsampling. In: ACM Transactions on Graphics (ToG), vol. 26, p. 96. ACM (2007) Kopf, J., Cohen, M.F., Lischinski, D., Uyttendaele, M.: Joint bilateral upsampling. In: ACM Transactions on Graphics (ToG), vol. 26, p. 96. ACM (2007)
22.
Zurück zum Zitat Laffont, P.Y., Ren, Z., Tao, X., Qian, C., Hays, J.: Transient attributes for high-level understanding and editing of outdoor scenes. ACM Trans. graphics (TOG) 33(4), 1–11 (2014) CrossRef Laffont, P.Y., Ren, Z., Tao, X., Qian, C., Hays, J.: Transient attributes for high-level understanding and editing of outdoor scenes. ACM Trans. graphics (TOG) 33(4), 1–11 (2014) CrossRef
24.
Zurück zum Zitat Nalisnick, E., Matsukawa, A., Teh, Y.W., Lakshminarayanan, B.: Detecting out-of-distribution inputs to deep generative models using a test for typicality. arXiv preprint arXiv:​1906.​02994 5 (2019) Nalisnick, E., Matsukawa, A., Teh, Y.W., Lakshminarayanan, B.: Detecting out-of-distribution inputs to deep generative models using a test for typicality. arXiv preprint arXiv:​1906.​02994 5 (2019)
25.
26.
Zurück zum Zitat Prenger, R., Valle, R., Catanzaro, B.: Waveglow: a flow-based generative network for speech synthesis. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3617–3621. IEEE (2019) Prenger, R., Valle, R., Catanzaro, B.: Waveglow: a flow-based generative network for speech synthesis. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3617–3621. IEEE (2019)
28.
Zurück zum Zitat Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. In: Advances in Neural Information Processing Systems, pp. 2234–2242 (2016) Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. In: Advances in Neural Information Processing Systems, pp. 2234–2242 (2016)
29.
Zurück zum Zitat Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)
30.
Zurück zum Zitat Sorrenson, P., Rother, C., Köthe, U.: Disentanglement by nonlinear ica with general incompressible-flow networks (gin). arXiv preprint arXiv:​2001.​04872 (2020) Sorrenson, P., Rother, C., Köthe, U.: Disentanglement by nonlinear ica with general incompressible-flow networks (gin). arXiv preprint arXiv:​2001.​04872 (2020)
31.
Zurück zum Zitat Sun, H., et al.: Dual-glow: Conditional flow-based generative model for modality transfer. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 10611–10620 (2019) Sun, H., et al.: Dual-glow: Conditional flow-based generative model for modality transfer. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 10611–10620 (2019)
33.
Zurück zum Zitat Ulyanov, D., Vedaldi, A., Lempitsky, V.: It takes (only) two: adversarial generator-encoder networks. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018) Ulyanov, D., Vedaldi, A., Lempitsky, V.: It takes (only) two: adversarial generator-encoder networks. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
35.
Zurück zum Zitat Zhao, S., Song, J., Ermon, S.: Infovae: balancing learning and inference in variational autoencoders. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 5885–5892 (2019) Zhao, S., Song, J., Ermon, S.: Infovae: balancing learning and inference in variational autoencoders. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 5885–5892 (2019)
36.
Zurück zum Zitat Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV 2017, pp. 2223–2232 (2017) Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV 2017, pp. 2223–2232 (2017)
37.
Zurück zum Zitat Zhu, J.Y., et al.: Toward multimodal image-to-image translation. In: Advances in Neural Information Processing Systems, pp. 465–476 (2017) Zhu, J.Y., et al.: Toward multimodal image-to-image translation. In: Advances in Neural Information Processing Systems, pp. 465–476 (2017)
Metadaten
Titel
Conditional Invertible Neural Networks for Diverse Image-to-Image Translation
verfasst von
Lynton Ardizzone
Jakob Kruse
Carsten Lüth
Niels Bracher
Carsten Rother
Ullrich Köthe
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-71278-5_27

Premium Partner