Elsevier

Information Fusion

Volume 48, August 2019, Pages 11-26
Information Fusion

FusionGAN: A generative adversarial network for infrared and visible image fusion

https://doi.org/10.1016/j.inffus.2018.09.004Get rights and content

Highlights

  • We propose a new IR/VIS fusion method based on Generative Adversarial Networks.

  • It can keep both the thermal radiation and the texture details in the source images.

  • It is an end-to-end model and does not need to design fusion rules manually.

  • Our results look like sharpened IR images with highlighted target and abundant textures.

  • We generalize it to fuse images with different resolutions like thermal pan-sharpening.

Abstract

Infrared images can distinguish targets from their backgrounds on the basis of difference in thermal radiation, which works well at all day/night time and under all weather conditions. By contrast, visible images can provide texture details with high spatial resolution and definition in a manner consistent with the human visual system. This paper proposes a novel method to fuse these two types of information using a generative adversarial network, termed as FusionGAN. Our method establishes an adversarial game between a generator and a discriminator, where the generator aims to generate a fused image with major infrared intensities together with additional visible gradients, and the discriminator aims to force the fused image to have more details existing in visible images. This enables that the final fused image simultaneously keeps the thermal radiation in an infrared image and the textures in a visible image. In addition, our FusionGAN is an end-to-end model, avoiding manually designing complicated activity level measurements and fusion rules as in traditional methods. Experiments on public datasets demonstrate the superiority of our strategy over state-of-the-arts, where our results look like sharpened infrared images with clear highlighted targets and abundant details. Moreover, we also generalize our FusionGAN to fuse images with different resolutions, say a low-resolution infrared image and a high-resolution visible image. Extensive results demonstrate that our strategy can generate clear and clean fused images which do not suffer from noise caused by upsampling of infrared information.

Introduction

Image fusion is an enhancement technique that aims to combine images obtained by different kinds of sensors to generate a robust or informative image that can facilitate subsequent processing or help in decision making [1], [2]. Particularly, multi-sensor data such as thermal infrared and visible images has been used to enhance the performance in terms of human visual perception, object detection, and target recognition [3]. For example, infrared images capture thermal radiation, whereas visible images capture reflected light. These two types of images can provide scene information from different aspects with complementary properties, and they are also inherent in nearly all objects [4].

The image fusion problem has been developed with different schemes including multi-scale transform- [5], [6], [7], sparse representation-[8], [9], neural network- [10], [11], subspace-[12], [13], and saliency-based [14], [15] methods, hybrid models [16], [17], and other methods [18], [19]. Nevertheless, the major fusion framework involves three key components, including image transform, activity level measurement, and fusion rule designing [20]. Existing methods typically use the same transform or representation for different source images during the fusion process. However, it may not be appropriate for infrared and visible images, as the thermal radiation in infrared images and the appearance in visible images are manifestations of two different phenomena. In addition, the activity level measurement and fusion rule in most existing methods are designed in a manual way, and they have become more and more complex, having the limitations of implementation difficulty and computational cost [21].

To overcome the above mentioned issues, in this paper we propose an infrared and visible image fusion method from a novel perspective based on generative adversarial network (FusionGAN), which formulates the fusion as an adversarial game between keeping the infrared thermal radiation information and preserving the visible appearance texture information. More specifically, it can be seen as a minimax problem between a generator and a discriminator. The generator attempts to generate a fused image with major infrared intensities together with additional visible gradients, while the discriminator aims to force the fused image to have more texture details. This enables our fused image to maintain the thermal radiation in an infrared image and the texture details in a visible image at the same time. In addition, the end-to-end property of generative adversarial networks (GANs) can avoid manually designing complicated activity level measurements and fusion rules.

To show the major superiority of our method, we give a representative example in Fig. 1. The left two images are the infrared and visible images to be fused, where the visible image contains detailed background and the infrared image highlights the target, i.e. the water. The third image is the fusion result by using a recent method [22]. Clearly, this traditional method is just able to keep more texture details in source images, and the property of high contrast between target and background in the infrared image cannot be preserved in the fused image. In fact, the key information in the infrared image (i.e., the thermal radiation distribution) is totally lost in the fused image. The rightmost image in Fig. 1 is the fusion result by our FusionGAN. In contrast, our result preserves the thermal radiation distribution in the infrared image, and hence the target can be easily detected. Meanwhile, the details of the background (i.e., the trees, road and water plants) in the visible image are also well retained.

The main contributions of this work lie in the following four folds. First, we propose a generative adversarial architecture and design a loss function specialized for infrared and visible image fusion. The feasibility and superiority of GANs used for image fusion are also discussed. To the best of our knowledge, it is the first time that the GANs are adopted for addressing the image fusion task. Second, the proposed FusionGAN is an end-to-end model, where the fused image can be generated automatically from input source images without manually designing the activity level measurement or fusion rule. Third, we conduct experiments on public infrared and visible image fusion datasets with qualitative and quantitative comparisons to state-of-the-art methods. Compared to previous methods, the proposed FusionGAN can obtain results looking like sharpened infrared images with clear highlighted targets and abundant textures. Last but not the least, we generalize the proposed FusionGAN to fuse source images with different resolutions such as low-resolution infrared images and high-resolution visible images. It can generate high-resolution resulting images which do not suffer from noise caused by upsampling of infrared information.

The rest of this paper is arranged as follows. Section 2 describes background material and related work on GAN. In Section 3, we present our FusionGAN algorithm for infrared and visible image fusion. Section 4 illustrates the fusion performance of our method on various types of infrared and visible image/video pairs with comparisons to other approaches. We discuss the explainability of our FusionGAN in Section 5, followed by some concluding remarks in Section 6.

Section snippets

Related work

In this section, we briefly introduce the background material and relevant works, including traditional infrared and visible image fusion methods, deep learning based fusion techniques, as well as GANs and their variants.

Method

In this section, we describe the proposed FusionGAN for infrared and visible image fusion. We start by laying out the problem formulation with GANs, and then discuss the network architectures of the generator and the discriminator. Finally, we provide some details for the network training.

Experiments

In this section, we first briefly introduce the fusion metrics used in this paper and then demonstrate the efficacy of the proposed FusionGAN on public datasets, and compare it with eight state-of-the-art fusion methods including adaptive sparse representation (ASR) [37], curvelet transform (CVT) [38], dual-tree complex wavelet transform (DTCWT) [39], fourth order partial differential equation (FPDE) [12], guided filtering based fusion (GFF) [22], ratio of low-pass pyramid (LPP) [3], two-scale

Discussion

The deep learning based techniques usually have a common problem that they are regarded as black-box models and even if we understand the underlying mathematical principles of such models they lack an explicit declarative knowledge representation, hence have difficulty in generating the underlying explanatory structures [48]. In this section, we briefly discuss the explainability of our FusionGAN.

The essence of traditional GAN is to train a generator to capture the data distribution, so that

Conclusion

In this paper, we propose a novel infrared and visible image fusion method based on generative adversarial network. It can simultaneously keep the thermal radiation information in infrared images and the texture detail information in visible images. The proposed FusionGAN is an end-to-end model, which can avoid designing complicated activity level measurement and fusion rule manually as in traditional fusion strategies. Experiments on public datasets demonstrate that our fusion results look

Acknowledgment

This work was supported by the National Natural Science Foundation of China under Grant nos. 61773295 and 61503288, and the Beijing Advanced Innovation Center for Intelligent Robots and Systems under Grant no. 2016IRS15.

References (49)

  • Y. Liu et al.

    A general framework for image fusion based on multi-scale transform and sparse representation

    Inf. Fusion

    (2015)
  • J. Ma et al.

    Infrared and visible image fusion based on visual saliency map and weighted least square optimization

    Infrared Phys. Technol.

    (2017)
  • J. Ma et al.

    Infrared and visible image fusion via gradient transfer and total variation minimization

    Inf. Fusion

    (2016)
  • J. Zhao et al.

    Fusion of visible and infrared images using global entropy and gradient constrained regularization

    Infrared Phys. Technol.

    (2017)
  • S. Li et al.

    Pixel-level image fusion: a survey of the state of the art

    Inf. Fusion

    (2017)
  • Y. Liu et al.

    Deep learning for pixel-level image fusion: recent advances and future prospects

    Information Fusion

    (2018)
  • G. Piella

    A general framework for multiresolution image fusion: from pixels to regions

    Inf. Fusion

    (2003)
  • Q. Zhang et al.

    Sparse representation based multi-sensor image fusion for multi-focus and multi-modality images: a review

    Inf. Fusion

    (2018)
  • Y. Liu et al.

    Multi-focus image fusion with a deep convolutional neural network

    Inf. Fusion

    (2017)
  • B. Yang et al.

    Visual attention guided image fusion with sparse representation

    Optik-Int. J. Light Electron Opt.

    (2014)
  • F. Nencini et al.

    Remote sensing image fusion using the curvelet transform

    Inf. Fusion

    (2007)
  • J.J. Lewis et al.

    Pixel-and region-based image fusion with complex wavelets

    Inf. fusion

    (2007)
  • D.P. Bavirisetti et al.

    Two-scale image fusion of visible and infrared images using saliency detection

    Infrared Phys. Technol.

    (2016)
  • J. Ma et al.

    Infrared and visible image fusion methods and applications: a survey

    Inf. Fusion

    (2019)
  • Cited by (1180)

    View all citing articles on Scopus
    View full text