FusionGAN: A generative adversarial network for infrared and visible image fusion

doi:10.1016/j.inffus.2018.09.004

Information Fusion

Volume 48, August 2019, Pages 11-26

https://doi.org/10.1016/j.inffus.2018.09.004 Get rights and content

Highlights

•
We propose a new IR/VIS fusion method based on Generative Adversarial Networks.
•
It can keep both the thermal radiation and the texture details in the source images.
•
It is an end-to-end model and does not need to design fusion rules manually.
•
Our results look like sharpened IR images with highlighted target and abundant textures.
•
We generalize it to fuse images with different resolutions like thermal pan-sharpening.

Abstract

Infrared images can distinguish targets from their backgrounds on the basis of difference in thermal radiation, which works well at all day/night time and under all weather conditions. By contrast, visible images can provide texture details with high spatial resolution and definition in a manner consistent with the human visual system. This paper proposes a novel method to fuse these two types of information using a generative adversarial network, termed as FusionGAN. Our method establishes an adversarial game between a generator and a discriminator, where the generator aims to generate a fused image with major infrared intensities together with additional visible gradients, and the discriminator aims to force the fused image to have more details existing in visible images. This enables that the final fused image simultaneously keeps the thermal radiation in an infrared image and the textures in a visible image. In addition, our FusionGAN is an end-to-end model, avoiding manually designing complicated activity level measurements and fusion rules as in traditional methods. Experiments on public datasets demonstrate the superiority of our strategy over state-of-the-arts, where our results look like sharpened infrared images with clear highlighted targets and abundant details. Moreover, we also generalize our FusionGAN to fuse images with different resolutions, say a low-resolution infrared image and a high-resolution visible image. Extensive results demonstrate that our strategy can generate clear and clean fused images which do not suffer from noise caused by upsampling of infrared information.

Introduction

Image fusion is an enhancement technique that aims to combine images obtained by different kinds of sensors to generate a robust or informative image that can facilitate subsequent processing or help in decision making [1], [2]. Particularly, multi-sensor data such as thermal infrared and visible images has been used to enhance the performance in terms of human visual perception, object detection, and target recognition [3]. For example, infrared images capture thermal radiation, whereas visible images capture reflected light. These two types of images can provide scene information from different aspects with complementary properties, and they are also inherent in nearly all objects [4].

The image fusion problem has been developed with different schemes including multi-scale transform- [5], [6], [7], sparse representation-[8], [9], neural network- [10], [11], subspace-[12], [13], and saliency-based [14], [15] methods, hybrid models [16], [17], and other methods [18], [19]. Nevertheless, the major fusion framework involves three key components, including image transform, activity level measurement, and fusion rule designing [20]. Existing methods typically use the same transform or representation for different source images during the fusion process. However, it may not be appropriate for infrared and visible images, as the thermal radiation in infrared images and the appearance in visible images are manifestations of two different phenomena. In addition, the activity level measurement and fusion rule in most existing methods are designed in a manual way, and they have become more and more complex, having the limitations of implementation difficulty and computational cost [21].

To overcome the above mentioned issues, in this paper we propose an infrared and visible image fusion method from a novel perspective based on generative adversarial network (FusionGAN), which formulates the fusion as an adversarial game between keeping the infrared thermal radiation information and preserving the visible appearance texture information. More specifically, it can be seen as a minimax problem between a generator and a discriminator. The generator attempts to generate a fused image with major infrared intensities together with additional visible gradients, while the discriminator aims to force the fused image to have more texture details. This enables our fused image to maintain the thermal radiation in an infrared image and the texture details in a visible image at the same time. In addition, the end-to-end property of generative adversarial networks (GANs) can avoid manually designing complicated activity level measurements and fusion rules.

To show the major superiority of our method, we give a representative example in Fig. 1. The left two images are the infrared and visible images to be fused, where the visible image contains detailed background and the infrared image highlights the target, i.e. the water. The third image is the fusion result by using a recent method [22]. Clearly, this traditional method is just able to keep more texture details in source images, and the property of high contrast between target and background in the infrared image cannot be preserved in the fused image. In fact, the key information in the infrared image (i.e., the thermal radiation distribution) is totally lost in the fused image. The rightmost image in Fig. 1 is the fusion result by our FusionGAN. In contrast, our result preserves the thermal radiation distribution in the infrared image, and hence the target can be easily detected. Meanwhile, the details of the background (i.e., the trees, road and water plants) in the visible image are also well retained.

The main contributions of this work lie in the following four folds. First, we propose a generative adversarial architecture and design a loss function specialized for infrared and visible image fusion. The feasibility and superiority of GANs used for image fusion are also discussed. To the best of our knowledge, it is the first time that the GANs are adopted for addressing the image fusion task. Second, the proposed FusionGAN is an end-to-end model, where the fused image can be generated automatically from input source images without manually designing the activity level measurement or fusion rule. Third, we conduct experiments on public infrared and visible image fusion datasets with qualitative and quantitative comparisons to state-of-the-art methods. Compared to previous methods, the proposed FusionGAN can obtain results looking like sharpened infrared images with clear highlighted targets and abundant textures. Last but not the least, we generalize the proposed FusionGAN to fuse source images with different resolutions such as low-resolution infrared images and high-resolution visible images. It can generate high-resolution resulting images which do not suffer from noise caused by upsampling of infrared information.

The rest of this paper is arranged as follows. Section 2 describes background material and related work on GAN. In Section 3, we present our FusionGAN algorithm for infrared and visible image fusion. Section 4 illustrates the fusion performance of our method on various types of infrared and visible image/video pairs with comparisons to other approaches. We discuss the explainability of our FusionGAN in Section 5, followed by some concluding remarks in Section 6.

Section snippets

Related work

In this section, we briefly introduce the background material and relevant works, including traditional infrared and visible image fusion methods, deep learning based fusion techniques, as well as GANs and their variants.

Method

In this section, we describe the proposed FusionGAN for infrared and visible image fusion. We start by laying out the problem formulation with GANs, and then discuss the network architectures of the generator and the discriminator. Finally, we provide some details for the network training.

Experiments

In this section, we first briefly introduce the fusion metrics used in this paper and then demonstrate the efficacy of the proposed FusionGAN on public datasets, and compare it with eight state-of-the-art fusion methods including adaptive sparse representation (ASR) [37], curvelet transform (CVT) [38], dual-tree complex wavelet transform (DTCWT) [39], fourth order partial differential equation (FPDE) [12], guided filtering based fusion (GFF) [22], ratio of low-pass pyramid (LPP) [3], two-scale

Discussion

The deep learning based techniques usually have a common problem that they are regarded as black-box models and even if we understand the underlying mathematical principles of such models they lack an explicit declarative knowledge representation, hence have difficulty in generating the underlying explanatory structures [48]. In this section, we briefly discuss the explainability of our FusionGAN.

The essence of traditional GAN is to train a generator to capture the data distribution, so that

Conclusion

In this paper, we propose a novel infrared and visible image fusion method based on generative adversarial network. It can simultaneously keep the thermal radiation information in infrared images and the texture detail information in visible images. The proposed FusionGAN is an end-to-end model, which can avoid designing complicated activity level measurement and fusion rule manually as in traditional fusion strategies. Experiments on public datasets demonstrate that our fusion results look

Acknowledgment

This work was supported by the National Natural Science Foundation of China under Grant nos. 61773295 and 61503288, and the Beijing Advanced Innovation Center for Intelligent Robots and Systems under Grant no. 2016IRS15.

References (49)

Y. Ma et al.
Infrared and visible image fusion using total variation model
Neurocomputing
(2016)
A. Toet
Image fusion by a ratio of low-pass pyramid
Pattern Recognit. Lett.
(1989)
X. Jin et al.
A survey of infrared and visual image fusion methods
Infrared Phys. Technol.
(2017)
S. Li et al.
Performance comparison of different multi-resolution transforms for image fusion
Inf. Fusion
(2011)
G. Pajares et al.
A wavelet-based image fusion tutorial
Pattern Recognit.
(2004)
J. Wang et al.
Fusion method for infrared and visible images by using non-negative sparse representation
Infrared Phys. Technol.
(2014)
T. Xiang et al.
A fusion algorithm for infrared and visible images based on adaptive dual-channel unit-linking pcnn in nsct domain
Infrared Phys. Technol.
(2015)
W. Kong et al.
Novel fusion method for visible light and infrared images based on nsst–sf–pcnn
Infrared Phys. Technol.
(2014)
W. Kong et al.
Adaptive fusion method of visible light and infrared images based on non-subsampled shearlet transform and fast non-negative matrix factorization
Infrared Phys. Technology
(2014)
J. Zhao et al.
Infrared image enhancement through saliency feature analysis based on multi-scale decomposition
Infrared Phys. Technol.
(2014)

Y. Liu et al.

A general framework for image fusion based on multi-scale transform and sparse representation

Inf. Fusion

(2015)

J. Ma et al.

Infrared and visible image fusion based on visual saliency map and weighted least square optimization

Infrared Phys. Technol.

(2017)

J. Ma et al.

Infrared and visible image fusion via gradient transfer and total variation minimization

Inf. Fusion

(2016)

J. Zhao et al.

Fusion of visible and infrared images using global entropy and gradient constrained regularization

Infrared Phys. Technol.

(2017)

S. Li et al.

Pixel-level image fusion: a survey of the state of the art

Inf. Fusion

(2017)

Y. Liu et al.

Deep learning for pixel-level image fusion: recent advances and future prospects

Information Fusion

(2018)

G. Piella

A general framework for multiresolution image fusion: from pixels to regions

Inf. Fusion

(2003)

Q. Zhang et al.

Sparse representation based multi-sensor image fusion for multi-focus and multi-modality images: a review

Inf. Fusion

(2018)

Y. Liu et al.

Multi-focus image fusion with a deep convolutional neural network

Inf. Fusion

(2017)

B. Yang et al.

Visual attention guided image fusion with sparse representation

Optik-Int. J. Light Electron Opt.

(2014)

F. Nencini et al.

Remote sensing image fusion using the curvelet transform

Inf. Fusion

(2007)

J.J. Lewis et al.

Pixel-and region-based image fusion with complex wavelets

Inf. fusion

(2007)

D.P. Bavirisetti et al.

Two-scale image fusion of visible and infrared images using saliency detection

Infrared Phys. Technol.

(2016)

J. Ma et al.

Infrared and visible image fusion methods and applications: a survey

Inf. Fusion

(2019)

Cited by (1180)

MFHOD: Multi-modal image fusion method based on the higher-order degradation model
2024, Expert Systems with Applications
The task of multimodal image fusion aims to preserve the respective advantages of each modality, such as the detailed texture information from visible light images and the salient target information from infrared images. However, images in real environments are often not perfect high-quality images but are affected by various degradation factors such as noise, blur, and glare. Existing unsupervised multimodal image fusion algorithms use undegraded source images as input to fusion networks and generate high-quality fusion images through complex fusion strategies. This single linear learning approach from high-quality to high-quality cannot be applied to the fusion task of degraded multimodal images in real environments. To address this issue, this paper proposes an end-to-end multimodal image fusion network based on a high-order local degradation model (MFHOD). Firstly, inspired by the idea of probabilistic degradation, we propose a high-order local random degradation model (HODM), which inputs the source multimodal images into the degradation model to obtain degraded images before feeding them into the network. Secondly, we design a simple and efficient dual-branch feature extraction encoder to extract deep features from images. Then, from the perspectives of image pixels, brightness, and gradients, we propose an improved composite loss composed of multiple loss functions to constrain network training. Finally, we propose a $L_{2}$ -norm fusion strategy for preserving brightness information in low-light nighttime images. Our MFHOD demonstrates good performance on infrared and visible light image datasets as well as medical image datasets. Experimental results show that MFHOD can effectively suppress the effects of glare, noise, and smoke in adverse environments, and also improve the quality of fusion images in low-light and nighttime environments.
FASO-C: A rapid visualization technique based on optimized fusion with crossover-based atom search for multi-band imagery
2024, Expert Systems with Applications
Multi-band image fusion is a vital image processing technique used to combine data from multiple images or sensor bands, particularly in scenarios where a wide range of spectral information is available. Its primary objective is to create a single, comprehensive image that not only visually enhances the scene but also preserves critical details and characteristics present in each source. This paper proposes a novel image fusion technique that combines the useful features from a set of source images to provide a quick visualization of the scene. The approach is highly beneficial for multi-band images, including a large number of bands acquired by sampling the wavelength spectrum at narrow intervals. The fusion process depends on a multi-objective cost function, which optimizes the resultant fused image according to specific desired characteristics. By solving this optimization framework, an optimal set of weights is obtained using a novel Fusion-Driven Atom Search Optimization with Crossover (FASO-C) algorithm for the fusion of the image bands. Crossover operation enhances the exploitation capability of ASO, enabling the global optimal solution to be achieved quickly. To further validate the effectiveness of the proposed fusion model, various experiments were conducted to assess its performance using both subjective and objective metrics, as well as analyze its convergence.
CROSE: Low-light enhancement by CROss-SEnsor interaction for nighttime driving scenes
2024, Expert Systems with Applications
An increasing number of image perception models are being utilized in the field of autonomous driving. During nighttime driving, the visual perception capabilities of a single-modality RGB sensor become compromised. To address this issue, image fusion methods that utilize multi-modality data to provide a more comprehensive visual representation for nighttime scenes. Nevertheless, the existing image fusion methods are limited by poor generalization abilities and lighting conditions, resulting in inadequate handling of the contrast and spectral corruption. In this paper, we propose a cross-sensor low-light enhancement algorithm that provides more accurate visual perceptions. Our approach employs multiple sensors as preceptors for nighttime driving scenes. Compared with typical fusion algorithms that use a simple one-stage workflow for cross-sensor data, our proposed method adopts an advanced content-enhancement strategy by recursively and interactively scaling up informative pixels. More specifically, our approach employs information measurement to describe the information and then utilizes a cross-sensor content enhancement module to dynamically enhance the mutual information between the infrared spectrum and the RGB streams. Our experiments show that CROSE effectively preserves texture details, resulting in clearer and more solid fusion results for nighttime driving scenes.
Adversarial attacks on GAN-based image fusion
2024, Information Fusion
Image fusion has achieved significant success, owing to the rapid development of digital computing and Generative Adversarial Networks (GANs). GAN-based fusion techniques fuse latent codes through spatial or arithmetic operations to achieve real image fusion, facilitated by encoders. However, security concerns have arisen due to the vulnerability of deep neural networks to adversarial perturbations. This vulnerability extends to the GAN-based image fusion task. In this paper, we introduce two methods for creating adversarial examples in the context of GAN-based image fusion: adding subtle perturbations to input images or applying the adversarial patch to the input images. The subtle perturbation is meticulously crafted to be imperceptible yet capable of misleading a specific output, regardless of the input images. Conversely, the adversarial patch is a universal perturbation that, when applied to input images as a patch, induces meaningless output images. Our comprehensive experiments, conducted on datasets such as FFHQ (3 × 1024 × 1024) and Stanford Cars (3 × 512 × 512), include both quality and quantity evaluations. According to our experimental results, the subtle perturbation results in almost identical output images, while the adversarial patch induces meaningless fused images and can be transferred to other datasets. By demonstrating that image fusion models are highly vulnerable to adversarial attacks, this study highlights serious concerns regarding the security of these models.
MFIFusion: An infrared and visible image enhanced fusion network based on multi-level feature injection
2024, Pattern Recognition
Infrared and visible image fusion aims to integrate complementary information from both types of images. Existing deep learning-based fusion methods rely solely on the final output of the feature extraction network, which may overlook valuable information presented in the middle layers of the network, ultimately reducing the richness of the fusion results, i.e., detailed texture information might not be fully extracted and integrated into the fused image. This study proposes a multi-level feature injection method based on an image decomposition model for infrared and visible image fusion, termed as MFIFusion. On the one hand, we introduce an attention-guided multi-level feature injection module designed to mitigate information loss during the feature extraction stage of the image scale decomposition process. More specifically, the proposed method integrates multiple fusion branches in the encoder network and employs an attention mechanism to guide the feature fusion process. On the other hand, based on the characteristics that superficial features retain image detail information and profound features are more suitable for extracting semantic information from images, we use distinct fusion strategies in these two phases to adaptively control the intensity distribution of the salient targets and to preserve the texture information in the background region. Qualitative and quantitative results demonstrate that our proposed approach produces fused images that are more visually salient to the target and contain richly detailed textures.
A semantic-driven coupled network for infrared and visible image fusion
2024, Information Fusion
In order to be adapted to high-level vision tasks, several infrared and visible image fusion methods cascade with the downstream network to enhance the semantic information of fusion results. However, due to the feature-level heterogeneities between fusion and downstream tasks, these methods suffer from the loss of pixel-level information and incomplete reconstruction of semantic-level information. To further improve the performance of fusion images in high-level vision tasks, we propose a semantic-driven coupled network for infrared and visible image fusion, terms as SDCFusion. Firstly, to address feature heterogeneity, we couple the segmentation and fusion networks into a joint framework such that both networks share the multi-level cross-modality coupled features. Based on the joint optimization of dual tasks, a joint action between fusion and downstream tasks is formed to force the cross-modality coupled features modeled on both pixel domain and semantic domain. Subsequently, to guide the semantic information reconstruction, we cascade two networks to form the semantic-based driven action, which continuously optimizes the fusion image to achieve semantic representation capacity. In addition, we introduce an adaptive training strategy to reduce the complexity of dual-task training. Specifically, an mIoU-based semantic measurement weight is designed to balance the joint action and driven action throughout the training process. We evaluate our method at both pixel information and semantic information levels, respectively. The qualitative and quantitative experiments verify the superiority of SDCFusion in terms of visual effects and metrics. The object detection and semantic segmentation experiments demonstrate that SDCFusion achieves superior performance in high-level vision tasks. The source code is available at https://github.com/XiaoW-Liu/SDCFusion.

View all citing articles on Scopus

View full text

FusionGAN: A generative adversarial network for infrared and visible image fusion

Highlights

Abstract

Introduction

Section snippets

Related work

Method

Experiments

Discussion

Conclusion

Acknowledgment

Neurocomputing

Pattern Recognit. Lett.

Infrared Phys. Technol.

Inf. Fusion

Pattern Recognit.

Infrared Phys. Technol.

Infrared Phys. Technol.

Infrared Phys. Technol.

Infrared Phys. Technology

Infrared Phys. Technol.

Inf. Fusion

Infrared Phys. Technol.

Inf. Fusion

Infrared Phys. Technol.

Inf. Fusion

Information Fusion

Inf. Fusion

Inf. Fusion

Inf. Fusion

Optik-Int. J. Light Electron Opt.

Inf. Fusion

Inf. fusion

Infrared Phys. Technol.

Inf. Fusion