nach oben

Complex & Intelligent Systems

Erschienen in:

Open Access 30.06.2022 | Original Article

Multimodal medical image fusion with convolution sparse representation and mutual information correlation in NSST domain

verfasst von: Peng Guo, Guoqi Xie, Renfa Li, Hui Hu

Erschienen in: Complex & Intelligent Systems | Ausgabe 1/2023

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Patentsuche

Aus

Abstract

Multimodal medical image is an effective method to solve a series of clinical problems, such as clinical diagnosis and postoperative treatment. In this study, a medical image fusion method based on convolutional sparse representation (CSR) and mutual information correlation is proposed. In this method, the source image is decomposed into one high-frequency and one low-frequency sub-band by non-subsampled shearlet transform. For the high-frequency sub-band, CSR is used for high-frequency coefficient fusion. For the low-frequency sub-band, different fusion strategies are used for different regions by mutual information correlation analysis. Analysis of two kinds of medical image fusion problems, namely, CT–MRI and MRI–SPECT, reveals that the performance of this method is robust in terms of five common objective metrics. Compared with the other six advanced medical image fusion methods, the experimental results show that the proposed method achieves better results in subjective vision and objective evaluation metrics.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Introduction

In recent years, medical imaging has become an indispensable means in clinical diagnosis, surgery, and radiotherapy. However, single-modality medical images only focus on a certain type of morphological features. For example, computed tomography (CT) images reflect structural information about bone but are insensitive to soft tissues with a similar density. Although magnetic resonance imaging (MRI) has a strong ability to display soft tissues, it is poor in showing bone lesions and calcified lesions. Therefore, image fusion can obtain complementary information from different modalities of medical images and help clinicians perform postoperative detection and tumor and bone growth monitoring [1, 2].

Among many medical image fusion methods, a kind of method based on multiscale transform has attracted the attention of researchers, because it adopts a similar multiresolution processing mechanism to the human visual system. These methods include pyramid-based decomposition [3], wavelet-based decomposition [4] and multiscale geometric analysis decomposition [5]. These methods exist two common problems: first, it is difficult to determine the decomposition level; Second, the fusion strategy is difficult to choose.

Decomposition level is a key problem to be solved. When the decomposition level is low, it cannot extract enough spatial details from the source map, whereas when the decomposition level is high, the fusion of high-frequency sub-band is more sensitive to noise and registration [6‐8]. The general solution is simply to decompose an image into one high-frequency sub-band and one low-frequency sub-band. The high-frequency sub-bands contain more details and edge information, whereas the low-frequency sub-band contains the contour and structure information of images. In recent years, multiscale decomposition methods based on NSCT and NSST are popular because of their multiscale, multidirectional, and shift invariant. In particular, NSST has attracted more attention because of its superior computational efficiency to NSCT. Compared with pyramid-based methods, such as Gaussian pyramid decomposition, Laplacian pyramid decomposition [9], and gradient pyramid transformation [10], the method based on NSST can be decomposed from multiple directions, thus obtaining more image details. Compared with wavelet methods, such as discrete wave and dual tree complex wavelet, the method based on NSST can represent the curve and edge details of image well. Compared with multiscale geometric transformations, such as contourlet transform (COT) [11] and shearlet transform (ST) [12], the method based on NSST does not produce pseudo Gibbs phenomenon due to frequency aliasing. However, most of the existing NSST decomposition methods have a higher decomposition levels, which not only increases the amount of calculation, but also makes high-frequency sub-bands susceptible to noise. To preserve the structural information of the image as much as possible and to extract additional salient details, a new multiscale decomposition method is proposed in this study. Unaffected by the scale parameters of the general multiscale decomposition, this method uses NSST to decompose the image only in two scales, namely, one high-frequency sub-band and one low-frequency sub-band. In addition to using convolution sparse representation to enhance the detailed information of high-frequency sub-band, correlation analysis should be used to extract the detailed information of low-frequency sub-band due to the rich detailed information contained in low-frequency sub-band. Sparse representation seeks to represent image features with as few sparse vectors as possible, which is widely used in image reconstruction and denoising. The improvement of convolution sparse representation is that the sparse coefficients of local image blocks are replaced by global sparse coefficients.

The fusion strategy is important for the quality of fused image. In multiscale decomposition, a common strategy is to measure the activity of the decomposition coefficients first and then fuse them in accordance with the mean or maximum value. For example, in [13, 14], high- and the low-frequency sub-bands adopt the maximum scheme for fusion. However, low-frequency sub-bands provides structure information similar to the source image, whereas high-frequency sub-bands contain important details, thus, the same fusion scheme cannot consider the similarity and importance of the image simultaneously. In [15], a weighted average fusion strategy is adopted for similar regions of images, in this strategy, weight is calculated using the Siamese network. However, the definition of similar regions by this method directly affects the effect of the final image fusion. Recently, principal component analysis (PCA) [16], sparse representation [17, 18], smallest univalue segment assimilating nucleus (SUSAN) [19], and pulse coupled neural network (PCNN) [20, 21] have been used to enhance the salient information of fused images and measure the activity of decomposition coefficients. However, these methods have their own problems either in the selection of sparse dictionaries or in the training time. To obtain better fusion effect, different fusion strategies are selected according to different sub-bands in this study, namely, maximum fusion is used for the high-frequency sub-band and details of the low-frequency sub-band, and weighted average fusion is used for the similar structural information in the low-frequency sub-band.

This study focuses on the determination of decomposition scale and the fusion strategy of different frequency bands in NSST decomposition. To avoid the influence of high noise and registration on the fusion of high-frequency sub-band when the NSST decomposition scale is too high, this study only carries out one-level decomposition of NSST, that is, one high-frequency sub-band and one low-frequency sub-band. How to use mutual information correlation analysis to mine the detailed information in the low-frequency sub-band is one of the research objectives of this study. It has been explained above that it is inappropriate for all sub-bands to adopt the same fusion strategy. Another objective of this study is to study which fusion strategy should be adopted for high-frequency sub-band, and similar and dissimilar regions of low-frequency sub-band.

The main innovation points of this study include the following three aspects:

The convolutional sparse representation (CSR) model is used to process the high-frequency sub-band, which increases the detailed features and reduces the block effect caused by NSST decomposition, as well as the redundant information of different source graphs.

Mutual information correlation is used to extract detail information of low-frequency sub-bands. Given that only two-scale decomposition is conducted, the low-frequency sub-band contains abundant details. The mutual information correlation analysis can find the regions containing detailed information from the low-frequency sub-band.

Two different fusion strategies are used for low-frequency sub-band. The structural information of similarity is fused using the weighted average scheme, where the weight takes the product of the correlation analysis coefficient and the regional energy sum. The Laplacian energy gradient was used to measure the activity of the dissimilar regions to reflect the contrast changes of the regions.

The remaining sections of this paper are organized as follows: the next section describes related work about NSST and CSR. The following section explains the methods in detail. In the next section, a comparative experiment is simulated, and the corresponding results are analyzed. The last section summarizes the study.

NSST

Non-subsampled contourlet and non-subsampled shearlet waves are two popular multiscale geometric decomposition methods, because they are multiscale, multidirectional, and shift invariant. Given that the non-subsampled shearlet wave does not limit its direction and does not need to reverse the directional filter bank, its computational efficiency is higher than that of NSCT. NSST consists of two processes: non-subsampled pyramid scale decomposition (NSPFs) and shift-invariant shearlet filter banks (SFBS).

Figure 1 shows the framework of two-level NSST decomposition. The input image is decomposed into a high-frequency sub-band and a low-frequency sub-band after the first-level scale decomposition by NSPF, and then the low-frequency sub-band is decomposed into the second-level high-frequency sub-band and low-frequency sub-band. Therefore, the input image decomposed by L-level NSST will be transformed into L high-frequency sub-bands and one low-frequency sub-band. At each scale, multiple directions of sub-bands can be obtained by SFBs. Moreover, given that the traditional subsampling decomposition may bring frequency overlap, the pseudo-Gibbs phenomenon easily occurs. Thus, non-subsampled decomposition is adopted in NSST [22].

Convolutional sparse representation

The idea of SR comes from the learning process of image structures by the receptor field of simple cells in the visual cortex V1 area. Given its simple representation, SR has been widely used in image denoising [23], feature extraction [24], and super-resolution [25, 26]. Yang [27] and Yin [28] et al. applied SR to image fusion. The main difficulties in SR image fusion are sparse model selection and overcomplete dictionary learning. The common sparse model is based on a single image component and local patch, and the mathematical form is defined as

$$ \mathop {\min }\limits_{x} \left\| x \right\|_{0} {\text{ s}}{\text{.t}}{. }\, \left\| {y - Dx} \right\|_{2} < \varepsilon , $$

(1)

Here, $y \in R^{n}$ is the stacked vector representation of the image patch $\sqrt n \times \sqrt n$. $D \in R^{n \times m}$ is an overcomplete dictionary, and $x \in R^{m}$ is the sparse coefficient to be solved. However, the disadvantage of this model is that the sparse coefficient is obtained through the calculation of overlapping patches, thus, the global sparse coefficient of the whole image cannot be obtained. To improve fusion performance, Wohlberg [29] proposed the convolution form of SR. Liu [30] integrated morphological component analysis (MCA) and CSR into a unified optimization framework, which could realize multicomponent and global SR of source images simultaneously. CSR is given by the following equation:

$$ \mathop {\min }\limits_{{xm}} \frac{1}{2}\left\| {Y - \sum\limits_{{m = 1}}^{M} {d_{m} } \times X_{m} } \right\|_{2}^{{2_{{}} }} + \lambda \sum\limits_{{m = 1}}^{M} {\left\| {X_{m} } \right\|_{1} } . $$

(2)

Here, $Y$ is the whole image, which is modeled as the sum of the convolution map between $M$ local dictionary filters and global coefficients. the global single value and shift invariance features of CSR are conducive to extracting more detailed information and enhancing its robustness.

Another difficulty in SR is the learning of overcomplete dictionaries. In SR, the more fully the dictionary represents all the details of the image, the more likely the result of the reconstruction can restore the source image. The design of dictionaries usually adopts two methods: one is based on known transformation basis, such as discrete cosine transforms and wavelet basis. However, as data and application range change, the performance of such fixed dictionaries degrades considerably. The second is a learning-based approach. K-SVD and its improved dictionary learning method are widely used in medical image fusion [31, 32]. The adaptive K-SVD dictionary is constantly updated through iterative training, and it is updated alternately with sparse coding. The disadvantage is that the dictionary training time is long. In multimodal medical image fusion, the structure of medical images from different sensing devices is more complex, and data are more redundant. Therefore, dictionary learning based on joint block clustering is a better choice. By clustering similar patches of all source images, a complete and compact dictionary can be formed.

Proposed framework

As shown in Fig. 2, based on the general framework of image fusion in multiscale transformation domain, the proposed method mainly consists of three stages: multiscale decomposition, low-frequency and high-frequency sub-band fusion, and NSST reconstruction. For simplicity, two source diagrams are used for illustration. First, NSST is applied to source image $I_{a}$ and $I_{b}$. After first-level decomposition, a high-frequency sub-band with significant details and a low-frequency sub-band with structure information can be obtained. Second, a CSR-based maximum fusion strategy is used for the fusion of high-frequency coefficients. For the low-frequency sub-band, mutual information is used for the correlation analysis, and then local energy coefficients are used for the weighted summation of similar parts, whereas domain energy gradient and maximum fusion are used for dissimilar parts. Finally, the image is reconstructed by NSST inverse solution.

High-frequency sub-band fusion

Since the high-frequency sub-band contains the details of the image, the fusion of the high-frequency sub-band is mainly to fuse the salient features of the high-frequency sub-band. The advantage of convolution sparse representation is that it can describe these features with fewer sparse coefficients.

The sparse coefficients of the high-frequency sub-band of each source image are obtained by Eq. (2). Set $X_{m}^{k}$ indicates the sparse coefficient of the high-frequency sub-band of the $k$ image, $X_{m,1:N}^{k} \left( {x,y} \right)$ indicates the content of the location $\left( {x,y} \right)$; it is a $N$ dimensional vector. Here, the norm $L_{1}$ of $X_{m,1:N}^{k} \left( {x,y} \right)$ is used to measure the activity level of the source image. Thus, the sparse coefficient fusion rule of high-frequency sub-band is defined as

$$ X_{m,1:N}^{F} \left( {x,y} \right) = \left\{ {\begin{array}{l} {\begin{array}{*{20}c} {X_{m,1:N}^{A} \left( {x,y} \right)} & {{\text{if}} \left| {X_{m,1:N}^{A} \left( {x,y} \right)} \right|_{1} \ge \left| {X_{m,1:N}^{B} \left( {x,y} \right)} \right|_{1} } \\ \end{array} } \\ {\begin{array}{*{20}c} {X_{m,1:N}^{B} \left( {x,y} \right)} & {\text{ otherwise}} \\ \end{array} } \\ \end{array} } \right.. $$

(3)

The fused sparse coefficients are reconstructed by Eq. (4). The fused high-frequency sub-band is defined as

$$ H_{F} = \mathop \sum \limits_{m = 1}^{M} d_{m} \times X_{m}^{F} + \theta \mathop \sum \limits_{m = 1}^{M} X_{m}^{F} . $$

(4)

Low-frequency sub-band fusion

Given the low decomposition scale, the low-frequency sub-band still has abundant important information. To extract such information further, normalized mutual information (NMI) is used for correlation analysis. For the area with high correlation, the fusion strategy of local energy weighted summation is adopted to preserve energy as much as possible. For the area with low correlation, the neighborhood energy gradient (NEG) is adopted to highlight the contrast edge information of the source map as much as possible.

The basis of pixel-level image fusion is that the input images are linear and complementary. By correlation analysis of low-frequency sub-band, the salient features of the source maps can be preserved. Mutual information is often used in multimodal image registration, which is a statistical correlation method based on gray value. The greater the mutual information between two images, the higher the correlation between the two images. The mutual information quantity of an image can be calculated by Kullback–Leibler Divergence, and the mathematical form is as follows:

$$ {\text{MI}}\left( {x,y} \right) = \mathop \sum \limits_{x} \mathop \sum \limits_{y} P\left( {x,y} \right)\log \frac{{P\left( {x,y} \right)}}{P\left( x \right)P\left( y \right)}, $$

(5)

$$ H\left( x \right) = - \mathop \sum \limits_{x} P\left( x \right)\log_{2} P\left( x \right), $$

(6)

$$ H\left( y \right) = - \mathop \sum \limits_{y} P\left( y \right)\log_{2} P\left( y \right). $$

(7)

$P\left( x \right)$ and $P\left( y \right)$ represent the probability distribution of random variables $X$ and $Y$, $P\left( {x,y} \right)$ represents the joint distribution, $H\left( {x,y} \right)$ represents the joint entropy of $X$ and $Y$, and the joint entropy reflects the correlation of random variables $X$ and $Y$. Here, $X$ represents the image $I_{a}$ and $Y$ represents the image $I_{b}$. The joint entropy of the two can be calculated using the joint histogram.

The mutual information result is easily affected by the clustering result with numerous clusters, thus, the NMI maps the mutual information to the interval range of [0, 1] and is defined as

$$ {\text{NMI}}\left( {x,y} \right) = \frac{{2MI\left( {x,y} \right)}}{H\left( x \right) + H\left( y \right)}. $$

(8)

After a $3 \times 3$ sliding window partitioning, the correlation of the local region of the low-frequency sub-band can be obtained. Here, the mutual information $T$ of the whole image is regarded as the threshold of correlation. When the local region NMI is greater than the threshold $T$, the image blocks are correlated. Therefore, weighted average scheme is used. In medical image fusion, the intensities of different source images at the same location may vary magnificently, because the source images are captured with different imaging mechanisms. Therefore, the weight matrix cannot be evaluated with a simple average. Here, we use the center pixel energy to calculate the matrix, the center pixel energy is an adaptive weight calculate method based on region energy, and the mathematical form is defined by the following equation:

$$ E_{m} \left( {x,y} \right) = \mathop \sum \limits_{i = - N}^{N} \mathop \sum \limits_{j = - N}^{N} W_{L} \left( {i + N + 1,j + N + 1} \right) \times L_{m} \left( {x + i,y + j} \right)^{2} . $$

(9)

Here, $N$ is the radius of the local region $\left( {2N + 1, 2N + 1} \right)$, $L_{m}$ is the low-frequency coefficients of $m$ image, and $W_{L}$ indicates the weight of the local pixel. Given that the low-frequency sub-band image is relatively smooth, the weight can be directly represented in $2^{2N - d}$, where $d$ is the distance from the field pixel to the center point, if $N = 1$, then normalized $W_{L}$ is defined as

$$ W_{L} = \frac{1}{16}\left[ {\begin{array}{*{20}c} 1 & 2 & 1 \\ 2 & 4 & 2 \\ 1 & 2 & 1 \\ \end{array} } \right]. $$

For the low-frequency sub-band with high correlation, the coefficient blocks are fused by the strategy of the weighted sum of energy of the center pixel and are defined as

$$ \begin{gathered} L_{F} = w_{1} L_{A} + w_{2} L_{B} , \hfill \\ w_{1} = \frac{{E_{A} }}{{E_{A} + E_{B} }},\;w_{2} = \frac{{E_{B} }}{{E_{A} + E_{B} }}. \hfill \\ \end{gathered} $$

(10)

For the low-frequency sub-band with low correlation, that is, when the NMI of the local region is less than the threshold $T$, the fusion of the coefficient blocks adopts the strategy of maximum energy gradient. NEG is essentially the Sum of Laplace energy, which is a parameter characterizing image edge features. NEG reflects the contrast change of the neighbor window and the edge information of the image block and is defined as

$$ {\text{NEG}}\left( {x,y} \right) = \mathop \sum \limits_{i = - N}^{N} \mathop \sum \limits_{j = - N}^{N} {\text{LEG}}\left( {x + i,y + j} \right)^{2} , $$

(11)

$$ {\text{LEG}}\left( {x + i,y + j} \right)^{2} = \mathop \sum \limits_{{\left( {m,n} \right) \in \Omega }} \left[ {L\left( {x,y} \right) - L\left( {m,n} \right)} \right]^{2} . $$

(12)

Here, Ω indicates a neighbor window, after the activity is measured by NEG, the maximum value is taken for the fusion of low-frequency coefficients, and the equation is as follows:

$$ L_{F} = \left\{ {\begin{array}{l} {\begin{array}{*{20}c} {L_{A} } & {{\text{if}} {\text{NEG}}\left( {x,y} \right)_{1} \ge {\text{NEG}}\left( {x,y} \right)_{2} } \\ \end{array} } \\ {\begin{array}{*{20}c} {L_{B } } & {\text{ otherwise}} \\ \end{array} } \\ \end{array} } \right.. $$

(13)

The detailed description of the algorithm is shown in Algorithm 1.

Experiment and analysis

Comparison algorithms

Nine medical image fusion methods, which have been proposed in recent years, are compared with the proposed method. These methods are based on SR or multiscale transformation and include LP–ASR [33], SR–NSCT [34], parameter-adaptive and pulse-coupled neural network (PA–PCNN) [13], PC–LLE [14], IOI–LLF [35], CNN–CP [15], CoF–MLE–NSST [36], PSO–NSST [37] and PCNN–NSST [38]. The LP–ASR method is based on Laplacian Pyramid decomposition and adaptive SR, and the sparse coefficient fusion scheme is used to reduce the noise of the high-frequency components. The SR–NSC method incorporates NSCT into the SR fusion framework, and different fusion strategies are used for low- and high-frequency coefficients. The PA–PCNN method first performs NSST decomposition on the source images, and then a PA–PCNN model is used in the high-frequency sub-band fusion. After NSCT decomposition in the PC–LLE method, the high-frequency sub-bands are fused by the phase consistency rule. The image of Interest–Laplacian filter (IOI–LLF) method uses local LLF to decompose the source image into residual and Ground images and further decomposes the residual image based on IOI. The CNN–CP method uses a trained Siamese convolution network to fuse the pixel activity information of the source image and generate a weight map. The first two methods are based on SR, whereas the last four methods are all multiscale decomposition methods.

Objective evaluation metrics

To evaluate the performance of the various methods, five widely recognized object metrics, namely, entropy (EN) [39], structural similarity ($Q_{e}$) [40], mutual information (MI) [41], gradient ($Q^{{{\raise0.7ex\hbox{${{\text{AB}}}$} \!\mathord{\left/ {\vphantom {{{\text{AB}}} {\text{F}}}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${\text{F}}$}}}}$) [42] and the human eye visual perception (VIF) [43], are used in the experiment. EN can reflect the amount of information contained in the fused image; $Q_{e}$ represents the degree of similarity between the fused and source images; MI is a mutual information indicator used to measure the information contained in the fused image; $Q^{{{\raise0.7ex\hbox{${{\text{AB}}}$} \!\mathord{\left/ {\vphantom {{{\text{AB}}} {\text{F}}}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${\text{F}}$}}}}$ is a quality metric based on gradient, which is mainly used to measure the edge information of fused images; VIF is the information ratio between the fused image and the source image and is used to evaluate the human visualization performance of the fused image.

Experimental settings

In this experiment, 10 groups of CT–MRI images and 10 groups of MRI–SPECT images are used in fusion performance tests. As shown in Fig. 3, the first row (a) shows two sets of CT–MRI images. The second row (b) shows two sets of MRI–SPECT images. these images are from the Whole Brain Atlas provided by Harvard Medical School; each image has a resolution size of 256 × 256. All experiments are programmed by MATLAB 2014a, and the simulation experiment environment is Intel(R) Core(TM) I7-8565U CPU @ 1.80 GHz and 8.00 GB RAM.

Experimental results

Figure 4 shows the results of two groups of CT–MRI images obtained by different fusion methods. LP–ASR and SR–NSCT lose part of the energy of CT images and reduce the contrast of fusion images; PC–LLE and CNN–CP lose part of the information in the MRI source images, whereas noise-like artifacts exist in IOI–LLF fusion images. The fusion images of PA–PCNN, PCNN–NSST and the proposed method have better contrast and edge detail information.

Figure 5 shows the results of two sets of MRI–SPECT images obtained by different fusion methods. Among them, LP–ASR and SR–NSCT lose part of MRI information, and local image distortion exists. The IOI–LLF, PC–LLE and CNN–CP methods have complete texture information, but some SPECT functional information is missing. The fusion effect of PA–PCNN, PCNN–NSST and the proposed method is better subjectively.

To evaluate the performance of each fusion method objectively, Tables 1 and 2, respectively, show the average scores of the CT–MRI and MRI–SPCET fusion results. The higher the index value, the better the fusion performance, where the highest score is indicated in bold, and the lowest score is indicated by subscript. In addition, the performance of the proposed method is compared with that of several recent NSST methods. Among them, COF–MLE–NSST method uses co-occurrence filter to measure the activity of low-frequency sub-band coefficient, PSO–NSST method uses particle swarm optimization algorithm to optimize the membership function of low-frequency sub-band fuzzy logic system, and PCNN–NSST method uses PCNN to fuse high-frequency sub-band. Compared with the other nine methods, the proposed method ranks first in $Q_{e}$, MI, and $Q^{{{\raise0.7ex\hbox{${{\text{AB}}}$} \!\mathord{\left/ {\vphantom {{{\text{AB}}} {\text{F}}}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${\text{F}}$}}}}$ for CT–MRI and MRI–SPECT images, indicating that it preserves most of the structure information in the source images and keeps the edge of the source image and structure well. At the same time, because the method based on transformation domain is accompanied by the loss of a certain amount of information, the ranking of EN and VIF is not the highest, but the ranking is still relatively high, indicating that the proposed method has good robustness. The proposed method is inferior to the PA–PCNN method in VIF, because the latter adopts a neuron perceptron similar to that of humans.

Table 1

Objective evaluation of CT–MRI fusion images

Methods	EN	$Q_{e}$	MI	$Q^{{{\raise5pt\hbox{${{\text{AB}}}$} \!\big/ \!\lower5pt\hbox{${\text{F}}$}}}}$	VIF
LP–ASR	4.1466	0.7505	2.0982	0.8246	0.4502
SR–NSCT	4.3242	0.7626	1.9384	0.8344	0.4428
IOI–LLF	5.1825	0.7259	1.7656	0.7872	0.4375
PC–LLE	5.0340	0.7384	2.1178	0.8495	0.3827
CNN–CP	4.8099	0.7631	1.6211	0.7901	0.4438
PA–PCNN	5.1479	0.7754	2.0137	0.8024	0.4679
CoF–MLE–NSST	5.0987	0.7698	2.0592	0.8499	0.4685
PSO–NSST	5.0901	0.7724	2.1105	0.8450	0.4606
PCNN–NSST	5.0424	0.7805	2.1837	0.8503	0.4688
Proposed	5.0790	0.7827	2.2035	0.8507	0.4615

Table 2

Objective evaluation of MRI–SPECT fusion images

Methods	EN	$Q_{e}$	MI	$Q^{{{\raise5pt\hbox{${{\text{AB}}}$} \!\big/ \!\lower5pt\hbox{${\text{F}}$}}}}$	VIF
LP–ASR	4.5285	0.7811	2.6251	0.4954	0.6053
SR–NSCT	4.6179	0.7930	2.7989	0.6366	0.5986
IOI–LLF	4.7326	0.7685	2.6814	0.5507	0.6311
PC–LLE	4.6901	0.7724	2.5413	0.6544	0.6257
CNN–CP	5.1703	0.8022	2.7422	0.5979	0.6365
PA–PCNN	5.0224	0.8152	2.6907	0.6618	0.6389
CoF–MLE–NSST	5.0702	0.8029	2.7147	0.6689	0.6210
PSO–NSST	5.1126	0.8133	2.7194	0.6732	0.6255
PCNN–NSST	5.0677	0.8158	2.7867	0.6767	0.6378
Proposed	4.9790	0.8190	2.8068	0.6835	0.6290

To compared the computational costs of different fusion methods, the total time of 10 groups of CT–MRI fusion images is first calculated and then divided by 10 to obtain the average running time. The calculation was repeated 10 times, and the results and standard deviations are shown in Table 3. The proposed method is inferior to the LP–ASR and PC–LLE methods but superior to the other four methods. Particularly, the performance of the proposed method is similar to that of PA–PCNN, but the computational efficiency is higher, because the iteration process of PCNN is time consuming. Here, the IOI–LLF method has the lowest calculation efficiency, because IOI takes too much time.

Table 3

Running time for different methods

Methods	LP–ASR	SR–NSCT	IOI–LLF	PC–LLE	CNN–CP	PA–PCNN	Proposed
Average	0.286	6.351	73.25	0.495	14.12	8.68	2.97
Standard deviation	0.001	0.02	0.37	0.006	0.18	0.05	0.02

The proposed algorithm’s performance was also evaluated by changing the value of parameters used in the proposed method, such as NSST decomposition level, and the directions number. These values are obtained over 20 pairs of multi-modality medical images, and the average outcomes are shown in Table 4. From Table 4, it can be analyzed that as the decomposition level and directions are increasing, the values of En and MI are also increased. The values of $Q_{e}$, $Q^{{{\raise0.7ex\hbox{${{\text{AB}}}$} \!\mathord{\left/ {\vphantom {{{\text{AB}}} {\text{F}}}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${\text{F}}$}}}}$ and VIF are optimal when Level = 3. In general, with the increase of level, the value of each metrics increases slightly.

Table 4

Fusion quality metrics analysis based on different levels and directions

Quality metrics	Data sets	Level = 1 Directions = 16	Level = 2 Directions = 16,16	Level = 3 Directions = 16,16,8	Level = 4 Directions = 16,16,8,8	PLevel = 5 Directions = 16,16,8,8,4
EN	CT–MRI	5.0790	5.0896	5.1201	5.1576	5.1589
EN	MRI–SPECT	4.9790	4.9815	4.9924	4.9935	4.9936
$Q_{e}$	CT–MRI	0.7827	0.7830	0.7845	0.7843	0.7841
$Q_{e}$	MRI–SPECT	0.8190	0.8194	0.8196	0.8122	0.8037
MI	CT–MRI	2.2035	2.2046	2.2049	2.2051	2.2049
MI	MRI–SPECT	2.8068	2.8072	2.8073	2.8073	2.8074
$Q^{{{\raise5pt\hbox{${{\text{AB}}}$} \!\big/ \!\lower5pt\hbox{${\text{F}}$}}}}$	CT–MRI	0.8507	0.8509	0.8510	0.8509	0.8508
	MRI–SPECT	0.6835	0.6836	0.6838	0.6838	0.6837
VIF	CT–MRI	0.4615	0.4618	0.4619	0.4617	0.4615
VIF	MRI–SPECT	0.6290	0.6291	0.6295	0.6297	0.6298

Conclusions

In this study, we propose a multimodal medical image fusion method based on NSST and mutual information correlation analysis. Based on NSST scale decomposition, this method uses CSR to enhance the high-frequency detail information and uses mutual information correlation to mine the detail information of low-frequency sub-band. Then, different fusion strategies are adopted for different areas of low-frequency sub-band according to correlation. To achieve this goal, two new activity level measurement methods based on the domain energy gradient and central pixel energy sum are designed. By comparing with other advanced methods and numerous experiments, the effectiveness of the proposed method is proven. However, the method still has the following limitations: first, the setting of the threshold of the low-frequency sub-band correlation analysis has a certain influence on the final fusion effect. If the threshold is set too small, then the extraction of detail information is insufficient; if the threshold is set too large, then meaningless details in the MRI image are introduced into the fused image, causing artifacts. In this study, the mutual information of the whole source image is used as the threshold value for the correlation analysis of low-frequency sub-bands; this strategy is not an optimal scheme. In addition, Table 3 shows that this method is not as fast as some fusion methods, because the local mutual information correlation is calculated by sliding window, resulting in low calculation efficiency. In the future, we will devote ourselves to the research of a more effective threshold determination scheme by combining the prior information of source images.

Acknowledgements

This project is supported by the Provincial Natural Science Foundation of Hunan, China (Grant No. 2020JJ6021), the Research Foundation of Education Bureau of Hunan Province, China (Grant No. 21A0451, No. 19C0483), Construct Program of the Key Discipline in Hunan Province: Control Science and Engineering.

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Vorheriger Artikel SPCS: a spatial pyramid convolutional shuffle module for YOLO to detect occluded object

Nächster Artikel An MPA-based optimized grey Bernoulli model for China’s petroleum consumption forecasting

Ganasala P, Kumar V (2016) Feature-motivated simplified adaptive PCNN-based medical image fusion algorithm in NSST domain. J Digit Imaging 29:73–85

Qi G, Wang J, Zhang Q, Zeng F, Zhu Z (2017) An integrated dictionary learning entropy-based medical image fusion framework. FutureInternet 9(4):61

Petrovic V, Xydeas C (2004) Gradient-based multiresolution image fusion. IEEE Trans Image Process 13:228–237MATH

Sundar K, Jahnavi M, Lakshmisaritha K (2017) Multi-sensor image fusion based on empirical wavelet transform. In: 2017 international conference on electrical, electronics, communication, computer, and optimization techniques (ICEECCOT). IEEE, pp 93–97

Liu Y, Liu S, Wang Z (2015) Multi-focus image fusion with dense SIFT. Inf Fusion 23:139–155

Xia J, Lu Y, Tan L et al (2021) intelligent fusion of infrared and visible image data based on convolutional sparse representation and improved pulse-coupled neural network. Comput Mater Continua 67(1):613–624

Yuan G, Ma S, Liu J et al (2021) Fusion of medical images based on salient features extraction by PSO optimized fuzzy logic in NSST domain. Biomed Signal Process Control 69(12):102852

Ouerghi H, Mourali O, Zagrouba E (2020) Multi-modal image fusion based on weight local features and novel sum-modified-Laplacian in non-subsampled Shearlet transform domain. In: International symposium on visual computing

Shen J, Zhao Y, Yan S, Li X (2014) Exposure fusion using boosting Laplacian pyramid. IEEE Trans Cybern 44:1579–1590

10.

Chen G et al (2019) Weighted sparse representation and gradient domain guided filter pyramid image fusion based on low-light-level dual-channel camera. IEEE Photon J 99:1

11.

Li GX, Wang K (2007) Color image fusion algorithm using the contourlet transform. Acta Electron Sin 35:112

12.

Miao QG, Cheng S, Xu PF et al (2011) A novel algorithm of image fusion using shearlets. Opt Commun 284(6):1540–1547

13.

Yin M, Liu X, Liu Y et al (2018) Medical image fusion with parameter-adaptive pulse coupled neural network in non-subsampled shearlet transform domain. IEEE Trans Instrum Meas 68(1):49–64

14.

Zhu Z, Zheng M, Qi G et al (2019) A phase congruency and local laplacian energy based multi-modality medical image fusion method in NSCT domain. IEEE Access 7:20811–20824

15.

Wang K, Zheng M, Wei H et al (2020) Multi-modality medical image fusion using convolutional neural network and contrast pyramid. Sensors 20(8):2169

16.

Shahdoosti HR, Ghassemian H (2016) Combining the spectral PCA and spatial PCA fusion methods by an optimal filter. Inf Fusion 27:150–160

17.

Liu Y, Liu S, Wang Z (2015) A general framework for image fusion based on multi-scale transform and sparse representation. Inf Fusion 24:147–164

18.

Wang K, Qi G, Zhu Z, Chai Y (2017) A novel geometric dictionary construction approach for sparse representation based image fusion. Entropy 19:306

19.

Garaigordobil A, Ansola R, Veguería E et al (2019) Overhang constraint for topology optimization of self-supported compliant mechanisms considering additive manufacturing. Comput Aided Design 109:33–48

20.

Subashini MM, Sahoo SK (2014) Pulse coupled neural networks and its applications. Expert Syst Appl 41(8):3965–3974

21.

Wang M, Shang X (2020) An improved simplified PCNN model for salient region detection. Vis Comput 10–12:1–13

22.

Easley G, Labate D, Lim W-Q (2008) Sparse directional image representations using the discrete shearlet transform. Appl Comput Harmon Anal 25(1):25–46MathSciNetMATH

23.

Kim M, Han DK, Ko H (2016) Joint patch clustering-based dictionary learning for multimodal image fusion. Inf Fusion 27:198–214

24.

Liu H, Liu Y, Sun F (2015) Robust exemplar extraction using structured sparse coding. IEEE Trans Neural Netw 26:1816–1821MathSciNet

25.

Yang J, Wright J, Huang TS, Ma Y (2010) Image super-resolution via sparse representation. IEEE Trans Image Process 19(11):2861–2873MathSciNetMATH

26.

Dong W et al (2011) Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization. IEEE Trans Image Process 20(7):1838–1857MathSciNetMATH

27.

Yang B, Li S (2010) Multifocus image fusion and restoration with sparse representation. IEEE Trans Instrum Meas 59(4):884–892

28.

Yin H, Li S, Fang L (2013) Simultaneous image fusion and super-resolution using sparserepresentation. Inf Fusion 14:229–240

29.

Wohlberg B (2015) Efficient algorithms for convolutional sparse representations. IEEE Trans Image Process 25(1):301–315MathSciNetMATH

30.

Liu Y, Chen X et al (2019) Medical image fusion via convolutional sparsity based morphological component analysis. IEEE Signal Process Lett 26(3):485–489

31.

Aharon M, Elad M, Bruckstein A (2006) K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54:4311–4322MATH

32.

Dong W et al (2013) Sparse representation based image interpolation with nonlocal autoregressive modeling. IEEE Trans Image Process 22(4):1382–1394MathSciNetMATH

33.

Wang Z, Cuia Z, Zhu Y (2020) Multi-modal medical image fusion by Laplacian pyramid and adaptive sparse representation. Comput Biol Med 123:103823

34.

Li Y, Sun Y, Huang X et al (2018) An image fusion method based on sparse representation and sum modified-Laplacian in NSCT domain. Entropy 20(7):522

35.

Jiao D, Li W, Xiao B (2017) Anatomical-functional image fusion by information of interest in local Laplacian filtering domain. IEEE Trans Image Process 12:1–1MathSciNet

36.

Diwakar M, Singh P, Shankar A (2021) Multi-modal medical image fusion framework using co-occurrence filter and local extrema in NSST domain. Biomed Signal Process Control 68(12):102788

37.

Yuan GA et al (2021) Fusion of medical images based on salient features extraction by PSO optimized fuzzy logic in NSST domain. Biomedical Signal Process Control 69:102852

38.

Wei T et al (2020) Multimodal medical image fusion algorithm in the era of big data. Neural Comput Appl 3:1–21

39.

Cvejic N, Canagarajah C, Bull D (2006) Image fusion metric based on mutual information and Tsallis entropy. Electron Lett 42:626–627

40.

Zhang X-L, Li X-F, Li J (2014) Validation and correlation analysis of metrics for evaluating performance of image fusion. Acta Autom Sin 40(2):306–315MathSciNet

41.

Qu G, Zhang D, Yan P (2002) Information measure for performance of image fusion. Electron Lett 38:313–315

42.

Petrović V (2007) Subjective tests for image fusion evaluation and objective metric validation. Inf Fusion 8:208–216

43.

Bovik HA (2006) Image information and visual quality. IEEE Trans Image Process 15(2):430MathSciNet

Titel: Multimodal medical image fusion with convolution sparse representation and mutual information correlation in NSST domain
verfasst von: Peng Guo
Guoqi Xie
Renfa Li
Hui Hu
Publikationsdatum: 30.06.2022
Verlag: Springer International Publishing
Erschienen in: Complex & Intelligent Systems / Ausgabe 1/2023
Print ISSN: 2199-4536
Elektronische ISSN: 2198-6053
DOI: https://doi.org/10.1007/s40747-022-00792-9

Springer Professional

Multimodal medical image fusion with convolution sparse representation and mutual information correlation in NSST domain

Abstract

Publisher's Note

Introduction

NSST

Convolutional sparse representation

Proposed framework

High-frequency sub-band fusion

Low-frequency sub-band fusion

Experiment and analysis

Comparison algorithms

Objective evaluation metrics

Experimental settings

Experimental results

Conclusions

Acknowledgements

Publisher's Note

Premium Partner

Springer Professional

Abstract

Publisher's Note

Introduction

Related work

NSST

Convolutional sparse representation

Proposed framework

High-frequency sub-band fusion

Low-frequency sub-band fusion

Experiment and analysis

Comparison algorithms

Objective evaluation metrics

Experimental settings

Experimental results

Conclusions

Acknowledgements

Publisher's Note

Weitere Artikel der Ausgabe 1/2023

A novel image cryptosystem using Gray code, quantum walks, and Henon map for cloud applications

A faster dynamic convergency approach for self-organizing maps

A novel feature relearning method for automatic sleep staging based on single-channel EEG

The fuzzy Weighted Influence Nonlinear Gauge System method extended with D numbers and MICMAC

CIE-LSCP: color image encryption scheme based on the lifting scheme and cross-component permutation

A survey on COVID-19 impact in the healthcare domain: worldwide market implementation, applications, security and privacy issues, challenges and future prospects

Premium Partner