Multifocus image fusion using artificial neural networks
Introduction
Optical lenses, particularly those with long focal lengths, suffer from the problem of limited depth of field. Consequently, the image obtained will not be in focus everywhere, i.e., if one object in the scene is in focus, another one will be out of focus. A possible way to alleviate this problem is by image fusion (Zhang and Blum, 1999), in which several pictures with different focus points are combined to form a single image. This fused image will then hopefully contain all relevant objects in focus (Li et al., 1995; Seales and Dutta, 1996).
The simplest image fusion method just takes the pixel-by-pixel average of the source images. This, however, often leads to undesirable side effects such as reduced contrast. In recent years, various alternatives based on multiscale transforms have been proposed. The basic idea is to perform a multiresolution decomposition on each source image, then integrate all these decompositions to produce a composite representation. The fused image is finally reconstructed by performing an inverse multiresolution transform. Examples of this approach include the Laplacian pyramid (Burt and Andelson, 1983), the gradient pyramid (Burt and Kolczynski, 1993), the ratio-of-low-pass pyramid (Toet et al., 1989) and the morphological pyramid (Matsopoulos et al., 1994). More recently, the discrete wavelet transform (DWT) (Chipman et al., 1995; Koren et al., 1995; Li et al., 1995; Yocky, 1995, Zhang and Blum, 1999) has also been used. In general, DWT is superior to the previous pyramid-based methods (Li et al., 1995). First, the wavelet representation provides directional information while pyramids do not. Second, the wavelet basis functions can be chosen to be orthogonal and so, unlike the pyramid-based methods, DWT does not carry redundant information across different resolutions. Upon fusion of the wavelet coefficients, the maximum selection rule is typically used, as large absolute wavelet coefficients often correspond to salient features in the images. Fig. 1 shows a schematic diagram for the image fusion process based on DWT.
While these methods often perform satisfactorily, their multiresolution decompositions and consequently the fusion results are shift-variant because of an underlying down-sampling process. When there is a slight camera/object movement or when there is misregistration of the source images, their performance will thus quickly deteriorate. One possible remedy is to use the shift-invariant discrete wavelet frame transform (Unser, 1995). However, the implementation is more complicated and the algorithm is also more demanding in terms of both memory and time.
In this paper, we propose a pixel level multifocus image fusion method based on the use of image blocks and artificial neural networks. The implementation is computationally simple and can be realized in real-time. Experimental results show that it outperforms the DWT-based method. The rest of this paper is organized as follows. The proposed fusion scheme will be described in Section 2. Experiments will be presented in Section 3, and the last section gives some concluding remarks.
Section snippets
Neural network based multifocus image fusion
Fig. 2 shows a schematic diagram of the proposed multifocus image fusion method. Here, we consider the processing of just two source images, though the algorithm can be extended straightforwardly to handle more than two. Moreover, the source images are assumed to have been registered.
The basic fusion algorithm will be described in Section 2.1. The input features to the neural networks will be discussed in Section 2.2. Section 2.3 contains a brief introduction to the two neural network models
Demonstration of the effectiveness of the features
In this section, we first experimentally demonstrate the effectiveness of the three features proposed in Section 2.2 (namely SF,VI and EG) in representing the clarity level of an image. An image block of size 64×64 (Fig. 4(a)) is extracted from the “Lena” image. Fig. 4(b)–(e) show the degraded versions by blurring with a Gaussian of radius 0.5, 0.8, 1.0 and 1.5, respectively. As can be seen from Table 1, when the image becomes more blurred, all the three feature values diminish accordingly.
Conclusion
In this paper, we combine the idea of image blocks and artificial neural networks for pixel level multifocus image fusion. Features indicating the clarity of an image block are extracted and fed into the neural network, which then learns to determine which source image is clearer at that particular physical location. Two neural network models, namely the PNN and RBFN, have been used. Experimental results show that this method outperforms the DWT-based approach, particularly when there is object
References (19)
- et al.
Multisensor image fusion using the wavelet transform
Graphical Models Image Processing
(1995) Adaptive Control Processes
(1961)Neural Networks for Pattern Recognition
(1995)- et al.
The Laplacian pyramid as a compact image code
IEEE Trans. Comm.
(1983) - et al.
Enhanced image capture through fusion
A computational approach to edge detection
IEEE Trans. Pattern Recognition Machine Anal.
(1986)- et al.
Wavelets and image fusion
- et al.
Image quality measures and their performance
IEEE Trans. Comm.
(1995) - et al.
Introduction to the Theory of Neural Computation
(1991)