Elsevier

Pattern Recognition

Volume 40, Issue 12, December 2007, Pages 3503-3508
Pattern Recognition

Scaling and rotation invariant analysis approach to object recognition based on Radon and Fourier–Mellin transforms

https://doi.org/10.1016/j.patcog.2007.04.020Get rights and content

Abstract

Various types of orthogonal moments have been widely used for object recognition and classification. However, these moments do not natively possess scaling invariance, essential image normalization and binarization prior to moments extraction will lead to error of resampling and requantifying. This paper describes a new scaling and rotation invariant analysis method for object recognition. In the proposed method, the Radon transform is utilized to project the image onto projection space to convert the rotation of the original image to a translation of the projection in the angle variable and the scaling of the original image to a scaling of the projection in the spatial variable together with an amplitude scaling of the projection, and then the Fourier–Mellin transform is applied to the result to convert the translation in the angle variable and the scaling in the spatial variable as well as the amplitude scaling of the projection to a phase shift and an amplitude scaling, respectively. In order to achieve a set of completely invariant descriptors, a rotation and scaling invariant function is constructed. A k-nearest neighbors’ classifier is employed to implement classification. Theoretical and experimental results show the high classification accuracy of this approach as a result of using the rotation and scaling invariant function instead of image binarization and normalization, it is also shown that this method is relatively robust in the presence of white noise.

Introduction

Description of objects invariant to geometric transformation including translation, scaling and rotation is useful in image analysis, object recognition and classification [1], [2]. The simplest rotationally invariant feature is the Fourier transform of the boundary curve, which is invariant with regard to translation and rotation [1]. A popular class of the invariant features is based on the moment techniques including orthogonal moments and nonorthogonal moments. Nonorthogonal moments such as geometric moments [3] and complex moments [5], [6], [7] are components of the projection of the image onto monomial functions, and present a low computational cost, but are highly sensitive to noise; furthermore, reconstruction is extremely difficult. The orthogonal moments including the Zernike moments (ZM) [4], [8], [9], the pseudo-Zernike moments [4], the Legendre moments [4], [10], the orthogonal Fourier–Mellin moments (OFM) [8] and the Tchebichef moments (TM) [11] are the projection of the image onto a set of orthogonal basis. They have proven less sensitive to noise and very accurate in image reconstruction, but the major drawback is the lack of native scaling invariance, image binarization and normalization should be used prior to moments’ extraction [see Fig. 1(a)] and lead to inaccuracy of object recognition and classification since the normalization of the image generates error of resampling and requantifying and the binarization of the image destroys much useful information. This paper proposes a set of scaling and rotation invariant descriptors for image recognition (RFM) [see Fig. 1(b)]. The Radon transform is utilized in the proposed method to project the image onto projection space. In the space, a rotation of the original image results in a translation in the angle variable and a scaling of the original image leads to a scaling in the spatial variable together with an amplitude scaling. Then, the Fourier–Mellin transform is applied to convert the translation in the angle variable to a phase shift and the scaling in the spatial variable together with the amplitude scaling to an amplitude scaling. Based on the result, a rotation and scaling invariant function is constructed to achieve a set of completely invariant descriptors. A k-nearest neighbors’ classifier is employed to implement classification. Theoretical and experimental results show the superiority of this approach compared with orthogonal moment-based analysis methods. The outline of this paper is as follows: In Section 2, we briefly review Radon and Fourier–Mellin transforms. The proposed approach is presented in Section 3. In Section 4, noise robustness has been proven. Experimental results are described in Section 5, and conclusions are presented in Section 6.

Section snippets

Radon transform and some of its properties

The Radon transform of a two-dimensional (2-D) function f(x,y) is defined asP(t,θ)=R(t,θ){f(x,y)}=--f(x,y)δ(t-xcosθ-ysinθ)dxdy,where t is the perpendicular distance of a straight line from the origin O [see Fig. 2], θ is the angle between the distance vector and the x-axis, i.e., θ[0,π) [12].

The Radon transform has useful properties about translation, rotation and scaling as outlined in (2), (3), (4).

Translation:R(t,θ){f(x-x0,y-y0)}=P(t-t0,θ).

Rotation by φ:R(t,θ){f(xcosφ+ysinφ,-xcosφ+ysin

The proposed approach

Let fsr(x,y) be the scaled and rotated version of an image function f(x,y) with the scale factor λ and the rotation angle φ, according to (2), (3), (4). The Radon transform of fsr(x,y) is given byPsr(t,θ)=λPtλ,θ+φ,where P(t,θ) is the Radon transform of f(x,y). Then, the Fourier–Mellin transform of Psr(t,θ) isMsr(u,κ)=002πPsr(t,θ)·tσ-iu-1·exp(-ikθ)dtdθ=002πλPtλ,θ+φ·tσ-iu-1exp(-ikθ)dtdθ.

Let τ=t/λ, β=θ+φ, we have t=λ·τ, θ=β-φ, dt=λdτ, dθ=dβ, Eq. (8) can be rewritten asMsr(u,κ)=002πλP(τ,β)·

Noise robustness of this method

Suppose the image f(x,y) is corrupted by white noise η(x,y) with zero mean and variance σ2.f^(x,y)=f(x,y)+η(x,y).

ThenR(r,θ){f^(x,y)}=R(r,θ){f(x,y)}+R(r,θ){η(x,y)}.

Since the Radon transform is line integrals of the image, for the continuous case, the Radon transform of noise is constant for all of the points and directions and is equal to the mean value of the noise, which is assumed to be zero. ThereforeR(r,θ){f^(x,y)}=R(r,θ){f(x,y)}.

This means white noise with zero mean has no effect on the

Simulation results and performance analysis

The proposed method has been implemented using Matlab. Two image sets as shown in Fig. 4 were considered in the computer implementation, the image set 1 consists of eight gray-level images of airplane with size 128×128 and the image set 2 consists of eight gray-level images of butterfly with size 128×128. The experiments were conducted to test the classification accuracy of this approach. The second objective was to verify the robustness of the proposed method. A comparison of the performance

Conclusion

We have presented an approach to scaling and rotation invariant analysis for images. Unlike conventional orthogonal moment-based analysis methods in which the image needs to be binarized and normalized, this approach extracted invariant features from the Fourier–Mellin transforms of the original image's Radon projection. Experimental results show that this approach has higher classification accuracy and noise robustness to white noise compared with orthogonal moment-based analysis methods.

About the Author—XUAN WANG was born in 1966. He received the B.S. and M.S. degrees in Electrical Engineering from Shaanxi Normal University Xi’An, China in 1983 and 1987. He is currently vice professor and head of School of Physics and Information Technology at Shaanxi Normal University. He is currently pursuing the Ph.D. degree at Xidian University. His research interests include image processing and pattern recognition.

References (15)

There are more references available in the full text version of this article.

Cited by (0)

About the Author—XUAN WANG was born in 1966. He received the B.S. and M.S. degrees in Electrical Engineering from Shaanxi Normal University Xi’An, China in 1983 and 1987. He is currently vice professor and head of School of Physics and Information Technology at Shaanxi Normal University. He is currently pursuing the Ph.D. degree at Xidian University. His research interests include image processing and pattern recognition.

About the Author—BIN XIAO was born in 1982. He received the B.S. degree in Electrical Engineering from Shaanxi Normal University Xi’An, China in 2000, He is currently pursuing the M.S. degree at Shaanxi Normal University. His research interests include image processing and pattern recognition.

About the Author—JIAN-FENG MA was born in 1961. He received the Ph.D. degree in communication and electronic systems from Xidian University. He is currently a professor and the Dean of School of Computer in Xidian University. He is also a member of IEEE. His research interests include image processing and information and network security.

About the Author—XIU-LI BI, was born in 1982. She received the B.S. degree in Electrical Engineering from Shaanxi Normal University Xi’An, China in 2000. She is currently pursuing the M.S. degree at Shaanxi Normal University. Her research interests include image processing and pattern recognition.

This work was supported by the National Natural Science Foundation of China under Grant No. 60573036 and by the National High-Tech Research and Development Plan of China under Grant No. 60633020.

View full text