Abstract

This paper presents an effective local image region description method, called CS-LMP (Center Symmetric Local Multilevel Pattern) descriptor, and its application in image matching. The CS-LMP operator has no exponential computations, so the CS-LMP descriptor can encode the differences of the local intensity values using multiply quantization levels without increasing the dimension of the descriptor. Compared with the binary/ternary pattern based descriptors, the CS-LMP descriptor has better descriptive ability and computational efficiency. Extensive image matching experimental results testified the effectiveness of the proposed CS-LMP descriptor compared with other existing state-of-the-art descriptors.

1. Introduction

Image matching is one of the fundamental research areas in the fields of computer vision and it can be used in 3D reconstruction, panoramic image stitching, image registration, robot localization, and so forth. The task of image matching is to search the corresponding points between two images that are projected by the same 3D point. Generally, the image matching has the following three steps. At first, the interest points are detected and their local support regions are determined. Then the descriptors of the feature points are constructed. Finally the corresponding points are determined through matching their descriptors. In the above three steps, the descriptor construction is the key factor that can influence the performance of the image matching [1]. In this paper, we focus on the effective local image descriptor construction method and its application in image matching.

For an ideal local image descriptor, it should have high discriminative power and be robust to many kinds of image transformations, such as illumination changes, image geometric distortion, and partial occlusion [2, 3]. Many research efforts have been made for local image descriptor construction and several comparative studies have shown that the SIFT-like descriptors perform best [4]. The SIFT (Scale Invariant Feature Transform) descriptor is built by a 3D histogram of gradient locations and orientations where the contribution to bins is weighted by the gradient magnitude and a Gaussian window overlaid over the region [5]. Its dimension is 128 and is invariant to image scale and rotation transforms and robust to affine distortions, changes in 3D viewpoint, addition of noise, and changes in illumination. Because of the good performance of SIFT descriptor, many varieties of SIFT descriptor have been proposed. For example, the PCA-SIFT descriptor is a 36-dimensional vector by applying PCA (Principal Component Analysis) on gradient maps and it can be fast for image matching [6]. The SURF (Speeded Up Robust Features) descriptor uses integral image to compute the gradient histograms and it can speed up the computations effectively while preserving the quality of SIFT [7]. Furthermore, GLOH (Gradient Location-Orientation Histogram) [4], Rank-SIFT [8], and RIFT (Rotation-Invariant Feature Transform) [9] are also proposed based on the construction method of SIFT descriptor.

The LBP (Local Binary Pattern) operator has been proved a powerful image texture feature which has been successfully used in face recognition, image retrieval, texture segmentation, and facial expression recognition [10]. It has several advantages which are suitable for local image descriptor construction, such as computational simplicity and invariance to linear illumination. But the LBP operator tends to produce a rather long histogram, especially when the number of neighboring pixels increases. So it is not suitable for image matching. The CS-LBP (Center Symmetric Local Binary Pattern) descriptor can address the dimension problem while retaining the powerful ability of texture description [11]. It combines the advantages of the SIFT descriptor and LBP operator and performs better than the SIFT descriptor in the field of image matching. To improve the description ability and the robustness to image transformation, several generalized descriptors have been proposed, such as the CS-LTP (Center Symmetric Local Ternary Pattern) descriptor [12], the IWCS-LTP (Improved Weighted Center Symmetric Local Ternary Pattern) descriptor [13], and the WOS-LTP (Weighted Orthogonal Symmetric Local Ternary Pattern) descriptor [14]. Among the above descriptors, the WOS-LTP descriptor has better performance than the SIFT descriptor and IWCS-LTP descriptor. It divides the neighboring pixels into several orthogonal groups to reduce the dimension of the histogram and an adaptive weight is used to adjust the contribution of the code in histogram calculation. For the WOS-LTP descriptor, the quantization level of the intensity values is three. The image intensity variant information of the local neighborhood has not been fully investigated. Furthermore, LDTP (Local Directional Texture Pattern) is another kind of local texture pattern, which includes both directional and intensity information [15]. Although the LDTP descriptor is consistent against noise and illumination changes, its dimension is high. To solve this problem, CLDTP (Compact Local Directional Texture Pattern) is a proposed descriptor [16], which not only reduces the dimension of LDTP descriptor but also retains the advantages of LDTP descriptor.

Vector quantization is an effective method for texture description, and it is robust to noise and illumination variation [17, 18]. The number of quantization levels is an important parameter. The lower the quantization level, the less the discriminative information the descriptor has. However if the quantization level is increased, the descriptor’s robustness to noise and illumination variety will degrade. For LBP or LTP operator based descriptor, the differences of the local intensity values are quantized in two or three levels. They have better robustness to illumination variety, but their discriminative abilities are degraded. If we increase the quantization level directly according to the encoding mode of the LBP or LTP operator, the dimension of the descriptor will increase dramatically. LQP (Local Quantized Pattern) was proposed to solve these problems. It uses large local neighborhoods and deeper quantization with domain-adaptive vector quantization. But it uses visual word quantization to separate local patterns and uses a precompiled lookup table to cache the final coding for speed. Its main constraint is the size of the lookup table, and it is not suitable to be used for image matching. So for the multiply quantization level based descriptor, the effective encoding method is the key problem to balance the relationship between the discriminative ability and the dimension of the descriptor.

In this paper, we present a novel encoding method for local image descriptor named as CS-LMP (Center Symmetric Local Multilevel Pattern) operator, which can encode the differences of the local intensity values using multiply quantization levels. The CS-LMP descriptor is constructed based on the CS-LMP operator, and it can describe the local image region more detailedly without increasing the dimension of the descriptor. To make the descriptor containing more spatial structural information, we use a SIFT-like grid to divide the interest region. Compared with binary/ternary pattern based descriptor, it not only has better discriminative ability but also has higher computational efficiency. The performance of the CS-LMP descriptor is evaluated for image matching and the experimental results demonstrate its robustness and distinctiveness.

The rest of the paper is organized as follows. In Section 2, the CS-LBP, CS-LTP, and the WOS-LTP operator and descriptor construction methods are reviewed. Section 3 gives the CS-LMP operator, the CS-LMP histogram, and the construction method of the CS-LMP descriptor. The image matching experiments are conducted and their experimental results are presented in Section 4. Some concluding remarks are listed in Section 5.

Before presenting in detail the CS-LMP operator and the CS-LMP descriptor, we give a brief review of the CS-LBP, CS-LTP, and WOS-LTP methods that form the basis for our work.

2.1. CS-LBP and CS-LTP

The CS-LBP operator is a modified version of the well-known LBP operator, which compares center symmetric pairs of pixels in the neighborhood [11]. Formally, the CS-LBP operator can be represented aswhere and correspond to the gray values of center symmetric pairs of pixels of equally spaced pixels on a circle with radius . Obviously, the CS-LBP operator can produce distinct values, resulting in -dimensional histogram. It should be noticed that the CS-LBP descriptor is obtained by the binary codes, which is computed from the differences of the intensity value between pairs of the opposite pixels in a neighborhood. For the CS-LBP descriptor, its dimension is and the quantization level of the intensity values is two.

The CS-LTP operator is powerful texture operator [12]. It uses the encoding method similar to the CS-LBP operator, and extends the quantization level of the intensity values from two to three. The encoding method of the CS-LTP operator can be formulated asFrom (2) we can see that the dimension of the CS-LTP histogram is . Compared with CS-LBP descriptor, the CS-LTP descriptor has better descriptive ability for local textural variants, but its dimension is higher and its computational amount is larger.

2.2. WOS-LTP

The WOS-LTP descriptor is constructed based on the OS-LTP (Orthogonal Symmetric Local Ternary Pattern) operator [14]. The OS-LTP operator is an improved version of the LTP operator to reduce the dimension of the histogram. It takes only orthogonal symmetric four neighboring pixels into account. At first, the neighboring pixels are divided into orthogonal groups. Then the OS-LTP code is computed separately for each group. Given neighboring pixels equally located in a circle of radius around a central pixel at , the encoding method of the OS-LTP operator can be formulated asFrom (3) we can see that there are different 4-orthogonal-symmetric neighbor operators, each of which consists of turning the four orthogonal neighbors by one position in a clockwise direction. Existing research work has shown that, compared with the LTP, CS-LTP, and ICS-LTP operator, the OS-LTP operator has better discriminative ability for describing local texture structure and could achieve better robustness against noise interference.

The WOS-LTP descriptor is built by concatenating the weighted histograms of the subregions together, which uses the OS-LTP variance of the local region as an adaptive weight to adjust its contribution to the histogram [14]. Suppose the size of the image patch is ; the WOS-LTP histogram can be computed aswhere and is the maximal value of the OS-LTP operator. Existing experimental results have shown that, compared with SIFT and IWCS-LTP descriptor, the WOS-LTP descriptor can not only better characterize the image texture but also achieve higher computational efficiency. But its quantization level of the intensity values is three, and the intensity variant information has not been fully used.

3. Center Symmetric Local Multilevel Pattern

3.1. CS-LMP Operator

Although the local binary or ternary pattern based descriptors have good performance, they are limited to very coarse quantization and increasing the size of local neighborhood increases the histogram dimensions exponentially. These shortcomings limit the local descriptors’ descriptive ability and prevent them from leveraging all the available information. To solve these problems, we proposed a novel encoding method, named CS-LMP operator. It encodes the differences of the local intensity values according to the thresholds, and a pixel has encoding values. The selection method of pixels is the same as the LBP operator. The readers can find the detailed selection steps in [10].

At first, we define the thresholds to divide the differences of the local intensity values into multiply intervals: Then the CS-LMP code of the pixel at is illustrated asFrom (6) we can see that the CS-LMP operator has no exponential computations, and its maximum value is . Furthermore, the difference of the local intensity value is quantized levels. Compared with the local binary/ternary patterns, the CS-LMP can describe the local texture more flexibly and detailedly. Figure 1 shows an example of calculating the CS-LMP operator with 8 neighboring pixels, and the CS-LMP code has 4 values. Figure 2 gives examples of four encoding methods. As shown in Figure 2, for the flat image area and the texture variance image area, the code of the CS-LBP, CS-LTP, and OS-LTP operator remains unchanged. But there exist distinct differences between the CS-LMP code of the flat image area and that of the texture variance image area. So we can conclude that our CS-LMP operator appears to have better discriminative ability for describing local image texture.

3.2. CS-LMP Histogram

For the local image region, the CS-LMP histogram can be obtained using the CS-LMP code of each pixel. For the CS-LBP, CS-LTP, and WOS-LTP operator, the final code of a pixel has one value by performing binary or ternary computation. Their corresponding histogram can be obtained by computing the number of each kind of code. Different from the above three kinds of operators, the CS-LMP code of a pixel has values; the occurrences of each kind of value should be computed. The CS-LMP histogram can be represented aswhere , is the maximal value of the CS-LMP operator. Based on (7), the CS-LMP descriptor of the local image region can be obtained by concatenating histograms together.

3.3. CS-LMP Descriptor

To construct the CS-LMP descriptor, the interest regions are firstly detected by the Hessian-Affine detector [19], which are used to compute the descriptors. Then the detected regions are normalized to the circular regions with the same size 41 × 41. As shown in Figure 3, the detected ellipse region is rotated in order that the long axis of the ellipse is aligned to the positive v-axis of the local u-v image coordinate system, and the rotated elliptical region is geometrically mapped to a canonical circular region by an affine transformation. The normalized regions are invariant to scale, rotation, and affine transformation. In the rest of this paper, the normalized regions are used for local image descriptor construction.

In order to integrate the spatial information into the descriptor, we divide the normalized region into 16 (4 × 4) subregions using the grid division method of the SIFT descriptor. For each subregion, we firstly compute the CS-LMP codes of each pixel, respectively. Then the CS-LMP histograms are obtained using (7). For a single subregion, the dimension of the CS-LMP descriptor is . Finally we connect all the histograms of different subregions together to obtain the final CS-LMP descriptor for the interest region. So the dimension of the CS-LMP descriptor is . For example, we compare the dimensions of three descriptors based on the CS-LTP method, WOS-LTP method, and the CS-LMP method, respectively, whose quantization levels are all three. Assume the number of the neighboring pixels is 12; then the variable is 1 and the dimensions of the CS-LTP, WOS-LTP, and CS-LMP descriptor are 16 × 729, 16 × 27, and 16 × 18, respectively. We can conclude that the dimension of the CS-LMP descriptor is significantly reduced.

Furthermore, two normalization steps are performed on the CS-LMP descriptor to reduce the effects of the illumination. At first, the descriptor vector is normalized to unit length to remove the linear illumination changes. Then the elements of the normalized descriptor vector are truncated by 0.2 in order to reduce the impact of the nonlinear illumination changes. Finally, the descriptor vector is renormalized to unit length and truncated by 0.2 again.

4. Experimental Results

In this paper, we use the Mikolajczyk et al. dataset [20] to evaluate the performance of the SIFT, WOS-LTP, and CS-LMP descriptor by image matching experiments. This dataset includes eight types of scene images with different illumination and geometric distortion transformations and it has the ground-truth matches through estimated homography matrix. As shown in Figure 4, we randomly select one image pair in each category from the dataset. In the image matching experiments, we firstly use the Hessian-Affine detector to obtain the interest regions. Then the interest regions are normalized to the circular regions and the gray values of the regions are transformed to lie between 0 and 1. Finally the descriptor of each interest region is constructed and the nearest neighbor distance ratio (NNDR) matching algorithm is performed to obtain the matching points. Here we select the Euclidean distance as similarity measure. The parameter settings of the SIFT descriptor and WOS-LTP descriptor are the same as the original proposed papers [5, 14].

The Recall-Precision criterion is used to evaluate the matching results, which is computed from the number of the correct matches and the number of the false matches between a pair of images. Two interest regions are matched if the distance between their descriptors is below a threshold , and a match is correct if the overlap error is smaller than 0.5. The Recall-Precision curve can be obtained by changing the distance threshold . So a perfect descriptor would give a recall equal to 1 for any precision.

4.1. Parameter Evaluation

There are four parameters in the proposed CS-LMP descriptor: the number of neighboring pixels , the radius of neighboring pixels , the thresholds , and the variable . We conducted image matching experiments to investigate the effects of different parameters on the performance of the proposed descriptor. The matching results are shown in Figure 5, and only one parameter was varied in one experiment. For simplicity, the parameters and were evaluated in pairs, such as , , , , and .

Figure 5(a) shows the results with different variable . From Figure 5(a) we can see that the performances of image matching are similar when and , and they are better than the performance when . As the dimension of the CS-LMP descriptor with is much larger than that with , the variable is fixed to 2 in the following experiments to obtain higher computational efficiency. Figures 5(b) and 5(c) show the results with different thresholds . We can see that the CS-LMP descriptor performs similarly under different thresholds, and the best performance is achieved when . Figure 5(d) shows the results with different . From the results we can observe that our proposed descriptor is not sensitive to small changes. To achieve the balance between the computation amount and matching performance, the optimal parameter setting of is selected as . Based on the above analysis, we select the following parameter settings for the following image matching experiments: , , , and .

4.2. Matching Evaluation

In this section, we compare the performance of the proposed CS-LMP descriptor with the SIFT descriptor, the LDTP descriptor, the WOS-LTP descriptor, and the CLDTP descriptor using the Recall-Precision criterion. The image matching results of the testing images are shown in Figure 6. Figures 6(a) and 6(b) show the results for blur changes. Figure 6(a) is the results for the structured scene and Figure 6(b) for the textured scene. We can see that the SIFT descriptor obtained the lowest score. The CL-LMP descriptor performs best than other descriptors for the structured scene, and the performance of the WOS-LTP and CS-LMP descriptor is similar for the textured scene. Figures 6(c) and 6(d) show the performance of descriptors for viewpoint changes. Figure 6(c) is the results for the structured scene and Figure 6(d) for the textured scene. Figures 6(e) and 6(f) show the results to evaluate the descriptors for combined image rotation and scale changes. Figure 6(g) shows the results for illumination changes. From Figure 6(c) we can see that the SIFT descriptor obtains worse results and the performances of the other four descriptors are similar. From Figures 6(d)6(g) we can see that the CS-LMP descriptor obtains the best matching score, and the CLDTP descriptor obtains the second good matching score. Figure 6(h) shows the results to evaluate the influence of JPEG compression. From Figure 6(h) we can see that the five kinds of descriptors perform better than other cases, and the performance of the CS-LMP descriptor is slightly better than the other four descriptors. Based on the above analysis, we can conclude that the CS-LMP descriptor performs better than the well-known state-of-the-art SIFT descriptor, the LDTP descriptor, the WOS-LTP descriptor, and the CLDTP descriptor.

5. Conclusions

This paper presents a novel CS-LMP descriptor and its application in image matching. The CS-LMP descriptor is constructed based on the CS-LMP operator and the CS-LMP histogram, which can describe the local image region using multiply quantization levels. The constructed CS-LMP descriptor not only contains the gradient orientation information, but also contains the spatial structural information of the local image region. Furthermore, the dimension of the CS-LMP descriptor is much lower than the binary/ternary pattern based descriptor when they use the same quantization level. Our experimental results show that the CS-LMP descriptor performs better than the SIFT descriptor, the LDTP descriptor, the WOS-LTP descriptor, and the CLDTP descriptor. So the CS-LMP descriptor is effective for local image description. In the future work, we will further improve its performance and apply it in object recognition.

Competing Interests

The authors declare that there are no competing interests regarding the publication of this paper.

Acknowledgments

This paper is supported by the National Natural Science Foundation of China (Grants no. 61375010, no. 61175059, and no. 61472031) and Beijing Higher Education Young Elite Teacher Project (Grant no. YETP0375).