On Kernel Selection of Multivariate Local Polynomial Modelling and its Application to Image Smoothing and Reconstruction

Zhang, Z. G.; Chan, S. C.

doi:10.1007/s11265-010-0495-4

On Kernel Selection of Multivariate Local Polynomial Modelling and its Application to Image Smoothing and Reconstruction

Open access
Published: 30 May 2010

Volume 64, pages 361–374, (2011)
Cite this article

Download PDF

You have full access to this open access article

Journal of Signal Processing Systems Aims and scope Submit manuscript

On Kernel Selection of Multivariate Local Polynomial Modelling and its Application to Image Smoothing and Reconstruction

Download PDF

Z. G. Zhang¹ &
S. C. Chan¹

2521 Accesses
19 Citations
Explore all metrics

Abstract

This paper studies the problem of adaptive kernel selection for multivariate local polynomial regression (LPR) and its application to smoothing and reconstruction of noisy images. In multivariate LPR, the multidimensional signals are modeled locally by a polynomial using least-squares (LS) criterion with a kernel controlled by a certain bandwidth matrix. Based on the traditional intersection confidence intervals (ICI) method, a new refined ICI (RICI) adaptive scale selector for symmetric kernel is developed to achieve a better bias-variance tradeoff. The method is further extended to steering kernel with local orientation to adapt better to local characteristics of multidimensional signals. The resulting multivariate LPR method called the steering-kernel-based LPR with refined ICI method (SK-LPR-RICI) is applied to the smoothing and reconstruction problems in noisy images. Simulation results show that the proposed SK-LPR-RICI method has a better PSNR and visual performance than conventional LPR-based methods in image processing.

Nonparametric Regression Based Image Analysis

On the comparison of different kernel functionals and neighborhood geometry for nonlocal means filtering

Article 10 January 2017

Structure Adaptive Filtering for Edge-Preserving Image Smoothing

1 Introduction

Local polynomial regression (LPR) is a flexible and efficient nonparametric regression method in statistics. LPR has been widely applied in many research areas such as data smoothing, density estimation, and nonlinear modelling [1, 2]. Given a set of noisy and possibly non-uniformly-spaced samples of a multidimensional signal, the data samples are fitted locally by a multivariate polynomial using the least-squares (LS) criterion with a kernel function $ {K_{\mathbf{H}}}\left( {\mathbf{u}} \right) = \frac{1}{{|{\mathbf{H}}|}}K\left( {{{\mathbf{H}}^{ - 1}}{\mathbf{u}}} \right) $, where $ K\left( {\mathbf{u}} \right):{\mathbb{R}^d} \to \mathbb{R} $ is a symmetric spherical multi-dimensional basis-kernel and H is a certain bandwidth matrix to determine the kernel size and other properties.

LPR has been shown to possess an excellent spatial adaptation and reasonably simple implementation. Therefore, it has received considerable attention in the statistics and signal processing communities. Since signals may vary considerably, it is very crucial to choose a proper local kernel to achieve the best bias-variance tradeoff in estimating the local polynomial coefficients for modelling non-stationary signals. Generally, the optimal local kernel should be selected to minimize the mean squared error (MSE) or its integrated and smoothed versions, which are functions of the local kernel. For example, for slow varying parts of a one-dimensional signal, we would like the kernel to have a large scale so that more accurate estimates can be obtained by averaging out the additive noise as much as possible. At fast varying parts of the signal, however, we would like to have a small kernel scale so that excessive bias errors due to the limited order of the fitting polynomial will not occur.

For multivariate LPR, the bias-variance tradeoff problem is more complicated. It is because the bandwidth H is a matrix in multidimensional signal processing. Not only is the kernel scale important, but also the kernel shape has significant effects on the estimation results. It explains why there were not many publications concerning the bandwidth matrix selection of multivariate LPR, although the adaptive bandwidth selection for univariate LPR has been studied systematically and intensively [2–10].

Conventionally, the bandwidth matrix selection for multivariate LPR is simplified by using kernels with a specific shape, which is usually symmetric and spherical. More precisely, using a symmetric and spherical basis-kernel K(u) and a bandwidth matrix H = h I with h the scale parameter and I the identity matrix, the kernel K _H(u) is simplified to $ {K_h}\left( {\mathbf{u}} \right) = \frac{1}{{{h^d}}}K\left( {{\mathbf{u}}/h} \right) $, where d is the dimension of the multivariate LPR. As a result, the selection of the scale h is much easier than the selection of a d-dimensional bandwidth matrix H. With this assumption, Ruppert et al. [10–12] studied the asymptotic bias, variance, and MSE of multivariate LPR and developed the empirical-bias bandwidths selection (EBBS) algorithm for multivariate LPR [13]. Based on Ruppert’s work, Fan and Gijbels [2], Yang and Tschernig [14] also developed similar scale selection methods for multivariate local linear regression. These methods are innovative and useful, but they were developed mainly for linear or quadratic regression and did not consider the partial derivatives of the estimates. Also, they were all based on the symmetric kernel assumption.

Unlike above methods, Masry [15–17] considered a more general kernel, which may not be symmetric and spherical. More precisely, the basis-kernel K(u) in the kernel function $ {K_h}\left( {\mathbf{u}} \right) = \frac{1}{{{h^d}}}K\left( {{\mathbf{u}}/h} \right) $ may be asymmetric and of various shapes. A comprehensive study on the bias and variance of multivariate LPR with higher-order polynomials and higher-order derivatives was carried out, and an analytical expression of the optimal scale h was given. However, the optimal scale selection method is generally not directly usable in practice because its analytical form always involves quantities which are somewhat difficult to be estimated.

Instead of computing an optimal scale in an analytical form, empirical methods choose the scale from a finite set of scale parameters in a geometric grid. Recently, Katkovnik et al. [18–20] applied an intersection confidence intervals (ICI) method to the multivariate LPR problem for selecting the optimal local scale and studied its applications in image processing. Given a set of scale parameters in ascending order, the ICI rule determines the optimal scale by comparing the confidence intervals of the estimates with different scale parameters in the scale set. However, the ICI method (and other empirical scale selectors) can only select the local scale from the finite scale set, and therefore the exact scale value cannot be calculated if it is not included in the scale set.

In this paper, we first propose a new refined ICI adaptive scale selection method (RICI) for multidimensional signal processing using the bias and variance expressions derived from multivariate LPR analysis of Masry [15–17]. The RICI method can determine a more accurate scale parameter beyond the limit of the finite scale set. The main idea of the RICI method is inspired by the work of Stanković [21], which discussed in-depth the bias-variance tradeoff problem and the ICI adaptive scale selection algorithm.

Next, we study the application of the proposed LPR with RICI adaptive scale selection method (LPR-RICI) for smoothing noisy images. Motivated by the iterative steering kernel regression method of Takeda et al. [22], the local kernel shape is iteratively determined from the local orientation of the image data. The locally adaptive steering kernel has a better adaptability to local characteristics of images than the conventional symmetric circular kernel. Coupled with the proposed RICI scale selection, a fully automatic iterative steering-kernel-based LPR with refined ICI method (SK-LPR-RICI) for smoothing multivariate data is proposed. It has a better MSE and perceptual equality than the conventional symmetric circular kernel. Since LPR can be naturally adapted to non-uniformly spaced data points, we also propose to employ the SK-LPR-RICI method for interpolating non-uniformly sampled images, which finds applications in image reconstruction and super-resolution.

The rest of the paper is organized as follows. In Section 2, the basic principle of multivariate LPR and its bias-variance tradeoff problem are revisited. Section 3 introduces the ICI scale selection algorithm and proposed the refined ICI adaptive scale selector. Section 4 is devoted to the applications of LPR to image processing problems such as smoothing and reconstruction, and a fully data-driven iterative steering-kernel-based LPR with refined ICI method will be proposed. This algorithm is also applicable to other 2-dimentional data. Experimental results and comparisons to other conventional methods using both test and real images are presented in Section 5. Finally, conclusions are drawn in Section 6.

2 Multivariate Local Polynomial Regression

In multivariate LPR, we have a set of multidimensional observations: (Y _i, X_i), i = 1,2,⋯,n, where for each i, Y _i is a scalar response variable and $ {{\mathbf{X}}_i} = {\left( {{X_{i,1}},{X_{i,2}}, \cdots, {X_{i,d}}} \right)^T} $ is a d-dimensional explanatory variable. We assume the homoscedastic model:

$$ {Y_i} = m\left( {{{\mathbf{X}}_i}} \right) + \sigma \left( {{{\mathbf{X}}_i}} \right){\varepsilon_i} $$

(1)

where m(X _i) is a smooth function specifying the conditional mean of Y _i given X _i, ε _i is an independent identically distributed (i.i.d.) additive white Gaussian noise (AWGN) with zero mean and unit variance, and σ² (X _i) is its conditional variance. The problem is to estimate m(X _i) and its partial derivatives from the noisy samples Y _i.

The smooth function m(X _i) can be approximated locally, say at a given point $ {\mathbf{x}} = {\left( {{x_1},{x_2}, \cdots, {x_d}} \right)^T} $, by a multivariate polynomial of a certain degree:

$$ \begin{array}{*{20}{c}} {m({\mathbf{X}}:{\mathbf{x}}) = m\left( {\mathbf{x}} \right) + {{\left\{ {{\nabla_m}\left( {\mathbf{x}} \right)} \right\}}^T}\left( {{\mathbf{X}} - {\mathbf{x}}} \right) + \frac{1}{2}\left( {{\mathbf{X}} - {\mathbf{x}}} \right){{\left\{ {{\mathcal{H}_m}\left( {\mathbf{x}} \right)} \right\}}^T}\left( {{\mathbf{X}} - {\mathbf{x}}} \right) + \cdots } \\{ = m\left( {\mathbf{x}} \right) + {{\left\{ {{\nabla_m}\left( {\mathbf{x}} \right)} \right\}}^T}\left( {{\mathbf{X}} - {\mathbf{x}}} \right) + \frac{1}{2}{\hbox{ve}}{{\hbox{c}}^T}\left\{ {{\mathcal{H}_m}\left( {\mathbf{x}} \right)} \right\} \cdot {\hbox{vec}}\left\{ {\left( {{\mathbf{X}} - {\mathbf{x}}} \right){{\left( {{\mathbf{X}} - {\mathbf{x}}} \right)}^T}} \right\} + \cdots } \\{ = {\beta_0} + {\mathbf{\beta }}_\nabla^T \cdot \left( {{\mathbf{X}} - {\mathbf{x}}} \right) + {\mathbf{\beta }}_\mathcal{H}^T \cdot {\hbox{vech}}\left\{ {\left( {{\mathbf{X}} - {\mathbf{x}}} \right){{\left( {{\mathbf{X}} - {\mathbf{x}}} \right)}^T}} \right\} + \cdots } \\\end{array} $$

(2)

where $ {\beta_0} = m\left( {\mathbf{x}} \right),\;{{\mathbf{\beta }}_\nabla } = {\nabla_m}\left( {\mathbf{x}} \right) = {\left[ {\frac{{\partial m\left( {\mathbf{x}} \right)}}{{\partial {x_1}}}, \cdots, \frac{{\partial m\left( {\mathbf{x}} \right)}}{{\partial {x_d}}}} \right]^T} $, $ {{\mathbf{\beta }}_\mathcal{H}} = {\hbox{vech}}\left\{ {{\mathcal{H}_m}\left( {\mathbf{x}} \right)} \right\} = \frac{1}{2}{\left[ {\frac{{{\partial^2}m\left( {\mathbf{x}} \right)}}{{\partial x_1^2}},2\frac{{{\partial^2}m\left( {\mathbf{x}} \right)}}{{\partial {x_1}\partial {x_2}}}, \cdots, \frac{{{\partial^2}m\left( {\mathbf{x}} \right)}}{{\partial x_d^2}}} \right]^T} $, $ {\mathbf{X}} = {\left( {{X_1},{X_2}, \cdots, {X_d}} \right)^T} $ is in a neighborhood of x, ∇_m is the d × 1 gradient operator, ${\mathcal{H}_{m}}$ is the d × d Hessian matrix of m(x), vec(∙) is the vectorization operator which converts a matrix into a column vector lexicographically, and vech(∙) is the half-vectorization operator which converts, for example, $ {\mathbf{A}} = \left[ {\begin{array}{*{20}{c}} {{a_1}} & {{a_2}} \\{{a_3}} & {{a_4}} \\\end{array} } \right] $ into $ {\hbox{vech}}(A) = {\left[ {{a_1},{a_3},{a_4}} \right]^T} $.

Alternatively, a degree p multivariate polynomial in (2) can be written as:

$$ m\left( {{\mathbf{X}}:{\mathbf{x}}} \right) = \sum\limits_{\mathcal{K} = 0}^p {\sum\limits_{{ }{k_1} + \ldots + {k_d} = \mathcal{K}} {{\beta_{{k_1}, \cdots, {k_d}}}\prod\limits_{j = 1}^d {{{\left( {{X_j} - {x_j}} \right)}^{{k_j}}}} } }, $$

(3)

where $ {\mathbf{\beta }} = \left\{ {{\beta_{{k_1}, \cdots, {k_d}}}:{k_1} + \cdots + {k_d} = \mathcal{K}{ }a{\hbox{nd }}\mathcal{K} = 0, \cdots, p} \right\} $ is the vector of coefficients with length $ q = \prod\limits_{j = 1}^p {\left[ {\left( {d + j} \right)/j} \right]} $.

Since ε _i is i.i.d. and Gaussian distributed, the maximum likelihood (ML) estimation of the coefficient vector β at location x can be obtained by solving the following weighted least-squares (WLS) regression problem:

$$ {\mathbf{\hat{\beta }}}\left( {{\mathbf{x}};{\mathbf{H}}} \right) = \arg \mathop {{\min }}\limits_{\mathbf{\beta }} \left\{ {J\left( {{\mathbf{x}};{\mathbf{H}}} \right)} \right\}, $$

(4)

where

$$ J\left( {{\mathbf{x}};{\mathbf{H}}} \right) = \sum\limits_{i = 1}^n {{K_{\mathbf{H}}}\left( {{{\mathbf{X}}_i} - {\mathbf{x}}} \right){{\left[ {{Y_i} - \sum\limits_{\mathcal{K} = 0}^p {\sum\limits_{{ }{k_1} + \ldots + {k_d} = \mathcal{K}} {{\beta_{{k_1}, \cdots, {k_d}}}\prod\limits_{j = 1}^d {{{\left( {{X_{i,j}} - {x_j}} \right)}^{{k_j}}}} } } } \right]}^2}}, $$

(5)

is a LS cost function and $ {K_{\mathbf{H}}}\left( {{{\mathbf{X}}_i} - {\mathbf{x}}} \right) = \frac{{\mathbf{1}}}{{{\mathbf{|H|}}}}K\left( {{{\mathbf{H}}^{ - 1}}\left( {{{\mathbf{X}}_i} - {\mathbf{x}}} \right)} \right) $ is a d-variate non-negative kernel function which emphasizes neighboring observations around x in estimating β(x). The bandwidth matrix H determines the weights of neighboring samples around x to be used in estimating β(x). For separable windows, we have

$$ {K_{\mathbf{H}}}\left( {{{\mathbf{X}}_i} - {\mathbf{x}}} \right) = \prod\limits_{j = 1}^d {{K_{{h_j}}}\left( {{X_{i,j}} - {x_j}} \right)} = \frac{1}{{{h_1}...{h_d}}}\prod\limits_{j = 1}^d {{K_{{h_j}}}\left( {\frac{1}{{{h_j}}}\left( {{X_{i,j}} - {x_j}} \right)} \right)}, $$

(6)

Further, for identical bandwidth, h ₁ = ⋯ = h _d = h. It is the most popular and simplest way to simplify the bandwidth parameter from a matrix H to a scalar h.

In [15–17], the kernel considered is of the form $ {K_h}\left( {\mathbf{u}} \right) = \frac{1}{{{h^d}}}K\left( {{\mathbf{u}}/h} \right) $, where the non-negative basis-kernel function K(u) may not be circular or symmetric, and h is the scale parameter. To facilitate the development in the following sections, we assume that the basis-kernel K(u) is known a priori. Its extension to a locally orientation adaptive basis-kernel will be discussed in Section 4.

The WLS solution to (4) is:

$$ {\mathbf{\hat{\beta }}}\left( {{\mathbf{x}};h} \right) = {\left( {{\mathbb{X}^T}\mathbb{W}\mathbb{X}} \right)^{ - 1}}{\mathbb{X}^T}\mathbb{W}\mathbb{Y}{,} $$

(7)

where $ \mathbb{X} = \left[ {\begin{array}{*{20}{c}} 1 & {{{\left( {{{\mathbf{X}}_1} - {\mathbf{x}}} \right)}^T}} & {{\hbox{vech}}\left\{ {\left( {{{\mathbf{X}}_1} - {\mathbf{x}}} \right){{\left( {{{\mathbf{X}}_1} - {\mathbf{x}}} \right)}^T}} \right\}} & \cdots \\1 & {{{\left( {{{\mathbf{X}}_2} - {\mathbf{x}}} \right)}^T}} & {{\hbox{vech}}\left\{ {\left( {{{\mathbf{X}}_2} - {\mathbf{x}}} \right){{\left( {{{\mathbf{X}}_2} - {\mathbf{x}}} \right)}^T}} \right\}} & \cdots \\\vdots & \vdots & \vdots & \vdots \\1 & {{{\left( {{{\mathbf{X}}_n} - {\mathbf{x}}} \right)}^T}} & {{\hbox{vech}}\left\{ {\left( {{{\mathbf{X}}_n} - {\mathbf{x}}} \right){{\left( {{{\mathbf{X}}_n} - {\mathbf{x}}} \right)}^T}} \right\}} & \cdots \\\end{array} } \right],\;\mathbb{Y} = {\left[ {{Y_1},{Y_2}, \cdots, {Y_n}} \right]^T} $, and $ \mathbb{W} = diag\left\{ {{K_h}\left( {{{\mathbf{X}}_1} - {\mathbf{x}}} \right), \cdots, {K_h}\left( {{{\mathbf{X}}_n} - {\mathbf{x}}} \right)} \right\} $ is the weighting matrix.

From the estimated coefficient vector $ {\mathbf{\hat{\beta }}}\left( {{\mathbf{x}};h} \right) $, we can obtain the estimated k-th order derivative of m(x) as follows

$$ {\hat{m}^{\left( {\mathbf{k}} \right)}}\left( {\mathbf{x}} \right) = \frac{{{\partial^\mathcal{K}}m\left( {\mathbf{x}} \right)}}{{\partial x_1^{{k_1}} \cdots \partial x_d^{{k_d}}}} = \prod\limits_{j = 1}^d {\left( {{k_j}!} \right)} \cdot {\hat{\beta }_{{k_1}, \cdots, {k_d}}}\left( {{\mathbf{x}};h} \right), $$

(8)

where k = (k ₁,⋯, k _d) is a vector of non-negative integers and $ \mathcal{K} = \sum\limits_{j = 1}^d {{k_j}} $. In particular, m(x) is estimated from $ {\hat{\beta }_{0, \cdots, 0}}\left( {\mathbf{x}} \right) $, that is $ \hat{m}\left( {\mathbf{x}} \right) = {\hat{m}^{\left( {0, \cdots, 0} \right)}}\left( {\mathbf{x}} \right) = {\hat{\beta }_{0, \cdots, 0}}\left( {\mathbf{x}} \right) $.

As mention earlier, a key problem in LPR is to select an appropriate scale parameter h so as to achieve the best bias-variance tradeoff in estimating m ^(k)(x). To this end, we need first of all to examine the analytical expressions for the conditional bias, variance and MSE of the polynomial coefficient estimate $ {\mathbf{\hat{\beta }}} $. From (7), it can be shown that:

$$ Bias\left( {{\mathbf{\hat{\beta }}}\left( {{\mathbf{x}};h} \right)|{\mathbf{\tilde{X}}}} \right) = E\left( {{\mathbf{\hat{\beta }}}\left( {{\mathbf{x}};h} \right)} \right) - {\mathbf{\beta }}\left( {{\mathbf{x}};h} \right) = {\left( {{\mathbb{X}^T}\mathbb{W}\mathbb{X}} \right)^{ - 1}}{\mathbb{X}^T}\mathbb{W}\left( {{\mathbf{m}} - \mathbb{X}{\mathbf{\beta }}\left( {{\mathbf{x}};h} \right)} \right), $$

(9)

$$ Var\left( {{\mathbf{\hat{\beta }}}\left( {{\mathbf{x}};h} \right)|{\mathbf{\tilde{X}}}} \right) = {\left( {{\mathbb{X}^T}\mathbb{W}\mathbb{X}} \right)^{ - 1}}{\mathbb{X}^T}{\mathbf{\Sigma }}\mathbb{X}{\left( {{\mathbb{X}^T}\mathbb{W}\mathbb{X}} \right)^{ - 1}}, $$

(10)

$$ MSE\left( {{\mathbf{\hat{\beta }}}\left( {{\mathbf{x}};h} \right)|{\mathbf{\tilde{X}}}} \right) = Bia{s^2}\left( {{\mathbf{\hat{\beta }}}\left( {{\mathbf{x}};h} \right)|{\mathbf{\tilde{X}}}} \right) + Var\left( {{\mathbf{\hat{\beta }}}\left( {{\mathbf{x}};h} \right)|{\mathbf{\tilde{X}}}} \right), $$

(11)

where $ {\mathbf{\tilde{X}}} = \left\{ {{{\mathbf{X}}_1},{{\mathbf{X}}_2}, \cdots, {{\mathbf{X}}_n}} \right\} $, $ {\mathbf{m}} = {\left[ {m\left( {{{\mathbf{X}}_1}} \right),m\left( {{{\mathbf{X}}_2}} \right), \cdots, m\left( {{{\mathbf{X}}_n}} \right)} \right]^T} $, and $ {\mathbf{\Sigma }} = diag\left\{ {{K_h}\left( {{{\mathbf{X}}_i} - {\mathbf{x}}} \right){\sigma^2}\left( {{{\mathbf{X}}_i}} \right)} \right\} $.

While the variance is not difficult to estimate from the observations and the kernel coefficients, the estimation of the bias poses a significant problem because it depends on the estimate β(x;h) to be determined. Denote $ {\mathbf{k}}! = \prod\limits_{j = 1}^d {{k_j}!} $, and let f(x) be the probability density function (p.d.f.) of x. The asymptotic bias and variance of the k-th partial derivative $ {\hat{m}^{\left( {\mathbf{k}} \right)}}\left( {{\mathbf{x}};h} \right) $ are [15–17]:

$$ Bias\left\{ {{{\hat{m}}^{\left( {\mathbf{k}} \right)}}\left( {{\mathbf{x}};h} \right)|{\mathbf{\tilde{X}}}} \right\} = \sqrt {\mathbb{B}} {h^{p + 1 - \mathcal{K}}}, $$

(12)

$$ Var\left\{ {{{\hat{m}}^{\left( {\mathbf{k}} \right)}}\left( {{\mathbf{x}};h} \right)|{\mathbf{\tilde{X}}}} \right\} = \mathbb{V}{h^{ - \left( {d + 2\mathcal{K}} \right)}}, $$

(13)

where $ \mathbb{B} = {\left( {{\mathbf{k}}!{{\left( {{{\mathbf{M}}^{ - {\mathbf{1}}}}{\mathbf{B}}{{\mathbf{m}}_{p + 1}}\left( {\mathbf{x}} \right)} \right)}_\mathcal{K}}} \right)^2} $, $ \mathbb{V} = {\left( {{\mathbf{k}}!} \right)^2}{\sigma^2}\left( {\mathbf{x}} \right){\left( {{{\mathbf{M}}^{ - {\mathbf{1}}}}{\mathbf{\Gamma }}{{\mathbf{M}}^{ - {\mathbf{1}}}}} \right)_{\mathcal{K},\mathcal{K}}}/nf\left( {\mathbf{x}} \right) $, $ {\mathbf{B}} = {\left[ {{M_{0,p + 1}},{M_{1,p + 1}}, \cdots, {M_{p,p + 1}}} \right]^T} $, $ {{\mathbf{m}}_{p + 1}}\left( {\mathbf{x}} \right) = {\hbox{vech}}\left( {{{\hat{m}}^{\left( {\mathbf{k}} \right)}}\left( {\mathbf{x}} \right)} \right) $ is the regression result obtained with a higher order p + 1, $ {\left( {{{\mathbf{M}}^{ - {\mathbf{1}}}}{\mathbf{\Gamma }}{{\mathbf{M}}^{ - {\mathbf{1}}}}} \right)_{\mathcal{K},\mathcal{K}}} $ is the ($ \mathcal{K},\mathcal{K} $) diagonal element of the constant matrix $ {{\mathbf{M}}^{ - {\mathbf{1}}}}{\mathbf{\Gamma }}{{\mathbf{M}}^{ - {\mathbf{1}}}} $, $ {\left( {{{\mathbf{M}}^{ - {\mathbf{1}}}}{\mathbf{B}}{{\mathbf{m}}_{p + 1}}\left( {\mathbf{x}} \right)} \right)_\mathcal{K}} $ is the $ \mathcal{K} $-th element of the vector $ {{\mathbf{M}}^{ - {\mathbf{1}}}}{\mathbf{B}}{{\mathbf{m}}_{p + 1}}\left( {\mathbf{x}} \right) $, and the (i, j)-th element of the matrices M and Γ are $ {M_{i,j}} = \int {{{\mathbf{u}}^{\left( {i + j} \right)}}K\left( {\mathbf{u}} \right)d{\mathbf{u}}} $ and $ {\Gamma_{i,j}} = \int {{{\mathbf{u}}^{\left( {i + j} \right)}}{K^2}\left( {\mathbf{u}} \right)d{\mathbf{u}}} $ respectively. The details can be found in [15–17]. Similar asymptotic analytical results on bias and variance of multivariate LPR were also reported by Ruppert and Ward [10], and Fan and Gijebles [2].

It can be seen from (12) and (13) that the bias term will increase with the increase of the scale parameter h since that the underlying data at x cannot be modeled well by the local polynomial with the given order. On the other hand, a larger scale parameter will reduce the variance of the estimator. Hence, there exists a locally optimal scale h ^opt(x) for estimating the k-th partial derivative m ^(k)(x), and h ^opt(x) should minimize an appropriate measure such as the MSE as follows:

$$ MSE\left\{ {{{\hat{m}}^{\left( {\mathbf{k}} \right)}}\left( {{\mathbf{x}};h} \right)|{\mathbf{\tilde{X}}}} \right\} = Bia{s^2}\left\{ {{{\hat{m}}^{\left( {\mathbf{k}} \right)}}\left( {{\mathbf{x}};h} \right)|{\mathbf{\tilde{X}}}} \right\} + Var\left\{ {{{\hat{m}}^{\left( {\mathbf{k}} \right)}}\left( {{\mathbf{x}};h} \right)|{\mathbf{\tilde{X}}}} \right\} = \mathbb{B}{h^{2(p + 1 - \mathcal{K})}} + \mathbb{V}{h^{ - \left( {d + 2\mathcal{K}} \right)}}. $$

(14)

By minimizing (14), an analytical formula for h ^opt(x) can be determined as follows:

$$ {h^{opt}}\left( {\mathbf{x}} \right) = {\left( {\frac{{\left( {2\mathcal{K} + d} \right)\mathbb{V}}}{{2\left( {p - \mathcal{K} + 1} \right)\mathbb{B}}}} \right)^{\frac{1}{{d + 2p + 1}}}}. $$

(15)

However, some of the quantities in (15) are difficult to be calculated, which makes the optimal scale difficult to be estimated accurately. Instead of computing an optimal scale in an analytical form, empirical methods choose the local scale from a finite set of scale parameters in a geometric grid. In the next section, an empirical scale selection method, called the intersection of confidence interval rule is introduced.

3 Refined ICI Scale Selection Algorithm

The intersection of confidence interval (ICI) rule is an empirical method to select the adaptive scale from a set of scales in ascending order. The ICI rule was proposed by Goldenshluger and Nemirovski in [23], and Katkovnik et al. [18–20] applied the ICI rule to univariate and bivariate LPR for selecting the locally adaptive bandwidth or scale. The ICI method can be regarded as a special implementation of the Lepski’s approach [24], which compared a series of estimates with different scales from the scale set and the optimal scale was the largest one whose estimate did not differ significantly from the estimate of a smaller scale.

Starting with a finite set of scale parameters in a geometric grid,

$$ {\mathbf{\tilde{H}}} = \left\{ {{h_j}{\mathbf{I}}|{h_j} = {h_0}{a^j},{ }j = 1, \cdots, J} \right\}, $$

(16)

where a > 1 is a step factor, h ₀ > 0 is the base scale, and J is the size of the scale set. The minimum scale in the scale set should be small enough so that sharp changes can be preserved, while the maximum scale should be large enough to remove the additive noise adequately. The maximum scale is usually chosen as n/log₁₀(n) to avoid high complexity and possible over-smoothing [2].

3.1 ICI Rule

In the ICI rule, the optimal scale is determined by comparing the confidence intervals of the estimates with different scales from $ {\mathbf{\tilde{H}}} $. The ICI rule needs only to estimate the variance, which can be easily estimated from (10), and it does not involve the computation of the bias and MSE values.

To see how the algorithm works, we first note that, for each x, the estimate $ \hat{m}\left( {{\mathbf{x}};h} \right) $ is a random variable around the true value m(x) with bias and variance as shown in (12) and (13). For simplicity, in this section we shall drop the index x and k to save notation. It has been shown in [2, 20, 21, 23] that

$$ \left\{ {\hat{m}(h) - \left[ {m + Bias\left( {\hat{m}(h)} \right)} \right]} \right\}/\sqrt {{Var\left( {\hat{m}(h)} \right)}} \sim \mathcal{N}(0,1) $$

(17)

is Gaussian distributed with zero mean and unit variance. In other words, $ \hat{m}(h) $ is Gaussian distributed with mean $ Mean\left( {\hat{m}(h)} \right) = m + Bias\left( {\hat{m}(h)} \right) $ and standard deviation $ Std\left( {\hat{m}(h)} \right) = \sqrt {{Var\left( {\hat{m}(h)} \right)}} $.

The confidence intervals of the estimate $ \hat{m}(h) $ can also be constructed from the bias and variance as

$$ \left[ {Mean\left( {\hat{m}((h)} \right) - \kappa \cdot Std\left( {\hat{m}(h)} \right),Mean\left( {\hat{m}(h)} \right) + \kappa \cdot Std\left( {\hat{m}(h)} \right)} \right]. $$

(18)

As a result, the probability that $ \hat{m}(h) $ lies in the confidence interval $ |\hat{m}(h) - Mean\left( {\hat{m}(h)} \right)| \leqslant \kappa \cdot Std\left( {\hat{m}(h)} \right) $, Pr(κ), can be computed readily. Usually, the parameter κ is selected so that Pr(κ) will be close to one and a reasonable estimation of the possible range of values of $ \hat{m}(h) $ can be achieved. For example, typical values are Pr(1.65) = 0.90, Pr(1.96) = 0.95, and Pr(2.58) = 0.99. A larger κ will lead to a wider confidence interval and thus a smaller bandwidth selected, while a smaller κ will obtain a larger ICI-selected bandwidth. Instead of choosing a fixed and appropriate value for κ, it is also possible to optimize this value by assigning a local threshold κ to each locations. In [18], Katkovnik used the generalized cross-validation (GCV) to determine the value of κ for each x. It is also based on an empirical method, where the desired value of κ is chosen from a finite set of threshold parameters $ {\mathbf{\Gamma }} = \left\{ {{\kappa_1},{\kappa_2}, \cdots, {\kappa_K}} \right\} $. Each threshold parameter $ \kappa \in {\mathbf{\Gamma }} $ is used in the ICI rule to calculate a series of adaptive estimates. The optimal κ is chosen as the one which minimizes the GCV criterion:

$$ {\kappa^{opt}}\left( {\mathbf{x}} \right) = \arg \mathop {{\min }}\limits_\kappa \sum\limits_i {{{\left( {\frac{{{Y_i} - \hat{m}\left( {{{\mathbf{X}}_i};{h_{{j^{+} }|\kappa }}\left( {\mathbf{x}} \right)} \right)}}{{1 - {K_{{h_{{j^{+} }|\kappa }}\left( {\mathbf{x}} \right)}}\left( {{{\mathbf{X}}_i} - {\mathbf{x}}} \right)}}} \right)}^2}}, $$

(19)

where $ {h_{{j^{+} }|\kappa }}\left( {\mathbf{x}} \right) $ is the optimal scale with the threshold parameter κ. This method may lead to a better result at the expense of significantly higher computational complexity. It is because we need to compute a $ \hat{m}\left( {{h_j}} \right) $ for each $ \kappa \in {\mathbf{\Gamma }} $ and each $ {h_j} \in {\mathbf{\tilde{H}}} $, not to mention the GCV computation and comparison.

The mean value of $ \hat{m}(h) $ is difficult to be estimated since it involves the true value m. The standard deviation of $ \hat{m}(h) $, $ Std\left( {\hat{m}(h)} \right) $, can be computed with the help of (10). The quantity σ ²(x) in (10) is estimated from the homoscedasticity assumption of the additive noise. The robust noise variance estimator given in [18–20] can be used:

$$ \hat{\sigma } = \left\{ {median\left( {\left| {{Y_i} - {Y_{i - 1}}} \right|} \right)} \right\}/\left( {\sqrt {2} \cdot 0.6745} \right). $$

(20)

where median(∙) denotes the sample median operation.

Next, consider a series of confidence intervals $ {D_j} = \left[ {{L_j},{U_j}} \right] $ from the estimated $ \hat{m}\left( {{h_j}} \right) $ with different values of scale h _j from the scale set $ {\mathbf{\tilde{H}}} $:

$$ {U_j} = \hat{m}\left( {{h_j}} \right) + \left( {\kappa + \Delta \kappa } \right) \cdot Std\left( {\hat{m}\left( {{h_j}} \right)} \right), $$

(21)

$$ {L_j} = \hat{m}\left( {{h_j}} \right) - \left( {\kappa + \Delta \kappa } \right) \cdot Std\left( {\hat{m}\left( {{h_j}} \right)} \right), $$

(22)

where ∆κ is a threshold parameter used to adjust the width of the confidence interval. We notice that the confidence intervals (21) and (22) are biased because the estimated value $ \hat{m}\left( {{h_j}} \right) $ is used instead of the true value m(X_i). Another constant ∆κ is introduced to reduce the influence of the estimation bias, and its effect and selection will be discussed later in the refined ICI method.

For a small value of h, we shall expect the bias will be small and the confidence interval will gradually decrease with increasing value of h while the center of the interval remains more or less fixed. When h is increased to a point where the observations cannot be satisfactorily modeled, a large bias will result and the center of the interval will change significantly with respect to the others, while its length will still be small. As a result, the confidence interval will no longer intersect those with smaller values of h.

Motivated by this observation, the ICI scale selection method computes and examines the following quantities from the confident intervals in order to detect this sudden change:

$$ \begin{array}{*{20}{c}} {{{\overline L }_j} = \max \left[ {{{\overline L }_{j - 1}},{L_j}} \right],{\hbox{for}}\;j = 1,2, \cdots, J} \hfill \\{{{\underline U }_j} = \min \left[ {{{\underline U }_{j - 1}},{U_j}} \right],\;{\hbox{for}}\;j = 1,2, \cdots, J} \hfill \\{{{\overline L }_0} = {{\underline U }_0} = 0.} \hfill \\\end{array} $$

(23)

It can be seen that $ {\overline L_j} $ is the largest upper bound of the confidence interval for scale evaluated up to h _j, while $ {\underline U_j} $ is the corresponding lower bound. The largest value of these j for which $ {\underline U_j} \geqslant {\overline L_j} $, denoted by j ⁺, gives the desirable scale h _j+, because above which the confidence intervals no longer intersect with each other. In the ICI rule, it has been shown in [21] that if the two confidence intervals D _i and D _j intersects, then:

$$ |\hat{m}\left( {{h_i}} \right) - \hat{m}\left( {{h_j}} \right)) \leqslant \left( {\kappa + \Delta \kappa } \right) \cdot \left[ {Std\left( {\hat{m}\left( {{h_i}} \right)} \right) + Std\left( {\hat{m}\left( {{h_j}} \right)} \right)} \right]. $$

(24)

3.2 Refined ICI Method

We now consider the selection of Δκ and give a refined version of the conventional ICI rule. From (21), (22), and (24), it can be seen that the threshold Δκ can significantly affect the performance of the ICI algorithm. When Δκ is large, the confidence interval D _j becomes wide, and it may lead to a large value of h _j+. This will result in over-smoothing of the observations. On the contrary, when Δκ is small, the interval D _j will become narrow, and it may result in a very small value of h _j+ so that the additive noise cannot be removed effectively.

The selection and adjustment of Δκ have been discussed in [20] and [23]. By substituting the optimal scale (15) into the asymptotic bias and variance expressions in (12) and (13), we have:

$$ \frac{{Bia{s^2}\left\{ {{{\hat{m}}^{\left( {\mathbf{k}} \right)}}\left( {{\mathbf{x}};{h^{opt}}} \right)|{\mathbf{\tilde{X}}}} \right\}}}{{Var\left\{ {{{\hat{m}}^{\left( {\mathbf{k}} \right)}}\left( {{\mathbf{x}};{h^{opt}}} \right)|{\mathbf{\tilde{X}}}} \right\}}} = \frac{{\left( {2\mathcal{K} + d} \right)}}{{2\left( {p - \mathcal{K} + 1} \right)}}. $$

(25)

To simplify notations, let

$$ {b_m}(h) = Bias\left\{ {\hat{m}(h)|{\mathbf{\tilde{X}}}} \right\}, $$

(26)

$$ \sigma_m^2(h) = Var\left\{ {\hat{m}(h)|{\mathbf{\tilde{X}}}} \right\}, $$

(27)

$$ \beta = 2\left( {p + 1 - \mathcal{K}} \right),\,{\hbox{and}}\;\nu = 2\mathcal{K} + d. $$

(28)

Then, one gets $ b_m^2(h) = \mathbb{B}{h^\beta } $, $ \sigma_m^2(h) = \mathbb{V}{h^{ - \nu }} $, and $ MSE(h) = \mathbb{B}{h^\beta } + \mathbb{V}{h^{ - \nu }} $. The optimal scale can then be written as

$$ {h^{opt}} = {\left( {\frac{\beta }{v} \cdot \frac{\mathbb{V}}{\mathbb{B}}} \right)^{\frac{1}{{\beta + \nu }}}}. $$

(29)

In [23], ∆κ is chosen as κ because the bias and standard deviation are assumed to be equal. In [20], Katkovnik showed that $ \Delta \kappa = \left( {\beta /\nu } \right)\kappa $ was an appropriate choice because it can minimize the upper bound of the estimation error. That is, ∆κ should be selected as the ratio of the squared bias and the variance at the optimal scale h ^opt.

It can be seen that the above method can only select the optimal scale h ^opt = h _j+ from the finite scale set $ {\mathbf{\tilde{H}}} $. Generally, the exact optimal scale is not included in the scale set $ {\mathbf{\tilde{H}}} $. In [21], the computation of a more exact optimal scale for the general ICI scale selection problem with given bias and variance formulas was considered. The importance of the result is that explicit knowledge of $ \mathbb{B} $ and $ \mathbb{V} $ in (12) and (13) is not required. Here, we shall employ this method to further refine the ICI adaptive scale for the LPR problem using the bias and variance formulas in (12) and (13).

By assuming that h ^opt I is close to the conventional ICI-based optimal bandwidth $ {h_{{j^{+} }}}{\mathbf{I}} \in {\mathbf{\tilde{H}}} $, we can write $ {h_{{j^{+} }}} = {a^\eta }{h^{opt}} $, where a is the step factor and η is a variable to be determined for refining the optimal scale. The entries in the scale set $ {\mathbf{\tilde{H}}} $ of (16) can thus be rewritten as

$$ {\mathbf{\tilde{H}}} = \left\{ {{h_j}{\mathbf{I}}|{h_j} = {h_{{j^{+} }}}{a^{\left( {j - {j^{+} }} \right)}} = {h^{opt}}{a^{\left( {j - {j^{+} } + \eta } \right)}},{ }j = 1, \cdots, J} \right\}. $$

(30)

Substituting (30) into (12) and (13), the bias and standard deviation can be written as:

$$ {b_m}\left( {{h_j}} \right) = \sqrt {{\mathbb{B}{h_j}^\beta }} = = \sqrt {{\nu /\beta }} {\sigma_m}\left( {{h^{opt}}} \right){a^{\frac{{\beta \left( {j - {j^{+} } + \eta } \right)}}{2}}}, $$

(31)

$$ {\sigma_m}\left( {{h_j}} \right) = \sqrt {{\mathbb{V}/{h_j}^\nu }} = = {\sigma_m}\left( {{h^{opt}}} \right){a^{\frac{{ - \nu \left( {j - {j^{+} } + \eta } \right)}}{2}}}. $$

(32)

In the ICI rule, the optimal h _j+ is the largest scale where the confidence intervals $ {D_{{j^{+} } - 1}} $ and $ {D_{{j^{+} }}} $ intersect, but not $ {D_{{j^{+} }}} $ and $ {D_{{j^{+} } + 1}} $. Given an appropriate value of κ such that Pr(κ) ≈ 1. From the p.d.f. of (17), we have high confidence that $ \hat{m}\left( {{h_j}} \right) $ will lie in the confidence interval:

$$ m + {b_m}\left( {{h_j}} \right) - \kappa {\sigma_m}\left( {{h_j}} \right) \leqslant \hat{m}\left( {{h_j}} \right) \leqslant m + {b_m}\left( {{h_j}} \right) - \kappa {\sigma_m}\left( {{h_j}} \right). $$

(33)

Making use of (33) and substituting (16) to (24), one gets, after slight manipulation, the condition for $ {D_{{j^{+} } - 1}} $ and $ {D_{{j^{+} }}} $ to intersect as:

$$ {b_m}\left( {{h_{{j^{+} } - 1}}} \right) + \Delta \kappa {\sigma_m}\left( {{h_{{j^{+} } - 1}}} \right) \geqslant {b_m}\left( {{h_{{j^{+} }}}} \right) + \Delta \kappa {\sigma_m}\left( {{h_{{j^{+} }}}} \right), $$

(34)

and the condition for $ {D_{{j^{+} }}} $ and $ {D_{{j^{+} } + 1}} $ not to intersect as:

$$ {b_m}\left( {{h_{{j^{+} }}}} \right) + \left( {2\kappa + \Delta \kappa } \right){\sigma_m}\left( {{h_{{j^{+} }}}} \right) < {b_m}\left( {{h_{{j^{+} } + 1}}} \right) - \left( {2\kappa + \Delta \kappa } \right){\sigma_m}\left( {{h_{{j^{+} } + 1}}} \right). $$

(35)

Substituting the estimated bias (31) and variance (32) into the inequalities (34) and (35), the threshold parameter ∆κ and the refining parameter η in the expressions can be determined as follows:

$$ \Delta \kappa = \frac{{2\kappa }}{{{a^{\left( {\beta + \nu } \right)/2}} - 1}}, $$

(36)

$$ \eta = \frac{2}{{\beta + \nu }}{\log_a}\left( {\Delta \kappa \sqrt {{\frac{\beta }{\nu }}} \frac{{1 + {a^{\nu /2}}}}{{1 - {a^{ - \beta /2}}}}} \right). $$

(37)

Thus, given an appropriate value of the threshold parameter κ, say 1.96, in LPR problem, ∆κ and η can be obtained from (36) and (37). Hence, a refined adaptive scale is $ {h^{opt}} = {h_{{j^{+} }}}{a^{ - \eta }} $.

Since the scale set is constructed in a geometric order, we still need to add a logarithmic shift ∆η to η for compensating for the difference between the arithmetic mean and the geometric mean. This shift value can be approximated as [21]:

$$ \Delta \eta = \frac{{2\left[ {{{\log }_a}\left( {\left( {1 + {a^{\left( {\beta + \nu } \right)/2}}} \right)/2} \right)} \right]}}{{\beta + \nu }} - \frac{1}{2}. $$

(38)

Finally, the desired optimal scale is determined as:

$$ {h^{opt}} = {h_{{j^{+} }}}{a^{ - \left( {\eta + \Delta \eta } \right)}}. $$

(39)

We now summarize the refined ICI (RICI) adaptive scale selection method as follows:

1)
Set up a finite scale set $ {\mathbf{\tilde{H}}} $ as in (16);
2)
Compute $ \hat{m}\left( {{h_j}} \right) $ for $ {h_j}{\mathbf{I}} \in {\mathbf{\tilde{H}}} $;
3)
Select the threshold κ and compute ∆κ from (36);
4)
Find $ {h_{{j^{+} }}} $ from the set $ {\mathbf{\tilde{H}}} $ with the ICI rule;
5)
Compute the refined parameters η and ∆η from (37) and (38);
6)
Calculate the refined optimal scale $ {h^{opt}} = {h_{{j^{+} }}}{a^{ - \left( {\eta + \Delta \eta } \right)}} $.

Note that in step 3), the threshold parameter κ can be chosen as some reasonable values such as 1.96. If κ is selected by (19), the result may be better but the complexity is higher.

4 Data-Driven LPR for Image Processing

Recently, bivariate LPR has received considerable attention in image processing field. Katkovnik et al. [18–20] successfully applied LPR with ICI bandwidth selection to de-noising, de-blurring, and edge detection of images. Takeda et al. developed a steering kernel regression method for image de-noising, interpolation, and fusion [22]. In this section, we shall extend the adaptive scale selection method developed in Section 3 to kernel that can adapt to the local orientation of an image. This steering-kernel-based LPR-RICI (SK-LPR-RICI) will then be applied to smoothing and interpolation of noisy images.

An N ₁ × N ₂ gray-scale image can be viewed as the observations of a 2-dimensional (2D) function where the samples are taken along the 2D explanatory variable $ {\mathbf{x}} = \left( {{x_1},{x_2}} \right) $ at $ {x_1} = 1,2, \cdots, {N_1} $ and $ {x_2} = 1,2, \cdots, {N_2} $ with $ Y\left( {{x_1},{x_2}} \right) $ the intensity of the image pixel. In principle, the locations of the samples can be non-uniformly spaced. This makes LPR very useful in general interpolation and smoothing problems. The smoothed result is obtained from the estimated $ \hat{m}\left( {{x_1},{x_2}} \right) = {\beta_{0,0}} $ by applying LPR with a suitably chosen 2D kernel $ {K_{\mathbf{H}}}\left( \cdot \right) $ to the original image. The quantities $ {\beta_{1,0}} = \partial m\left( {{x_1},{x_2}} \right)/\partial {x_1} $ and $ {\beta_{0,1}} = \partial m\left( {{x_1},{x_2}} \right)/\partial {x_2} $ represent respectively the first-order partial derivatives of an image along the x ₁ and x ₂ coordinates. When x is evaluated at a series of 2D grid points, we obtain a smoothed surface representation of the data from the noisy image Y. Therefore, the smoothing and interpolation of digital images can be performed through LPR.

4.1 Kernel Selection

The Gaussian-type kernel is employed in our analysis work and experiments on image processing because it can be transformed to produce a steering kernel. Given a bandwidth matrix H, the Gaussian kernel is calculated as:

$$ {{\mathbf{K}}_{\mathbf{H}}}\left( {\mathbf{u}} \right) = \frac{1}{{2\pi |{\mathbf{H}}|}}\exp \left\{ { - {{\mathbf{u}}^T}{{\left( {{{\mathbf{H}}^{ - 1}}} \right)}^T}{{\mathbf{H}}^{ - 1}}{\mathbf{u}}/2} \right\} $$

(40)

Since the Gaussian kernel is not of compact support, it should be truncated to a sufficient kernel size N _k × N _k to reduce the arithmetic complexity, which will be discussed later in this section.

When the bandwidth matrix is chosen as H = h I, a symmetric Gaussian kernel results, which is used in most LPR methods for image processing because of its simplicity. For the symmetric basis-kernel, the locally adaptive scale parameter h at a point x will control the amount of smoothing to be performed at that point. When h is small, image details such as edges and textures will be preserved. However, possible additive noise may not be removed effectively. On the contrary, a large-scale kernel has better de-noising properties at the expense of possibly blurring of the image details. Therefore, a locally adaptive scale selection method is very crucial to LPR in image processing. The ICI rule with a symmetric circular kernel has been shown to be an effective tool for adaptive scale selection [18–20].

Recently, Takeda et al. [22] proposed a steering kernel, which is obtained by elongating and rotating a symmetric Gaussian kernel so that it can adapt to the dominant orientation locally. In [22], the orientation and scale of the steering kernel was determined using principal components analysis (PCA). The resultant iterative steering kernel regression (ISKR) method was shown to have a better performance than the conventional symmetric kernel, especially along image edges. In the ISKR method, the local scaling parameter was obtained as $ {h_i} = {h_0}{\gamma_i} $, where h ₀ and γ _i are respectively the global smoothing parameter and the local scaling parameter. The local scaling factor γ _i was estimated using the local energy distribution of the image, while h ₀ was chosen as to minimize the root mean squared error (RMSE) value for known test images or to yield most visually appealing results for real images. Consequently, the scale selection process is not fully automatic and the criterion for selecting the global smoothing parameter for real images is somewhat subjective.

To address this problem, we propose a fully data-driven adaptive scale selection method using the RICI rule proposed in Section 3. Specifically, we will extend the RICI rule for symmetric kernel to the steering kernel and obtain a new steering-kernel-based LPR with refined ICI bandwidth selection (SK-LPR-RICI). To this end, we propose to construct the local steering kernel $ {K_h}\left( {\mathbf{u}} \right) = \frac{1}{{{h^2}}}K\left( {{\mathbf{u}}/h} \right) $ at each sample x of an image by a steering basis-kernel K(u) and a scale parameter h. The shape of the local basis-kernel K(u) will be determined first based on the local dominant orientation estimation around x which can be achieved by the PCA method in [22]. The details of the PCA method is omitted to save place, and please be referred to [22] for details. Then, the RICI rule is employed to choose the optimal scale parameter h for the steering basis-kernel K(u).

4.2 Steering-Kernel-Based LPR with RICI Method

Once the locally adaptive steering basis-kernel is determined, the RICI adaptive scale selection method can be used with slight modification to determine the locally adaptive scale parameter h. More precisely, with different scale parameter h _j from the scale set $ {\mathbf{\tilde{H}}} $, we use the steering kernel $ {K_{{h_j}}}\left( {{\mathbf{X}} - {\mathbf{x}};{{\mathbf{H}}_{S,}}} \right) = \frac{1}{{h_j^2}}K\left( {\left( {{\mathbf{X}} - {\mathbf{x}}} \right)/{h_j};{{\mathbf{H}}_S}} \right) $ with a bandwidth matrix h _j H _S to fit a p-order polynomial at x and obtain a series of smoothed estimates of $ \hat{m}\left( {{\mathbf{x}};{h_j}} \right) $. The traditional ICI rule is then invoked to select a preliminary scale parameter $ {h_{{j^{+} }}} $. For d = 2 and k = (0,0), the bias and variance of the estimator are

$$ Bias\left\{ {\hat{m}\left( {{\mathbf{x}};h} \right)|{\mathbf{\tilde{X}}}} \right\} = \sqrt {\mathbb{B}} {h^{p + 1}}, $$

(41)

$$ Var\left\{ {\hat{m}\left( {{\mathbf{x}};h} \right)|{\mathbf{\tilde{X}}}} \right\} = \mathbb{V}{h^{ - 2}}. $$

(42)

The refined scale h ^opt can then be determined by the RICI method. A LPR using this refined scale parameter is then performed on the image to yield the final estimate $ \hat{m}\left( {{\mathbf{x}};{h^{opt}}} \right) = {\hat{\beta }_{0,0}}\left( {{\mathbf{x}};{h^{opt}}} \right) $.

Inspired by the ISKR method, the local steering matrix estimation and LPR with RICI rule can be executed iteratively to achieve better de-noising results, if necessary. The proposed iterative SK-LPR-RICI method can be summarized as follows:

Step 1
Initial gradient estimation

With an initial bandwidth matrix $ {{\mathbf{H}}^{(0)}} = {h^{(0)}}{\mathbf{I}} $, calculate the initial values of $ {\hat{m}^{(0)}}\left( {{\mathbf{x}};{{\mathbf{H}}^{(0)}}} \right) $ and the gradient estimation $ \hat{\beta }_{0,1}^{(0)}\left( {{\mathbf{x}};{{\mathbf{H}}^{(0)}}} \right) $ and $ \hat{\beta }_{1,0}^{(0)}\left( {{\mathbf{x}};{{\mathbf{H}}^{(0)}}} \right) $.
Step 2
Steering kernel estimation (l-th iteration)

The first-order gradient estimates $ \hat{\beta }_{0,1}^{\left( {l - 1} \right)}\left( {{\mathbf{x}};{{\mathbf{H}}^{\left( {l - 1} \right)}}} \right) $ and $ \hat{\beta }_{1,0}^{\left( {l - 1} \right)}\left( {{\mathbf{x}};{{\mathbf{H}}^{\left( {l - 1} \right)}}} \right) $ are used to calculate the steering bandwidth matrix $ {\mathbf{H}}_{S,{\mathbf{x}}}^{(l)} $ and the steering basis-kernel $ K_{\mathbf{x}}^{(l)}\left( {{\mathbf{u}};{{\mathbf{H}}_{S,{\mathbf{x}}}}} \right) $.
Step 3
Local scale parameter estimation (l-th iteration)

Obtain a series of LPR results using kernels with the bandwidth matrix $ h \cdot {\mathbf{H}}_{S,{\mathbf{x}}}^{(l)} $ with h belonging to $ {\mathbf{\tilde{H}}} $. Determine the scale parameter $ {h^{opt,(l)}} $ using the RICI rule.
Step 4
LPR with estimated steering kernel (l-th iteration)

A LPR is performed with the estimated bandwidth matrix $ {{\mathbf{H}}^{(l)}} = {h^{opt,(l)}}{\mathbf{H}}_{S,{\mathbf{x}}}^{(l)} $ to obtain $ {\hat{m}^{(l)}}\left( {{\mathbf{x}};{{\mathbf{H}}^{(l)}}} \right) $, along with its partial derivatives $ \hat{\beta }_{1,0}^{(l)}\left( {{\mathbf{x}};{h^{opt}}} \right) $ and $ \hat{\beta }_{0,1}^{(l)}\left( {{\mathbf{x}};{h^{opt}}} \right) $.

Generally, more iterations will reduce the estimation variance, but the bias will increase, As a result, when the additive noise is small, the number of iterations should also be small to avoid over-smoothing for the image. When the image is contaminated by large amount of noise, more iterations can improve the de-noising results. Similarly, given prior knowledge of additive noise, the initial scale parameter h ⁽⁰⁾ should be selected as a large value when noise component is significantly large, while a small value of h ⁽⁰⁾ is more appropriate when the additive noise is small.

The computational complexity of the SK-LPR-RICI method is mainly determined by the polynomial order p and the kernel size N _K. For each data sample, the complexities of computing $ {\mathbb{X}^T}\mathbb{W}\mathbb{X} $ and $ {\mathbb{X}^T}\mathbb{W}\mathbb{Y} $ in (7) are $ O\left( {{q^2}N_K^2} \right) $ and $ O\left( {qN_K^2} \right) $ respectively, and the matrix inversion needs a complexity of $ O\left( {{q^3}} \right) $, where $ q = \prod\limits_{j = 1}^p {\left[ {\left( {d + j} \right)/j} \right]} $ is the length of the coefficient β. If q ≪ N _K, the complexity of the WLS solution is $ O\left( {N_K^2} \right) $. As a result, a large p, which means a large q, will lead to a very heavy computational complexity. Hence, a linear (p = 1) regression is generally used, and the kernel size N _K is given a small value of, say 9 or 11. Since the SK-LPR-RICI method involves the SVD, which has a complexity of $ O\left( {N_K^3} \right) $, it has a higher arithmetic complexity than the symmetric-kernel-based LPR. As for the ISKR method, if there is no a prior knowledge about the global smoothing parameter h ₀, each h ₀ from $ {\mathbf{\tilde{H}}} $ has to be tested and more importantly the quality has to be judged manually. Hence, the complexity of ISKR and the proposed iterative SK-LPR-RICI method are comparable.

4.3 LPR for Image Smoothing and Reconstruction

Since LPR is applicable to uniformly as well as non-uniformly sampled observations, it is very useful in smoothing and interpolation of generally sampled data set. One useful application is the reconstruction of a high-resolution (HR) image from multiple shifted low-resolution (LR) images in super-resolution imaging [25–28]. Since the LR images may be shifted arbitrarily from each other, the image samples are non-uniformly sampled. Using LPR, we can construct a HR image from the non-uniform spaced image samples. More information about super-resolution and image registration can be found in [25–30]. Here, we only focus on the application of the SK-LPR-RICI method to the HR imaging from LR images without blurring. The image samples may be corrupted by noise due to imprecise registration.

As shown before, LPR is a practical de-noising and interpolation technique for image processing. Therefore, it can be used to interpolate noisy uniformly spaced images and remove possible artifacts due to data compression. This will be illustrated further by examples in Section 5. However, it is worth noting that the proposed LPR-based methods are only capable of handling the artifacts that can be modeled as an additive Gaussian white noise. It will be interesting to improve the LPR-based methods to deal with other unwanted effects like illumination or blurring, but it is outside the scope the paper and we will pursue such direction in further study.

5 Experimental Results

In this section, we will evaluate the proposed RICI method for image smoothing, coding artifacts removal, and super-resolution imaging. Peak signal-to-noise ration (PSNR), $ PSNR = 10 \cdot {\log_{10}}\left( {{{255}^2}/MSE} \right) $, is used here as the performance criterion for evaluation. The PSNR values shown in the tables are averages of 100 independent Monte-Carlo realizations, and one realization is illustrated in figures.

5.1 Image De-Noising

We now apply the proposed SK-LPR-RICI method to suppress the additive Gaussian white noise in gray-scale images. Two 256 × 256 8-bit images, Test and Lena are used in the simulation, and the noise-free original images are shown in Figs. 1a and 2a. The image Test contains distinct edges and corners, and the performances around image edges are easier to be visualized and compared. Lena also contains a good mixture of details, flat area, and texture, which makes it a commonly used test image. Gaussian kernels are used here, and the scale set of (16) is chosen as $ {\mathbf{\tilde{H}}} = \left\{ {{h_j}:|{h_j} = 1/4,1/2,1,2,4} \right\} $. The kernel size parameter N _K is set to 11.

Four different adaptive scale selection methods are evaluated:

1)
LPR-ICI: it uses the traditional ICI adaptive scale selector introduced by Katkovnik in [18], for the symmetric circular kernel.
2)
LPR-RICI: it uses the refined ICI adaptive scale selector method proposed in Section 3, and a symmetric circular kernel.
3)
Iterative SK-LPR-RICI: it uses the iterative steering-kernel-based LPR with the refined ICI adaptive scale selector, as proposed in Section 4.
4)
ISKR: it uses the iterative steering kernel regression method introduced by Takeda et al. [22].

For the ISKR method, the global smoothing parameter h ₀ is chosen from scale parameters in the set $ {\mathbf{\tilde{H}}} $. The number of iterations in the SK-LPR-RICI and ISKR method is set to 3. We shall estimate the smooth function $ \hat{m}\left( {{{\mathbf{X}}_i}} \right) $ using linear regression with p = 1. As a result, we have $ \beta = 2\left( {p + 1 - \mathcal{K}} \right) = 4 $ and $ \nu = 2\mathcal{K} + d = 2 $ in the bias and variance expressions (31) and (32). More precisely, the estimate bias is proportional to the squared scale while the variance is inversely proportional to the squared scale. The threshold parameter κ used for the ICI-based methods is selected as 1.96 so that the probability that $ \hat{m}\left( {{{\mathbf{X}}_i}} \right) $ lies in the confidence interval is 95%. With these parameter and Eqs. 36–39, we can calculate the parameters for the RICI method: ∆κ = 0.56, η = 0.55, ∆η = 0.22, and $ {h^{opt}} = 0.59{h_{{j^{+} }}} $.

The corresponding smoothing results are shown in Figs. 1 and 2 where the additive noise is Gaussian distributed with zero mean and variance σ² = 25. To evaluate the best possible performance of the ISKR, we assume that the “true image” is available so that the best global smoothing scale h ₀ can be found by a one-dimensional search. We found that the MSE curve as a function of h ₀ is convex near the global minimum, and, therefore, a bisection search is employed to seek for the best value of h ₀ from the bandwidth set $ {\mathbf{\tilde{H}}} $.

The smoothing results of ISKR method in Figs. 1c and 2c are obtained with the optimal global smoothing parameter h ₀ from $ {\mathbf{\tilde{H}}} $ and the optimal number of iteration l. For the Test and Lena images, the optimal global smoothing parameters are h ₀ = 1.27 and h ₀ = 0.65, respectively. The number of iteration for images Test and Lena are both l = 1 in the ISKR method. The results of the iterative SK-LPR-RICI method illustrated in Figs. 1f and 2f, are obtained with the same number of iteration. It also gives the best PSNR value.

We can see from Figs. 1c-f and 2c–f that all the four adaptive scale selection methods have satisfactory results: the noise in flat area is suppressed and the edges are preserved quite well. By comparing the PSNR values in Table 1 (a), we find that the SK-LPR-RICI method has the highest PSNR. Some specific areas are enlarged in Figs. 1j-l and 2j–l for better visualization. It shows that the SK-LPR-RICI method also have a better visual performance.

Table 1 PSNR comparisons of variable basis-kernel and scale selection methods for image processing (σ ² = 25).

Full size table

Next, we evaluate the performances of various scale selection methods under different noise variances. The PSNR values are summarized in Table 2. It can be seen from Table 2 that:

1.
Generally, the PSNR values of the four adaptive scale selection methods in ascending order are: LPR-ICI, LPR-RICI, ISKR, SK-LPR-RICI. Note that the best performance in the ISKR method cannot in general be realized because h ₀ is chosen to minimize the MSE, which is unavailable in practice.
2.
The best global smoothing parameter h ₀ for the ISKR method varies with the noise variance. For small noise level, h ₀ is relatively small, while for large noise level, h ₀ increases accordingly.
3.
More iterations generally cause more smoothing of the images. Therefore, the number of iteration should increase with the noise level of the images, if it is known in prior. More iterations should only be applied to rather noisy images in order to avoid over-smoothing.

Table 2 PSNR comparisons of variable basis-kernel and scale selection methods for image de-noising (against different noise variance).

Full size table

Finally, the computational complexities of these methods under test are compared. The detailed analysis of one LPR operation in a pixel can be found in Section 4.2. The LPR-ICI method has the lowest complexity because it does not need to refine the ICI-based bandwidths and to calculate the steering kernels. The SK-LPR-RICI and ISKR methods have higher complexities than LPR-ICI and LPR-RICI methods because the parameters to construct the steering kernels have to be estimated. The SK-LPR-RICI method has a higher complexity than ISKR because it includes additional ICI bandwidth selection and refinement. In summary, the computational complexities of the four methods in ascending order are: LPR-ICI, LPR-RICI, ISKR, SK-LPR-RICI.

5.2 Coding Artifacts Removal

LPR can also be used to suppress the quantization noise in image and video compression. This is because the artifacts or noise produced during the compression and quantization can be roughly modeled as Gaussian distributed. This is supported by observations in [31], which reported that the JPEG compression noise is Gaussian distributed with zero mean and a covariance matrix determined by the quantization table and DCT transform matrix.

To evaluate the effectiveness of the LPR algorithms, we compressed the image Lena by JPEG with a compression ratio 50%, as shown in Fig. 3b. Only one iteration is employed in the ISKR and iterative SK-LPR-RICI methods because the quantization noise is not very severe. Other experimental settings are the same as the previous de-noising experiment. The compression artifacts mainly occur near sharp edges and have block-like shapes, which are illustrated in Fig. 3. It can be seen that the blocking artifacts are suppressed satisfactorily by the LPR-based methods. The proposed SK-LPR-RICI method can achieve the best results among these LPR-based methods, and their PSNR values are summarized in Table 1 (b).

5.3 Super-Resolution Imaging

In this section, the applications of the proposed SK-LPR-RICI method to super-resolution of test gray-scale images and real color photos are presented.

1)
Gray-scale images: A set of four LR (128 × 128) images are generated from the original HR (256 × 256) Lena by decimating it by a factor of 2 both horizontally and vertically and then with a shift of zero pixels, one pixel to right, one pixel to bottom, and one pixel to right and bottom, respectively. Gaussian noise of zero mean and variance σ² = 25 is added to the LR images. The scale set $ {\mathbf{\tilde{H}}} $, the threshold parameter κ, and the kernel size N _K used in this experiment follow their values in previous de-noising experiment.

Experiments on test image Lena are carried to evaluate the performances of the SK-LPR-RICI method with other traditional reconstruction methods based on bicubic spline interpolation, the iterative back-projection (IBP) method [28, 29], and ISKR. The IBP method minimizes the LS error between the HR image to be estimated and a set of known LR images using LS estimation method with regularization. In LPR, we model the each pixel in the true image as a local polynomial and solve a series of local LS estimation problem using the samples from the known LR images, instead of the whole image. As a result, compared with the IBP method, the LPR-based reconstruction methods can better adapt to local orientations in the image and achieve a better performance at the expense of a much higher complexity.

As shown in the results of Fig. 4, the IBP method in (c) has a better result than the bicubic interpolation method in (b). But, the edges are blurred as compared with the LPR-based methods in (d)–(f). The main difficulty of the LS regularization method is to choose a global regularization parameter to suppress the additive noise wile preserving image edges. The PSNR values of Fig. 4b–f are listed in Table 1 (c). The proposed SK-LPR-RICI method has a higher PSNR value than other methods tested.

2).
Color images: We now consider the reconstruction of real color photos. In color images, each pixel consists of the red (R), green (G), and blue (B) components in the RGB color space. Therefore, it is possible to extend the gray-scale super-resolution methods to color images by treating each channel in the color space as an individual image and employing the gray-scale super-resolution methods to each channel independently. However, as pointed out in [27], this method will produce undesirable color artifacts because it ignores the correlation between different channels. In this paper, we first convert the image to the HSI (Hue, Saturation, and Intensity) color space. The adaptive scale selection methods will be applied to the intensity channel to obtain the local basis-kernel and scale parameters. A LPR using the resulting basis-kernel and scale parameters is then applied to the R, G, and B channels of the image, respectively. An iterative maximum a posteriori (MAP) color super-resolution method proposed in [26, 27] is also included in the experiment for comparison. The MAP method also solves a global LS problem with regularization to minimize the estimation error between the HR and LR images.

Six LR (456 × 289) color photos were captured by a commercial digital camera (Canon IXUS 600). The SK-LPR-RICI super-resolution method is applied to increase the spatial resolution by a factor of 3 to yield a 1368 × 867 HR image. A frequency-domain based motion estimation and image registration technique developed in [30] is employed to determine the relative shifts of these LR images.

For comparison, we display the LR photo in the same size as the HR image by resizing each pixel to a 3 × 3 block. Some areas of the LR and resultant HR images are enlarged so that these details can be seen clearly. From visual inspection of the results in Fig. 5, it can be observed that the SK-LPR-RICI method is able to preserve image edges while remove the image noise effectively.

Overall, based on the experimental results in this section, it can be concluded that the proposed SK-LPR-RICI method is a useful tool in smoothing and reconstruction of noisy images. It can preserve local characteristics well and has a systematic algorithm to automatically calculate the parameters involved in the method.

6 Conclusion

A study on the multivariate LPR for analysis of multidimensional signals was presented. A new refined ICI adaptive bandwidth selection method was first developed for the multivariate LPR and it was shown to have a better bias-variance tradeoff than the traditional ICI rule. This method is further extended to the case of a steering kernel with local orientation for better adaptation to local characteristics of multidimensional signals. The data-adaptive scale selection method was then applied to smoothing and reconstruction of noisy images. Simulation results on test and true images showed that the proposed SK-LPR-RICI method had a better PSNR and visual performance than the conventional LPR-based methods in image processing.

References

Wand, M. P., & Jones, M. C. (1995). Kernel smoothing. London: Chapman and Hall.
MATH Google Scholar
Fan, J., & Gijbels, I. (1996). Local polynomial modelling and its applications. London: Chapman and Hall.
MATH Google Scholar
Fan, J., & Gijbels, I. (1995). Data-driven bandwidth selection in local polynomial fitting: variable bandwidth and spatial adaptation. Journal of the Royal Statistical Society: Series B, Statistical Methodology, 57(2), 371–394.
MathSciNet MATH Google Scholar
Fan, J., Gijbels, I., Hu, T. C., & Huang, L. S. (1996). A study of variable bandwidth selection for local polynomial regression. Statistica Sinica, 6(1), 113–127.
MathSciNet MATH Google Scholar
Vieu, P. (1991). Nonparametric regression: optimal local bandwidth choice. Journal of the Royal Statistical Society: Series B, 53(2), 453–464.
MathSciNet MATH Google Scholar
Sheather, S. J., & Jones, M. C. (1991). A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society: Series B, 53(3), 683–690.
MathSciNet MATH Google Scholar
Hall, P., Sheather, S. J., Jones, M. C., & Marron, J. S. (1991). On optimal data-based bandwidth selection in kernel density estimation. Biometrika, 78(2), 263–269.
Article MathSciNet MATH Google Scholar
Chan, S. C., & Zhang, Z. G. (2004). Robust local polynomial regression using M-estimator with adaptive bandwidth. Proc. IEEE International Symposium of Circuits and Systems, 333–336.
Chan, S. C., & Zhang, Z. G. (2004). Multi-resolution analysis of non-uniform data with jump discontinuities and impulsive noise using robust local polynomial regression. Proc. IEEE International Conference of Acoustics, Speech and Signal Processing, 769–772.
Zhang, Z. G., Chan, S. C., Ho, K. L., & Ho, K. R. (2008). On bandwidth selection in local polynomial regression analysis and its application to multi-resolution analysis of non-uniform data. Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology, 52(3), 263–280.
Google Scholar
Ruppert, D., & Wand, M. P. (1995). Multivariate locally weighted least squares regression. Annals of Statistics, 22(3), 1346–1370.
Article MathSciNet Google Scholar
Ruppert, D., Sheather, S. J., & Wand, M. P. (1995). An effective bandwidth selector for local least squares regression. Journal of the American Statistical Association, 90(432), 1257–1270.
Article MathSciNet MATH Google Scholar
Ruppert, D. (1997). Empirical-bias bandwidths for local polynomial nonparametric regression and density estimation. Journal of the American Statistical Association, 92(439), 1049–1062.
Article MathSciNet MATH Google Scholar
Yang, L., & Tschernig, R. (1999). Multivariate bandwidth selection for local linear regression. Journal of the Royal Statistical Society: Series B, 61(4), 793–815.
Article MathSciNet MATH Google Scholar
Masry, E. (1996). Multivariate polynomial regression for time series; uniform strong consistency and rates. Journal of Time Series Analysis, 17(1), 571–599.
Article MathSciNet MATH Google Scholar
Masry, E. (1996). Multivariate regression estimation: local polynomial fitting for time series. Stochastic Processes and Their Applications, 65(1), 81–101.
Article MathSciNet MATH Google Scholar
Masry, E. (1999). Multivariate regression estimation of continuous-time processes from sampled data: local polynomial fitting approach. IEEE Transactions on Information Theory, 45(6), 1939–1953.
Article MathSciNet MATH Google Scholar
Katkovnik, V. (1999). A new method for varying adaptive bandwidth selection, IEEE Transactions on Signal Processing, 47(9), 2567–2571.
Article Google Scholar
Katkovnik, V., Egiazarian, K., & Astola, J. (2005). A spatially adaptive nonparametric regression image deblurring. IEEE Transactions on Image Processing, 14(10), 1469–1478.
Article Google Scholar
Katkovnik, V., Egiazarian, K., & Astola, J. (2006). Local approximation techniques in signal and image processing. Bellingham: SPIE.
Book Google Scholar
Stanković, L. J. (2004). Performance analysis of the adaptive algorithm for bias-to-variance tradeoff. IEEE Transactions on Signal Processing, 52(5), 1228–1234.
Article MathSciNet Google Scholar
Takeda, H., Farsiu, S., & Milanfar, P. (2007). Kernel regression for image processing and reconstruction. IEEE Transactions on Image Processing, 16(2), 349–366.
Article MathSciNet Google Scholar
Goldenshluger, A., & Nemirovski, A. (1997). On spatial adaptive estimation of nonparametric regression. Mathematical Methods of Statistics, 6(2), 135–170.
MathSciNet MATH Google Scholar
Lepski, O. (1990). On a problem of adaptive estimation in Gaussian white noise. Theory of Probability and its Applications, 35(3), 454–466.
Article MathSciNet Google Scholar
Chaudhuri, S. (2001). Super-resolution imaging. Norwell: Kluwer.
Google Scholar
Farsiu, S., Robinson, D., Elad, M., & Milanfar, P. (2004). Fast and robust multi-frame superresolution. IEEE Transactions on Image Processing, 13(10), 1327–1344.
Article Google Scholar
Farsiu, S., Elad, M., & Milanfar, P. (2006). Multi-frame demosaicing and super-resolution of color images. IEEE Transactions on Image Processing, 15(1), 141–159.
Article Google Scholar
Peleg, S., Keren, D., & Schweitzer, L. (1987). Improving image resolution using subpixel motion. Pattern Recognition Letters, 5(3), 223–226.
Article Google Scholar
Irani, M., & Peleg, S. (1991). Improving resolution by image registration. CVGIP. Graphical Models and Image Processing, 53(3), 231–239.
Article Google Scholar
Vandewalle, P., Süsstrunk, S., & Vetterli, M. (2006). A frequency domain approach to registration of aliased images with application to super-resolution. EURASIP Journal on Applied Signal Processing. doi:10.1155/ASP/2006/71459.
Google Scholar
Li, D., & Leow, M. (2003). Closed-form quality measures for compressed medical images: statistical preliminaries for transform coding. Proc. of the 25th IEEE Annual International Conference on Engineering in Medicine and Biology Society, pp. 837–840.
Zhang, Z. G., Chan, S. C., & Zhu, Z. Y. (2009). A new two-stage method for restoration of images corrupted by Gaussian and impulse noises using local polynomial regression and edge preserving regularization. Proc. IEEE International Symposium on Circuits and Systems, 948–951.

Download references

Acknowledgment

This work was supported in part by the Hong Kong Research Grant Council (RGC). The authors would like to thank Mr. K. S. Lee for his assistance for this paper.

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Author information

Authors and Affiliations

Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong, China
Z. G. Zhang & S. C. Chan

Authors

Z. G. Zhang
View author publications
You can also search for this author in PubMed Google Scholar
S. C. Chan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Z. G. Zhang.

Additional information

This work was supported in part by the Hong Kong Research Grant Council (RGC). Parts of this work were presented at the Conference of 2009 IEEE International Symposium on Circuits and Systems [32].

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Zhang, Z.G., Chan, S.C. On Kernel Selection of Multivariate Local Polynomial Modelling and its Application to Image Smoothing and Reconstruction. J Sign Process Syst 64, 361–374 (2011). https://doi.org/10.1007/s11265-010-0495-4

Download citation

Received: 02 October 2009
Revised: 03 May 2010
Accepted: 04 May 2010
Published: 30 May 2010
Issue Date: September 2011
DOI: https://doi.org/10.1007/s11265-010-0495-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On Kernel Selection of Multivariate Local Polynomial Modelling and its Application to Image Smoothing and Reconstruction

Abstract

Similar content being viewed by others

Nonparametric Regression Based Image Analysis

On the comparison of different kernel functionals and neighborhood geometry for nonlocal means filtering

Structure Adaptive Filtering for Edge-Preserving Image Smoothing

1 Introduction

2 Multivariate Local Polynomial Regression