Abstract
In this article we introduce a smoothing spline estimate for fixed design regression estimation based on real and artificial data, where the artificial data comes from previously undertaken similar experiments. The smoothing spline estimate gives different weights to the real and the artificial data. It is investigated under which conditions the rate of convergence of this estimate is better than the rate of convergence of the ordinary smoothing spline estimate applied to the real data only. The finite sample size performance of the estimate is analyzed using simulated data. The usefulness of the estimate is illustrated by applying it in the context of experimental fatigue tests.
Similar content being viewed by others
References
Adams RA (1975) Empirical sobolev spaces. Academic Press, New York
Bartle RG, Sherbert DR (2011) Introduction to real analysis, 4th edn. Wiley, New York
Beirlant J, Györfi L (1998) On the asymptotic \({L}_2\)-error in partitioning regression estimation. J Stat Plan Inference 71:93–107
Birman MS, Solomjak MZ (1967) Piecewise polynomial approximation of function of the class\(W^{\alpha }_{p}\). Mathematics of the USSR-Sbornik, translation of mat. Sbornik 73: 331–355, 2: 295–316
Boller C, Seeger T, Vormwald M (2008) Materials Database for Cyclic Loading. Fachgebiet Werkstoffmechanik, TU Darmstadt
Chernoff H (1952) A measure of asymptotic efficiency of tests of a hypothesis based on the sum of observations. Ann Math Stat 23:493–507
Devroye L (1982) Necessary and sufficient conditions for the almost everywhere convergence of nearest neighbor regression function estimates. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 61:467–481
Devroye L, Györfi L, Krzyżak A, Lugosi G (1994) On the strong universal consistency of nearest neighbor regression function estimates. Ann Stat 22:1371–1385
Devroye L, Krzyżak A (1989) An equivalence theorem for \({L}_1\) convergence of the kernel regression estimate. J Stat Plan Inference 23:71–82
Devroye L, Wagner TJ (1980) Distribution-free consistency results in nonparametric discrimination and regression function estimation. Ann Stat 8:231–239
El Dsoki C (2010) Reduzierung des experimentellen Versuchsaufwandes durch künstliche neuronale Netze. Shaker-Verlag, Aachen
Eubank RL (1999) Nonparametric regression and spline smoothing, 2nd edn. Marcel Dekker, New York
Furer D, Kohler M, Krzyzak A (2013) Fixed design regression estimation based on real and artificial data. J Nonparametr Stat 25:223–241
Gasser T, Müller M-H (1979) Kernel estimation of regression functions. In: Gasser T, Rosenblatt M (eds) Smoothing techniques for curve estimation. Lecture notes in mathematics, vol 757. Springer, Heidelberg, pp 23–68
Györfi L (1981) Recent results on nonparametric regression estimate and multiple classification. Probl Control Inf Theory 10:43–52
Györfi L, Kohler M, Krzyżak A, Walk H (2002) A distribution-free theory of nonparametric regression. Springer series in statistics. Springer, New York
Kohler M, Krzyżak A (2001) Nonparametric regression estimation using penalized least squares. IEEE Trans Inf Theory 47:3054–3058
Kohler M, Krzyżak A (2012) Pricing of American options in discrete time using least squares estimates with complexity penalties. J Stat Plan Inference 142:2289–2307
Lugosi G, Zeger K (1995) Nonparametric estimation via empirical risk minimization. IEEE Trans Inf Theory 41:677–687
Mack YP (1981) Local properties of \(k\)-nearest neighbor regression estimates. SIAM J Algebraic Discret Methods 2:311–323
Manson SS (1965) Fatigue: a complex subject—some simple approximation. Exp Mech 5:193–226
Nadaraya EA (1964) On estimating regression. Theory Probab Appl 9:141–142
Nadaraya EA (1970) Remarks on nonparametric estimates for density functions and regression curves. Theory Probab Appl 15:134–137
Nussbaum M (1985) Spline smoothing in regression models and asymptotic effciency in \(L_2\). Ann Stat 13:984–997
Oden JT, Reddy JN (1976) An introduction to the mathematical theory of finite elements. Wiley, New York
R Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Stone CJ (1977) Consistent nonparametric regression. Ann Stat 5:595–645
Stone CJ (1982) Optimal global rates of convergence for nonparametric regression. Ann Stat 10:1040–1053
Tomasella A, El-Dsoki C, Hanselka H, Kaufmann H (2011) A computational estimation of cyclic material properties using artificial neural networks. Proced Eng 10:439–445
van de Geer S (1990) Estimating a regression function. Ann Stat 25:1014–1035
van de Geer S (2000) Empirical processes in M-estimation. Cambridge University Press, Cambridge
Wahba G (1990) Spline models for observational data. Society for Industrial and Applied Mathematics, Philadelphia
Watson GS (1964) Smooth regression analysis. Sankhya Ser A 26:359–372
Zhao LC (1987) Exponential bounds of mean error for the nearest neighbor estimates of regression functions. J Multivar Anal 21:168–178
Acknowledgments
The authors would like to thank two anonymous referees for various helpful comments and the German Research Foundation (DFG) for funding this project within the Collaborative Research Center 666.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 A deterministic lemma
Lemma 5
Let \(d\ge 1, t > 0, w_1, \ldots , w_N \in \mathrm{I R}_+, x_1, \ldots , x_N \in \mathrm{I R}^d\) and \(z_1, \bar{z}_1, \ldots , z_N, \bar{z}_N \in \mathrm{I R}\). Let \(m\) be a function \(m: \mathrm{I R}^d \rightarrow \mathrm{I R}\). Let \(\mathcal{F}_n\) be a set of functions \(f:\mathrm{I R}^d \rightarrow \mathrm{I R}\) and for \(f \in \mathcal{F}_n\) let
be a penalty term. Define
and
and assume that both minima exist. Then
implies
Proof
The result can be proven by modifying a proof in Kohler and Krzyżak (2012). For the sake of completeness we give nevertheless in the sequel a complete proof.
By definition of the estimate we have
hence
which implies
We show next that \(T_1 \le T_2\). Assume to the contrary that this is not true. Then
Using (30) we see that
which implies
i.e.,
But this is a contradiction to (30), so we have indeed proved \(T_1 \le T_2\). As a consequence we can conclude from (30)
In the next to last inequality we have used, that \(a^2/2-b^2 \le (a-b)^2\) \((a,b \in \mathrm{I R})\) with \(a=\bar{m}_n(x_i)- m^*_{n}(x_i)\) and \(b=m(x_i)- m^*_{n}(x_i)\).\(\square \)
Proof (Proof of Lemma 1)
Set \(N=n+N_n\), and for \(i \in \{1, \ldots , N\}\) choose
and
in Lemma 5. Then we immediately get the assertion of Lemma 1.\(\square \)
1.2 Auxiliary results
Lemma 6
Assume that the sub-Gaussion condition (9) ist satisfied. Then there exist constants \(c_{28},t_0 \in \mathrm{I R}_{+}\) depending only on \(K\) and \(\sigma _0\) such that for all \(k \in \mathrm{I N}, t,\sigma >0\) with \(t>2^{k}\sigma ^{\frac{1}{2k}}t_0\) and with \(t\ge 2^kt_0\)
Proof
The proof is a modification of the proof of Lemma 6.1 in van de Geer (1990). \(\square \)
Lemma 7
Let \(k \in \mathrm{I N}\) and \(c_{29}>0\). Then there exists a constant \(c_{30} \in \mathrm{I R}_{+}\) (independent of \(k\) and \(c_{29}\)) such that
Proof
See Birman and Solomjak (1967), Theorem 5.2.\(\square \)
Lemma 8
Let \(H_n\) be a set of functions \(h:[0,1] \rightarrow \mathrm{I R}\) and let \(R>0\). Suppose that \(\sup _{h\in H_n} \Vert h\Vert _n \le R\) and that the sub-Gaussian condition (9) is satisfied. Then for some constant \(c_{31}\) depending only on \(K\) and \(\sigma _0\), and for \(\delta >0\) and \(\sigma >0\) satisfying \(R>\delta /\sigma \) and
we have
Proof
See van de Geer (2000), Corollary 8.3. \(\square \)
Rights and permissions
About this article
Cite this article
Furer, D., Kohler, M. Smoothing spline regression estimation based on real and artificial data. Metrika 78, 711–746 (2015). https://doi.org/10.1007/s00184-014-0524-6
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-014-0524-6