Skip to main content
Top

2022 | OriginalPaper | Chapter

A Statistical Learning Theory Approach for the Analysis of the Trade-off Between Sample Size and Precision in Truncated Ordinary Least Squares

Authors : Giorgio Gnecco, Fabio Raciti, Daniela Selvi

Published in: High-Dimensional Optimization and Probability

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This chapter deals with linear regression problems for which one has the possibility of varying the supervision cost per example, by controlling the conditional variance of the output given the feature vector. For a fixed upper bound on the total available supervision cost, the trade-off between the number of training examples and their precision of supervision is investigated, using a nonasymptotic data-independent bound from the literature in statistical learning theory. This bound is related to the truncated output of the ordinary least squares regression algorithm. The results of the analysis are also compared theoretically with the ones obtained in a previous work, based on a large-sample approximation of the untruncated output of ordinary least squares. Advantages and disadvantages of the investigated approach are discussed.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
For the two respective problems considered in [3] and in the extended framework of [4], OLS and WLS provide the best linear unbiased estimates of the parameter vector of the linear regression model, according to Gauss–Markov theorem [13, Section 9.4]. This depends on the fact that the measurement noise is homoskedastic in the framework considered in [3], and heteroskedastic in its extension considered in [4].
 
2
The statement of [9, Theorem 11.3] actually includes a universal positive constant, the value of which is not explicitly reported therein. However, it can be computed by an inspection of the proof of such theorem, as summarized in the following. In more details, such proof shows that
$$\displaystyle \begin{aligned} R^{exp}_c \leq \mathbb{E} \left\{T_{1,N_c}\right\} + \mathbb{E} \left\{T_{2,N_c}\right\} \end{aligned} $$
(13)
(see [9, Section 7.1] for the precise definitions of the random variables \(T_{1,N_c}\) and \(T_{2,N_c}\)), and that
$$\displaystyle \begin{aligned} \mathbb{E} \left\{T_{1,N_c}\right\} \leq v + 9 \left(12 e N_c \right)^{2(p+1)} \cdot \frac{2304 L^2}{N_c} \cdot \exp \left(-\frac{N_c v}{2304 L^2} \right) \end{aligned} $$
(14)
for
$$\displaystyle \begin{aligned} v=\frac{2304 L^2}{N_c} \cdot \ln \left(9 \left(12 e N_c \right)^{2(p+1)}\right)\,, \end{aligned} $$
(15)
whereas
$$\displaystyle \begin{aligned} \mathbb{E} \left\{T_{2,N_c}\right\} \leq 8 \sigma_c^2 \frac{p}{N_c} + \inf_{\hat{\underline{\beta}} \in \mathbb{R}^p} \mathbb{E} \left\{\left(\hat{\underline{\beta}}'\underline{x}^{test}-y^{test}\right)^2 \right\}\,. \end{aligned} $$
(16)
In the specific framework considered in this chapter, one has
$$\displaystyle \begin{aligned} y^{test}=\underline{\beta}'\underline{x}^{test}\,, \end{aligned} $$
(17)
hence the upper bound (16) is reduced to
$$\displaystyle \begin{aligned} \mathbb{E} \left\{T_{2,N_c}\right\} \leq 8 \sigma_c^2 \frac{p}{N_c}\,. \end{aligned} $$
(18)
Finally, the upper bound (19) presented in the text follows by combining (13)–(18).
 
3
This is a linear regression model able to represent unobserved heterogeneity in the data via possibly different constants associated with distinct observational units. Depending on the setting, such units may have the same or different numbers of associated observations. The first case is called a balanced panel, the second one an unbalanced panel [17].
 
4
This case was investigated in [5] and [6], respectively for balanced and unbalanced panels, relying in both analyses on large-sample approximations of the outputs of suitable algorithms used to estimate the parameters of the fixed effects panel data model.
 
Literature
1.
3.
go back to reference G. Gnecco, F. Nutarelli, On the trade-off between number of examples and precision of supervision in regression problems, in Proceedings of the Fourth International Conference of the International Neural Network Society on Big Data and Deep Learning (INNS BDDL 2019), Sestri Levante, Italy, 1–6 (2019) G. Gnecco, F. Nutarelli, On the trade-off between number of examples and precision of supervision in regression problems, in Proceedings of the Fourth International Conference of the International Neural Network Society on Big Data and Deep Learning (INNS BDDL 2019), Sestri Levante, Italy, 1–6 (2019)
5.
go back to reference G. Gnecco, F. Nutarelli, Optimal trade-off between sample size and precision of supervision for the fixed effects panel data model, in Proceedings of the Fifth International Conference on machine Learning, Optimization & Data science (LOD 2019), Certosa di Pontignano (Siena), Italy, vol. 11943 of Lecture Notes in Computer Science (2020), pp. 531–542 G. Gnecco, F. Nutarelli, Optimal trade-off between sample size and precision of supervision for the fixed effects panel data model, in Proceedings of the Fifth International Conference on machine Learning, Optimization & Data science (LOD 2019), Certosa di Pontignano (Siena), Italy, vol. 11943 of Lecture Notes in Computer Science (2020), pp. 531–542
8.
go back to reference R.M. Groves, F.J. Fowler, Jr., M.P. Couper, J.M. Lepkowski, E. Singer, R. Tourangeau, Survey Methodology (Wiley-Interscience, 2004) R.M. Groves, F.J. Fowler, Jr., M.P. Couper, J.M. Lepkowski, E. Singer, R. Tourangeau, Survey Methodology (Wiley-Interscience, 2004)
9.
go back to reference L. Györfi, A. Krzyzak, M. Kohler, H. Walk, A Distribution-Free Theory of Nonparametric Regression (Springer, Berlin, 2002)CrossRef L. Györfi, A. Krzyzak, M. Kohler, H. Walk, A Distribution-Free Theory of Nonparametric Regression (Springer, Berlin, 2002)CrossRef
10.
go back to reference T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning, 2nd edn. (Springer, Berlin, 2009)CrossRef T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning, 2nd edn. (Springer, Berlin, 2009)CrossRef
11.
go back to reference H.T. Nguyen, O. Kosheleva, V. Kreinovich, S. Ferson, Trade-off between sample size and accuracy: case of measurements under interval uncertainty. Int. J. Approx. Reason. 50, 1164–1176 (2009)MathSciNetCrossRef H.T. Nguyen, O. Kosheleva, V. Kreinovich, S. Ferson, Trade-off between sample size and accuracy: case of measurements under interval uncertainty. Int. J. Approx. Reason. 50, 1164–1176 (2009)MathSciNetCrossRef
12.
go back to reference J.S. Rustagi, Optimization Techniques in Statistics (Academic Press, London, 1994)MATH J.S. Rustagi, Optimization Techniques in Statistics (Academic Press, London, 1994)MATH
13.
go back to reference P.A. Ruud, An Introduction to Classical Econometric Theory (Oxford University Press, Oxford, 2000) P.A. Ruud, An Introduction to Classical Econometric Theory (Oxford University Press, Oxford, 2000)
14.
go back to reference S. Shalev-Shwartz, S. Ben-David, Understanding Machine Learning: From Theory to Algorithms (Cambridge University Press, Cambridge, 2014)CrossRef S. Shalev-Shwartz, S. Ben-David, Understanding Machine Learning: From Theory to Algorithms (Cambridge University Press, Cambridge, 2014)CrossRef
15.
go back to reference S. Shalev-Shwartz, O. Shamir, N. Srebro, K. Sridharan, Learnability, stability and uniform convergence. J. Mach. Learn. Res. 11, 2635–2670 (2010)MathSciNetMATH S. Shalev-Shwartz, O. Shamir, N. Srebro, K. Sridharan, Learnability, stability and uniform convergence. J. Mach. Learn. Res. 11, 2635–2670 (2010)MathSciNetMATH
16.
go back to reference V.N. Vapnik, Statistical Learning Theory (Wiley-Interscience, 1998) V.N. Vapnik, Statistical Learning Theory (Wiley-Interscience, 1998)
17.
go back to reference J.M. Wooldridge, Econometric Analysis of Cross Section and Panel Data (MIT Press, Cambridge, 2002)MATH J.M. Wooldridge, Econometric Analysis of Cross Section and Panel Data (MIT Press, Cambridge, 2002)MATH
Metadata
Title
A Statistical Learning Theory Approach for the Analysis of the Trade-off Between Sample Size and Precision in Truncated Ordinary Least Squares
Authors
Giorgio Gnecco
Fabio Raciti
Daniela Selvi
Copyright Year
2022
DOI
https://doi.org/10.1007/978-3-031-00832-0_7