Skip to main content

2019 | OriginalPaper | Buchkapitel

Optimal Trade-Off Between Sample Size and Precision of Supervision for the Fixed Effects Panel Data Model

verfasst von : Giorgio Gnecco, Federico Nutarelli

Erschienen in: Machine Learning, Optimization, and Data Science

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We investigate a modification of the classical fixed effects panel data model (a linear regression model able to represent unobserved heterogeneity in the data), in which one has the additional possibility of controlling the conditional variance of the output given the input, by varying the cost associated with the supervision of each training example. Assuming an upper bound on the total supervision cost, we analyze and optimize the trade-off between the sample size and the precision of supervision (the reciprocal of the conditional variance of the output), by formulating and solving a suitable optimization problem, based on a large-sample approximation of the output of the classical algorithm used to estimate the parameters of the fixed effects panel data model. Considering a specific functional form for that precision, we prove that, depending on the “returns to scale” of the precision with respect to the supervision cost per example, in some cases “many but bad” examples provide a smaller generalization error than “few but good” ones, whereas in other cases the opposite occurs. The results extend to the fixed effects panel data model the ones we obtained in recent works for a simpler linear regression model. We conclude discussing possible applications of our results, and extensions of the proposed optimization framework to other panel data models.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
For simplicity of exposition, here the model is not presented in its most general form (e.g., the disturbances \(\varepsilon _{n,t}\)’s are simply assumed to be mutually independent).
 
2
The case of finite T and large N is of more interest for microeconometrics, and will be investigated in future research.
 
3
E.g., if the \(\varvec{x}_{n,t}\)’s are independent, identically distributed, and have finite moments up to the order 4.
 
4
We recall that a sequence of random real matrices \(\varvec{M}_{T}\), \(T=1,\ldots ,+\infty \) converges in probability to the real matrix \(\varvec{M}\) if, for every \(\varepsilon >0\), \(\mathrm{Prob} \left( \left\| \varvec{M}_{T} - \varvec{M}\right\| > \varepsilon \right) \) (where \(\Vert \cdot \Vert \) is an arbitrary matrix norm) tends to 0 as T tends to \(+\infty \). In this case, one writes \(\mathrm{plim}_{T \rightarrow +\infty } \varvec{M}_T=\varvec{M}\).
 
5
The existence of the probability limit (20) and the assumed positive definiteness of the matrix \(\varvec{A}_N\) guarantee that the invertibility of the matrix \(\sum _{n=1}^N \varvec{X}_n' \varvec{Q} \varvec{X}_n=\sum _{n=1}^N \varvec{X}_n' \varvec{Q}' \varvec{Q} \varvec{X}_n\) (see Sect. 2) holds with probability near 1 for large T.
 
6
This is obtained taking also into account that, as a consequence of the Continuous Mapping Theorem [4, Theorem 7.33], the probability limit of the product of two random variables equals the product of their probability limits, when the latter two exist.
 
7
By an argument similar to that used in [6], one can show that the approximation is exact, at optimality, when C is a multiple of both \(N c_\mathrm{min}\) and \(N c_\mathrm{max}\).
 
Literatur
2.
Zurück zum Zitat Athey, S., Imbens, G.: Recursive partitioning for heterogeneous causal effects. Proc. Nat. Acad. Sci. 113, 7353–7360 (2016)MathSciNetCrossRef Athey, S., Imbens, G.: Recursive partitioning for heterogeneous causal effects. Proc. Nat. Acad. Sci. 113, 7353–7360 (2016)MathSciNetCrossRef
3.
Zurück zum Zitat Chen, C.-H., Lee, L.H.: Stochastic Simulation Optimization: An Optimal Computing Budget Allocation. World Scientific, Singapore (2010)CrossRef Chen, C.-H., Lee, L.H.: Stochastic Simulation Optimization: An Optimal Computing Budget Allocation. World Scientific, Singapore (2010)CrossRef
4.
Zurück zum Zitat Florescu, I.: Probability and Stochastic Processes. Wiley, Hoboken (2015)MATH Florescu, I.: Probability and Stochastic Processes. Wiley, Hoboken (2015)MATH
5.
Zurück zum Zitat Frees, E.W.: Longitudinal and Panel Data: Analysis and Applications in the Social Sciences. Cambridge University Press, Cambridge (2004)CrossRef Frees, E.W.: Longitudinal and Panel Data: Analysis and Applications in the Social Sciences. Cambridge University Press, Cambridge (2004)CrossRef
7.
Zurück zum Zitat Greene, W.H.: Econometric Analysis. Pearson Education, London (2003) Greene, W.H.: Econometric Analysis. Pearson Education, London (2003)
8.
Zurück zum Zitat Groves, R.M., Fowler Jr., F.J., Couper, M.P., Lepkowski, J.M., Singer, E., Tourangeau, R.: Survey Methodology. Wiley, Hoboken (2004)MATH Groves, R.M., Fowler Jr., F.J., Couper, M.P., Lepkowski, J.M., Singer, E., Tourangeau, R.: Survey Methodology. Wiley, Hoboken (2004)MATH
9.
Zurück zum Zitat Nguyen, H.T., Kosheleva, O., Kreinovich, V., Ferson, S.: Trade-off between sample size and accuracy: case of measurements under interval uncertainty. Int. J. Approx. Reason. 50, 1164–1176 (2009)MathSciNetCrossRef Nguyen, H.T., Kosheleva, O., Kreinovich, V., Ferson, S.: Trade-off between sample size and accuracy: case of measurements under interval uncertainty. Int. J. Approx. Reason. 50, 1164–1176 (2009)MathSciNetCrossRef
10.
Zurück zum Zitat Ruud, P.A.: An Introduction to Classical Econometric Theory. Oxford University Press, Oxford (2000) Ruud, P.A.: An Introduction to Classical Econometric Theory. Oxford University Press, Oxford (2000)
11.
Zurück zum Zitat Vapnik, V.N.: Statistical Learning Theory. Wiley, Hoboken (1998)MATH Vapnik, V.N.: Statistical Learning Theory. Wiley, Hoboken (1998)MATH
12.
Zurück zum Zitat Varian, H.R.: Big Data: new tricks for econometrics. J. Econ. Perspect. 28, 3–38 (2014)CrossRef Varian, H.R.: Big Data: new tricks for econometrics. J. Econ. Perspect. 28, 3–38 (2014)CrossRef
13.
Zurück zum Zitat Wooldridge, J.M.: Econometric Analysis of Cross Section and Panel Data. MIT Press, Cambridge (2002)MATH Wooldridge, J.M.: Econometric Analysis of Cross Section and Panel Data. MIT Press, Cambridge (2002)MATH
Metadaten
Titel
Optimal Trade-Off Between Sample Size and Precision of Supervision for the Fixed Effects Panel Data Model
verfasst von
Giorgio Gnecco
Federico Nutarelli
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-37599-7_44