Using large data sets to forecast sectoral employment

Gupta, Rangan; Kabundi, Alain; Miller, Stephen M.; Uwilingiye, Josine

doi:10.1007/s10260-013-0243-6

Using large data sets to forecast sectoral employment

Published: 01 October 2013

Volume 23, pages 229–264, (2014)
Cite this article

Statistical Methods & Applications Aims and scope Submit manuscript

Rangan Gupta¹,
Alain Kabundi²,
Stephen M. Miller³ &
…
Josine Uwilingiye²

474 Accesses
4 Citations
Explore all metrics

Abstract

We use several models using classical and Bayesian methods to forecast employment for eight sectors of the US economy. In addition to using standard vector-autoregressive and Bayesian vector autoregressive models, we also augment these models to include the information content of 143 additional monthly series in some models. Several approaches exist for incorporating information from a large number of series. We consider two multivariate approaches—extracting common factors (principal components) and Bayesian shrinkage. After extracting the common factors, we use Bayesian factor-augmented vector autoregressive and vector error-correction models, as well as Bayesian shrinkage in a large-scale Bayesian vector autoregressive models. For an in-sample period of January 1972 to December 1989 and an out-of-sample period of January 1990 to March 2010, we compare the forecast performance of the alternative models. More specifically, we perform ex-post and ex-ante out-of-sample forecasts from January 1990 through March 2009 and from April 2009 through March 2010, respectively. We find that factor augmented models, especially error-correction versions, generally prove the best in out-of-sample forecast performance, implying that in addition to macroeconomic variables, incorporating long-run relationships along with short-run dynamics play an important role in forecasting employment. Forecast combination models, however, based on the simple average forecasts of the various models used, outperform the best performing individual models for six of the eight sectoral employment series.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A SVEC Model to Forecast and Perform Structural Analysis (Shocks) for the Mexican Economy, 1985Q1–2014Q4

The Nonlinear Unemployment-Inflation Relationship and the Factors That Define It

Article 18 March 2021

Andrew Keinsley & Sandeep Kumar Rangaraju

Cross-sectional quasi-maximum likelihood and bias-corrected pooled least squares estimators for short dynamic panels

Article 19 January 2021

In Choi & Sanghyun Jung

Notes

One referee suggests a third approach to select variables—a statistical procedure using general-to-specific modeling. That is, one includes variables and their lags generally based on their in-sample significance or out-of-sample performance. Since we use as many as 143 predictors, this approach is difficult to implement in practice.
Some recent work suggests that a few DSGE models can out-perform reduced-form time-series models in out-of-sample forecasting. See Christoffel et al. (2010) and Gupta et al. (2011).
See Sect. 5.2 for further details.
The discussion in this section relies heavily on LeSage (1999), Gupta and Miller (2012a, b) and Das et al. (2009)
That is, \(A(L)=A_1 L+A_2 L^{2}+\cdots +A_p L^{p}\);
See LeSage (1999) and references cited therein for further details regarding the non-stationary of most macroeconomic time series.
See Johansen (1991) for further technical details.
For an illustration, see Dua and Ray (1995). We use \(k_{ij} = 0.5\).
In addition to using a Fit of 0.50, we also experiment with a Fit as the average relative MSE from an OLS-estimated VAR containing the eight sectoral employment variables, i.e., \(Fit=\frac{1}{8}{\sum \nolimits _{i=1}^8} {\frac{MSE_i^{\infty }}{MSE_{i}^{0}}}\), as well as a Fit value of 0.25. In both cases, the forecasting performances of the alternative Bayesian models deteriorate. These results are available upon request from the authors.
We can estimate factors using the generalized principal component approach as in Forni et al. (2005) or static factor based on principal component as in Stock and Watson (2002b). See Stock and Watson (2005) for a review literature on factor analysis.
See these papers for more details on the model and the estimation.
When we extract the common factors (principal components) for the MBFAVAR and BFAVAR models, we transform all variables to induce stationarity. Now, we transform all variables to induce non-stationarity. That is, for stationary variables, we accumulate to make them I(1). We also extract three common factors from the non-stationary variables, excluding the stationary variables. The findings prove similar to the three factors extracted when we accumulate the I(0) variables to make them I(1).
Banerjee et al. (2010) note that to extract common factors \(F_{t}\), one must standardized variables \(Y_{t}\) (mean zero, variance one).
Note that as \(n\rightarrow \infty \), and the number of factors \(r\) remains fixed, the number of cointegrating relations \(n-r\rightarrow \infty .\)
Bai and Ng (2004) and Bai (2004) allow for the possibility that \(u_{t}\) or some elements of \(u_{t}\) are \(I(1)\).
Ex-post forecasts use actual values of the variables used in the forecasting equation to generate the forecasts whereas the ex-ante forecasts use forecasted values. The ex-ante forecasts give an objective statistical method (approach) to choose the best performing models, which, in turn, we use to predict the turning points.
After determining the in-sample lag length for the VEC- and FAVEC-type models, we apply the trace test of cointegration to the eight employment series, and the eight employment series and the three factors for the medium and large FAVEC models. The tests suggest 5, 8, and 8 cointegrating vectors, respectively, implying 3 common trends in all the cases. Note that, at each recursion, we choose the number of cointegrating vectors for the BVEC, MBFAVEC, and BFAVEC models by using the trace test. Hence, we update the number of cointegrating relations over the ex-post out-of-sample period. Interestingly, at the end of the out-of-sample period, we find that the number of cointegrating vectors falls to 3 in the BVEC model, while the number stays at 8 and 8, respectively, for the MBFAVEC and BFAVEC models. Note that the results for the MBFAVEC and BFAVEC are consistent with theory, since the number of factors (3) equals the number of common trends (= number of variables in the (M)BFAVEC less the number of cointegrating vectors). These results are available upon request from the authors.
We also estimated in prior versions of this paper AR, VEC, FAAR, FAVAR, and FAVEC models. These models exhibited much worse performance than those reported it the text. Results are available from the authors.
Note that if \(A_{t+n} \) denotes the actual value of a specific variable in period \(t + n\) and \(_t F_{t+n} \) equals the forecast made in period \(t\) for \(t + n\), the RMSE statistic equals the following: \(\sqrt{\left[ {{{\sum \nolimits _{t=1}^{N}} {\left( {{ }_tF_{t+n} -A_{t+n} } \right) ^{2}} }/N} \right] }\) where \(N\) equals the number of forecasts.
As a robustness check, we estimate the BFAAR, BFAVAR, and LBVAR models using the first-differenced employment series (which, in our case, amounts to forecasting growth rates of employment, since the employment series are in logarithms). We then recover the (log-)level forecasts of the data using the actual observation of the period before the starting point of the recursive out-of-sample forecast period. We observe that the forecast performance of the BFAAR model (for the log-level of employment) improves in seven out of the eight cases (the manufacturing forecasts worsen). For the BFAVAR model, forecast performances improve for construction; trade, transportation, and utilities; and professional and business services. For the LBVAR model, the improvements only occur for professional and business services; and leisure and hospitality. As with the Bayesian models for forecasting the levels of employment, we forecast the first-differences (growth rates) of employment with the tightness of the prior based on an in-sample fit of 50 %. Importantly, however, our general conclusions do not change. In other words, the improved performances of these models do not make them the preferred models for cases where they were non-optimal. One exception does occur. To wit, now the BFAAR model does the best at the margin for construction employment instead of the LBVAR model. This result highlights the importance of modeling the long-run relationships over and above differencing the data (which is also done for the BFAVECM before recovering the log-level forecasts) to provide more robust results in the presence of structural breaks (Carriero et al. 2011). The details of these results are available upon request from the authors.
In addition to the ex-ante out-of-sample forecasting exercise over 2009:4 to 2010:3, we also analyze the in-sample (1972:1–1989:12) and ex-post out-of-sample (1990:1–2009:3) forecasts obtained from the best models for each of the eight employment series. The differences between the actual data and the predicted data for the in-sample are virtually inseparable, while the ex-post out-of-sample forecasts from the best models tend to predict the turning points quite well. We suppress these results to save space, but are available upon request from the authors.

References

Bai J (2004) Estimating cross-section common stochastic trends in nonstationary panel data. J Econ 122:137–183
Article MATH Google Scholar
Bai J, Ng S (2002) Determining the number of factors in approximate factor models. Econometrica 70:191–221
Article MATH MathSciNet Google Scholar
Bai J, Ng S (2004) A PANIC attack on unit roots and cointegration. Econometrica 72:1127–1177
Article MATH MathSciNet Google Scholar
Bańbura M, Giannone D, Reichlin L (2010) Large Bayesian vector auto regressions. J Appl Econ 25:71–92
Article Google Scholar
Banerjee A, Marcellino MG (2009) Factor-augmented error correction models. In: Castle JL, Shephard N (eds) The methodology and practice of econometrics: a festschrift for David Hendry. Oxford University Press, Oxford, pp 227–254
Chapter Google Scholar
Banerjee A, Marcellino MG, Maston I (2010) Forecasting with factor-augmented error correction models. CEPR Discussion Paper No. DP7677
Bernanke BS, Boivin J, Eliazs P (2005) Measuring the effects of monetary policy: a factor-augmented vector autoregressive (FAVAR) approach. Q J Econ 120:387–422
Google Scholar
Boivin J, Ng S (2005) Understanding and comparing factor based forecasts. Int J Cent Banking 1:117–152
Google Scholar
Carriero A, Kapetanios G, Marcellino M (2009) Forecasting exchange rates with a large Bayesian VAR. Int J Forecast 25:400–417
Article Google Scholar
Carriero A, Kapetanios G, Marcellino M (2011) Forecasting large datasets with Bayesian reduced rank multivariate models. J Appl Econ 26:735–761
Article MathSciNet Google Scholar
Christoffel K, Coenen G, Warne A (2010) Forecasting with DSGE models. European Central Bank, Working Paper No 1185
Das S, Gupta R, Kabundi A (2009) Could we have forecasted the recent downturn in the South African housing market? J Hous Econ 18:325–335
Article Google Scholar
Doan TA, Litterman RB, Sims CA (1984) Forecasting and conditional projections using realistic prior distributions. Econ Rev 3:1–100
Article MATH Google Scholar
Dua P, Ray SC (1995) A BVAR model for the connecticut economy. J Forecast 14:167–180
Article Google Scholar
Enders W (2004) Applied econometric time series, 2nd edn. Wiley, New York
Google Scholar
Forni M, Hallin M, Lippi M, Reichlin L (2005) The generalized dynamic factor model, one sided estimation and forecasting. J Am Stat Assoc 100:830–840
Article MATH MathSciNet Google Scholar
Giacomini R, White H (2006) Tests of conditional predictive ability. Econometrica 74:1545–1578
Article MATH MathSciNet Google Scholar
Glennon D, Lane J, Johnson S (1987) Regional econometric models that reflect labor market relations. Int J Forecast 3:299–312
Article Google Scholar
Gupta R, Kabundi A, Miller SM (2011) Forecasting the US real house price index: structural and non-structural models with and without fundamentals. Econ Model 26:2013–2021
Article Google Scholar
Gupta R, Miller SM (2012a) “Ripple effects” and forecasting home prices in Los Angeles, Las Vegas, and Phoenix. Ann Reg Sci 48:763–782
Article Google Scholar
Gupta R, Miller SM (2012b) The time-series properties on housing prices: a case study of the Southern California market. J Real Estate Financ Econ 44:339–361
Article Google Scholar
Johansen S (1991) Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models. Econometrica 59:1551–1580
Article MATH MathSciNet Google Scholar
Lane T (1966) The urban base multiplier: an evaluation of the state of the art. Land Econ 42:339–347
Article Google Scholar
LeSage JP (1990) A comparison of the forecasting ability of ECM and VAR models. Rev Econ Stat 72:664–671
Article Google Scholar
LeSage JP (1999) Applied econometrics using MATLAB. http://www.spatial-econometrics.com
Litterman RB (1981) A Bayesian procedure for forecasting with vector autoregressions. Working Paper, Federal Reserve Bank of Minneapolis
Litterman RB (1986) Forecasting with Bayesian vector autoregressions: five years of experience. J Bus Econ Stat 4:25–38
Google Scholar
Rapach DE, Strauss JK (2005) Forecasting employment growth in Missouri with many potentially relevant predictors: an analysis of forecast combining methods. Federal Reserve Bank of Saint Louis. Reg Econ Dev 1:97–112
Google Scholar
Rapach DE, Strauss JK (2008) Forecasting US employment growth using forecast combining methods. J Forecast 27:75–93
Article MathSciNet Google Scholar
Rapach DE, Strauss JK (2010) Bagging or combining (or both)? An analysis based on forecasting US employment growth. Econ Rev 29:511–533
Article MathSciNet Google Scholar
Rapach DE, Strauss JK (2012) Forecasting US state-level employment growth: an amalgamation approach. Int J Forecast 28:315–327
Article Google Scholar
Sims CA (1980) Macroeconomics and reality. Econometrica 48:1–48
Article Google Scholar
Sims CA, Stock JH, Watson MW (1990) Inference in linear time series models with some unit roots. Econometrica 58:113–144
Article MATH MathSciNet Google Scholar
Spencer DE (1993) Developing a Bayesian vector autoregression model. Int J Forecast 9:407–421
Article Google Scholar
Stevens BH, Moore CL (1980) A critical review of the literature on shift-share as a forecasting technique. J Reg Sci 20:419–437
Article Google Scholar
Stock JH, Watson MW (1999) Forecasting inflation. J Monet Econ 44:293–335
Article Google Scholar
Stock JH, Watson MW (2002a) Forecasting using principal components from a large number of predictors. J Am Stat Assoc 97:147–162
Article MathSciNet Google Scholar
Stock JH, Watson MW (2002b) Macroeconomics forecasting using diffusion indexes. J Bus Econ Stat 20:147–162
Article MathSciNet Google Scholar
Stock JH, Watson MW (2003) Forecasting output and inflation: the role of asset prices. J Econ Lit 41:788–829
Article Google Scholar
Stock JH, Watson MW (2004) Combination forecasts of output growth in a seven-country data set. J Forecast 23:405–430
Article Google Scholar
Stock JH, Watson MW (2005) Implications of dynamic factor models for VAR analysis. NBER Working Paper No. 11467
Taylor CA (1982) Econometric modeling of urban and other substate areas: an analysis of alternative methodologies. Reg Sci Urban Econ 12:425–448
Article Google Scholar
Theil H (1971) Principles of econometrics. Wiley, New York
MATH Google Scholar
Todd RM (1984) Improving economic forecasting with Bayesian vector autoregression. Quarterly Review, Federal Reserve Bank of Minneapolis, pp 18–29
Williamson RB (1975) Predictive power of the export base theory. Growth Change 6:3–10
Google Scholar

Download references

Acknowledgments

We thank two anonymous referees for many helpful comments. Any remaining errors, however, are solely ours.

Author information

Authors and Affiliations

Department of Economics, University of Pretoria, Pretoria, 0002, South Africa
Rangan Gupta
Department of Economics and Econometrics, University of Johannesburg, Johannesburg, 2006, South Africa
Alain Kabundi & Josine Uwilingiye
Department of Economics, University of Nevada-Las Vegas, Las Vegas, NV, 89154-6005, USA
Stephen M. Miller

Authors

Rangan Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Alain Kabundi
View author publications
You can also search for this author in PubMed Google Scholar
Stephen M. Miller
View author publications
You can also search for this author in PubMed Google Scholar
Josine Uwilingiye
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rangan Gupta.

Appendix

See Table 10.

Table 10 Variables

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gupta, R., Kabundi, A., Miller, S.M. et al. Using large data sets to forecast sectoral employment. Stat Methods Appl 23, 229–264 (2014). https://doi.org/10.1007/s10260-013-0243-6

Download citation

Accepted: 07 September 2013
Published: 01 October 2013
Issue Date: June 2014
DOI: https://doi.org/10.1007/s10260-013-0243-6

Keywords

JEL Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using large data sets to forecast sectoral employment

Abstract

Access this article

Similar content being viewed by others

A SVEC Model to Forecast and Perform Structural Analysis (Shocks) for the Mexican Economy, 1985Q1–2014Q4

The Nonlinear Unemployment-Inflation Relationship and the Factors That Define It

Cross-sectional quasi-maximum likelihood and bias-corrected pooled least squares estimators for short dynamic panels

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

JEL Classification

Navigation

Using large data sets to forecast sectoral employment

Abstract

Access this article

Similar content being viewed by others

A SVEC Model to Forecast and Perform Structural Analysis (Shocks) for the Mexican Economy, 1985Q1–2014Q4

The Nonlinear Unemployment-Inflation Relationship and the Factors That Define It

Cross-sectional quasi-maximum likelihood and bias-corrected pooled least squares estimators for short dynamic panels

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Search

Navigation