Skip to main content
Log in

Using large data sets to forecast sectoral employment

  • Published:
Statistical Methods & Applications Aims and scope Submit manuscript

Abstract

We use several models using classical and Bayesian methods to forecast employment for eight sectors of the US economy. In addition to using standard vector-autoregressive and Bayesian vector autoregressive models, we also augment these models to include the information content of 143 additional monthly series in some models. Several approaches exist for incorporating information from a large number of series. We consider two multivariate approaches—extracting common factors (principal components) and Bayesian shrinkage. After extracting the common factors, we use Bayesian factor-augmented vector autoregressive and vector error-correction models, as well as Bayesian shrinkage in a large-scale Bayesian vector autoregressive models. For an in-sample period of January 1972 to December 1989 and an out-of-sample period of January 1990 to March 2010, we compare the forecast performance of the alternative models. More specifically, we perform ex-post and ex-ante out-of-sample forecasts from January 1990 through March 2009 and from April 2009 through March 2010, respectively. We find that factor augmented models, especially error-correction versions, generally prove the best in out-of-sample forecast performance, implying that in addition to macroeconomic variables, incorporating long-run relationships along with short-run dynamics play an important role in forecasting employment. Forecast combination models, however, based on the simple average forecasts of the various models used, outperform the best performing individual models for six of the eight sectoral employment series.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

Notes

  1. One referee suggests a third approach to select variables—a statistical procedure using general-to-specific modeling. That is, one includes variables and their lags generally based on their in-sample significance or out-of-sample performance. Since we use as many as 143 predictors, this approach is difficult to implement in practice.

  2. Some recent work suggests that a few DSGE models can out-perform reduced-form time-series models in out-of-sample forecasting. See Christoffel et al. (2010) and Gupta et al. (2011).

  3. See Sect. 5.2 for further details.

  4. The discussion in this section relies heavily on LeSage (1999), Gupta and Miller (2012a, b) and Das et al. (2009)

  5. That is, \(A(L)=A_1 L+A_2 L^{2}+\cdots +A_p L^{p}\);

  6. See LeSage (1999) and references cited therein for further details regarding the non-stationary of most macroeconomic time series.

  7. See Johansen (1991) for further technical details.

  8. For an illustration, see Dua and Ray (1995). We use \(k_{ij} = 0.5\).

  9. In addition to using a Fit of 0.50, we also experiment with a Fit as the average relative MSE from an OLS-estimated VAR containing the eight sectoral employment variables, i.e., \(Fit=\frac{1}{8}{\sum \nolimits _{i=1}^8} {\frac{MSE_i^{\infty }}{MSE_{i}^{0}}}\), as well as a Fit value of 0.25. In both cases, the forecasting performances of the alternative Bayesian models deteriorate. These results are available upon request from the authors.

  10. We can estimate factors using the generalized principal component approach as in Forni et al. (2005) or static factor based on principal component as in Stock and Watson (2002b). See Stock and Watson (2005) for a review literature on factor analysis.

  11. See these papers for more details on the model and the estimation.

  12. When we extract the common factors (principal components) for the MBFAVAR and BFAVAR models, we transform all variables to induce stationarity. Now, we transform all variables to induce non-stationarity. That is, for stationary variables, we accumulate to make them I(1). We also extract three common factors from the non-stationary variables, excluding the stationary variables. The findings prove similar to the three factors extracted when we accumulate the I(0) variables to make them I(1).

  13. Banerjee et al. (2010) note that to extract common factors \(F_{t}\), one must standardized variables \(Y_{t}\) (mean zero, variance one).

  14. Note that as \(n\rightarrow \infty \), and the number of factors \(r\) remains fixed, the number of cointegrating relations \(n-r\rightarrow \infty .\)

  15. Bai and Ng (2004) and Bai (2004) allow for the possibility that \(u_{t}\) or some elements of \(u_{t}\) are \(I(1)\).

  16. Ex-post forecasts use actual values of the variables used in the forecasting equation to generate the forecasts whereas the ex-ante forecasts use forecasted values. The ex-ante forecasts give an objective statistical method (approach) to choose the best performing models, which, in turn, we use to predict the turning points.

  17. After determining the in-sample lag length for the VEC- and FAVEC-type models, we apply the trace test of cointegration to the eight employment series, and the eight employment series and the three factors for the medium and large FAVEC models. The tests suggest 5, 8, and 8 cointegrating vectors, respectively, implying 3 common trends in all the cases. Note that, at each recursion, we choose the number of cointegrating vectors for the BVEC, MBFAVEC, and BFAVEC models by using the trace test. Hence, we update the number of cointegrating relations over the ex-post out-of-sample period. Interestingly, at the end of the out-of-sample period, we find that the number of cointegrating vectors falls to 3 in the BVEC model, while the number stays at 8 and 8, respectively, for the MBFAVEC and BFAVEC models. Note that the results for the MBFAVEC and BFAVEC are consistent with theory, since the number of factors (3) equals the number of common trends (= number of variables in the (M)BFAVEC less the number of cointegrating vectors). These results are available upon request from the authors.

  18. We also estimated in prior versions of this paper AR, VEC, FAAR, FAVAR, and FAVEC models. These models exhibited much worse performance than those reported it the text. Results are available from the authors.

  19. Note that if \(A_{t+n} \) denotes the actual value of a specific variable in period \(t + n\) and \(_t F_{t+n} \) equals the forecast made in period \(t\) for \(t + n\), the RMSE statistic equals the following: \(\sqrt{\left[ {{{\sum \nolimits _{t=1}^{N}} {\left( {{ }_tF_{t+n} -A_{t+n} } \right) ^{2}} }/N} \right] }\) where \(N\) equals the number of forecasts.

  20. As a robustness check, we estimate the BFAAR, BFAVAR, and LBVAR models using the first-differenced employment series (which, in our case, amounts to forecasting growth rates of employment, since the employment series are in logarithms). We then recover the (log-)level forecasts of the data using the actual observation of the period before the starting point of the recursive out-of-sample forecast period. We observe that the forecast performance of the BFAAR model (for the log-level of employment) improves in seven out of the eight cases (the manufacturing forecasts worsen). For the BFAVAR model, forecast performances improve for construction; trade, transportation, and utilities; and professional and business services. For the LBVAR model, the improvements only occur for professional and business services; and leisure and hospitality. As with the Bayesian models for forecasting the levels of employment, we forecast the first-differences (growth rates) of employment with the tightness of the prior based on an in-sample fit of 50 %. Importantly, however, our general conclusions do not change. In other words, the improved performances of these models do not make them the preferred models for cases where they were non-optimal. One exception does occur. To wit, now the BFAAR model does the best at the margin for construction employment instead of the LBVAR model. This result highlights the importance of modeling the long-run relationships over and above differencing the data (which is also done for the BFAVECM before recovering the log-level forecasts) to provide more robust results in the presence of structural breaks (Carriero et al. 2011). The details of these results are available upon request from the authors.

  21. In addition to the ex-ante out-of-sample forecasting exercise over 2009:4 to 2010:3, we also analyze the in-sample (1972:1–1989:12) and ex-post out-of-sample (1990:1–2009:3) forecasts obtained from the best models for each of the eight employment series. The differences between the actual data and the predicted data for the in-sample are virtually inseparable, while the ex-post out-of-sample forecasts from the best models tend to predict the turning points quite well. We suppress these results to save space, but are available upon request from the authors.

References

  • Bai J (2004) Estimating cross-section common stochastic trends in nonstationary panel data. J Econ 122:137–183

    Article  MATH  Google Scholar 

  • Bai J, Ng S (2002) Determining the number of factors in approximate factor models. Econometrica 70:191–221

    Article  MATH  MathSciNet  Google Scholar 

  • Bai J, Ng S (2004) A PANIC attack on unit roots and cointegration. Econometrica 72:1127–1177

    Article  MATH  MathSciNet  Google Scholar 

  • Bańbura M, Giannone D, Reichlin L (2010) Large Bayesian vector auto regressions. J Appl Econ 25:71–92

    Article  Google Scholar 

  • Banerjee A, Marcellino MG (2009) Factor-augmented error correction models. In: Castle JL, Shephard N (eds) The methodology and practice of econometrics: a festschrift for David Hendry. Oxford University Press, Oxford, pp 227–254

    Chapter  Google Scholar 

  • Banerjee A, Marcellino MG, Maston I (2010) Forecasting with factor-augmented error correction models. CEPR Discussion Paper No. DP7677

  • Bernanke BS, Boivin J, Eliazs P (2005) Measuring the effects of monetary policy: a factor-augmented vector autoregressive (FAVAR) approach. Q J Econ 120:387–422

    Google Scholar 

  • Boivin J, Ng S (2005) Understanding and comparing factor based forecasts. Int J Cent Banking 1:117–152

    Google Scholar 

  • Carriero A, Kapetanios G, Marcellino M (2009) Forecasting exchange rates with a large Bayesian VAR. Int J Forecast 25:400–417

    Article  Google Scholar 

  • Carriero A, Kapetanios G, Marcellino M (2011) Forecasting large datasets with Bayesian reduced rank multivariate models. J Appl Econ 26:735–761

    Article  MathSciNet  Google Scholar 

  • Christoffel K, Coenen G, Warne A (2010) Forecasting with DSGE models. European Central Bank, Working Paper No 1185

  • Das S, Gupta R, Kabundi A (2009) Could we have forecasted the recent downturn in the South African housing market? J Hous Econ 18:325–335

    Article  Google Scholar 

  • Doan TA, Litterman RB, Sims CA (1984) Forecasting and conditional projections using realistic prior distributions. Econ Rev 3:1–100

    Article  MATH  Google Scholar 

  • Dua P, Ray SC (1995) A BVAR model for the connecticut economy. J Forecast 14:167–180

    Article  Google Scholar 

  • Enders W (2004) Applied econometric time series, 2nd edn. Wiley, New York

    Google Scholar 

  • Forni M, Hallin M, Lippi M, Reichlin L (2005) The generalized dynamic factor model, one sided estimation and forecasting. J Am Stat Assoc 100:830–840

    Article  MATH  MathSciNet  Google Scholar 

  • Giacomini R, White H (2006) Tests of conditional predictive ability. Econometrica 74:1545–1578

    Article  MATH  MathSciNet  Google Scholar 

  • Glennon D, Lane J, Johnson S (1987) Regional econometric models that reflect labor market relations. Int J Forecast 3:299–312

    Article  Google Scholar 

  • Gupta R, Kabundi A, Miller SM (2011) Forecasting the US real house price index: structural and non-structural models with and without fundamentals. Econ Model 26:2013–2021

    Article  Google Scholar 

  • Gupta R, Miller SM (2012a) “Ripple effects” and forecasting home prices in Los Angeles, Las Vegas, and Phoenix. Ann Reg Sci 48:763–782

    Article  Google Scholar 

  • Gupta R, Miller SM (2012b) The time-series properties on housing prices: a case study of the Southern California market. J Real Estate Financ Econ 44:339–361

    Article  Google Scholar 

  • Johansen S (1991) Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models. Econometrica 59:1551–1580

    Article  MATH  MathSciNet  Google Scholar 

  • Lane T (1966) The urban base multiplier: an evaluation of the state of the art. Land Econ 42:339–347

    Article  Google Scholar 

  • LeSage JP (1990) A comparison of the forecasting ability of ECM and VAR models. Rev Econ Stat 72:664–671

    Article  Google Scholar 

  • LeSage JP (1999) Applied econometrics using MATLAB. http://www.spatial-econometrics.com

  • Litterman RB (1981) A Bayesian procedure for forecasting with vector autoregressions. Working Paper, Federal Reserve Bank of Minneapolis

  • Litterman RB (1986) Forecasting with Bayesian vector autoregressions: five years of experience. J Bus Econ Stat 4:25–38

    Google Scholar 

  • Rapach DE, Strauss JK (2005) Forecasting employment growth in Missouri with many potentially relevant predictors: an analysis of forecast combining methods. Federal Reserve Bank of Saint Louis. Reg Econ Dev 1:97–112

    Google Scholar 

  • Rapach DE, Strauss JK (2008) Forecasting US employment growth using forecast combining methods. J Forecast 27:75–93

    Article  MathSciNet  Google Scholar 

  • Rapach DE, Strauss JK (2010) Bagging or combining (or both)? An analysis based on forecasting US employment growth. Econ Rev 29:511–533

    Article  MathSciNet  Google Scholar 

  • Rapach DE, Strauss JK (2012) Forecasting US state-level employment growth: an amalgamation approach. Int J Forecast 28:315–327

    Article  Google Scholar 

  • Sims CA (1980) Macroeconomics and reality. Econometrica 48:1–48

    Article  Google Scholar 

  • Sims CA, Stock JH, Watson MW (1990) Inference in linear time series models with some unit roots. Econometrica 58:113–144

    Article  MATH  MathSciNet  Google Scholar 

  • Spencer DE (1993) Developing a Bayesian vector autoregression model. Int J Forecast 9:407–421

    Article  Google Scholar 

  • Stevens BH, Moore CL (1980) A critical review of the literature on shift-share as a forecasting technique. J Reg Sci 20:419–437

    Article  Google Scholar 

  • Stock JH, Watson MW (1999) Forecasting inflation. J Monet Econ 44:293–335

    Article  Google Scholar 

  • Stock JH, Watson MW (2002a) Forecasting using principal components from a large number of predictors. J Am Stat Assoc 97:147–162

    Article  MathSciNet  Google Scholar 

  • Stock JH, Watson MW (2002b) Macroeconomics forecasting using diffusion indexes. J Bus Econ Stat 20:147–162

    Article  MathSciNet  Google Scholar 

  • Stock JH, Watson MW (2003) Forecasting output and inflation: the role of asset prices. J Econ Lit 41:788–829

    Article  Google Scholar 

  • Stock JH, Watson MW (2004) Combination forecasts of output growth in a seven-country data set. J Forecast 23:405–430

    Article  Google Scholar 

  • Stock JH, Watson MW (2005) Implications of dynamic factor models for VAR analysis. NBER Working Paper No. 11467

  • Taylor CA (1982) Econometric modeling of urban and other substate areas: an analysis of alternative methodologies. Reg Sci Urban Econ 12:425–448

    Article  Google Scholar 

  • Theil H (1971) Principles of econometrics. Wiley, New York

    MATH  Google Scholar 

  • Todd RM (1984) Improving economic forecasting with Bayesian vector autoregression. Quarterly Review, Federal Reserve Bank of Minneapolis, pp 18–29

  • Williamson RB (1975) Predictive power of the export base theory. Growth Change 6:3–10

    Google Scholar 

Download references

Acknowledgments

We thank two anonymous referees for many helpful comments. Any remaining errors, however, are solely ours.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rangan Gupta.

Appendix

Appendix

See Table 10.

Table 10 Variables

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gupta, R., Kabundi, A., Miller, S.M. et al. Using large data sets to forecast sectoral employment. Stat Methods Appl 23, 229–264 (2014). https://doi.org/10.1007/s10260-013-0243-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10260-013-0243-6

Keywords

JEL Classification

Navigation