Abstract
We use several models using classical and Bayesian methods to forecast employment for eight sectors of the US economy. In addition to using standard vector-autoregressive and Bayesian vector autoregressive models, we also augment these models to include the information content of 143 additional monthly series in some models. Several approaches exist for incorporating information from a large number of series. We consider two multivariate approaches—extracting common factors (principal components) and Bayesian shrinkage. After extracting the common factors, we use Bayesian factor-augmented vector autoregressive and vector error-correction models, as well as Bayesian shrinkage in a large-scale Bayesian vector autoregressive models. For an in-sample period of January 1972 to December 1989 and an out-of-sample period of January 1990 to March 2010, we compare the forecast performance of the alternative models. More specifically, we perform ex-post and ex-ante out-of-sample forecasts from January 1990 through March 2009 and from April 2009 through March 2010, respectively. We find that factor augmented models, especially error-correction versions, generally prove the best in out-of-sample forecast performance, implying that in addition to macroeconomic variables, incorporating long-run relationships along with short-run dynamics play an important role in forecasting employment. Forecast combination models, however, based on the simple average forecasts of the various models used, outperform the best performing individual models for six of the eight sectoral employment series.
Similar content being viewed by others
Notes
One referee suggests a third approach to select variables—a statistical procedure using general-to-specific modeling. That is, one includes variables and their lags generally based on their in-sample significance or out-of-sample performance. Since we use as many as 143 predictors, this approach is difficult to implement in practice.
See Sect. 5.2 for further details.
That is, \(A(L)=A_1 L+A_2 L^{2}+\cdots +A_p L^{p}\);
See LeSage (1999) and references cited therein for further details regarding the non-stationary of most macroeconomic time series.
See Johansen (1991) for further technical details.
For an illustration, see Dua and Ray (1995). We use \(k_{ij} = 0.5\).
In addition to using a Fit of 0.50, we also experiment with a Fit as the average relative MSE from an OLS-estimated VAR containing the eight sectoral employment variables, i.e., \(Fit=\frac{1}{8}{\sum \nolimits _{i=1}^8} {\frac{MSE_i^{\infty }}{MSE_{i}^{0}}}\), as well as a Fit value of 0.25. In both cases, the forecasting performances of the alternative Bayesian models deteriorate. These results are available upon request from the authors.
See these papers for more details on the model and the estimation.
When we extract the common factors (principal components) for the MBFAVAR and BFAVAR models, we transform all variables to induce stationarity. Now, we transform all variables to induce non-stationarity. That is, for stationary variables, we accumulate to make them I(1). We also extract three common factors from the non-stationary variables, excluding the stationary variables. The findings prove similar to the three factors extracted when we accumulate the I(0) variables to make them I(1).
Banerjee et al. (2010) note that to extract common factors \(F_{t}\), one must standardized variables \(Y_{t}\) (mean zero, variance one).
Note that as \(n\rightarrow \infty \), and the number of factors \(r\) remains fixed, the number of cointegrating relations \(n-r\rightarrow \infty .\)
Ex-post forecasts use actual values of the variables used in the forecasting equation to generate the forecasts whereas the ex-ante forecasts use forecasted values. The ex-ante forecasts give an objective statistical method (approach) to choose the best performing models, which, in turn, we use to predict the turning points.
After determining the in-sample lag length for the VEC- and FAVEC-type models, we apply the trace test of cointegration to the eight employment series, and the eight employment series and the three factors for the medium and large FAVEC models. The tests suggest 5, 8, and 8 cointegrating vectors, respectively, implying 3 common trends in all the cases. Note that, at each recursion, we choose the number of cointegrating vectors for the BVEC, MBFAVEC, and BFAVEC models by using the trace test. Hence, we update the number of cointegrating relations over the ex-post out-of-sample period. Interestingly, at the end of the out-of-sample period, we find that the number of cointegrating vectors falls to 3 in the BVEC model, while the number stays at 8 and 8, respectively, for the MBFAVEC and BFAVEC models. Note that the results for the MBFAVEC and BFAVEC are consistent with theory, since the number of factors (3) equals the number of common trends (= number of variables in the (M)BFAVEC less the number of cointegrating vectors). These results are available upon request from the authors.
We also estimated in prior versions of this paper AR, VEC, FAAR, FAVAR, and FAVEC models. These models exhibited much worse performance than those reported it the text. Results are available from the authors.
Note that if \(A_{t+n} \) denotes the actual value of a specific variable in period \(t + n\) and \(_t F_{t+n} \) equals the forecast made in period \(t\) for \(t + n\), the RMSE statistic equals the following: \(\sqrt{\left[ {{{\sum \nolimits _{t=1}^{N}} {\left( {{ }_tF_{t+n} -A_{t+n} } \right) ^{2}} }/N} \right] }\) where \(N\) equals the number of forecasts.
As a robustness check, we estimate the BFAAR, BFAVAR, and LBVAR models using the first-differenced employment series (which, in our case, amounts to forecasting growth rates of employment, since the employment series are in logarithms). We then recover the (log-)level forecasts of the data using the actual observation of the period before the starting point of the recursive out-of-sample forecast period. We observe that the forecast performance of the BFAAR model (for the log-level of employment) improves in seven out of the eight cases (the manufacturing forecasts worsen). For the BFAVAR model, forecast performances improve for construction; trade, transportation, and utilities; and professional and business services. For the LBVAR model, the improvements only occur for professional and business services; and leisure and hospitality. As with the Bayesian models for forecasting the levels of employment, we forecast the first-differences (growth rates) of employment with the tightness of the prior based on an in-sample fit of 50 %. Importantly, however, our general conclusions do not change. In other words, the improved performances of these models do not make them the preferred models for cases where they were non-optimal. One exception does occur. To wit, now the BFAAR model does the best at the margin for construction employment instead of the LBVAR model. This result highlights the importance of modeling the long-run relationships over and above differencing the data (which is also done for the BFAVECM before recovering the log-level forecasts) to provide more robust results in the presence of structural breaks (Carriero et al. 2011). The details of these results are available upon request from the authors.
In addition to the ex-ante out-of-sample forecasting exercise over 2009:4 to 2010:3, we also analyze the in-sample (1972:1–1989:12) and ex-post out-of-sample (1990:1–2009:3) forecasts obtained from the best models for each of the eight employment series. The differences between the actual data and the predicted data for the in-sample are virtually inseparable, while the ex-post out-of-sample forecasts from the best models tend to predict the turning points quite well. We suppress these results to save space, but are available upon request from the authors.
References
Bai J (2004) Estimating cross-section common stochastic trends in nonstationary panel data. J Econ 122:137–183
Bai J, Ng S (2002) Determining the number of factors in approximate factor models. Econometrica 70:191–221
Bai J, Ng S (2004) A PANIC attack on unit roots and cointegration. Econometrica 72:1127–1177
Bańbura M, Giannone D, Reichlin L (2010) Large Bayesian vector auto regressions. J Appl Econ 25:71–92
Banerjee A, Marcellino MG (2009) Factor-augmented error correction models. In: Castle JL, Shephard N (eds) The methodology and practice of econometrics: a festschrift for David Hendry. Oxford University Press, Oxford, pp 227–254
Banerjee A, Marcellino MG, Maston I (2010) Forecasting with factor-augmented error correction models. CEPR Discussion Paper No. DP7677
Bernanke BS, Boivin J, Eliazs P (2005) Measuring the effects of monetary policy: a factor-augmented vector autoregressive (FAVAR) approach. Q J Econ 120:387–422
Boivin J, Ng S (2005) Understanding and comparing factor based forecasts. Int J Cent Banking 1:117–152
Carriero A, Kapetanios G, Marcellino M (2009) Forecasting exchange rates with a large Bayesian VAR. Int J Forecast 25:400–417
Carriero A, Kapetanios G, Marcellino M (2011) Forecasting large datasets with Bayesian reduced rank multivariate models. J Appl Econ 26:735–761
Christoffel K, Coenen G, Warne A (2010) Forecasting with DSGE models. European Central Bank, Working Paper No 1185
Das S, Gupta R, Kabundi A (2009) Could we have forecasted the recent downturn in the South African housing market? J Hous Econ 18:325–335
Doan TA, Litterman RB, Sims CA (1984) Forecasting and conditional projections using realistic prior distributions. Econ Rev 3:1–100
Dua P, Ray SC (1995) A BVAR model for the connecticut economy. J Forecast 14:167–180
Enders W (2004) Applied econometric time series, 2nd edn. Wiley, New York
Forni M, Hallin M, Lippi M, Reichlin L (2005) The generalized dynamic factor model, one sided estimation and forecasting. J Am Stat Assoc 100:830–840
Giacomini R, White H (2006) Tests of conditional predictive ability. Econometrica 74:1545–1578
Glennon D, Lane J, Johnson S (1987) Regional econometric models that reflect labor market relations. Int J Forecast 3:299–312
Gupta R, Kabundi A, Miller SM (2011) Forecasting the US real house price index: structural and non-structural models with and without fundamentals. Econ Model 26:2013–2021
Gupta R, Miller SM (2012a) “Ripple effects” and forecasting home prices in Los Angeles, Las Vegas, and Phoenix. Ann Reg Sci 48:763–782
Gupta R, Miller SM (2012b) The time-series properties on housing prices: a case study of the Southern California market. J Real Estate Financ Econ 44:339–361
Johansen S (1991) Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models. Econometrica 59:1551–1580
Lane T (1966) The urban base multiplier: an evaluation of the state of the art. Land Econ 42:339–347
LeSage JP (1990) A comparison of the forecasting ability of ECM and VAR models. Rev Econ Stat 72:664–671
LeSage JP (1999) Applied econometrics using MATLAB. http://www.spatial-econometrics.com
Litterman RB (1981) A Bayesian procedure for forecasting with vector autoregressions. Working Paper, Federal Reserve Bank of Minneapolis
Litterman RB (1986) Forecasting with Bayesian vector autoregressions: five years of experience. J Bus Econ Stat 4:25–38
Rapach DE, Strauss JK (2005) Forecasting employment growth in Missouri with many potentially relevant predictors: an analysis of forecast combining methods. Federal Reserve Bank of Saint Louis. Reg Econ Dev 1:97–112
Rapach DE, Strauss JK (2008) Forecasting US employment growth using forecast combining methods. J Forecast 27:75–93
Rapach DE, Strauss JK (2010) Bagging or combining (or both)? An analysis based on forecasting US employment growth. Econ Rev 29:511–533
Rapach DE, Strauss JK (2012) Forecasting US state-level employment growth: an amalgamation approach. Int J Forecast 28:315–327
Sims CA (1980) Macroeconomics and reality. Econometrica 48:1–48
Sims CA, Stock JH, Watson MW (1990) Inference in linear time series models with some unit roots. Econometrica 58:113–144
Spencer DE (1993) Developing a Bayesian vector autoregression model. Int J Forecast 9:407–421
Stevens BH, Moore CL (1980) A critical review of the literature on shift-share as a forecasting technique. J Reg Sci 20:419–437
Stock JH, Watson MW (1999) Forecasting inflation. J Monet Econ 44:293–335
Stock JH, Watson MW (2002a) Forecasting using principal components from a large number of predictors. J Am Stat Assoc 97:147–162
Stock JH, Watson MW (2002b) Macroeconomics forecasting using diffusion indexes. J Bus Econ Stat 20:147–162
Stock JH, Watson MW (2003) Forecasting output and inflation: the role of asset prices. J Econ Lit 41:788–829
Stock JH, Watson MW (2004) Combination forecasts of output growth in a seven-country data set. J Forecast 23:405–430
Stock JH, Watson MW (2005) Implications of dynamic factor models for VAR analysis. NBER Working Paper No. 11467
Taylor CA (1982) Econometric modeling of urban and other substate areas: an analysis of alternative methodologies. Reg Sci Urban Econ 12:425–448
Theil H (1971) Principles of econometrics. Wiley, New York
Todd RM (1984) Improving economic forecasting with Bayesian vector autoregression. Quarterly Review, Federal Reserve Bank of Minneapolis, pp 18–29
Williamson RB (1975) Predictive power of the export base theory. Growth Change 6:3–10
Acknowledgments
We thank two anonymous referees for many helpful comments. Any remaining errors, however, are solely ours.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
See Table 10.
Rights and permissions
About this article
Cite this article
Gupta, R., Kabundi, A., Miller, S.M. et al. Using large data sets to forecast sectoral employment. Stat Methods Appl 23, 229–264 (2014). https://doi.org/10.1007/s10260-013-0243-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10260-013-0243-6