Forecasting using a large number of predictors: Is Bayesian shrinkage a valid alternative to principal components?
Introduction
Many problems in economics require the exploitation of large panels of time series. Recent literature has shown the “value” of large information for signal extraction and forecasting, and new methods have been proposed to handle the large-dimensionality problem (Forni et al., 2005, Giannone et al., 2004, Stock and Watson, 2002a, Stock and Watson, 2002b).
A related literature has explored the performance of Bayesian model averaging for forecasting (Koop and Potter, 2003, Stock and Watson, 2006, Stock and Watson, 2005a, Wright, 2003) but, surprisingly, few papers explore the performance of Bayesian regression in forecasting with high-dimensional data. Exceptions are Stock and Watson (2005a), who consider normal Bayes estimators for orthonormal regressors, and Giacomini and White (2006), who provide an empirical example in which a Bayesian regression with a large number of predictors is compared with principal component regression (PCR).
Bayesian methods are part of the traditional econometrician toolbox and offer a natural solution to overcoming the curse of dimensionality problem by shrinking the parameters via the imposition of priors. In particular, the Bayesian VAR has been advocated as a device for forecasting macroeconomic data (Doan et al., 1984, Litterman, 1986). It is then surprising that, in most applications, these methods have been applied to relatively small systems and that their empirical and theoretical properties for large panels have not been given more attention in the literature.
This paper is a first step towards filling this gap. We analyze Bayesian regression methods under Gaussian and double-exponential priors and study their forecasting performance on the standard “large” macroeconomic dataset that has been used to establish properties of principal-component-based forecasts (Stock and Watson, 2002a, Stock and Watson, 2002b). Moreover, we analyze the asymptotic properties of Gaussian Bayesian regression for , the size of the cross-section, and , the sample size, going to infinity. The aim is to establish a connection between Bayesian regression and the classical literature on forecasting with large panels based on principal components.
Our two choices for the prior correspond to two interesting cases: variable aggregation and variable selection. Under a Gaussian prior, the posterior mode solution is such that all variables in the panel are given non-zero coefficients. Regressors, as in PCR, are linear combinations of all variables in the panel, but while the Gaussian prior gives decreasing weight to the ordered eigenvalues of the covariance matrix of the data, principal components impose unit weight to the dominant ones and zero to the others. The double-exponential prior, on the other hand, favors sparse models since it puts more mass near zero and in the tails, which induces a tendency of the coefficients maximizing the posterior density to be either large or zero. As a result, it favors the recovery of few large coefficients instead of many small ones and truly zero rather than small values. This case is interesting because it results in variable selection rather than in variable aggregation and, in principle, this should give results that are more interpretable from the economic point of view.
Under a Gaussian prior, it is easy to compute the maximizer of the posterior density. Under such a prior with independent and identically distributed (i.i.d.) regression coefficients, the solution amounts to solving a penalized least-squares problem with a penalty proportional to the sum of the squares of the coefficients, i.e. to a so-called Ridge regression problem. Under a double-exponential prior, however, there is no analytical form for the maximizer of the posterior density, but we can exploit the fact that, under such a prior with i.i.d. coefficients, the solution amounts to a Lasso regression problem, i.e. to penalized least-squares with a penalty proportional to the sum of the absolute values of the coefficients. Several algorithms have been proposed for Lasso regression. In our empirical study, we have used two algorithms, recently proposed, which work without limitations of dimensionality: LARS (Least Angle Regression) developed by Efron et al. (2004) and the Iterative Landweber scheme with soft-thresholding at each iteration developed by De Mol and Defrise (2002) and Daubechies et al. (2004).
An interesting feature of Lasso regression is that it combines variable selection and parameter estimation. The estimator depends in a nonlinear way on the variable to be predicted and this may have advantages in some empirical situations. The availability of the algorithms mentioned above, which are computationally feasible, makes the double-exponential prior an attractive alternative to other priors used for variable selection and requiring computationally demanding algorithms, such as the one proposed by Fernandez et al. (2001) in the context of Bayesian Model Averaging and applied by Stock and Watson (2005a) to macroeconomic forecasting with large cross-sections.
Although Gaussian and double-exponential Bayesian regressions rely on different estimation strategies, an out-of-sample evaluation based on the Stock and Watson dataset, shows that, for a given range of the prior choice, the two methods produce forecasts which are highly correlated and are characterized by similar mean-square errors. Moreover, these forecasts are highly correlated with those produced by principal components, also with similar mean-square errors: they do well when PCR does well. Hence, although the Lasso prior leads to the selection of few variables, the forecasts obtained from these informative targeted predictors do not outperform PCR based on few principal components.
In order to understand these results, we study the asymptotic properties of the forecast based on Bayesian regression as the cross-section and the sample size become large. This double-asymptotic analysis has been applied by recent literature to the case of PCR (Bai, 2003, Bai and Ng, 2002, Forni et al., forthcoming, Forni et al., 2004, Stock and Watson, 2002a, Stock and Watson, 2002b) but never to Bayesian regression. This analysis is however important to understanding the performance of this method for large panels and also as a guide to setting shrinkage parameters as the dimension of the panel changes. Here we will limit the analysis to the Bayesian regression based on a Gaussian prior and show that, under very general conditions, consistency is achieved provided that the degree of shrinkage increases with the cross-sectional dimension. The conditions under which we show consistency require that most of the regressors are informative about the future of the variable to forecast. This condition is satisfied in the particular case in which the data follow an approximate factor structure, the case for which the literature has shown consistency for PCR. The approximate factor structure imposes a high degree of collinearity in the data that persists as we add series to the panel. Intuitively, under those assumptions, if the prior is chosen appropriately in relation with , Bayesian regression under normality will give larger weight to the principal components associated with the dominant eigenvalues and therefore will produce results which are similar to PCR.
Our empirical work shows, moreover, that Lasso forecasts, although based on regression on few variables are as accurate and as highly correlated with PCR forecasts as are those obtained under normality. This result may seem puzzling, but it can be explained by the fact that our panel is highly collinear. Under collinearity, few variables, if selected appropriately, should capture the essence of covariation. In this case, we expect them to be strongly correlated with principal components and, as the latter, to span the space of the pervasive common factors. Under collinearity, however, we expect the selection not to be stable and to be very sensitive to minor perturbation of the data. In this sense, we do not expect variable selection to provide results which lead to clearer economic interpretation than principal components or Ridge regression.
The paper is organized as follows. Section 2 introduces the problem of forecasting using large cross sections. The Section 3 reports the results of the out-of-sample exercise for the three methods considered: principal components, Bayesian regression with normal and with double-exponential priors. The Section 4 reports asymptotic results for the Gaussian prior case. The Section 5 concludes and outlines problems for future research.
Section snippets
Three solutions to the “curse of dimensionality” problem
Consider the vector of covariance-stationary processes . We will assume that they all have mean zero and unitary variance.
We are interested in forecasting linear transformations of some elements of using all the variables as predictors. Precisely, we are interested in estimating the linear projection where is a potentially large information set at time and is a filtered version of , for a specific
Empirics
The dataset employed for the out-of-sample forecasting analysis is the same as the one used in Stock and Watson (2005b). The panel includes real variables (sectoral industrial production, employment and hours worked), nominal variables (consumer and producer price indices, wages, money aggregates), asset prices (stock prices and exchange rates), the yield curve and surveys, for a total of variables.9
Theory
We have seen that Bayesian regression and PCR are methods that help us solve the curse of dimensionality problem which typically arises when trying to extract relevant information from a large number of predictors.
For PCR, the literature has analyzed the asymptotic properties for the size of the cross-section and the sample size going to infinity under assumptions that essentially impose that, as we increase the number of time series, the sources of common dynamics remain limited (Bai, 2003
Conclusions and open questions
This paper has analyzed the properties of Bayesian shrinkage in large panels of time series and compared them to PCR.
We have considered the Gaussian and the double-exponential prior and showed that they offer a valid alternative to principal components. For the macroeconomic panel considered, the forecast they provide is very correlated to that of PCR and implies similar mean-square forecast errors.
This exercise should be understood as rather stylized. For the Bayesian case there is room for
Acknowledgments
The paper has been prepared for the conference to honor the 25th anniversary of Beveridge and Nelson’s JME paper, in Atlanta March 31st-April 1st, 2006. We would like to thank an anonymous referee, Marta Banbura, Michel Defrise, James Hamilton, James Nason, Christian Schumacher, Farshid Vahid, Peter Vlaar, Mark Watson for useful comments, and also seminar participants at the Atlanta Federal Reserve, the 8th Bundesbank spring conference, the 5th IAP workshop, Louvain-la-Neuve, the conference on
References (33)
- et al.
Forecasting economic time series using targeted predictors
Journal of Econometrics
(2008) - et al.
Benchmark priors for Bayesian model averaging
Journal of Econometrics
(2001) - et al.
The generalized dynamic factor model consistency and rates
Journal of Econometrics
(2004) - et al.
Nowcasting: The real-time informational content of macroeconomic data
Journal of Monetary Economics
(2008) - et al.
Inferential theory for factor models of large dimensions
Econometrica
(2003)- et al.
Determining the number of factors in approximate factor models
Econometrica
(2002) - et al.
Confidence intervals for diffusion index forecasts and inference for factor-augmented regressions
Econometrica
(2006) - Banbura, M., Giannone, D., Reichlin, L., 2007. Bayesian VARs with Large Panels, Discussion Paper 6326, Center for...
- et al.
Arbitrage, factor structure and mean-variance analysis in large asset markets
Econometrica
(1983)
Atomic decomposition by basis pursuit
SIAM Review
An iterative thresholding algorithm for linear inverse problems with a sparsity constraint
Communications on Pure and Applied Mathematics
A note on wavelet-based inversion methods
Forecasting and conditional projection using realistic prior distributions
Econometric Reviews
Cited by (361)
Multi-scale impacts of oil shocks on travel and leisure stocks: A MODWT-Bayesian TVP model with shrinkage approach
2024, Technological Forecasting and Social ChangeThe Asymptotic Equivalence of Ridge and Principal Component Regression with Many Predictors
2024, Econometrics and StatisticsRisks and risk premia in the US Treasury market
2024, Journal of Economic Dynamics and ControlApproximate factor models with weaker loadings
2023, Journal of EconometricsLasso inference for high-dimensional time series
2023, Journal of EconometricsStructural inference in sparse high-dimensional vector autoregressions
2023, Journal of Econometrics