Forecasting using a large number of predictors: Is Bayesian shrinkage a valid alternative to principal components?

doi:10.1016/j.jeconom.2008.08.011

Journal of Econometrics

Volume 146, Issue 2, October 2008, Pages 318-328

https://doi.org/10.1016/j.jeconom.2008.08.011 Get rights and content

Abstract

This paper considers Bayesian regression with normal and double-exponential priors as forecasting methods based on large panels of time series. We show that, empirically, these forecasts are highly correlated with principal component forecasts and that they perform equally well for a wide range of prior choices. Moreover, we study conditions for consistency of the forecast based on Bayesian regression as the cross-section and the sample size become large. This analysis serves as a guide to establish a criterion for setting the amount of shrinkage in a large cross-section.

Introduction

Many problems in economics require the exploitation of large panels of time series. Recent literature has shown the “value” of large information for signal extraction and forecasting, and new methods have been proposed to handle the large-dimensionality problem (Forni et al., 2005, Giannone et al., 2004, Stock and Watson, 2002a, Stock and Watson, 2002b).

A related literature has explored the performance of Bayesian model averaging for forecasting (Koop and Potter, 2003, Stock and Watson, 2006, Stock and Watson, 2005a, Wright, 2003) but, surprisingly, few papers explore the performance of Bayesian regression in forecasting with high-dimensional data. Exceptions are Stock and Watson (2005a), who consider normal Bayes estimators for orthonormal regressors, and Giacomini and White (2006), who provide an empirical example in which a Bayesian regression with a large number of predictors is compared with principal component regression (PCR).

Bayesian methods are part of the traditional econometrician toolbox and offer a natural solution to overcoming the curse of dimensionality problem by shrinking the parameters via the imposition of priors. In particular, the Bayesian VAR has been advocated as a device for forecasting macroeconomic data (Doan et al., 1984, Litterman, 1986). It is then surprising that, in most applications, these methods have been applied to relatively small systems and that their empirical and theoretical properties for large panels have not been given more attention in the literature.

This paper is a first step towards filling this gap. We analyze Bayesian regression methods under Gaussian and double-exponential priors and study their forecasting performance on the standard “large” macroeconomic dataset that has been used to establish properties of principal-component-based forecasts (Stock and Watson, 2002a, Stock and Watson, 2002b). Moreover, we analyze the asymptotic properties of Gaussian Bayesian regression for $n$ , the size of the cross-section, and $T$ , the sample size, going to infinity. The aim is to establish a connection between Bayesian regression and the classical literature on forecasting with large panels based on principal components.

Our two choices for the prior correspond to two interesting cases: variable aggregation and variable selection. Under a Gaussian prior, the posterior mode solution is such that all variables in the panel are given non-zero coefficients. Regressors, as in PCR, are linear combinations of all variables in the panel, but while the Gaussian prior gives decreasing weight to the ordered eigenvalues of the covariance matrix of the data, principal components impose unit weight to the dominant ones and zero to the others. The double-exponential prior, on the other hand, favors sparse models since it puts more mass near zero and in the tails, which induces a tendency of the coefficients maximizing the posterior density to be either large or zero. As a result, it favors the recovery of few large coefficients instead of many small ones and truly zero rather than small values. This case is interesting because it results in variable selection rather than in variable aggregation and, in principle, this should give results that are more interpretable from the economic point of view.

Under a Gaussian prior, it is easy to compute the maximizer of the posterior density. Under such a prior with independent and identically distributed (i.i.d.) regression coefficients, the solution amounts to solving a penalized least-squares problem with a penalty proportional to the sum of the squares of the coefficients, i.e. to a so-called Ridge regression problem. Under a double-exponential prior, however, there is no analytical form for the maximizer of the posterior density, but we can exploit the fact that, under such a prior with i.i.d. coefficients, the solution amounts to a Lasso regression problem, i.e. to penalized least-squares with a penalty proportional to the sum of the absolute values of the coefficients. Several algorithms have been proposed for Lasso regression. In our empirical study, we have used two algorithms, recently proposed, which work without limitations of dimensionality: LARS (Least Angle Regression) developed by Efron et al. (2004) and the Iterative Landweber scheme with soft-thresholding at each iteration developed by De Mol and Defrise (2002) and Daubechies et al. (2004).

An interesting feature of Lasso regression is that it combines variable selection and parameter estimation. The estimator depends in a nonlinear way on the variable to be predicted and this may have advantages in some empirical situations. The availability of the algorithms mentioned above, which are computationally feasible, makes the double-exponential prior an attractive alternative to other priors used for variable selection and requiring computationally demanding algorithms, such as the one proposed by Fernandez et al. (2001) in the context of Bayesian Model Averaging and applied by Stock and Watson (2005a) to macroeconomic forecasting with large cross-sections.

Although Gaussian and double-exponential Bayesian regressions rely on different estimation strategies, an out-of-sample evaluation based on the Stock and Watson dataset, shows that, for a given range of the prior choice, the two methods produce forecasts which are highly correlated and are characterized by similar mean-square errors. Moreover, these forecasts are highly correlated with those produced by principal components, also with similar mean-square errors: they do well when PCR does well. Hence, although the Lasso prior leads to the selection of few variables, the forecasts obtained from these informative targeted predictors do not outperform PCR based on few principal components.

In order to understand these results, we study the asymptotic properties of the forecast based on Bayesian regression as the cross-section and the sample size become large. This double-asymptotic analysis has been applied by recent literature to the case of PCR (Bai, 2003, Bai and Ng, 2002, Forni et al., forthcoming, Forni et al., 2004, Stock and Watson, 2002a, Stock and Watson, 2002b) but never to Bayesian regression. This analysis is however important to understanding the performance of this method for large panels and also as a guide to setting shrinkage parameters as the dimension of the panel changes. Here we will limit the analysis to the Bayesian regression based on a Gaussian prior and show that, under very general conditions, consistency is achieved provided that the degree of shrinkage increases with the cross-sectional dimension. The conditions under which we show consistency require that most of the regressors are informative about the future of the variable to forecast. This condition is satisfied in the particular case in which the data follow an approximate factor structure, the case for which the literature has shown consistency for PCR. The approximate factor structure imposes a high degree of collinearity in the data that persists as we add series to the panel. Intuitively, under those assumptions, if the prior is chosen appropriately in relation with $n$ , Bayesian regression under normality will give larger weight to the principal components associated with the dominant eigenvalues and therefore will produce results which are similar to PCR.

Our empirical work shows, moreover, that Lasso forecasts, although based on regression on few variables are as accurate and as highly correlated with PCR forecasts as are those obtained under normality. This result may seem puzzling, but it can be explained by the fact that our panel is highly collinear. Under collinearity, few variables, if selected appropriately, should capture the essence of covariation. In this case, we expect them to be strongly correlated with principal components and, as the latter, to span the space of the pervasive common factors. Under collinearity, however, we expect the selection not to be stable and to be very sensitive to minor perturbation of the data. In this sense, we do not expect variable selection to provide results which lead to clearer economic interpretation than principal components or Ridge regression.

The paper is organized as follows. Section 2 introduces the problem of forecasting using large cross sections. The Section 3 reports the results of the out-of-sample exercise for the three methods considered: principal components, Bayesian regression with normal and with double-exponential priors. The Section 4 reports asymptotic results for the Gaussian prior case. The Section 5 concludes and outlines problems for future research.

Section snippets

Three solutions to the “curse of dimensionality” problem

Consider the $(n \times 1)$ vector of covariance-stationary processes $Z_{t} = {(z_{1 t}, \dots, z_{n t})}^{'}$ . We will assume that they all have mean zero and unitary variance.

We are interested in forecasting linear transformations of some elements of $Z_{t}$ using all the variables as predictors. Precisely, we are interested in estimating the linear projection $y_{t + h | t} = proj {y_{t + h} | Ω_{t}}$ where $Ω_{t} = span {Z_{t - s}, s = 0, 1, 2, \dots}$ is a potentially large information set at time $t$ and $y_{t + h} = z_{i, t + h}^{h} = f_{h} (L) z_{i, t + h}$ is a filtered version of $z_{i t}$ , for a specific

Empirics

The dataset employed for the out-of-sample forecasting analysis is the same as the one used in Stock and Watson (2005b). The panel includes real variables (sectoral industrial production, employment and hours worked), nominal variables (consumer and producer price indices, wages, money aggregates), asset prices (stock prices and exchange rates), the yield curve and surveys, for a total of $n = 131$ variables.⁹

Theory

We have seen that Bayesian regression and PCR are methods that help us solve the curse of dimensionality problem which typically arises when trying to extract relevant information from a large number of predictors.

For PCR, the literature has analyzed the asymptotic properties for the size of the cross-section $n$ and the sample size $T$ going to infinity under assumptions that essentially impose that, as we increase the number of time series, the sources of common dynamics remain limited (Bai, 2003

Conclusions and open questions

This paper has analyzed the properties of Bayesian shrinkage in large panels of time series and compared them to PCR.

We have considered the Gaussian and the double-exponential prior and showed that they offer a valid alternative to principal components. For the macroeconomic panel considered, the forecast they provide is very correlated to that of PCR and implies similar mean-square forecast errors.

This exercise should be understood as rather stylized. For the Bayesian case there is room for

Acknowledgments

The paper has been prepared for the conference to honor the 25th anniversary of Beveridge and Nelson’s JME paper, in Atlanta March 31st-April 1st, 2006. We would like to thank an anonymous referee, Marta Banbura, Michel Defrise, James Hamilton, James Nason, Christian Schumacher, Farshid Vahid, Peter Vlaar, Mark Watson for useful comments, and also seminar participants at the Atlanta Federal Reserve, the 8th Bundesbank spring conference, the 5th IAP workshop, Louvain-la-Neuve, the conference on

References (33)

J. Bai et al.
Forecasting economic time series using targeted predictors
Journal of Econometrics
(2008)
C. Fernandez et al.
Benchmark priors for Bayesian model averaging
Journal of Econometrics
(2001)
M. Forni et al.
The generalized dynamic factor model consistency and rates
Journal of Econometrics
(2004)
D. Giannone et al.
Nowcasting: The real-time informational content of macroeconomic data
Journal of Monetary Economics
(2008)
J.H. Stock et al.
J. Bai
Inferential theory for factor models of large dimensions
Econometrica
(2003)
J. Bai et al.
Determining the number of factors in approximate factor models
Econometrica
(2002)
J. Bai et al.
Confidence intervals for diffusion index forecasts and inference for factor-augmented regressions
Econometrica
(2006)
Banbura, M., Giannone, D., Reichlin, L., 2007. Bayesian VARs with Large Panels, Discussion Paper 6326, Center for...
G. Chamberlain et al.
Arbitrage, factor structure and mean-variance analysis in large asset markets
Econometrica
(1983)

S.S. Chen et al.

Atomic decomposition by basis pursuit

SIAM Review

(2001)

D’Agostino, A., Giannone, D., 2007. Comparing alternative predictors based on large-panel factor models, Discussion...

D’Agostino, A., Giannone, D., Surico, P., 2006. (Un)Predictability and Macroeconomic Stability, Working Paper Series...

I. Daubechies et al.

An iterative thresholding algorithm for linear inverse problems with a sparsity constraint

Communications on Pure and Applied Mathematics

(2004)

C. De Mol et al.

A note on wavelet-based inversion methods

T. Doan et al.

Forecasting and conditional projection using realistic prior distributions

Econometric Reviews

(1984)

Cited by (361)

Multi-scale impacts of oil shocks on travel and leisure stocks: A MODWT-Bayesian TVP model with shrinkage approach
2024, Technological Forecasting and Social Change
The study applies a Maximal overlap discrete wavelet transform process and time-varying parameter model with shrinkage approaches to explore the effects of oil prices on the travel and leisure (T&L) industry in different scales and regions. Overall, our results suggest that fluctuations in oil prices generally raise the volatility of travel and leisure stocks. Effects of oil shocks are significantly time-varying in the short to medium term but relatively stable in the long term. In addition, time scale and regional heterogeneity arise in the impacts of oil shocks. The North American market is similar to the Global market in that oil shocks have a solid time-varying effect on the travel and leisure industry. The Europe and Asia-Pacific markets are affected by oil prices in the medium term but are much weaker than the North American market.
The Asymptotic Equivalence of Ridge and Principal Component Regression with Many Predictors
2024, Econometrics and Statistics
The asymptotic properties of ridge regression in large dimension are studied. Two key results are established. First, consistency and rates of convergence for ridge regression are obtained under assumptions which impose different rates of increase in the dimension $n$ between the first $n_{1}$ and the remaining $n - n_{1}$ eigenvalues of the population covariance of the predictors. Second, it is proved that under the special and more restrictive case of an approximate factor structure, principal component and ridge regression have the same rate of convergence and the rate is faster than the one previously established for ridge.
Risks and risk premia in the US Treasury market
2024, Journal of Economic Dynamics and Control
We analyze the risk-return trade-off in the US Treasury market using a term structure model that features volatility-in-mean effects of multiple sources, and yet preserves tractable bond prices. We find a strong positive relation between risks and risk premia over the 1966-2018 period. While interest-rate risk is the main driver of such positive relation, macro risk plays a non-trivial role, and its omission leads to unstable estimates of the trade-off. Notably, macro risk contributes to the surge and consequent fall of risk premia around the 1980s, whereas it moves inversely with risk premia during the recent ‘low yield’ period.
Approximate factor models with weaker loadings
2023, Journal of Econometrics
Pervasive cross-section dependence is increasingly recognized as a characteristic of economic data and the approximate factor model provides a useful framework for analysis. Assuming a strong factor structure where $Λ^{0^{'}} Λ^{0} / N^{α}$ is positive definite in the limit when $α = 1$ , early work established convergence of the principal component estimates of the factors and loadings up to a rotation matrix. This paper shows that the estimates are still consistent and asymptotically normal when $α \in (0, 1]$ albeit at slower rates and under additional assumptions on the sample size. The results hold whether $α$ is constant or varies across factor loadings. The framework developed for heterogeneous loadings and the simplified proofs that can be also used in strong factor analysis are of independent interest.
Lasso inference for high-dimensional time series
2023, Journal of Econometrics
In this paper we develop valid inference for high-dimensional time series. We extend the desparsified lasso to a time series setting under Near-Epoch Dependence (NED) assumptions allowing for non-Gaussian, serially correlated and heteroskedastic processes, where the number of regressors can possibly grow faster than the time dimension. We first derive an error bound under weak sparsity, which, coupled with the NED assumption, means this inequality can also be applied to the (inherently misspecified) nodewise regressions performed in the desparsified lasso. This allows us to establish the uniform asymptotic normality of the desparsified lasso under general conditions, including for inference on parameters of increasing dimensions. Additionally, we show consistency of a long-run variance estimator, thus providing a complete set of tools for performing inference in high-dimensional linear time series models. Finally, we perform a simulation exercise to demonstrate the small sample properties of the desparsified lasso in common time series settings.
Structural inference in sparse high-dimensional vector autoregressions
2023, Journal of Econometrics
We consider statistical inference for impulse responses and forecast error variance decompositions in sparse, structural high-dimensional vector autoregressive (SVAR) systems. We introduce consistent estimators of impulse responses in the high-dimensional setting and suggest valid inference procedures for the same parameters. Statistical inference in our setting is much more involved since standard procedures, like the delta-method, do not apply. By using local projection equations, we first construct a de-sparsified version of regularized estimators of the moving average parameters associated with the VAR system. We then obtain estimators of the structural impulse responses by combining the aforementioned de-sparsified estimators with a non-regularized estimator of the contemporaneous impact matrix, also taking into account the high-dimensionality of the system. We show that the distribution of the derived estimators of structural impulse responses has a Gaussian limit. We also present a valid bootstrap procedure to estimate this distribution. Applications of the inference procedure in the construction of confidence intervals for impulse responses as well as in tests for forecast error variance decomposition are presented. Our procedure is illustrated by means of simulations and an empirical application.

View all citing articles on Scopus

View full text

Forecasting using a large number of predictors: Is Bayesian shrinkage a valid alternative to principal components?

Abstract

Introduction

Section snippets

Three solutions to the “curse of dimensionality” problem

Empirics

Theory

Conclusions and open questions

Acknowledgments

Journal of Econometrics

Journal of Econometrics

Journal of Econometrics

Journal of Monetary Economics

Inferential theory for factor models of large dimensions

Econometrica

Determining the number of factors in approximate factor models

Econometrica

Confidence intervals for diffusion index forecasts and inference for factor-augmented regressions

Econometrica

Arbitrage, factor structure and mean-variance analysis in large asset markets

Econometrica

Atomic decomposition by basis pursuit

SIAM Review

An iterative thresholding algorithm for linear inverse problems with a sparsity constraint

Communications on Pure and Applied Mathematics

A note on wavelet-based inversion methods

Forecasting and conditional projection using realistic prior distributions

Econometric Reviews