2021 | OriginalPaper | Chapter Open Access

# Massive Data Analytics for Macroeconomic Nowcasting

Authors: Peng Cheng, Laurent Ferrara, Alice Froidevaux, Thanh-Long Huynh

Publisher: Springer International Publishing

## 1 Introduction

## 2 Review of the Recent Literature

### 2.1 Various Types of Massive Data

^{1}Forecasting prices with Google data has been also considered, for example, by Seabold and Coppola [ 48] who focus on a set of Latin American countries for which publication delays are quite large. Besides Google data, crowd-sourced data from online platforms, such as Yelp, provide accurate real-time geographical information. Glaeser et al. [ 37] present evidence that Yelp data can complement government surveys by measuring economic activity in real time at a granular level and at almost any geographic scale in the USA.

### 2.2 Econometric Methods to Deal with Massive Datasets

_{t}using a set of variables

_{t}∼ N(0, σ

^{2}). To account for dynamics, x

_{jt}can also be a lagged value of the target variable or of other explanatory variables. In such a situation, usual least-squares estimates are not necessarily a good idea as there are too many parameters to estimate, leading to a high degree of uncertainty in estimates, as well as a strong risk of over-fitting in-sample associated to poor out-of-sample performances. There are some econometric approaches to address the curse of dimensionality. Borrowing from Giannone et al. [ 36], we can classify those approaches in two categories: sparse and dense models. Sparse methods assume that some β

_{j}coefficients in Eq. ( 1) are equal to zero. This means that only few variables have an impact on the target variable. Zeros can be imposed ex ante by the practitioners based on specific a priori information. Alternatively, zeros can be estimated using an appropriate estimation method such as the LASSO ( least absolute shrinkage and selection operator) regularization approach [ 51] or some Bayesian techniques that impose some coefficients to take null values during the estimation step (see, e.g., Smith et al. [ 49] who develop a Bayesian approach that can shrink some coefficients to zero and allows coefficients that are shrunk to zero to vary through regimes).

_{t}is decomposed into a common component Λf

_{t}where \(f_{t}=\left ( f_{1t},\ldots ,f_{rt}\right ) ^{{ }^{\prime }}\) and Λ is the loading matrix such that \(\varLambda =\left ( \lambda _{1},\ldots ,\lambda _{n}\right )^{\prime }\) and an idiosyncratic component \(\xi _{t}=\left ( \xi _{1t},\ldots ,\xi _{nt}\right ) ^{{ }^{\prime }}\) a vector of n mutually uncorrelated components. A VAR(p) dynamics is sometimes allowed for the vector f

_{t}. Estimation is carried out using the diffusion index approach of Stock and Watson [ 50] or the generalized DFM of Forni et al. [ 30]. As the number r of estimated factors \(\hat {f}_{t}\) is generally small, they can be directly put in a second step into the regression equation to explain y

_{t}in the following way:

_{t}is generally a low-frequency variable (e.g., quarterly), while explanatory variables x

_{t}are generally high frequency (e.g., daily). A standard approach is to first aggregate the high-frequency variables to the low frequency by averaging and then to estimate Eq. ( 1) at the lowest frequency. Alternatively, mixed-data sampling (MIDAS hereafter) models have been put forward by Ghysels et al. [ 34] in order to avoid systematically aggregating high-frequency variables. As an example, let’s consider the following MIDAS bivariate equation:

_{t}) such that we observe m times \((x_{t}^{(m)})\) over the period [ t − 1, t]. The term \(B\left (\theta \right )\) controls the polynomial weights that allows the frequency mixing. Indeed, the MIDAS specification consists in smoothing the past values of \((x_{t}^{(m)})\) by using the polynomial \(B\left (\theta \right )\) of the form:

_{K}(.) is the weight function that can take various shapes. For example, as in [ 34], a two-parameter exponential Almon lag polynomial can be implemented such as θ = ( θ

_{1}, θ

_{2}),

_{k}(.) but assume a linear relationship of the following form:

_{j}in Eq. ( 7) to be equal to zero. We will use this strategy in our applications (see details in Sect. 4.1).

## 3 Example of Macroeconomic Applications Using Massive Alternative Data

### 3.1 A Real-Time Proxy for Exports and Imports

#### 3.1.1 International Trade

#### 3.1.2 Localization Data

#### 3.1.3 QuantCube International Trade Index: The Case of China

_{i,j}the number of ship arrivals of type i [container cargo, tanker, bulk cargo] in a given Chinese port j.