1 Introduction
2 Methodology: sentiment analysis
2.1 Dictionary-based method
-
First, the word list is prepared by Bannier et al. (2018). This is the German equivalent of the English original dictionary provided by Loughran and McDonald (2016). The last-mentioned word list is well established for textual analysis in finance- and accounting-specific contexts. The word list prepared by Bannier et al. (2018) includes over 2200 positive and 10,000 negative word forms. The dictionary is binary coded for polarity in positive and negative terms.
-
Second, there is a forecast-specific German dictionary-based on Sharpe et al. (2020). According to Di Fatta et al. (2015), words have different connotations and meanings in different contexts, and sentiment indices have to be adapted to the content to which they have been applied. To this end, Sharpe et al. (2020) developed a forecast-specific word list which excludes words that have special meanings in an economic forecasting context. The word list contains 205 positive and 103 negative words (see Tables 8, 9) and is binary coded like the previous one.
-
Finally, there is the SentimentWortschatz (SentiWS) dictionary (Remus et al. 2010). The SentiWS dictionary contains a German-specific word list for sentiment analysis. The current version (v2.0) contains about 16,000 positive and 18,000 negative word forms, and unlike the other two dictionaries, it includes weights for polarity within the interval of \([-1; 1]\).
2.2 Automatic variable selection approach
2.3 Recursive estimation
3 Corpus and data
3.1 The text corpus
-
Business cycle forecast (sub-)section Business cycle forecast reports are heterogeneous in size and content. Some reports are structured into different subsections like recent national or international economic development, business cycle forecasts, economic policy advices, or methodological explanations. Other reports are miscellaneous texts of various themes and cannot be split in a meaningful way. Therefore, business cycle reports should contain a clearly defined forecast (sub-)section.
-
Time range The corpus covers business cycle forecast reports for Germany from 1993 to 2017 to circumvent the German reunification and possible misspecification for East and West Germany.
-
Forecasters’ experiences Continuity and regularity of publication within the examined period ensure forecasters’ experiences in the field of economic forecasting, ensuring a sufficient level of homogeneity in language across institutes.
-
Language homogeneity The (relatively short) period of 25 years as well as forecasters’ experiences assures a sufficient degree of homogeneity in language over time.
-
Quantitative forecast availability To use a comparative sample for growth and inflation forecast analysis, only business cycle forecast reports with a calculable fixed horizon forecast for growth and inflation will be used. The availability of numerical point forecasts of growth and inflation for the current and next year restricts the number of incorporated forecast reports (see Sect. 3.2).
-
Forecasting date The forecasting date is distributed over the whole year, depending on respective institutional practice and the frequency of publication. In most cases, the frequency of publication is bi-annual or higher.
-
Text availability Another criterion was the public availability of business cycle forecast reports, which is why private institutes like banks are not included.
3.2 The sample
Growth forecasts | Inflation forecasts | |
---|---|---|
Number of observations | 534 | 534 |
Mean error | \(-\) 0.051 | \(-\) 0.135 |
Mean absolute error | 1.715 | 0.685 |
Root mean squared error | 2.578 | 0.862 |
Theil’s inequality coefficient | 1.000 | 0.546 |
Number of overestimations | 274 | 292 |
Number of underestimations | 260 | 242 |
Information content | 1.398 | 1.217 |
\(\chi ^2\)-test | 0.000 | 0.000 |
AUROC | 0.746 | 0.763 |
4 Empirical results
4.1 Sentiments’ characteristics
Dictionary | Bannier | Sharpe | SentiWS | LASSO | LASSO | Ridge | Ridge |
---|---|---|---|---|---|---|---|
(1,2) | (1,2) | (GDP) | (inflation) | (GDP) | (inflation) | ||
Dictionary type | Binary | Binary | Weighted | Weighted | Weighted | Weighted | Weighted |
Total entries | 7619 | 292 | 22972 | 71 | 69 | 2359 | 2359 |
Positive entries in % | 1363 | 196 | 10863 | 42 | 38 | 1257 | 1161 |
17.9 | 67.1 | 47.3 | 59.2 | 55.1 | 53.3 | 49.2 | |
Negative entries in % | 6256 | 96 | 12109 | 29 | 31 | 1102 | 1198 |
82.1 | 32.9 | 52.7 | 40.8 | 44.9 | 46.7 | 50.8 | |
Average score | – | – | − 0.0515 | − 0.0032 | 0.0002 | 0.0000 | 0.0000 |
Stand. deviation | – | – | 0.2153 | 0.0302 | 0.0159 | 0.0021 | 0.0017 |
4.2 Forecast efficiency
Dependent variable: growth forecast error\(^{\mathrm{a}}\) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Constant | –\(^{\mathrm{b}}\) | 0.079 | 0.078 | 0.052 | 0.052 | 0.077 | 0.086 | 0.056 | 0.083 | − 0.057 |
– | 0.132 | 0.132 | 0.131 | 0.130 | 0.132 | 0.128 | 0.127 | 0.125 | 0.124 | |
lGDP_FE | − 0.203\(^{***}\) | − 0.212\(^{***}\) | − 0.206\(^{***}\) | − 0.182\(^{***}\) | − 0.167\(^{***}\) | − 0.196\(^{***}\) | − 0.099\(^{*}\) | − 0.221\(^{***}\) | 0.002 | − 0.188\(^{***}\) |
(0.057) | (0.058) | (0.058) | (0.057) | (0.057) | (0.057) | (0.057) | (0.054) | (0.058) | (0.052) | |
Bannier1 | 0.118 | |||||||||
(0.135) | ||||||||||
Bannier2 | 0.032 | |||||||||
(0.126) | ||||||||||
Sharpe1 | − 0.324\(^{**}\) | |||||||||
(0.151) | ||||||||||
Sharpe2 | − 0.402\(^{***}\) | |||||||||
(0.136) | ||||||||||
SentiWS | − 0.152 | |||||||||
(0.155) | ||||||||||
Lasso_GDP_P | − 0.736\(^{***}\) | |||||||||
(0.145) | ||||||||||
Lasso_INF_P | − 0.761\(^{***}\) | |||||||||
(0.124) | ||||||||||
Ridge_GDP_P | − 1.093\(^{***}\) | |||||||||
(0.166) | ||||||||||
Ridge_INF_P | −1.341\(^{***}\) | |||||||||
(0.159) | ||||||||||
Observations | 387 | 387 | 387 | 387 | 387 | 387 | 387 | 387 | 387 | 387 |
\(R^{2}\) | 0.043 | 0.045 | 0.043 | 0.057 | 0.063 | 0.045 | 0.097 | 0.122 | 0.142 | 0.198 |
Efficiency test [p value] | [< 0.001] | [0.001] | [0.002] | [< 0.001] | [< 0.001] | [0.001] | [< 0.001] | [< 0.001] | [< 0.001] | [< 0.001] |
Dependent variable: inflation forecast error\(^{\mathrm{a}}\) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Constant | –\(^{\mathrm{b}}\) | − 0.062 | − 0.062 | − 0.058 | − 0.058 | − 0.063 | − 0.063 | − 0.067 | − 0.065 | − 0.106 |
– | 0.042 | 0.042 | 0.042 | 0.041 | 0.042 | 0.042 | 0.039 | 0.042 | 0.037 | |
lINF_FE | − 0.109\(^{**}\) | − 0.108\(^{**}\) | − 0.108\(^{**}\) | − 0.121\(^{**}\) | − 0.132\(^{***}\) | − 0.109\(^{**}\) | − 0.109\(^{**}\) | − 0.045 | − 0.128\(^{**}\) | 0.067 |
(0.050) | (0.050) | (0.050) | (0.050) | (0.051) | (0.050) | (0.052) | (0.047) | (0.054) | (0.047) | |
Bannier1 | 0.023 | |||||||||
(0.045) | ||||||||||
Bannier2 | 0.019 | |||||||||
(0.040) | ||||||||||
Sharpe1 | 0.073 | |||||||||
(0.049) | ||||||||||
Sharpe2 | 0.113\(^{***}\) | |||||||||
(0.043) | ||||||||||
SentiWS | −0.011 | |||||||||
(0.047) | ||||||||||
Lasso_GDP_P | − 0.0005 | |||||||||
(0.049) | ||||||||||
Lasso_INF_P | − 0.323\(^{***}\) | |||||||||
(0.043) | ||||||||||
Ridge_GDP_P | 0.046 | |||||||||
(0.049) | ||||||||||
Ridge_INF_P | − 0.568\(^{***}\) | |||||||||
(0.054) | ||||||||||
Observations | 387 | 387 | 387 | 387 | 387 | 387 | 387 | 387 | 387 | 387 |
\(R^{2}\) | 0.013 | 0.013 | 0.013 | 0.020 | 0.030 | 0.013 | 0.013 | 0.157 | 0.015 | 0.269 |
Efficiency test [p value] | [0.028] | [0.085] | [0.085] | [0.033] | [0.004] | [0.088] | [0.091] | [<0.001] | [0.062] | [<0.001] |
4.3 Predictive power
4.3.1 In-sample forecasting regressions
-
First, the term ‘spread’ (long-term interest rate minus the short-term interest rate) serves as a monetary control variable. The long-term interest rate serves the yield on debt securities outstanding issued by residents with mean residual maturity of more than nine and up to 10 years (monthly average, source Deutsche Bundesbank 2020). As the short-term interest rate, the EURIBOR 3-month funds money market rate is used (monthly average, source Deutsche Bundesbank 2020).
-
Second, total orders received by the German industry serves as the industry control variable. We take the change over the previous month at constant prices, calendar and seasonally adjusted orders (source: Deutsche Bundesbank 2020)
-
Third, the Ifo business climate index as leading business cycle indicator (monthly data, source Ifo institute 2020)
Dependent variable: average growth rate of GDP over the next four quarters | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Lagged | 0.098 | 0.092 | 0.100 | 0.149 | 0.113 | 0.092 | 0.101 | 0.120 | 0.054 | 0.040 |
endog. var. | (0.211) | (0.206) | (0.207) | (0.199) | (0.201) | (0.193) | (0.192) | (0.196) | (0.205) | (0.178) |
Order inflow | 0.838\(^{***}\) | 0.733\(^{***}\) | 0.757\(^{***}\) | 0.904\(^{***}\) | 0.860\(^{***}\) | 0.822\(^{***}\) | 0.837\(^{***}\) | 0.745\(^{***}\) | 0.831\(^{***}\) | 0.646\(^{***}\) |
(0.172) | (0.158) | (0.162) | (0.173) | (0.172) | (0.170) | (0.183) | (0.163) | (0.167) | (0.167) | |
Interest rate | 0.962\(^{**}\) | 1.037\(^{**}\) | 1.044\(^{**}\) | 0.913\(^{*}\) | 0.937\(^{*}\) | 0.977\(^{*}\) | 0.962\(^{**}\) | 0.785\(^{*}\) | 0.986\(^{**}\) | 0.629\(^{*}\) |
spread | (0.463) | (0.467) | (0.477) | (0.476) | (0.485) | (0.526) | (0.454) | (0.401) | (0.495) | (0.356) |
Ifo business | 0.077 | − 0.075 | − 0.070 | 0.105 | 0.107 | 0.068 | 0.079 | 0.023 | 0.044 | 0.271 |
climate | (0.435) | (0.472) | (0.478) | (0.434) | (0.458) | (0.473) | (0.473) | (0.397) | (0.479) | (0.321) |
Bannier1 | 0.628\(^{*}\) | |||||||||
(0.356) | ||||||||||
Bannier2 | 0.515 | |||||||||
(0.342) | ||||||||||
Sharpe1 | − 0.498 | |||||||||
(0.363) | ||||||||||
Sharpe2 | − 0.185 | |||||||||
(0.303) | ||||||||||
SentiWS | 0.127 | |||||||||
(0.766) | ||||||||||
Lasso_GDP_P | − 0.015 | |||||||||
(0.556) | ||||||||||
Lasso_INF_P | − 1.018\(^{***}\) | |||||||||
(0.275) | ||||||||||
Ridge_GDP_P | 0.193 | |||||||||
(0.577) | ||||||||||
Ridge_INF_P | − 1.200\(^{***}\) | |||||||||
(0.287) | ||||||||||
Constant | 1.292\(^{**}\) | 1.332\(^{***}\) | 1.327\(^{***}\) | 1.211\(^{**}\) | 1.272\(^{**}\) | 1.305\(^{***}\) | 1.286\(^{***}\) | 1.184\(^{**}\) | 1.370\(^{***}\) | 1.271\(^{***}\) |
(0.515) | (0.488) | (0.489) | (0.498) | (0.506) | (0.476) | (0.441) | (0.481) | (0.439) | (0.432) | |
Observations | 76 | 76 | 76 | 76 | 76 | 76 | 76 | 76 | 76 | 76 |
\(R^{2}\) | 0.409 | 0.430 | 0.424 | 0.418 | 0.411 | 0.410 | 0.409 | 0.475 | 0.411 | 0.499 |
Dependent variable: average growth rate of inflation over the next four quarters | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Lagged | 0.116 | 0.034 | − 0.002 | 0.118 | 0.136 | 0.094 | 0.116 | 0.240 | 0.117 | 0.280 |
endog. var. | (0.168) | (0.149) | (0.142) | (0.157) | (0.162) | (0.159) | (0.166) | (0.192) | (0.163) | (0.220) |
Order inflow | 0.110 | 0.149\(^{**}\) | 0.154\(^{**}\) | 0.078 | 0.089 | 0.124 | 0.108 | 0.105 | 0.111 | 0.096 |
(0.072) | (0.075) | (0.073) | (0.072) | (0.070) | (0.077) | (0.078) | (0.069) | (0.072) | (0.070) | |
Interest rate | 0.052 | − 0.006 | − 0.037 | 0.095 | 0.096 | 0.024 | 0.051 | 0.042 | 0.055 | 0.029 |
spread | (0.135) | (0.133) | (0.128) | (0.127) | (0.129) | (0.148) | (0.136) | (0.135) | (0.151) | (0.130) |
Ifo business | 0.255\(^{***}\) | 0.342\(^{***}\) | 0.373\(^{***}\) | 0.182\(^{**}\) | 0.180\(^{*}\) | 0.282\(^{***}\) | 0.262\(^{**}\) | 0.238\(^{***}\) | 0.246\(^{**}\) | 0.255\(^{***}\) |
climate | (0.077) | (0.066) | (0.069) | (0.090) | (0.098) | (0.066) | (0.112) | (0.082) | (0.113) | (0.079) |
Bannier1 | − 0.304\(^{*}\) | |||||||||
(0.169) | ||||||||||
Bannier2 | − 0.380\(^{**}\) | |||||||||
(0.162) | ||||||||||
Sharpe1 | 0.297 | |||||||||
(0.189) | ||||||||||
Sharpe2 | 0.234 | |||||||||
(0.160) | ||||||||||
SentiWS | − 0.155 | |||||||||
(0.230) | ||||||||||
Lasso_GDP_P | − 0.015 | |||||||||
(0.164) | ||||||||||
Lasso_INF_P | − 0.221 | |||||||||
(0.185) | ||||||||||
Ridge_GDP_P | 0.015 | |||||||||
(0.133) | ||||||||||
Ridge_INF_P | − 0.247 | |||||||||
(0.202) | ||||||||||
Constant | 1.299\(^{***}\) | 1.399\(^{***}\) | 1.437\(^{***}\) | 1.301\(^{***}\) | 1.270\(^{***}\) | 1.326\(^{***}\) | 1.299\(^{***}\) | 1.109\(^{***}\) | 1.299\(^{***}\) | 1.048\(^{***}\) |
(0.294) | (0.254) | (0.239) | (0.272) | (0.281) | (0.278) | (0.298) | (0.344) | (0.293) | (0.398) | |
Observations | 76 | 76 | 76 | 76 | 76 | 76 | 76 | 76 | 76 | 76 |
\(R^{2}\) | 0.201 | 0.252 | 0.282 | 0.241 | 0.235 | 0.209 | 0.201 | 0.224 | 0.201 | 0.223 |
4.3.2 Out-of-sample forecasting performance
Relative | Relative | DM-statistic | p value | DM-statistic | p value | |
---|---|---|---|---|---|---|
MAE | MSE | (linear) | (linear) | (quadratic) | (quadratic) | |
Dependent Variable: GDP growth | ||||||
Bannier1 | 1.064 | 1.063 | \(-\) 0.486 | 0.686 | \(-\) 0.329 | 0.629 |
Bannier2 | 1.106 | 1.098 | \(-\) 1.024 | 0.847 | \(-\) 0.540 | 0.705 |
Sharpe1 | 1.195 | 1.482 | \(-\) 0.760 | 0.776 | \(-\) 1.170 | 0.879 |
Sharpe2 | 1.222 | 1.404 | \(-\) 0.985 | 0.838 | \(-\) 1.397 | 0.919 |
SentiWS | 1.580 | 2.611 | \(-\) 3.441 | 1.000 | \(-\) 2.411 | 0.992 |
Lasso_GDP_P | 0.949 | 0.945 | 1.303 | 0.096 | 0.888 | 0.187 |
Lasso_INF_P | 0.938 | 1.090 | 0.299 | 0.383 | \(-\) 0.267 | 0.605 |
Ridge_GDP_P | 1.282 | 1.408 | \(-\) 1.963 | 0.975 | \(-\) 2.063 | 0.980 |
Ridge_INF_P | 0.858 | 0.766 | 0.548 | 0.292 | 0.563 | 0.287 |
Dependent Variable: Inflation rate | ||||||
Bannier1 | 1.080 | 1.100 | \(-\) 0.830 | 0.797 | \(-\) 0.553 | 0.710 |
Bannier2 | 1.049 | 1.082 | \(-\) 0.470 | 0.681 | \(-\) 0.415 | 0.661 |
Sharpe1 | 1.040 | 1.153 | \(-\) 0.246 | 0.597 | \(-\) 0.616 | 0.731 |
Sharpe2 | 0.950 | 0.991 | 0.655 | 0.256 | 0.107 | 0.457 |
SentiWS | 1.017 | 1.011 | \(-\) 0.087 | 0.535 | \(-\) 0.030 | 0.512 |
Lasso_GDP_P | 1.158 | 1.223 | \(-\) 1.892 | 0.971 | \(-\) 1.862 | 0.969 |
Lasso_INF_P | 1.007 | 0.931 | \(-\) 0.137 | 0.554 | 1.046 | 0.148 |
Ridge_GDP_P | 1.071 | 1.217 | \(-\) 0.529 | 0.702 | \(-\) 0.993 | 0.840 |
Ridge_INF_P | 0.934 | 0.889 | 2.524 | 0.006 | 1.975 | 0.024 |