Abstract
Forecasting economic indicators is an important task for analysts. However, many indicators suffer from structural breaks leading to forecast failure. Methods that are robust following a structural break have been proposed in the literature but they come at a cost: an increase in forecast error variance. We propose a method to select between a set of robust and non-robust forecasting models. Our method uses time-series clustering to identify possible structural breaks in a time series, and then switches between autoregressive forecasting models depending on the series dynamics. We perform a rigorous empirical evaluation with 400 simulated series with an artificial structural break and with real data economic series: Industrial Production and Consumer Prices for all Western European countries available from the OECD database. Our results show that the proposed method statistically outperforms benchmarks in forecast accuracy for most case scenarios, particularly at short horizons.
Similar content being viewed by others
Data Availability
Available on Gitlab upon request.
Code Availability Statement
Available on Gitlab upon request.
Notes
https://data.oecd.org/.
As discussed by Klassen et al. (2020), the use of the fuzzy technique with subsequence clustering can generate the clusters with more efficiency, correcting some problems discussed in the literature. In this study we also tested fuzzy clustering as an alternative to the usual procedure, but the results were the same, therefore we opted to show the regular procedure as our result.
-
1.
significance level for Autometrics selection. In our case, we set the p-value to 0.01;
-
2.
pre-search lag reduction, as the number of lags tested for the autoregression process. We set this number to 50 (fifty);
-
3.
the outlier treatment choice. We opted to test two outlier choices:
-
(a)
“none”, as a model that does not treat outliers and just performs the Autometrics model selection;
-
(b)
“IIS”, which adds an impulse dummy for every observation, therefore, is just model (a) plus impulse indicator saturation.
-
(a)
-
1.
We opted for showing only the results regarding mean squared error, because the results for Mean Absolute Error generate the same conclusions.
This test differs from the usual Diebold Mariano test because the authors apply a bias correction to the later and also compare the results with a Student-t distribution, instead of gaussian.
This data was collected on February of 2020 and we were not able to find data for Industrial Product of Switzerland and the series for Iceland is very short, starting at 1998.
We do not consider tests for unit roots in the presence of structural breaks at this stage as the algorithm is used to detect breaks. Instead, these tests should be treated as indicative only, in order to establish the appropriate transformation of the dependent variable when applying the algorithm, while recognizing their limitations if breaks are found.
We also evaluated our model for different sample designs—as different sizes of training, validation and test sets - and the proposed DSB models had a satisfactory performance. However, its performance is better if the possible models to ensemble are more adjusted to the data. Hence, we opt to use the design sample that allows the individual models to have best performance.
More details about the evolution of MSQE and MAE can be found in our GITLAB repository.
The detailed results are available by request from the authors.
References
Aghabozorgi, S., Shirkhorshidi, A. S., & Wah, T. Y. (2015). Time-series clustering-a decade review. Information Systems, 53, 16–38.
Aminikhanghahi, S., & Cook, D. J. (2017). A survey of methods for time series change point detection. Knowledge and Information Systems, 51(2), 339–367.
Bai, J., & Perron, P. (1998). Estimating and testing linear models with multiple structural changes. Econometrica, pp. 47–78.
Bates, J. M., & Granger, C. W. (1969). The combination of forecasts. Journal of the Operational Research Society, 20(4), 451–468.
Cassisi, C., Montalto, P., Aliotta, M., Cannata, A., & Pulvirenti, A. (2012). Similarity measures and dimensionality reduction techniques for time series data mining. Advances in data mining knowledge discovery and applications’ InTech. Rijeka, Croatia, 2012, 71–96.
Castle, J., Doornik, J., Hendry, D., & Pretis, F. (2015). Detecting location shifts during model selection by step-indicator saturation. Econometrics, 3(2), 240–264.
Castle, J. L., Clements, M. P., & Hendry, D. F. (2015). Robust approaches to forecasting. International Journal of Forecasting, 31(1), 99–112.
Castle, J. L., Clements, M. P., & Hendry, D. F. (2016). An overview of forecasting facing breaks. Journal of Business Cycle Research, 12(1), 3–23.
Chauvet, M., & Potter, S. (2013). Forecasting output. Handbook of Economic Forecasting, 2, 141–194.
Chiu, C.-W.J., Hayes, S., Kapetanios, G., & Theodoridis, K. (2019). A new approach for detecting shifts in forecast accuracy. International Journal of Forecasting, 35(4), 1596–1612.
Clements, M. P., & Hendry, D. F. (2001). Forecasting non-stationary economic time series. MIT Press.
Corneli, M., Latouche, P., & Rossi, F. (2018). Multiple change points detection and clustering in dynamic networks. Statistics and Computing, 28(5), 989–1007.
Diebold, F. X., & Shin, M. (2018). Machine learning for regularized survey forecast combination: Partially-egalitarian lasso and its derivatives. International Journal of Forecasting.
Doornik, J. A. (2009). Autometrics. Citeseer: In In Honour of David F. Hendry.
Franses, P. H., & Wiemann, T. (2020). Intertemporal similarity of economic time series: an application of dynamic time warping. Computational Economics, 56(1), 59–75.
Garcia, M. G., Medeiros, M. C., & Vasconcelos, G. F. (2017). Real-time inflation forecasting with high-dimensional models: The case of brazil. International Journal of Forecasting, 33(3), 679–693.
Gilliland, M. (2020). The value added by machine learning approaches in forecasting. International Journal of Forecasting, 36(1), 161–166.
Hailin, L., & Miao, W. (2020). Fuzzy clustering based on feature weights for multivariate time series. Knowledge-Based Systems, page 105907.
Hansen, P. R., Lunde, A., & Nason, J. M. (2011). The model confidence set. Econometrica, 79(2), 453–497.
Harvey, D., Leybourne, S., & Newbold, P. (1997). Testing the equality of prediction mean squared errors. International Journal of Forecasting, 13(2), 281–291.
Hendry, D. F. (2006). Robustifying forecasts from equilibrium-correction systems. Journal of Econometrics, 135(1–2), 399–426.
Hyndman, R., & Khandakar, Y. (2008). Automatic time series forecasting: The forecast package for r. Journal of Statistical Software, Articles, 27(3), 1–22.
Izakian, H., Pedrycz, W., & Jamal, I. (2015). Fuzzy clustering of time series data using dynamic time warping distance. Engineering Applications of Artificial Intelligence, 39, 235–244.
Jabeur, S. B., Mefteh-Wali, S., & Viviani, J.-L. (2021). Forecasting gold price with the xgboost algorithm and shap interaction values. Annals of Operations Research, pages 1–21.
Januschowski, T., Wang, Y., Torkkola, K., Erkkilä, T., Hasson, H., & Gasthaus, J. (2021). Forecasting with trees. International Journal of Forecasting.
Keogh, E., Chu, S., Hart, D., & Pazzani, M. (2001). An online algorithm for segmenting time series. In Proceedings 2001 IEEE International Conference on Data Mining, pages 289–296. IEEE.
Klassen, G., Tatusch, M., Himmelspach, L., & Conrad, S. (2020). Fuzzy clustering stability evaluation of time series. In International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, pages 680–692. Springer.
Li, H. (2015). On-line and dynamic time warping for time series data mining. International Journal of Machine Learning and Cybernetics, 6(1), 145–153.
McKnight, S., Mihailov, A., & Rumler, F. (2019). Inflation forecasting using the new keynesian phillips curve with a time-varying trend. Economic Modelling.
Murphy, K. P. (2012). Machine learning: a probabilistic perspective. MIT press.
Panagiotelis, A., Athanasopoulos, G., Hyndman, R. J., Jiang, B., & Vahid, F. (2019). Macroeconomic forecasting for australia using a large number of predictors. International Journal of Forecasting, 35(2), 616–633.
Perron, P., & Yabu, T. (2009). Estimating deterministic trends with an integrated or stationary noise component. Journal of Econometrics, 151(1), 56–69.
Rakthanmanon, T., Keogh, E. J., Lonardi, S., & Evans, S. (2011). Time series epenthesis: Clustering time series streams requires ignoring some data. In 2011 IEEE 11th International Conference on Data Mining, pages 547–556. IEEE.
Smeekes, S., & Wijler, E. (2018). Macroeconomic forecasting using penalized regression methods. International Journal of Forecasting, 34(3), 408–430.
Song, Y. et al. (2011). Modelling regime switching and structural breaks with an infinite dimension markov switching model. Economics Department Working Paper, 427.
Stock, J. H., & Watson, M. W. (2002). Macroeconomic forecasting using diffusion indexes. Journal of Business and Economic Statistics, 20(2), 147–162.
Talagala, P. D., Hyndman, R. J., Smith-Miles, K., Kandanaarachchi, S., and Muñoz, M. A. (2019). Anomaly detection in streaming nonstationary temporal data. Journal of Computational and Graphical Statistics, 0(0):1–21.
Timmermann, A. (2006). Forecast combinations. Handbook of Economic Forecasting, 1, 135–196.
Tran, D.-H. (2019). Automated change detection and reactive clustering in multivariate streaming data. In 2019 IEEE-RIVF International Conference on Computing and Communication Technologies (RIVF), pages 1–6. IEEE.
Wan, H., Guo, S., Yin, K., Liang, X., & Lin, Y. (2020). Cts-lstm: Lstm-based neural networks for correlatedtime series prediction. Knowledge-Based Systems, 191, 105239.
Wang, X., Smith, K., & Hyndman, R. (2006). Characteristic-based clustering for time series data. Data Mining and Knowledge Discovery, 13(3), 335–364.
Wang, Z., Qu, J., Fang, X., Li, H., Zhong, T., & Ren, H. (2020). Prediction of early stabilization time of electrolytic capacitor based on arima-bi_lstm hybrid model. Neurocomputing, 403, 63–79.
Zakaria, J., Mueen, A., & Keogh, E. (2012). Clustering time series using unsupervised-shapelets. In 2012 IEEE 12th International Conference on Data Mining, pages 785–794. IEEE.
Zolhavarieh, S., Aghabozorgi, S., & Teh, Y. W. (2014). A review of subsequence time series clustering. The Scientific World Journal, 2014.
Acknowledgements
We are grateful to Professor Sir David F. Hendry for his helpful comments.
Funding
Not applicable.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares that they have no conflict of interest.
Ethical Approval
Not applicable.
Consent to Participate
Not applicable.
Consent for Publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pinto, J.M., Castle, J.L. Machine Learning Dynamic Switching Approach to Forecasting in the Presence of Structural Breaks. J Bus Cycle Res 18, 129–157 (2022). https://doi.org/10.1007/s41549-022-00066-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41549-022-00066-w