Skip to main content
Top
Published in: International Journal of Data Science and Analytics 3/2017

16-02-2017 | Regular Paper

Resampling strategies for imbalanced time series forecasting

Authors: Nuno Moniz, Paula Branco, Luís Torgo

Published in: International Journal of Data Science and Analytics | Issue 3/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Time series forecasting is a challenging task, where the non-stationary characteristics of data portray a hard setting for predictive tasks. A common issue is the imbalanced distribution of the target variable, where some values are very important to the user but severely under-represented. Standard prediction tools focus on the average behaviour of the data. However, the objective is the opposite in many forecasting tasks involving time series: predicting rare values. A common solution to forecasting tasks with imbalanced data is the use of resampling strategies, which operate on the learning data by changing its distribution in favour of a given bias. The objective of this paper is to provide solutions capable of significantly improving the predictive accuracy on rare cases in forecasting tasks using imbalanced time series data. We extend the application of resampling strategies to the time series context and introduce the concept of temporal and relevance bias in the case selection process of such strategies, presenting new proposals. We evaluate the results of standard forecasting tools and the use of resampling strategies, with and without bias over 24 time series data sets from six different sources. Results show a significant increase in predictive accuracy on rare cases associated with using resampling strategies, and the use of biased strategies further increases accuracy over non-biased strategies.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
1.
go back to reference Akbilgic, O., Bozdogan, H., Balaban, M.E.: A novel hybrid RBF neural networks model as a forecaster. Stat. Comput. 24(3), 365–375 (2014)MathSciNetCrossRefMATH Akbilgic, O., Bozdogan, H., Balaban, M.E.: A novel hybrid RBF neural networks model as a forecaster. Stat. Comput. 24(3), 365–375 (2014)MathSciNetCrossRefMATH
2.
go back to reference Basu, S., Meckesheimer, M.: Automatic outlier detection for time series: an application to sensor data. Knowl. Inf. Syst. 11(2), 137–154 (2007). (ISSN 0219-1377)CrossRef Basu, S., Meckesheimer, M.: Automatic outlier detection for time series: an application to sensor data. Knowl. Inf. Syst. 11(2), 137–154 (2007). (ISSN 0219-1377)CrossRef
3.
go back to reference Branco, P.: Re-sampling Approaches for Regression Tasks Under Imbalanced Domains. Master’s thesis, Universidade do Porto (2014) Branco, P.: Re-sampling Approaches for Regression Tasks Under Imbalanced Domains. Master’s thesis, Universidade do Porto (2014)
5.
go back to reference Branco, P., Torgo, L., Ribeiro, R.P.: A survey of predictive modeling on imbalanced domains. ACM Comput. Surv 49(2), 31:1–31:50 (2016b)CrossRef Branco, P., Torgo, L., Ribeiro, R.P.: A survey of predictive modeling on imbalanced domains. ACM Comput. Surv 49(2), 31:1–31:50 (2016b)CrossRef
6.
go back to reference Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Dbsmote: density-based synthetic minority over-sampling technique. Appl. Intell. 36(3), 664–684 (2012)CrossRef Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Dbsmote: density-based synthetic minority over-sampling technique. Appl. Intell. 36(3), 664–684 (2012)CrossRef
7.
go back to reference Cao, L., Zhao, Y., Zhang, C.: Mining impact-targeted activity patterns in imbalanced data. IEEE Trans. Knowl. Data Eng. 20(8), 1053–1066 (2008)CrossRef Cao, L., Zhao, Y., Zhang, C.: Mining impact-targeted activity patterns in imbalanced data. IEEE Trans. Knowl. Data Eng. 20(8), 1053–1066 (2008)CrossRef
8.
go back to reference Chatfield, C.: The Analysis of Time Series: An Introduction, 6th edn. CRC Press, Boca Raton (2004)MATH Chatfield, C.: The Analysis of Time Series: An Introduction, 6th edn. CRC Press, Boca Raton (2004)MATH
9.
go back to reference Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. JAIR 16, 321–357 (2002)MATH Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. JAIR 16, 321–357 (2002)MATH
10.
go back to reference Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. Newsl. 6(1), 1–6 (2004). (ISSN 1931-0145)CrossRef Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. Newsl. 6(1), 1–6 (2004). (ISSN 1931-0145)CrossRef
11.
go back to reference Chen, J., He, H., Williams, G.J., Jin, H.: Temporal sequence associations for rare events. In: Proceedings of the 8th PAKDD, pp. 235–239. Springer (2004) Chen, J., He, H., Williams, G.J., Jin, H.: Temporal sequence associations for rare events. In: Proceedings of the 8th PAKDD, pp. 235–239. Springer (2004)
12.
13.
go back to reference Elkan, C.: The foundations of cost-sensitive learning. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence—Volume 2, IJCAI’01, pp. 973–978. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2001) (ISBN 1-55860-812-5, 978-1-558-60812-2) Elkan, C.: The foundations of cost-sensitive learning. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence—Volume 2, IJCAI’01, pp. 973–978. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2001) (ISBN 1-55860-812-5, 978-1-558-60812-2)
14.
go back to reference Fanaee-T, H., Gama, J.: Event labeling combining ensemble detectors and background knowledge. Prog. Artif. Intell. 2(2–3),113–127 (2014) (ISSN 2192-6352) Fanaee-T, H., Gama, J.: Event labeling combining ensemble detectors and background knowledge. Prog. Artif. Intell. 2(2–3),113–127 (2014) (ISSN 2192-6352)
15.
go back to reference Fawcett, T., Provost, F.: Activity monitoring: noticing interesting changes in behavior. In: Proceedings of the 5th ACM SIGKDD, pp. 53–62 (1999) Fawcett, T., Provost, F.: Activity monitoring: noticing interesting changes in behavior. In: Proceedings of the 5th ACM SIGKDD, pp. 53–62 (1999)
16.
go back to reference Hoens, T.R., Qian, Q., Chawla, N.V., Zhou, Z.-H.: Building decision trees for the multi-class imbalance problem. In: Proceedings of the 16th PAKDD, pp. 122–134. Springer, Berlin (2012) Hoens, T.R., Qian, Q., Chawla, N.V., Zhou, Z.-H.: Building decision trees for the multi-class imbalance problem. In: Proceedings of the 16th PAKDD, pp. 122–134. Springer, Berlin (2012)
17.
go back to reference Hyndman, R., Khandakar, Y.: Automatic time series forecasting: the forecast package for r. J. Stat. Soft. 27(1), 1–22 (2008). (ISSN 1548-7660) Hyndman, R., Khandakar, Y.: Automatic time series forecasting: the forecast package for r. J. Stat. Soft. 27(1), 1–22 (2008). (ISSN 1548-7660)
18.
go back to reference Ihler, A., Hutchins, J., Smyth, P.: Adaptive event detection with time-varying poisson processes. In: Proceedings of the 12th ACM SIGKDD, pp. 207–216. New York, NY, USA (2006) Ihler, A., Hutchins, J., Smyth, P.: Adaptive event detection with time-varying poisson processes. In: Proceedings of the 12th ACM SIGKDD, pp. 207–216. New York, NY, USA (2006)
19.
go back to reference Japkowicz, N.: The class imbalance problem: significance and strategies. In: Proceedings of the 2000 International Conference on Artificial Intelligence (ICAI), pp. 111–117 (2000) Japkowicz, N.: The class imbalance problem: significance and strategies. In: Proceedings of the 2000 International Conference on Artificial Intelligence (ICAI), pp. 111–117 (2000)
20.
go back to reference Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)MATH Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)MATH
21.
go back to reference Jo, T., Japkowicz, N.: Class imbalances versus small disjuncts. ACM Sigkdd Explor. Newsl. 6(1), 40–49 (2004)CrossRef Jo, T., Japkowicz, N.: Class imbalances versus small disjuncts. ACM Sigkdd Explor. Newsl. 6(1), 40–49 (2004)CrossRef
22.
go back to reference Keogh, E., Lonardi, S., Chiu, B.Y-c.: Finding surprising patterns in a time series database in linear time and space. In: Proceedings of the 8th ACM SIGKDD, pp. 550–556. New York, NY, USA (2002) Keogh, E., Lonardi, S., Chiu, B.Y-c.: Finding surprising patterns in a time series database in linear time and space. In: Proceedings of the 8th ACM SIGKDD, pp. 550–556. New York, NY, USA (2002)
23.
go back to reference Koprinska, I., Rana, M., Agelidis, V.G.: Yearly and seasonal models for electricity load forecasting. In: Proceedings of 2011 IJCNN, pp. 1474–1481 (2011) Koprinska, I., Rana, M., Agelidis, V.G.: Yearly and seasonal models for electricity load forecasting. In: Proceedings of 2011 IJCNN, pp. 1474–1481 (2011)
24.
go back to reference Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: One-sided selection. In: Proceedings of the 14th ICML, pp. 179–186. Morgan Kaufmann, Nashville (1997) Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: One-sided selection. In: Proceedings of the 14th ICML, pp. 179–186. Morgan Kaufmann, Nashville (1997)
25.
go back to reference Laurikkala, J.: Improving identification of difficult small classes by balancing class distribution. In: Conference on Artificial Intelligence in Medicine in Europe, pp. 63–66. Springer (2001) Laurikkala, J.: Improving identification of difficult small classes by balancing class distribution. In: Conference on Artificial Intelligence in Medicine in Europe, pp. 63–66. Springer (2001)
26.
go back to reference Li, K., Zhang, W., Lu, Q., Fang, X.: An improved smote imbalanced data classification method based on support degree. In: Proceedings of 2014 International Conference IIKI, pp. 34–38. IEEE (2014) Li, K., Zhang, W., Lu, Q., Fang, X.: An improved smote imbalanced data classification method based on support degree. In: Proceedings of 2014 International Conference IIKI, pp. 34–38. IEEE (2014)
27.
go back to reference Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 18–22 (2002) Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 18–22 (2002)
28.
go back to reference Lin, J., Keogh, E.J., Fu, A., Van Herle, H.: Approximations to magic: finding unusual medical time series. In: CBMS, pp. 329–334. IEEE Computer Society (2005) (ISBN 0-7695-2355-2) Lin, J., Keogh, E.J., Fu, A., Van Herle, H.: Approximations to magic: finding unusual medical time series. In: CBMS, pp. 329–334. IEEE Computer Society (2005) (ISBN 0-7695-2355-2)
29.
go back to reference Liu, X.-Y., Wu, J., Zhou, Z.-H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 39(2), 539–550 (2009)CrossRef Liu, X.-Y., Wu, J., Zhou, Z.-H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 39(2), 539–550 (2009)CrossRef
30.
go back to reference López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250, 113–141 (2013)CrossRef López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250, 113–141 (2013)CrossRef
31.
go back to reference Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F.: e1071: Misc Functions of the Department of Statistics (e1071), TU Wien, R package version 1.6-1 (2012) Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F.: e1071: Misc Functions of the Department of Statistics (e1071), TU Wien, R package version 1.6-1 (2012)
32.
go back to reference Milborrow, S.: earth: Multivariate Adaptive Regression Spline Models (2013) Milborrow, S.: earth: Multivariate Adaptive Regression Spline Models (2013)
33.
go back to reference Moniz, N., Branco, P., Torgo, L.: Resampling strategies for imbalanced time series. In: Proceedings 3rd IEEE International Conference on Data Science and Advanced Analytics (DSAA), Montreal, Canada (2016) Moniz, N., Branco, P., Torgo, L.: Resampling strategies for imbalanced time series. In: Proceedings 3rd IEEE International Conference on Data Science and Advanced Analytics (DSAA), Montreal, Canada (2016)
34.
go back to reference Oliveira, M., Torgo, L.: Ensembles for time series forecasting. In: Proceedings of the 6th Asian Conference on Machine Learning (ACML), Nha Trang City, Vietnam (2014) Oliveira, M., Torgo, L.: Ensembles for time series forecasting. In: Proceedings of the 6th Asian Conference on Machine Learning (ACML), Nha Trang City, Vietnam (2014)
35.
go back to reference R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2014) R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2014)
36.
go back to reference Ribeiro, R.: Utility-Based Regression. Ph.D. thesis, Department of Computer Science, Faculty of Sciences—University of Porto (2011) Ribeiro, R.: Utility-Based Regression. Ph.D. thesis, Department of Computer Science, Faculty of Sciences—University of Porto (2011)
37.
go back to reference Scott, S.L.: Detecting network intrusion using a markov modulated nonhomogeneous poisson process. Submitt. J. ASA (2001) Scott, S.L.: Detecting network intrusion using a markov modulated nonhomogeneous poisson process. Submitt. J. ASA (2001)
38.
go back to reference Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: Rusboost: a hybrid approach to alleviating class imbalance. IEEE Trans. SMC 40(1), 185–197 (2010) Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: Rusboost: a hybrid approach to alleviating class imbalance. IEEE Trans. SMC 40(1), 185–197 (2010)
41.
go back to reference Tong, H., Thanoon, B., Gudmundsson, G.: Threshold time series modeling of two icelandic riverflow systems1. JAWRA 21(4), 651–662 (1985). (ISSN 1752-1688) Tong, H., Thanoon, B., Gudmundsson, G.: Threshold time series modeling of two icelandic riverflow systems1. JAWRA 21(4), 651–662 (1985). (ISSN 1752-1688)
42.
go back to reference Torgo, L.: Data Mining with R, Learning with Case Studies. Chapman and Hall/CRC, Boca Raton (2010)CrossRef Torgo, L.: Data Mining with R, Learning with Case Studies. Chapman and Hall/CRC, Boca Raton (2010)CrossRef
43.
go back to reference Torgo, L.: An infra-structure for performance estimation and experimental comparison of predictive models in R. CoRR. arXiv:1412.0436 (2014) Torgo, L.: An infra-structure for performance estimation and experimental comparison of predictive models in R. CoRR. arXiv:​1412.​0436 (2014)
44.
go back to reference Torgo, L., Ribeiro, R.: Utility-based regression. In: Springer, editor, Proceedings of 11th PKDD, pp. 597–604 (2007) Torgo, L., Ribeiro, R.: Utility-based regression. In: Springer, editor, Proceedings of 11th PKDD, pp. 597–604 (2007)
45.
go back to reference Torgo, L., Branco, P., Ribeiro, R.P., Pfahringer, B.: Resampling strategies for regression. Expert Syst. 32(3), 465–476 (2015)CrossRef Torgo, L., Branco, P., Ribeiro, R.P., Pfahringer, B.: Resampling strategies for regression. Expert Syst. 32(3), 465–476 (2015)CrossRef
46.
go back to reference Torgo, L., Ribeiro, R.P., Pfahringer, B., Branco, P.: Smote for regression. In: Progress in Artificial Intelligence, pp. 378–389. Springer (2013) Torgo, L., Ribeiro, R.P., Pfahringer, B., Branco, P.: Smote for regression. In: Progress in Artificial Intelligence, pp. 378–389. Springer (2013)
47.
go back to reference Vilalta, R., Ma, S.: Predicting rare events in temporal domains. In: Proceedings of the 2002 IEEE ICDM, pp. 474–481 (2002) Vilalta, R., Ma, S.: Predicting rare events in temporal domains. In: Proceedings of the 2002 IEEE ICDM, pp. 474–481 (2002)
48.
go back to reference Wallace, B.C., Small, K., Brodley, C.E., Trikalinos, T.A.: Class imbalance, redux. In: Proceedings of 11th ICDM, pp. 754–763. IEEE (2011) Wallace, B.C., Small, K., Brodley, C.E., Trikalinos, T.A.: Class imbalance, redux. In: Proceedings of 11th ICDM, pp. 754–763. IEEE (2011)
49.
go back to reference Wei, W., Li, J., Cao, L., Yuming, O., Chen, J.: Effective detection of sophisticated online banking fraud on extremely imbalanced data. World Wide Web 16(4), 449–475 (2013)CrossRef Wei, W., Li, J., Cao, L., Yuming, O., Chen, J.: Effective detection of sophisticated online banking fraud on extremely imbalanced data. World Wide Web 16(4), 449–475 (2013)CrossRef
50.
go back to reference Weiss, G.M., Hirsh, H.: Learning to predict rare events in event sequences. In: Proceedings of the 4th KDD, pp. 359–363. AAAI Press (1998) Weiss, G.M., Hirsh, H.: Learning to predict rare events in event sequences. In: Proceedings of the 4th KDD, pp. 359–363. AAAI Press (1998)
51.
go back to reference Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Mach. Lear. 23(1), 69–101 (1996). (ISSN 0885-6125) Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Mach. Lear. 23(1), 69–101 (1996). (ISSN 0885-6125)
52.
go back to reference Yang, Q., Wu, X.: 10 challenging problems in data mining research. Int. J. Inf. Technol. Decis. Mak. 5(4), 597–604 (2006)CrossRef Yang, Q., Wu, X.: 10 challenging problems in data mining research. Int. J. Inf. Technol. Decis. Mak. 5(4), 597–604 (2006)CrossRef
Metadata
Title
Resampling strategies for imbalanced time series forecasting
Authors
Nuno Moniz
Paula Branco
Luís Torgo
Publication date
16-02-2017
Publisher
Springer International Publishing
Published in
International Journal of Data Science and Analytics / Issue 3/2017
Print ISSN: 2364-415X
Electronic ISSN: 2364-4168
DOI
https://doi.org/10.1007/s41060-017-0044-3

Premium Partner