Skip to main content
Erschienen in: International Journal of Data Science and Analytics 3/2017

16.02.2017 | Regular Paper

Resampling strategies for imbalanced time series forecasting

verfasst von: Nuno Moniz, Paula Branco, Luís Torgo

Erschienen in: International Journal of Data Science and Analytics | Ausgabe 3/2017

Einloggen

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Time series forecasting is a challenging task, where the non-stationary characteristics of data portray a hard setting for predictive tasks. A common issue is the imbalanced distribution of the target variable, where some values are very important to the user but severely under-represented. Standard prediction tools focus on the average behaviour of the data. However, the objective is the opposite in many forecasting tasks involving time series: predicting rare values. A common solution to forecasting tasks with imbalanced data is the use of resampling strategies, which operate on the learning data by changing its distribution in favour of a given bias. The objective of this paper is to provide solutions capable of significantly improving the predictive accuracy on rare cases in forecasting tasks using imbalanced time series data. We extend the application of resampling strategies to the time series context and introduce the concept of temporal and relevance bias in the case selection process of such strategies, presenting new proposals. We evaluate the results of standard forecasting tools and the use of resampling strategies, with and without bias over 24 time series data sets from six different sources. Results show a significant increase in predictive accuracy on rare cases associated with using resampling strategies, and the use of biased strategies further increases accuracy over non-biased strategies.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Akbilgic, O., Bozdogan, H., Balaban, M.E.: A novel hybrid RBF neural networks model as a forecaster. Stat. Comput. 24(3), 365–375 (2014)MathSciNetCrossRefMATH Akbilgic, O., Bozdogan, H., Balaban, M.E.: A novel hybrid RBF neural networks model as a forecaster. Stat. Comput. 24(3), 365–375 (2014)MathSciNetCrossRefMATH
2.
Zurück zum Zitat Basu, S., Meckesheimer, M.: Automatic outlier detection for time series: an application to sensor data. Knowl. Inf. Syst. 11(2), 137–154 (2007). (ISSN 0219-1377)CrossRef Basu, S., Meckesheimer, M.: Automatic outlier detection for time series: an application to sensor data. Knowl. Inf. Syst. 11(2), 137–154 (2007). (ISSN 0219-1377)CrossRef
3.
Zurück zum Zitat Branco, P.: Re-sampling Approaches for Regression Tasks Under Imbalanced Domains. Master’s thesis, Universidade do Porto (2014) Branco, P.: Re-sampling Approaches for Regression Tasks Under Imbalanced Domains. Master’s thesis, Universidade do Porto (2014)
5.
Zurück zum Zitat Branco, P., Torgo, L., Ribeiro, R.P.: A survey of predictive modeling on imbalanced domains. ACM Comput. Surv 49(2), 31:1–31:50 (2016b)CrossRef Branco, P., Torgo, L., Ribeiro, R.P.: A survey of predictive modeling on imbalanced domains. ACM Comput. Surv 49(2), 31:1–31:50 (2016b)CrossRef
6.
Zurück zum Zitat Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Dbsmote: density-based synthetic minority over-sampling technique. Appl. Intell. 36(3), 664–684 (2012)CrossRef Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Dbsmote: density-based synthetic minority over-sampling technique. Appl. Intell. 36(3), 664–684 (2012)CrossRef
7.
Zurück zum Zitat Cao, L., Zhao, Y., Zhang, C.: Mining impact-targeted activity patterns in imbalanced data. IEEE Trans. Knowl. Data Eng. 20(8), 1053–1066 (2008)CrossRef Cao, L., Zhao, Y., Zhang, C.: Mining impact-targeted activity patterns in imbalanced data. IEEE Trans. Knowl. Data Eng. 20(8), 1053–1066 (2008)CrossRef
8.
Zurück zum Zitat Chatfield, C.: The Analysis of Time Series: An Introduction, 6th edn. CRC Press, Boca Raton (2004)MATH Chatfield, C.: The Analysis of Time Series: An Introduction, 6th edn. CRC Press, Boca Raton (2004)MATH
9.
Zurück zum Zitat Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. JAIR 16, 321–357 (2002)MATH Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. JAIR 16, 321–357 (2002)MATH
10.
Zurück zum Zitat Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. Newsl. 6(1), 1–6 (2004). (ISSN 1931-0145)CrossRef Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. Newsl. 6(1), 1–6 (2004). (ISSN 1931-0145)CrossRef
11.
Zurück zum Zitat Chen, J., He, H., Williams, G.J., Jin, H.: Temporal sequence associations for rare events. In: Proceedings of the 8th PAKDD, pp. 235–239. Springer (2004) Chen, J., He, H., Williams, G.J., Jin, H.: Temporal sequence associations for rare events. In: Proceedings of the 8th PAKDD, pp. 235–239. Springer (2004)
12.
Zurück zum Zitat Dougherty, R.L., Edelman, A., Hyman, J.M.: Nonnegativity-, monotonicity-, or convexity-preserving cubic and quintic Hermite interpolation. Math. Comput. 52(186), 471–494 (1989). doi:10.2307/2008477. (ISSN 00255718)MathSciNetCrossRefMATH Dougherty, R.L., Edelman, A., Hyman, J.M.: Nonnegativity-, monotonicity-, or convexity-preserving cubic and quintic Hermite interpolation. Math. Comput. 52(186), 471–494 (1989). doi:10.​2307/​2008477. (ISSN 00255718)MathSciNetCrossRefMATH
13.
Zurück zum Zitat Elkan, C.: The foundations of cost-sensitive learning. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence—Volume 2, IJCAI’01, pp. 973–978. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2001) (ISBN 1-55860-812-5, 978-1-558-60812-2) Elkan, C.: The foundations of cost-sensitive learning. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence—Volume 2, IJCAI’01, pp. 973–978. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2001) (ISBN 1-55860-812-5, 978-1-558-60812-2)
14.
Zurück zum Zitat Fanaee-T, H., Gama, J.: Event labeling combining ensemble detectors and background knowledge. Prog. Artif. Intell. 2(2–3),113–127 (2014) (ISSN 2192-6352) Fanaee-T, H., Gama, J.: Event labeling combining ensemble detectors and background knowledge. Prog. Artif. Intell. 2(2–3),113–127 (2014) (ISSN 2192-6352)
15.
Zurück zum Zitat Fawcett, T., Provost, F.: Activity monitoring: noticing interesting changes in behavior. In: Proceedings of the 5th ACM SIGKDD, pp. 53–62 (1999) Fawcett, T., Provost, F.: Activity monitoring: noticing interesting changes in behavior. In: Proceedings of the 5th ACM SIGKDD, pp. 53–62 (1999)
16.
Zurück zum Zitat Hoens, T.R., Qian, Q., Chawla, N.V., Zhou, Z.-H.: Building decision trees for the multi-class imbalance problem. In: Proceedings of the 16th PAKDD, pp. 122–134. Springer, Berlin (2012) Hoens, T.R., Qian, Q., Chawla, N.V., Zhou, Z.-H.: Building decision trees for the multi-class imbalance problem. In: Proceedings of the 16th PAKDD, pp. 122–134. Springer, Berlin (2012)
17.
Zurück zum Zitat Hyndman, R., Khandakar, Y.: Automatic time series forecasting: the forecast package for r. J. Stat. Soft. 27(1), 1–22 (2008). (ISSN 1548-7660) Hyndman, R., Khandakar, Y.: Automatic time series forecasting: the forecast package for r. J. Stat. Soft. 27(1), 1–22 (2008). (ISSN 1548-7660)
18.
Zurück zum Zitat Ihler, A., Hutchins, J., Smyth, P.: Adaptive event detection with time-varying poisson processes. In: Proceedings of the 12th ACM SIGKDD, pp. 207–216. New York, NY, USA (2006) Ihler, A., Hutchins, J., Smyth, P.: Adaptive event detection with time-varying poisson processes. In: Proceedings of the 12th ACM SIGKDD, pp. 207–216. New York, NY, USA (2006)
19.
Zurück zum Zitat Japkowicz, N.: The class imbalance problem: significance and strategies. In: Proceedings of the 2000 International Conference on Artificial Intelligence (ICAI), pp. 111–117 (2000) Japkowicz, N.: The class imbalance problem: significance and strategies. In: Proceedings of the 2000 International Conference on Artificial Intelligence (ICAI), pp. 111–117 (2000)
20.
Zurück zum Zitat Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)MATH Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)MATH
21.
Zurück zum Zitat Jo, T., Japkowicz, N.: Class imbalances versus small disjuncts. ACM Sigkdd Explor. Newsl. 6(1), 40–49 (2004)CrossRef Jo, T., Japkowicz, N.: Class imbalances versus small disjuncts. ACM Sigkdd Explor. Newsl. 6(1), 40–49 (2004)CrossRef
22.
Zurück zum Zitat Keogh, E., Lonardi, S., Chiu, B.Y-c.: Finding surprising patterns in a time series database in linear time and space. In: Proceedings of the 8th ACM SIGKDD, pp. 550–556. New York, NY, USA (2002) Keogh, E., Lonardi, S., Chiu, B.Y-c.: Finding surprising patterns in a time series database in linear time and space. In: Proceedings of the 8th ACM SIGKDD, pp. 550–556. New York, NY, USA (2002)
23.
Zurück zum Zitat Koprinska, I., Rana, M., Agelidis, V.G.: Yearly and seasonal models for electricity load forecasting. In: Proceedings of 2011 IJCNN, pp. 1474–1481 (2011) Koprinska, I., Rana, M., Agelidis, V.G.: Yearly and seasonal models for electricity load forecasting. In: Proceedings of 2011 IJCNN, pp. 1474–1481 (2011)
24.
Zurück zum Zitat Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: One-sided selection. In: Proceedings of the 14th ICML, pp. 179–186. Morgan Kaufmann, Nashville (1997) Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: One-sided selection. In: Proceedings of the 14th ICML, pp. 179–186. Morgan Kaufmann, Nashville (1997)
25.
Zurück zum Zitat Laurikkala, J.: Improving identification of difficult small classes by balancing class distribution. In: Conference on Artificial Intelligence in Medicine in Europe, pp. 63–66. Springer (2001) Laurikkala, J.: Improving identification of difficult small classes by balancing class distribution. In: Conference on Artificial Intelligence in Medicine in Europe, pp. 63–66. Springer (2001)
26.
Zurück zum Zitat Li, K., Zhang, W., Lu, Q., Fang, X.: An improved smote imbalanced data classification method based on support degree. In: Proceedings of 2014 International Conference IIKI, pp. 34–38. IEEE (2014) Li, K., Zhang, W., Lu, Q., Fang, X.: An improved smote imbalanced data classification method based on support degree. In: Proceedings of 2014 International Conference IIKI, pp. 34–38. IEEE (2014)
27.
Zurück zum Zitat Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 18–22 (2002) Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 18–22 (2002)
28.
Zurück zum Zitat Lin, J., Keogh, E.J., Fu, A., Van Herle, H.: Approximations to magic: finding unusual medical time series. In: CBMS, pp. 329–334. IEEE Computer Society (2005) (ISBN 0-7695-2355-2) Lin, J., Keogh, E.J., Fu, A., Van Herle, H.: Approximations to magic: finding unusual medical time series. In: CBMS, pp. 329–334. IEEE Computer Society (2005) (ISBN 0-7695-2355-2)
29.
Zurück zum Zitat Liu, X.-Y., Wu, J., Zhou, Z.-H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 39(2), 539–550 (2009)CrossRef Liu, X.-Y., Wu, J., Zhou, Z.-H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 39(2), 539–550 (2009)CrossRef
30.
Zurück zum Zitat López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250, 113–141 (2013)CrossRef López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250, 113–141 (2013)CrossRef
31.
Zurück zum Zitat Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F.: e1071: Misc Functions of the Department of Statistics (e1071), TU Wien, R package version 1.6-1 (2012) Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F.: e1071: Misc Functions of the Department of Statistics (e1071), TU Wien, R package version 1.6-1 (2012)
32.
Zurück zum Zitat Milborrow, S.: earth: Multivariate Adaptive Regression Spline Models (2013) Milborrow, S.: earth: Multivariate Adaptive Regression Spline Models (2013)
33.
Zurück zum Zitat Moniz, N., Branco, P., Torgo, L.: Resampling strategies for imbalanced time series. In: Proceedings 3rd IEEE International Conference on Data Science and Advanced Analytics (DSAA), Montreal, Canada (2016) Moniz, N., Branco, P., Torgo, L.: Resampling strategies for imbalanced time series. In: Proceedings 3rd IEEE International Conference on Data Science and Advanced Analytics (DSAA), Montreal, Canada (2016)
34.
Zurück zum Zitat Oliveira, M., Torgo, L.: Ensembles for time series forecasting. In: Proceedings of the 6th Asian Conference on Machine Learning (ACML), Nha Trang City, Vietnam (2014) Oliveira, M., Torgo, L.: Ensembles for time series forecasting. In: Proceedings of the 6th Asian Conference on Machine Learning (ACML), Nha Trang City, Vietnam (2014)
35.
Zurück zum Zitat R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2014) R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2014)
36.
Zurück zum Zitat Ribeiro, R.: Utility-Based Regression. Ph.D. thesis, Department of Computer Science, Faculty of Sciences—University of Porto (2011) Ribeiro, R.: Utility-Based Regression. Ph.D. thesis, Department of Computer Science, Faculty of Sciences—University of Porto (2011)
37.
Zurück zum Zitat Scott, S.L.: Detecting network intrusion using a markov modulated nonhomogeneous poisson process. Submitt. J. ASA (2001) Scott, S.L.: Detecting network intrusion using a markov modulated nonhomogeneous poisson process. Submitt. J. ASA (2001)
38.
Zurück zum Zitat Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: Rusboost: a hybrid approach to alleviating class imbalance. IEEE Trans. SMC 40(1), 185–197 (2010) Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: Rusboost: a hybrid approach to alleviating class imbalance. IEEE Trans. SMC 40(1), 185–197 (2010)
41.
Zurück zum Zitat Tong, H., Thanoon, B., Gudmundsson, G.: Threshold time series modeling of two icelandic riverflow systems1. JAWRA 21(4), 651–662 (1985). (ISSN 1752-1688) Tong, H., Thanoon, B., Gudmundsson, G.: Threshold time series modeling of two icelandic riverflow systems1. JAWRA 21(4), 651–662 (1985). (ISSN 1752-1688)
42.
Zurück zum Zitat Torgo, L.: Data Mining with R, Learning with Case Studies. Chapman and Hall/CRC, Boca Raton (2010)CrossRef Torgo, L.: Data Mining with R, Learning with Case Studies. Chapman and Hall/CRC, Boca Raton (2010)CrossRef
43.
Zurück zum Zitat Torgo, L.: An infra-structure for performance estimation and experimental comparison of predictive models in R. CoRR. arXiv:1412.0436 (2014) Torgo, L.: An infra-structure for performance estimation and experimental comparison of predictive models in R. CoRR. arXiv:​1412.​0436 (2014)
44.
Zurück zum Zitat Torgo, L., Ribeiro, R.: Utility-based regression. In: Springer, editor, Proceedings of 11th PKDD, pp. 597–604 (2007) Torgo, L., Ribeiro, R.: Utility-based regression. In: Springer, editor, Proceedings of 11th PKDD, pp. 597–604 (2007)
45.
Zurück zum Zitat Torgo, L., Branco, P., Ribeiro, R.P., Pfahringer, B.: Resampling strategies for regression. Expert Syst. 32(3), 465–476 (2015)CrossRef Torgo, L., Branco, P., Ribeiro, R.P., Pfahringer, B.: Resampling strategies for regression. Expert Syst. 32(3), 465–476 (2015)CrossRef
46.
Zurück zum Zitat Torgo, L., Ribeiro, R.P., Pfahringer, B., Branco, P.: Smote for regression. In: Progress in Artificial Intelligence, pp. 378–389. Springer (2013) Torgo, L., Ribeiro, R.P., Pfahringer, B., Branco, P.: Smote for regression. In: Progress in Artificial Intelligence, pp. 378–389. Springer (2013)
47.
Zurück zum Zitat Vilalta, R., Ma, S.: Predicting rare events in temporal domains. In: Proceedings of the 2002 IEEE ICDM, pp. 474–481 (2002) Vilalta, R., Ma, S.: Predicting rare events in temporal domains. In: Proceedings of the 2002 IEEE ICDM, pp. 474–481 (2002)
48.
Zurück zum Zitat Wallace, B.C., Small, K., Brodley, C.E., Trikalinos, T.A.: Class imbalance, redux. In: Proceedings of 11th ICDM, pp. 754–763. IEEE (2011) Wallace, B.C., Small, K., Brodley, C.E., Trikalinos, T.A.: Class imbalance, redux. In: Proceedings of 11th ICDM, pp. 754–763. IEEE (2011)
49.
Zurück zum Zitat Wei, W., Li, J., Cao, L., Yuming, O., Chen, J.: Effective detection of sophisticated online banking fraud on extremely imbalanced data. World Wide Web 16(4), 449–475 (2013)CrossRef Wei, W., Li, J., Cao, L., Yuming, O., Chen, J.: Effective detection of sophisticated online banking fraud on extremely imbalanced data. World Wide Web 16(4), 449–475 (2013)CrossRef
50.
Zurück zum Zitat Weiss, G.M., Hirsh, H.: Learning to predict rare events in event sequences. In: Proceedings of the 4th KDD, pp. 359–363. AAAI Press (1998) Weiss, G.M., Hirsh, H.: Learning to predict rare events in event sequences. In: Proceedings of the 4th KDD, pp. 359–363. AAAI Press (1998)
51.
Zurück zum Zitat Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Mach. Lear. 23(1), 69–101 (1996). (ISSN 0885-6125) Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Mach. Lear. 23(1), 69–101 (1996). (ISSN 0885-6125)
52.
Zurück zum Zitat Yang, Q., Wu, X.: 10 challenging problems in data mining research. Int. J. Inf. Technol. Decis. Mak. 5(4), 597–604 (2006)CrossRef Yang, Q., Wu, X.: 10 challenging problems in data mining research. Int. J. Inf. Technol. Decis. Mak. 5(4), 597–604 (2006)CrossRef
Metadaten
Titel
Resampling strategies for imbalanced time series forecasting
verfasst von
Nuno Moniz
Paula Branco
Luís Torgo
Publikationsdatum
16.02.2017
Verlag
Springer International Publishing
Erschienen in
International Journal of Data Science and Analytics / Ausgabe 3/2017
Print ISSN: 2364-415X
Elektronische ISSN: 2364-4168
DOI
https://doi.org/10.1007/s41060-017-0044-3