Rule induction for forecasting method selection: Meta-learning the characteristics of univariate time series

doi:10.1016/j.neucom.2008.10.017

Neurocomputing

Volume 72, Issues 10–12, June 2009, Pages 2581-2594

https://doi.org/10.1016/j.neucom.2008.10.017 Get rights and content

Abstract

For univariate forecasting, there are various statistical models and computational algorithms available. In real-world exercises, too many choices can create difficulties in selecting the most appropriate technique, especially for users lacking sufficient knowledge of forecasting. This study focuses on rule induction for forecasting method selection by understanding the nature of historical forecasting data. A novel approach for selecting a forecasting method for univariate time series based on measurable data characteristics is presented that combines elements of data mining, meta-learning, clustering, classification and statistical measurement. We conducted a large-scale empirical study of over 300 time series using four of the most popular forecasting methods. To provide a rich portrait of the global characteristics of univariate time series, we extracted measures from a comprehensive set of features such as trend, seasonality, periodicity, serial correlation, skewness, kurtosis, nonlinearity, self-similarity, and chaos. Both supervised and unsupervised learning methods are used to learn the relationship between the characteristics of the time series and the forecasting method suitability, providing both recommendation rules, as well as visualizations in the feature space. A derived weighting schema based on the rule induction is also used to improve forecasting accuracy based on combined forecasting models.

Introduction

Time series forecasting has been a traditional research area for decades, and various statistical models and advanced computational algorithms have been developed to improve forecasting accuracy. With the continuous emergence of more methods, forecasters have been given more choices. However, more options could also create potential problems in practice especially when forecasts are based on a trial-and-error procedure with little understanding of the conditions under which certain forecasting methods perform well. Certainly the ‘no free lunch theorem’ [1] informs us that there is never likely to be a single method that fits all situations. In the forecasting context, therefore, recommendation rules on how to select a suitable forecasting method for a given type of time series have attracted attention.

From more than a decade ago, the research on forecasting methods selection attracted many attempts to find recommendations and rules [2], [3], [4], [5] mostly based on expert systems approaches. Certainly there are obvious limitations for systems based on human judgment, with the strongest concern being that expert system based rules are not dynamic and therefore require significant rework and validation prior to updating. This is not a trivial problem especially when the forecasting domain or situation changes. In this study, we aim to develop an automated rule induction system that couples forecasting methods performance with time series data characteristics. Metrics to characterize a time series are developed to provide a rich portrait of the time series including its trend, seasonality, serial correlation, nonlinearity, skewness, kurtosis, self-similarity, chaos, and periodicity. Self-organizing maps (SOMs) and decision trees (DTs) are used to induce rules explaining the relationships between these characteristics and forecasting method performance. The induced rules from such a system are envisaged to provide recommendations to forecasters on how to select forecasting methods. In the proposed system, a data-driven approach based on a meta-learning framework and machine learning algorithms are employed, reducing the dependence on expert knowledge. Such an automated system is more flexible, adaptive and efficient when situations change and rules are required to be revised.

After outlining related work in forecasting method selection, as well as relevant cross-disciplinary work in Section 2, we explain the detailed components and procedures of the proposed meta-learning based system in Section 3. Then the background knowledge on four forecasting methods which were used as candidates in our empirical study is provided in Section 4. Section 5 then follows in which each identified characteristic for univariate time series data in our study are introduced including algorithms used to extract descriptive metric for each characteristic. Three machine learning techniques used for learning the relationship between time series characteristics and forecasting method performance are discussed in Section 6. Our empirical study and experimental results including induced rules are demonstrated in Section 7. Future research directions are discussed and conclusions drawn in Section 8.

Section snippets

Related work

In the literature on forecasting method selection, there are two common approaches: (1) comparing the track record of various approaches and using expert knowledge to provide guidelines to select forecasting methods and (2) using the results of large empirical studies to estimate a relationship between data features and model performance. The first approach has been developed over many decades. To select a forecasting method, some general guidelines consisting of many factors—convenience,

Meta-learning based system for rule induction

Meta-learning was proposed to support data mining tasks and to understand the conditions under which a given learning strategy is most appropriate for a given task. Meta-learning involves a process of studying the relationships between learning strategies and tasks [15]. The central property of the meta-learning approach is to understand the nature of data, and to learn to select the method which performs best for certain types of data.

We adapt a meta-learning architecture from Vilalta's

Background: forecasting methods

Forecasting is designed to predict possible future alternatives and helps current planning and decision making. For example, the forecasting of annual student enrollment is critical information for a university to determine financial plans and design strategies. Time series analysis provides foundations for forecasting model construction and selection based on historical data. Modeling the time series is a complex problem, because the difference in characteristics of time series data can make

Time series characteristics extraction

In this study, we investigated various data characteristics from diverse perspectives related to univariate time series structure-based characteristic identification and feature extraction. We selected the nine most informative, representative and easily-measurable characteristics to summarize the time series structure. Based on these identified characteristics, corresponding metrics are calculated. The extracted data characteristics and corresponding metrics are mapped to forecasting

Machine learning techniques

After global characteristics and corresponding metrics have been defined, we then can use this finite set of descriptors to characterize or analyze time series data using appropriate machine learning techniques such as clustering algorithms and DTs. The mining of time series data has attracted great attention in the data mining community in recent years and many clustering algorithms have been applied to search for the similarity between series. k-means clustering is the most commonly used

Data sets

In our empirical study, we included various types of data sets consisting of synthetic and real-world time series from different domains such as economics, medical, and engineering. We included 46 data sets from the UCR time series data mining archive [54] which covers data sets of time series from diverse fields, including finance, medicine, biometrics, chemistry, astronomy, robotics, and networking industry. These data sets have the complete spectrum of stationary, non-stationary, noisy,

Future research and conclusions

In this research, we have focused on analyzing the nature of the time series data and developing a novel approach to generate recommendation rules for selection of forecasting methods based on data characteristics of the time series. The research work presented in this paper has not only extended the study on forecasting rules generation with a wider range of forecasting methods and algorithms, but has also deepened the research into a more specific or quantitative manner rather than merely

Xiaozhe Wang is a lecturer at School of Management, LaTrobe University. Prior to joining LaTrobe University, she obtained a Ph.D. from Monash University, and was a Research Fellow at both Monash University and the University of Melbourne, Australia. Dr. Wang also worked as senior statistician in industry after finished her Ph.D. Her research interests are data mining, machine learning, meta-learning and time series forecasting. Her research have been published in journals, book chapter and

References (59)

M. Adya et al.
Automatic identification of time series features for rule-based forecasting
International Journal of Forecasting
(2001)
B. Arinze
Selecting appropriate forecasting models using rule induction
Omega international journal of management science
(1994)
V. Mahajan et al.
New product forecasting models: directions for research and implementation
International Journal of Forecasting
(1988)
C. Shah
Model selection in univariate time series forecasting using discriminant analysis
International Journal of Forecasting
(1997)
J.R. Rice
The algorithm selection problem
Advances in Computers
(1976)
R.B.C. Prudêncio et al.
Meta-learning approaches to selecting time series models
Neurocomputing
(2004)
R.J. Hyndman et al.
A state space framework for automatic forecasting using exponential smoothing methods
International Journal of Forecasting
(2002)
J.S. Armstrong et al.
Error measures for generalizing about forecasting methods: empirical comparisons
International Journal of Forecasting
(1992)
D.H. Wolpert et al.
No free lunch theorems for optimization
IEEE Transactions on Evolutionary Computation
(1996)
F. Collopy et al.
Rule-based forecasting: development and validation of an expert systems approach to combining time series extrapolations
Management Science
(1992)

N. Meade

Evidence for the selection of forecasting methods

International Journal of Forecasting

(2000)

J.S. Armstrong

Research needs in forecasting

International Journal of Forecasting

(1988)

J.C. Chambers et al.

How to choose the right forecasting technique

Harvard Business Review

(1971)

D.M. Georgoff et al.

Manager's guide to forecasting

Harvard Business Review

(1986)

L. Moutinho et al.

Expert systems: a new tool in marketing

Qualitative Review in Marketing

(1988)

D.J. Reid, A comparison of forecasting techniques on economic time series, in: Forecasting in Action, OR Society,...

K.A. Smith-Miles, Cross-disciplinary perspectives on meta-learning for algorithm selection, ACM Computing Surveys 41...

R. Vilalta et al.

Using meta-learning to support data-mining

International Journal of Computer Science Applications I

(2004)

J.S. Armstrong, (Ed.), Principles of Forecasting: A Handbook for Researchers and Practitioners, Kluwer Academic...

S. Makridakis et al.

Forecasting Methods and Applications

(1998)

A.R. Ganguly, Hybrid statistical and data mining approaches for forecasting complex systems, in: Proceedings of the...

C.C. Pegels

Exponential forecasting: some new variations

Management Science

(1969)

E.S. Gardner

Exponential smoothing: the state of the art

International Journal of Forecasting

(1985)

G.E.P. Box, G.M. Jenkins, Time Series Analysis: Forecasting and Control, Holden-Day, San Fransisco, CA,...

G.E.P. Box et al.

Time Series Analysis: Forecasting and Control

(1994)

S.C. Ahalt, P. Chen, C.T. Chou, Proceedings of the Second International IEEE Conference on Tools for Artificial...

Forecasting-Principles, Forecasting with Artificial Neural Networks, Special Interest Group,...

P.J. Werbos, Beyong regression: new tools for prediction and analysis in the behavioral sciences, Ph.D. Thesis, Harvard...

J.M. Zurada

An Introduction to Artificial Neural Systems

(1992)

Cited by (153)

Contextual Dependency Vision Transformer for spectrogram-based multivariate time series analysis
2024, Neurocomputing
Multivariate time series (MTS) analysis plays an important role in various real-world applications. Existing Transformer-based methods address this problem based on hierarchical semantic representations across different scales. However, most of them ignore exploiting the helpful multiple temporal and variable relationships within the hierarchical semantic representations. To this end, this paper proposes a novel method named Contextual Dependency Vision Transformer (CD-ViT), which generates multi-grained semantic information based on spectrogram and explores mutual dependencies between multi-variable and multi-temporal representations. CD-ViT contains two key modules, i.e., the Hierarchical Variable-dependency Transformer (HVT) module and the Bidirectional Temporal-dependency Interaction (BTI) module. Specifically, the HVT module progressively establishes mutual dependencies between multiple variables, from fine to coarse scales, with shared parameters. The BTI module employs two bidirectional flows to fuse multi-temporal tokens through zoom-in and zoom-out operations. Comprehensive experiments on widely used datasets, including UEA, Olszewski, UCI, MIMIC III, and ETT, demonstrate that the proposed approach achieves significant improvement on three popular tasks, i.e., classification, regression, and forecasting. The code is available at https://github.com/Kali-github/CD-ViT.
Current status, challenges, and prospects of data-driven urban energy modeling: A review of machine learning methods
2023, Energy Reports
Urban energy modeling is essential in planning electricity generation and efficiently managing electric power systems. Various urban energy models were developed for several energy-driven applications, including emission reduction, retrofit analysis, and forecasting. Electricity load forecasts help to estimate the load demand and effectively aid in power system operation and balancing. The accuracy of load forecasts at high temporal and spatial resolution can impact system planning and operation. Therefore, it is essential to know the factors that affect the accuracy of these forecasts and how they can be improved regarding the current state of the art. This article reviews the recent literature on data-driven electricity load forecasts in three steps. First, different phases of the review process are explained to select and analyze recent literature on machine learning-based short-term load forecasts. Then various aspects of load forecasting techniques have been reviewed, addressing their advantages, disadvantages, temporal resolution, and performance. Finally, the review covers the current challenges in load forecasting and describes the reasons for performance degradation and lower accuracy. Based on the reviewed literature, it was found that temperature, user load profiles, and proper management of input data highly affect load forecast accuracy. In addition, shortcomings of existing performance evaluation metrics make the applicability of those techniques questionable. Finally, we conclude the review by highlighting the necessary actions to improve load forecast accuracy that are relatively unexplored and can be used as a reference for future research on accurate load forecasts.
Forecast combinations: An over 50-year review
2023, International Journal of Forecasting
Forecast combinations have flourished remarkably in the forecasting community and, in recent years, have become part of mainstream forecasting research and activities. Combining multiple forecasts produced for a target time series is now widely used to improve accuracy through the integration of information gleaned from different sources, thereby avoiding the need to identify a single “best” forecast. Combination schemes have evolved from simple combination methods without estimation to sophisticated techniques involving time-varying weights, nonlinear combinations, correlations among components, and cross-learning. They include combining point forecasts and combining probabilistic forecasts. This paper provides an up-to-date review of the extensive literature on forecast combinations and a reference to available open-source software implementations. We discuss the potential and limitations of various methods and highlight how these ideas have developed over time. Some crucial issues concerning the utility of forecast combinations are also surveyed. Finally, we conclude with current research gaps and potential insights for future research.
A two-phased cluster-based approach towards ranked forecast-model selection
2023, Machine Learning with Applications
Sales forecasting processes are usually automated to some extent in the retail sector and practitioners often have limited knowledge pertaining to the selection of appropriate forecasting methods. In this paper, we propose a generic two-phased, cluster-based framework capable of assisting retail forecasting practitioners in the selection of appropriate forecasting methods for time series of their retail sales data. One phase of the framework, called the benchmarking phase, involves establishing a benchmark data set (or updating it if it already exists) which can be leveraged to inform feature-based forecast model identification and ranking for different clusters of time series. The computationally efficient identification of a tailored shortlist of forecast models is thus facilitated during the other framework phase, called the implementation phase, for each sales time series presented to it by a retail organisation, based on the features of the time series presented. The two phases of the framework may be applied repeatedly in alternating fashion, enlarging the benchmark data set and improving its representativeness each time after having applied the implementation phase to the sales time series of a new retail organisation by re-applying the processes of the benchmarking phase. One iteration of this alternating application of the two framework phases is demonstrated and validated in respect of the M5 forecasting competition data (employed during the benchmarking phase) and a data set of the retail chain Corporacion Favorita (subsequently employed during the implementation phase).
Exploring the association between time series features and forecasting by temporal aggregation using machine learning
2023, Neurocomputing
When a forecast of the total value over several time periods ahead is required, forecasters are presented with two temporal aggregation (TA) approaches to produce required forecasts: i) aggregated forecast (AF) or ii) aggregate data using non-overlapping temporal aggregation (AD). Often, the recommendation is to aggregate data to a frequency relevant to the decision the eventual forecast will support and then produce the forecast. However, this might not be always the best choice and we argue that both AF and AD approaches may outperform each other in different situations. Moreover, there is a lack of evidence on what indicators may determine the superiority of each approach. We design and execute an empirical experiment framework to first explore the performance of these approaches using monthly time series of M4 competition dataset. We further turn the problem into a classification supervised learning by constructing a database consisting of features of each time series as predictor and model class labelled as AF/AD as response/outcome. We then build machine learning algorithms to investigate the association between time series features and the performance of AF and AD. Our findings suggest that both AF and AD approaches may not consistently generate accurate results for every individual series. AF is shown to be significantly better than AD for the monthly M4 time series, especially for longer horizons. We build several machine learning approaches using a set of extracted time series features as input to predict accurately whether AD or AF should be used. We find out that Random Forest (RF) is the most accurate approach in correctly classifying the outcome assessed both by statistical measures such as misclassification error, F-statistics, area under the curve, and a utility measure. The RF approach reveals that curvature, nonlinearity, seas_pacf, unitroot_pp, mean, ARCHM.LM, Coefficient of Variation, stability, linearity, and max_level_shif are among the most important features in driving the predictions of the model. Our findings indicate that the strength of trend, ARCH.LM, hurst, autocorrelation lag 1, unitroot_pp, and seas_pacf may favour AF approach, while lumpiness, entropy, nonlinearity, curvature, and strength of seasonality may increase the chance of AD performing better. We conclude the study by summarising the findings and present an agenda for further research.
Model combinations through revised base rates
2023, International Journal of Forecasting
Standard selection criteria for forecasting models focus on information that is calculated for each series independently, disregarding the general tendencies and performance of the candidate models. In this paper, we propose a new way to perform statistical model selection and model combination that incorporates the base rates of the candidate forecasting models, which are then revised so that the per-series information is taken into account. We examine two schemes that are based on the precision and sensitivity information from the contingency table of the base rates. We apply our approach on pools of either exponential smoothing or ARMA models, considering both simulated and real time series, and show that our schemes work better than standard statistical benchmarks. We test the significance and sensitivity of our results, discuss the connection of our approach to other cross-learning approaches, and offer insights regarding implications for theory and practice.

View all citing articles on Scopus

Kate Smith-Miles is a Professor and Head of the School of Mathematical Sciences at Monash University in Australia. She obtained a B.Sc.(Hons.) in Mathematics and a Ph.D. in Electrical Engineering, both from the University of Melbourne, Australia. Kate has published two books on neural networks and data mining applications, and over 175 refereed journal and international conference papers in the areas of neural networks, combinatorial optimization, intelligent systems and data mining. She has been awarded over AUD $1.75 million in competitive grants, including eight Australian Research Council grants and industry awards. She is on the editorial board of several international journals, including IEEE Transactions on Neural Networks, has been program chair for several international conferences (e.g. HIS’03, CIDM’09) and has chaired the IEEE Computational Intelligence Society's Technical Committee on Data Mining (2007–2008). She is a frequent reviewer of international research activities including grant applications in Canada, UK, Finland, Singapore and Australia, refereeing for international research journals, and Ph.D. examinations. In addition to her academic activities, she also regularly acts as a consultant to industry in the areas of optimization, data mining and intelligent systems.

Rob Hyndman is Professor of Statistics at Monash University, Australia, and holds a Ph.D. in Statistics from the University of Melbourne. He is the Editor-in-Chief of the International Journal of Forecasting and Director of the Business and Economic Forecasting Unit, Monash University, one of the leading forecasting research groups in the world. He is currently supervising seven Ph.D. students on forecasting-related projects. Rob is also an experienced consultant and has worked with over 200 clients during the last 20 years, on projects covering all areas of applied statistics from forecasting to the ecology of lemmings. He is co-author of the well-known textbook Forecasting: Methods and Applications (Wiley, 1998) with Makridakis and Wheelwright, and has had more than 50 published papers in many journals.

View full text

Rule induction for forecasting method selection: Meta-learning the characteristics of univariate time series

Abstract

Introduction

Section snippets

Related work

Meta-learning based system for rule induction

Background: forecasting methods

Time series characteristics extraction

Machine learning techniques

Data sets

Future research and conclusions

International Journal of Forecasting

Omega international journal of management science

International Journal of Forecasting

International Journal of Forecasting

Advances in Computers

Neurocomputing

International Journal of Forecasting

International Journal of Forecasting

No free lunch theorems for optimization

IEEE Transactions on Evolutionary Computation

Rule-based forecasting: development and validation of an expert systems approach to combining time series extrapolations

Management Science

Evidence for the selection of forecasting methods

International Journal of Forecasting

Research needs in forecasting

International Journal of Forecasting

How to choose the right forecasting technique

Harvard Business Review

Manager's guide to forecasting

Harvard Business Review

Expert systems: a new tool in marketing

Qualitative Review in Marketing

Using meta-learning to support data-mining

International Journal of Computer Science Applications I

Forecasting Methods and Applications

Exponential forecasting: some new variations

Management Science

Exponential smoothing: the state of the art

International Journal of Forecasting

Time Series Analysis: Forecasting and Control

An Introduction to Artificial Neural Systems