A periodogram-based metric for time series classification
Introduction
Classification and clustering time series is becoming an important area of research in several fields, such as economics, marketing, business, finance, medicine, biology, physics, psychology, zoology, and many others. For example, in Economics we may be interested in classifying the economic situation of a country by looking at some time series indicators, such as Gross National Product, investment expenditure, disposable income, unemployment rate or inflation rate. In Medicine, a patient may be classified in different classes using the information from the electrocardiogram time series.
The problem of identifying similarities or dissimilarities in time series data has been studied in the discrimination and clustering literature (see for instance Jonhson and Wichern, 1992). Some studies use non-parametric approaches for splitting a set of time series into clusters by looking at their Euclidean distances in the space of points. As pointed out by Galeano and Peña (2000), this metric has the important limitation of being invariant to transformations that modify the order of observations over time, and, therefore, it does not take into account the correlation structure of the time series. Piccolo (1990) introduced a metric for ARIMA models based on the autoregressive representation and applied this measure to the identification of similarities between industrial production series. Tong and Dabas (1990) investigated the affinity among some linear and non-linear fitted models by applying classical clustering techniques to the estimated residuals. Diggle and Fisher (1991) introduced a non-parametric approach to compare the spectrum of two time series based on the underlying cumulative periodograms. Diggle and al Wasel (1997) developed inference methods in spectral analysis based in the likelihood ratio to compare replicated time series data. Kakizawa et al. (1998) proposed parametric models for discriminating and clustering multivariate time series, with applications to environmental data (for discriminant analysis for time series, see also Shumway and Unger, 1974, Shumway, 1982, Dargahi-Noubary and Laycock, 1981, Dargahi-Noubary, 1992, Zhang and Taniguchi, 1994). Maharaj (2000) used a test of hypothesis in the comparison of two stationary time series based on the autoregressive parameters and proposed a classification method using the p-value of this test as a measure of similarity. Maharaj (2002) compared two non-stationary time series using the evolutionary spectra approach in order to take into account the structural changes over time. Other related works on clustering of time series are by Bohte et al. (1980), Kosmelj and Batagelj (1990), Shaw and King (1992), Maharaj (1999) and Xiong and Yeung (2004).
In this paper, we propose a metric based on the normalized periodogram and we use it for time series classification. We provide simulation results comparing this metric to the one by Piccolo (1990) and the ones based on autocorrelation, partial autocorrelation and inverse autocorrelation coefficients. In particular, we discuss the classification of time series as stationary or as non-stationary.
The remainder of the paper is organized as follows. In Section 2 we discuss briefly previous related methods on clustering time series and present our periodogram-based metrics. In Section 3 we discuss the methodology used for empirical classification of ARMA and ARIMA models and in Section 4 we present results from various approaches. In Section 5 we present an illustrative example with economic time series data to identify similarities among industrial production index series in United States, and in Section 6, we summarize the paper and discuss possible future research.
Section snippets
Time series metrics
A fundamental problem in classification analysis of time series is the choice of a relevant metric. Let be a vector time series with components represented by autoregressive integrated moving average or ARIMA models,where is the autoregressive operator of order p and is the moving average operator of order q; B is the back-shift operator and is the differencing operator of order d. The autoregressive and moving average
Methodology of time series classification
In this section we will use the following previously discussed distances for time series classification:
Step 1: Find similarities or dissimilarities between every pair of time series in the data set. For each data we compute a distance matrix with different pairs using the following metrics:
- (i)
Classical Euclidean (EUCL) distance, .
- (ii)
Piccolo's distance, . The application of this distance requires the fitting of an ARIMA model to the time series.
Simulation results
We simulated one thousand time series replications of each of the following six stationary [(a)–(f)] and six non-stationary [(g)–(l)] models. All the series have zero mean and unit variance white noise. The samples sizes were taken equal to 50, 100, 200, 500, 1000 and 10000 observations: Model (a): AR(1), with ; Model (b): AR(2), with and ; Model (c): ARMA(1,1), with and ; Model (d): ARMA(1,1),with and ; Model (e): MA(1), with ; Model (f):
Application
As an illustrative example we use the Industrial Production (by Market Group) indices in United States (source: http://www.economagic.com). The 20 time series indices (seasonally adjusted) with sample sizes of , from January 1977 to September 2002, are reported in Table 3.
Before carrying out clustering analysis, the series were transformed in differences of the logarithm, , as shown in Fig. 2, in order to get the percentages increases from period to period. This gets rid of
Conclusions
In this paper, we have studied metrics based on different dependence measures to classify time series as stationary or as non-stationary. Simulation results show that the metrics based on the logarithm of the normalized periodogram and the metric based on the autocorrelation coefficients can all distinguish empirically with high success ARMA from ARIMA models, while this does not happen with the classic Euclidean distance nor with the metric based in the autoregressive weights proposed by
Acknowledgements
This research was partially supported by a grant from the Fundação para a Ciência e Tecnologia (POCTI/FCT) and by a grant from the Fundação Calouste Gulbenkian. The third author acknowledges support from grant SEJ2004-03303 and from Fundación BBVA, Spain. Part of this work was completed during the visit of Jorge Caiado to the Department of Statistics, Universidad Carlos III de Madrid, Spain. The authors gratefully acknowledge the helpful comments and suggestions of associate editor and an
References (34)
Comparison and classifying of stationary multivariate time series
Pattern Recognition
(1999)Comparison of non-stationary time series in the frequency domain
Comput. Statist. Data Anal.
(2002)- et al.
Using cluster analysis to classify time series
Physica D
(1992) Inverse autocovariances and a measure of linear determinism for a stationary process
J. Time Ser. Anal.
(1983)Recursive estimation of the inverse correlation function
Statistica
(1986)On the estimation of the inverse correlation function
J. Time Ser. Anal.
(1988)- et al.
On unified model selection for stationary and nonstationary short and long memory autoregressive processes
Biometrika
(1998) Autoregressive and window estimates of the inverse autocorrelation function
Biometrika
(1980)A simulation study of autoregressive and window estimators of the inverse correlation function
Appl. Statist.
(1983)- et al.
Clustering of time series
Proc. COMPSTAT
(1980)
Time Series: Theory and Methods
Inverse autocorrelations
J. Roy. Statist. Soc. Ser. A
The inverse autocorrelations of a time series and their applications
Technometrics
Discrimination between Gaussian time series based on their spectral differences
Commun. Statist. Theory Methods
Spectral ratio discriminant and information theory
J. Time Ser. Anal.
Nonparametric comparison of cumulative periodograms
Appl. Statist.
Spectral analysis of replicated biomedical time series
Appl. Statist.
Cited by (198)
Combination prediction of underground mine rock drilling time based on seasonal and trend decomposition using Loess
2024, Engineering Applications of Artificial IntelligenceAn exhaustive comparison of distance measures in the classification of time series with 1NN method
2024, Journal of Computational ScienceFeature-based and shape-match classifications of animal population time series
2024, Ecological InformaticsNetwork log-ARCH models for forecasting stock market volatility
2024, International Journal of ForecastingThe voice of COVID-19: Breath and cough recording classification with temporal decision trees and random forests
2023, Artificial Intelligence in Medicine