Elsevier

Information Systems

Volume 51, July 2015, Pages 27-42
Information Systems

A distance based time series classification framework

https://doi.org/10.1016/j.is.2015.02.005Get rights and content

Highlights

  • A time series classification framework is proposed.

  • Some alignment techniques are implemented in it.

  • k-NN and SVM classification modules are available.

  • 40 different datasets are classified by using the framework.

  • The performance of the alignment techniques is compared.

Abstract

One of the challenging tasks in machine learning is the classification of time series. It is not very different from standard classification except that the time shifts across time series should be corrected by using a suitable alignment algorithm. In this study, we proposed a framework designed for distance based time series classification which enables users to easily apply different alignment and classification methods to different time series datasets. The framework can be extended to implement new alignment and classification algorithms. Using the framework, we implemented the k-Nearest Neighbor and Support Vector Machines classifiers as well as the alignment methods Dynamic Time Warping, Signal Alignment via Genetic Algorithm, Parametric Time Warping and Canonical Time Warping. We also evaluated the framework on UCR time series repository for which we can conclude that a suitable alignment method enhances the time series classification performance on nearly every dataset.

Introduction

A time series is produced by recording successive measurements of some quantity over time. There are many application fields working with time series [1], [2], [3]. For instance, in the search for exoplanets a time series is created by periodically recording the brightness of a target star, which is called a star light. The time series is then classified with respect to a set of references in order to find unexpected dips due to a planet passing directly between the star and the observer. More than a thousand potential exoplanet has been discovered in the Kepler project by time series analysis [4].

Time series classification plays an important role in many applications such as signature verification [5], speech and handwriting recognition or change detection in mutation analysis. It has been a topic of great interest which has two fundamental components: (i) classification and (ii) alignment. In the classification component, the well known methods, such as the k-Nearest Neighbor (k-NN) and Support Vector Machines (SVM) [6], can be used in their original forms. In these methods, the distance between a pair of time series is calculated by using standard Euclidean metric because of its easy and fast implementation. A time series is created by recording measurements over time. The imperfections of measurement device or differences in the examined subjects result in distortion in the time axis called time drifts or retention time difference [7] which reduces the classification performance. In order to eliminate the distortions, one should make non-linear adjustments often called alignment [8].

Alignment involves the elimination of temporal variations by stretching or compressing the time axis of one or both time series [9]. Another equivalent definition of alignment is to create a mapping between a pair of time series by fitting a warping function. Although the alignment methods may follow very different strategies, they all aim to produce a mapping which is then used to correct the time drifts in the time series. The newly produced and time corrected time series are called aligned time series.

Alignment methods can be used to improve the performance of a classifier by integrating the alignment method into the distance calculation. In this setup, whenever a distance calculation between a pair of time series is requested by the classifier, the time series are first aligned, then the Euclidean distance of the aligned time series is returned to the classifier. By using such an approach, the alignment becomes an integral part of the distance calculation so that it is usually considered as a distance measure [10].

The methods proposed in the studies on alignment techniques are usually considered as new alternatives to the standard Euclidean distance metric or other distance measures based on different alignment methods. For instance, Dynamic Time Warping (DTW) [11], one of the first alignment techniques emerging from the spoken word recognition field, is perceived as a new distance measure. DTW is indeed a generalization of the Minkowski distance which can handle time series of different lengths [12]. However, many other distance measures, such as the Euclidean distance, cannot be applied to time series of different lengths. In such cases, the time series first need to be “aligned” by re-interpolating them to equal length. This implies that a distance measure does not always behave as an alignment technique but rather works in tandem with alignment methods. Therefore, in this study, it is preferred to distinguish alignment methods from distance measures even if the alignment is a variant of some distance measure.

Another challenge in time series classification is the application specific adjustments either in alignment or data processing steps. For instance, multi-dimensional time series are often converted to one dimension by summing, averaging or using other dimensional reduction algorithms because of the fact that the majority of the alignment algorithms are designed to work only with one-dimensional time series [5]. Amplitude normalization, baseline correction or measuring the quality of the alignment are the other minor tasks to be handled in times series classification [13], [14]. As a result, many alignment techniques proposed in the literature are entangled with the application specific tunings which make them unusable for standalone alignment purposes. For the same reason, the performance of the proposed alignment methods can be evaluated on a very limited number of application domains. The studies about time series classification have also a reasonable tendency to focus only on the field of study. Therefore, there are only a few studies trying to test their methods on datasets from different application domains.

In order to overcome the difficulties highlighted above, a public time series classification framework is proposed to make a clear distinction between classification and alignment.1 The proposed framework enables one to freely change the alignment method and observe its performance without dealing with classification. Likewise, different classification techniques can be tested by keeping alignment method fixed. New classification or alignment algorithms can also be integrated to the framework by implementing the related interfaces. The framework also benefits parallel computing resources if available.

By using the proposed framework, two classification and four alignment methods were tested on 40 different datasets kindly provided by Eamonn Keogh of University of California, Riverside [15]. The most significant outcome of the experiments is that using an alignment method dramatically improves the classification performance on nearly every dataset. The second finding is that the performance of alignment techniques heavily relies on the characteristics of investigated dataset as such an example is analyzed in the experiments. Parallel programming feature of the framework was also tested by utilizing the facilities in High Performance Computing Center of Turkey. The framework has been designed as an open source project, so that researchers can implement their own algorithms.

The rest of this paper is organized as follows: In Section 2 we surveyed the literature on time series classification algorithms. In Section 3, we presented the framework and its current classification and alignment implementations. In Section 4, we experimented the framework with the dataset. The experimental setup and the results were also given. In the last section, we gave a conclusion and future work.

Section snippets

Related work

The studies in time series classification can be analyzed in three main categories with respect to the classification scheme: feature based, model based and distance based [16]. Our classification framework falls into the third category.

The classification framework

The main objective of this work is to propose a flexible and easy to use framework suitable for distance based classification of time series. The framework has classification and alignment components that are flexible to allow implementing different classification or alignment methods. The connections between the two components are handled within the framework as well as the other tasks required in time series classification such as performance evaluation of classifier and visualization of data

Datasets

The collection of time series datasets used in this study is kindly provided by Eamonn Keogh from University of California, Riverside (UCR) [15]. The datasets in this collection come from many diverse applications including person tracking with computer vision [51], monitoring of fish migration [52] and classification of leaves of Swedish trees [53]. The time series in each dataset is provided with class labels as well as training and testing sets. A summary of the datasets is shown in Table 1.

Experimental results

Using the proposed framework, we conducted the following experiments:

  • In Section 5.1, we compared the alignment methods in terms of the smoothness of the warping functions.

  • In Section 5.2, we tested the parallel processing ability of the framework.

  • In Section 5.3, we made a detailed analysis on Gun-Point dataset in order to explain the dramatic performance difference between the alignment methods in this dataset.

  • In Section 5.4, lastly we presented the performance of alignment methods on all

Conclusion and future work

In this study, we proposed a new framework for time series classification composed of two main components, alignment and classification. It allows us to implement custom designed alignments and classification techniques.

The choice of a classifier and the alignment method is not obvious to a researcher working on the classification of time series datasets. Consequently, the common methodology adopted by researchers is to experiment their dataset with a few combinations of well-known

References (81)

  • M. Bashir et al.

    Reduced dynamic time warping for handwriting recognition based on multidimensional time series of a novel pen device

    Int. J. Intell. Syst. Technol.,WASET

    (2008)
  • C. Cortes et al.

    Support-vector networks

    Mach. Learn.

    (1995)
  • L.S. Ettre

    Nomenclature for chromatography

    Pure Appl. Chem.

    (1993)
  • R. Smith et al.

    LC-MS alignment in theory and practice: a comprehensive algorithmic review

    Brief. Bioinform.

    (2015)
  • K. Coakley et al.

    Alignment of noisy signals

    IEEE Trans. Instrum. Meas.

    (2001)
  • H. Ding et al.

    Querying and mining of time series dataexperimental comparison of representations and distance measures

    Proc. VLDB Endow.

    (2008)
  • H. Sakoe et al.

    Dynamic-programming algorithm optimization for spoken word recognition

    IEEE Trans. Acoust. Speech Signal Process.

    (1978)
  • Z. Prekopcsak et al.

    Time series classification by class-specific Mahalanobis distance measures

    Adv. Data Anal. Classif.

    (2012)
  • E. Keogh, Q. Zhu, B. Hu, Y. Hao, X. Xi, L. Wei, C.A. Ratanamahatana, The UCR Time Series Classification/Clustering...
  • Z. Xing et al.

    A brief survey on sequence classification

    ACM SIGKDD Explor. Newslett.

    (2010)
  • C. Faloutsos et al.

    Fast subsequence matching in time-series databases

    SIGMOD Rec.

    (1994)
  • I. Popivanov, R. Miller, Similarity search over time-series data using wavelets, in: Proceedings of the 18th...
  • F. Korn, H.V. Jagadish, C. Faloutsos, Efficiently supporting ad hoc queries in large datasets of time sequences, in:...
  • J. Listgarten et al.

    Difference detection in LC-MS data for protein biomarker discovery

    Bioinformatics

    (2007)
  • A. Nanopoulos et al.

    Feature-based classification of time-series data

    Int. J. Comput. Res.

    (2001)
  • D. Pham et al.

    Control chart pattern recognition using neural networks

    J. Syst. Eng.

    (1992)
  • D.T. Pham et al.

    Control chart pattern-recognition using learning vector quantization networks

    Int. J. Prod. Res.

    (1994)
  • S. Gauri

    Control chart pattern recognition using feature-based learning vector quantization

    Int. J. Adv. Manuf. Technol.

    (2010)
  • E. Alpaydin

    Introduction to Machine Learning

    (2010)
  • H. Ney et al.

    Dynamic programming search for continuous speech recognition

    IEEE Signal Process. Mag.

    (1999)
  • B.S. Atal et al.

    Speech analysis and synthesis by linear prediction of the speech wave

    J. Acoust. Soc. Am.

    (1971)
  • T. Vintsyuk

    Speech discrimination by dynamic programming

    Cybernetics

    (1968)
  • R. Bellman

    Dynamic Programming

    (2003)
  • T. Rakthanmanon, B. Campana, A. Mueen, G. Batista, B. Westover, Q. Zhu, J. Zakaria, E. Keogh, Searching and mining...
  • E. Keogh, L. Wei, X. Xi, S. hee Lee, M. Vlachos, Lb-keogh supports exact indexing of shapes under rotation invariance...
  • S. Salvadora et al.

    Toward accurate dynamic time warping in linear time and space

    Intell. Data Anal.

    (2007)
  • F. Itakura

    Minimum prediction residual principle applied to speech recognition

    IEEE Trans. Acoust. Speech Signal Process.

    (1975)
  • C.A. Ratanamahatana et al.

    Three myths about dynamic time warping data mining

  • P.H.C. Eilers

    Parametric time warping

    Anal. Chem.

    (2004)
  • J.O. Ramsay et al.

    Curve registration

    J. R. Stat. Soc. Ser. B—Stat. Methodol.

    (1998)
  • Cited by (31)

    • Spatial-temporal alignment of time series with different sampling rates based on cellular multi-objective whale optimization

      2023, Information Processing and Management
      Citation Excerpt :

      Another class of profile-based alignment algorithm is the meta-heuristic and evolutionary algorithms, which can be divided into two types: single-objective optimization (SOO) and multi-objective optimization (MOO). The SOO methods include genetic algorithm (Kaya & Gunduz-Oguducu, 2013, Kaya, 2015), differential evolution (Wei, Ding & Zhou, 2020) et al. The MOO methods include Multi-Objective Particle Swarm Optimization (MOPSO) (Zhang, Pu & Schonfeld, 2020), Multi-Objective Evolutionary Algorithms with Inverse Model (IM-MOEA) (Xue, Jiang & Wang, 2021), Non-dominated Sorting Genetic Algorithm II (NSGA-II) (Zambrano-Vega, Nebro & García-Nieto, 2017, Ortuño et al., 2013, Feng & Zhang, 2021, Huang, Xue & Jiang, 2020, Acampora, Ishibuchi & Vitiello, 2014) et al.

    • Dual-PISA: An index for aggregation operations on time series data

      2020, Information Systems
      Citation Excerpt :

      In recent years, more and more sensor data is collected for monitoring, analysis, and forecasting. This data is usually organized along the time dimension to form a considerable amount of time series data [1–4]. For example, there are more than 100,000 meteorological ground stations in China.

    • A hybrid dynamic exploitation barebones particle swarm optimisation algorithm for time series segmentation

      2019, Neurocomputing
      Citation Excerpt :

      Time series can be obtained from different areas, such as climate [3], hydrology [4], finances [5], satellite images [6], etc. They are used for different tasks depending on the objective of the researchers and the application areas, e.g. classification [7,8], forecasting [9,10], tipping point detection [11], clustering [12], similarity assessment [13,14] or segmentation [15]. Specifically, time series segmentation is an important task, which consists of cutting the time series in some specific points trying to achieve different objectives, which are generally related to two points of view.

    View all citing articles on Scopus
    View full text