TSC is a task where a classifier has to assign a label
Y to a timely ordered sequence of values
$$\begin{aligned} X = [x_0, x_1, ..., x_t] \end{aligned}$$
(1)
with
\(x_i\,\in \,{\mathbb {R}}^{m}\) and
\(m=1\) in case of a univariate and
\(m>1\) in case of a multivariate time series of length
t.
TSC is subject to intensive research since its applicability to a plethora of domains, reaching from machine failure detection in an industrial setting [
13] to stock market data [
14] or speech recognition [
15]. Hence, manifold approaches to TSC exist, each with their own up- and downsides. A valuable benchmarking tool for the performance of a TSC model is provided through the University of California Riverside (UCR) Time Series Archive [
16]. TSC approaches can be roughly divided into state of the art TSC models that require a transformation of the raw time series to a feature space (e. g. ensemble classificators) and deep learning models that use the raw time series as an input and generate features on their own, e. g. through convolution and pooling operations. The Collective Of Transformation-based Ensemble (COTE) classifier is based on 35 classifiers and is extended through a hierarchical voting system of the COTE classifiers to HIVE-COTE. HIVE-COTE is widely recognized as a state of the art TSC model [
17]. HIVE-COTE has a computational complexity of
\(O(n^2 \cdot t^4)\) for a dataset of size
n and time series length
t and took more than 72,000 s to train on a dataset of
\(n=700\) time series of length
\(t=46\) on a high end device at the time of publication [
18]. It is thus not suited for model updates in an industrial setting, where sensoric data with high sampling rates like MBN or acoustic emission (AE) is collected. Recent research has shown that while being much faster since leveraging parallel GPU computations, deep learning models like convolutional neural networks (CNN) perform equally good or, in the case of InceptionTime, even outperform HIVE-COTE on the UCR dataset both in training and prediction time [
18]. The CNN architecture InceptionTime by
Fawaz et al. was released in 2019 and is able to outperform ResNet, which was proposed as a baseline model for TSC by
Wang et al. in 2017 [
19], while scaling better [
18]. InceptionTime consists of an ensemble of five deep learning models for TSC, where each classifier consist of a cascade of multiple so called Inception modules, which where initially proposed by
Szegedy et al. for computer vision purposes and lead to the rise of CNN in computer vision tasks [
20]. Especially for computer vision tasks like image recognition, deep CNN architectures are state of the art. While CNN learn spatial information and features from images, they have shown to be able to learn temporal information and features from time series. The developers of InceptionTime state that ”put simply, the time series problem is essentially the same class of problem” as image classification, ”just with one less dimension”. CNN are a promising and scalable approach to TSC and hence suited for industrial purposes for time critical TSC tasks like material property or tool health classification.