ABSTRACT
Time series (TS) occur in many scientific and commercial applications, ranging from earth surveillance to industry automation to the smart grids. An important type of TS analysis is classification, which can, for instance, improve energy load forecasting in smart grids by detecting the types of electronic devices based on their energy consumption profiles recorded by automatic sensors. Such sensor-driven applications are very often characterized by (a) very long TS and (b) very large TS datasets needing classification. However, current methods to time series classification (TSC) cannot cope with such data volumes at acceptable accuracy; they are either scalable but offer only inferior classification quality, or they achieve state-of-the-art classification quality but cannot scale to large data volumes. In this paper, we present WEASEL (Word ExtrAction for time SEries cLassification), a novel TSC method which is both fast and accurate. Like other state-of-the-art TSC methods, WEASEL transforms time series into feature vectors, using a sliding-window approach, which are then analyzed through a machine learning classifier. The novelty of WEASEL lies in its specific method for deriving features, resulting in a much smaller yet much more discriminative feature set. On the popular UCR benchmark of 85 TS datasets, WEASEL is more accurate than the best current non-ensemble algorithms at orders-of-magnitude lower classification and training times, and it is almost as accurate as ensemble classifiers, whose computational complexity makes them inapplicable even for mid-size datasets. The outstanding robustness of WEASEL is also confirmed by experiments on two real smart grid datasets, where it out-of-the-box achieves almost the same accuracy as highly tuned, domain-specific methods.
- Anthony Bagnall, Luke M. Davis, Jon Hills, and Jason Lines. 2012. Transformation Based Ensembles for Time Series Classification Proceedings of the 2012 SIAM International Conference on Data Mining, Vol. Vol. 12. SIAM, 307--318.Google Scholar
- Anthony Bagnall, Jason Lines, Aaron Bostrom, James Large, and Eamonn Keogh. 2016. The Great Time Series Classification Bake Off: An Experimental Evaluation of Recently Proposed Algorithms. Extended Version. Data Mining and Knowledge Discovery (2016), 1--55. Google ScholarDigital Library
- Anthony Bagnall, Jason Lines, Jon Hills, and Aaron Bostrom. 2015. Time-Series Classification with COTE: The Collective of Transformation-Based Ensembles. IEEE Transactions on Knowledge and Data Engineering, Vol. 27, 9 (2015), 2522--2535.Google ScholarDigital Library
- Mustafa Gokce Baydogan, George Runger, and Eugene Tuv. 2013. A bag-of-features framework to classify time series. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, 11 (2013), 2796--2802. Google ScholarDigital Library
- BOSS implementation. 2016. https://github.com/patrickzib/SFA/. (2016).Google Scholar
- Aaron Bostrom and Anthony Bagnall. 2015. Binary shapelet transform for multiclass time series classification International Conference on Big Data Analytics and Knowledge Discovery. Springer, 257--269.Google Scholar
- Wlodzimierz Bryc. 2012. The normal distribution: characterizations with applications. Vol. Vol. 100. Springer Science & Business Media.Google Scholar
- G. Webb C. Tan and F. Petitjean. 2017. Indexing and classifying gigabytes of time series under time warping SIAM SDM.Google Scholar
- Janez Demvsar. 2006. Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research Vol. 7 (2006), 1--30. Google ScholarDigital Library
- Philippe Esling and Carlos Agon. 2012. Time-series data mining. ACM Computing Surveys Vol. 45, 1 (2012), 12:1--12:34. Google ScholarDigital Library
- Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. The Journal of Machine Learning Research Vol. 9 (2008), 1871--1874. Google ScholarDigital Library
- Jingkun Gao, Suman Giri, Emre Can Kara, and Mario Bergés. 2014. PLAID: a public dataset of high-resoultion electrical appliance measurements for load identification research: demo abstract Proceedings of the 1st ACM Conference on Embedded Systems for Energy-Efficient Buildings. ACM, 198--199. Google ScholarDigital Library
- Christophe Gisler, Antonio Ridi, Damien Zujferey, O Abou Khaled, and Jean Hennebert. 2013. Appliance consumption signature database and recognition test protocols International Workshop on Systems, Signal Processing and their Applications (WoSSPA). IEEE, 336--341.Google Scholar
- Josif Grabocka, Nicolas Schilling, Martin Wistuba, and Lars Schmidt-Thieme. 2014. Learning time-series shapelets. In Proceedings of the 2014 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 392--401. Google ScholarDigital Library
- Benjamin F Hobbs, Suradet Jitprapaikulsarn, Sreenivas Konda, Vira Chankong, Kenneth A Loparo, and Dominic J Maratukulam. 1999. Analysis of the value for unit commitment of improved load forecasts. IEEE Transactions on Power Systems Vol. 14, 4 (1999), 1342--1348.Google ScholarCross Ref
- Bing Hu, Yanping Chen, and Eamonn Keogh. 2013. Time Series Classification under More Realistic Assumptions Proceedings of the 2013 SIAM International Conference on Data Mining. SIAM, 578--586.Google Scholar
- Zbigniew Jerzak and Holger Ziekow. 2014. The DEBS 2014 Grand Challenge. In Proceedings of the 2014 ACM International Conference on Distributed Event-based Systems. ACM, 266--269. Google ScholarDigital Library
- Isak Karlsson, Panagiotis Papapetrou, and Henrik Boström. 2016. Generalized random shapelet forests. Data Mining and Knowledge Discovery Vol. 30, 5 (2016), 1053--1085. Google ScholarDigital Library
Index Terms
- Fast and Accurate Time Series Classification with WEASEL
Recommendations
Scalable time series classification
Time series classification tries to mimic the human understanding of similarity. When it comes to long or larger time series datasets, state-of-the-art classifiers reach their limits because of unreasonably high training or testing times. One ...
A Significantly Faster Elastic-Ensemble for Time-Series Classification
Intelligent Data Engineering and Automated Learning – IDEAL 2019Early classification on time series
In this paper, we formulate the problem of early classification of time series data, which is important in some time-sensitive applications such as health informatics. We introduce a novel concept of MPL (minimum prediction length) and develop ECTS (...
Comments