Skip to main content
Erschienen in: Data Mining and Knowledge Discovery 6/2021

18.09.2021

Time series clustering in linear time complexity

verfasst von: Xiaosheng Li, Jessica Lin, Liang Zhao

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 6/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

With the increasing power of data storage and advances in data generation and collection technologies, large volumes of time series data become available and the content is changing rapidly. This requires data mining methods to have low time complexity to handle the huge and fast-changing data. This article presents a novel time series clustering algorithm that has linear time complexity. The proposed algorithm partitions the data by checking some randomly selected symbolic patterns in the time series. We provide theoretical analysis to show that group structures in the data can be revealed from this process. We evaluate the proposed algorithm extensively on all 128 datasets from the well-known UCR time series archive, and compare with the state-of-the-art approaches with statistical analysis. The results show that the proposed method achieves better accuracy compared with other rival methods. We also conduct experiments to explore how the parameters and configuration of the algorithm can affect the final clustering results.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Aghabozorgi S, Shirkhorshidi AS, Wah TY (2015) Time-series clustering-a decade review. Inf Syst 53:16–38CrossRef Aghabozorgi S, Shirkhorshidi AS, Wah TY (2015) Time-series clustering-a decade review. Inf Syst 53:16–38CrossRef
Zurück zum Zitat Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. KDD workshop, Seattle, WA 10:359–370 Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. KDD workshop, Seattle, WA 10:359–370
Zurück zum Zitat Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(Jan):1–30MathSciNetMATH Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(Jan):1–30MathSciNetMATH
Zurück zum Zitat Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases, vol 23. ACM Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases, vol 23. ACM
Zurück zum Zitat Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Proceedings of the twenty-first international conference on Machine learning, ACM, p 36 Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Proceedings of the twenty-first international conference on Machine learning, ACM, p 36
Zurück zum Zitat Gupta L, Molfese DL, Tammana R, Simos PG (1996) Nonlinear alignment and averaging for estimating the evoked potential. IEEE Trans Biomed Eng 43(4):348–356CrossRef Gupta L, Molfese DL, Tammana R, Simos PG (1996) Nonlinear alignment and averaging for estimating the evoked potential. IEEE Trans Biomed Eng 43(4):348–356CrossRef
Zurück zum Zitat Hoeffding W (1994) Probability inequalities for sums of bounded random variables In the collected works of Wassily Hoeffding. Springer, Berlin, pp 409–426CrossRef Hoeffding W (1994) Probability inequalities for sums of bounded random variables In the collected works of Wassily Hoeffding. Springer, Berlin, pp 409–426CrossRef
Zurück zum Zitat Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392MathSciNetCrossRef Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392MathSciNetCrossRef
Zurück zum Zitat Kumar M, Patel NR, Woo J (2002) Clustering seasonality patterns in the presence of errors. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 557–563 Kumar M, Patel NR, Woo J (2002) Clustering seasonality patterns in the presence of errors. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 557–563
Zurück zum Zitat Kumar N, Lolla VN, Keogh E, Lonardi S, Ratanamahatana CA, Wei L (2005) Time-series bitmaps: a practical visualization tool for working with large time series databases. In: Proceedings of the 2005 SIAM international conference on data mining, SIAM, pp 531–535 Kumar N, Lolla VN, Keogh E, Lonardi S, Ratanamahatana CA, Wei L (2005) Time-series bitmaps: a practical visualization tool for working with large time series databases. In: Proceedings of the 2005 SIAM international conference on data mining, SIAM, pp 531–535
Zurück zum Zitat Lei Q, Yi J, Vaculin R, Wu L, Dhillon IS (2019) Similarity preserving representation learning for time series clustering. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, AAAI Press, pp 2845–2851 Lei Q, Yi J, Vaculin R, Wu L, Dhillon IS (2019) Similarity preserving representation learning for time series clustering. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, AAAI Press, pp 2845–2851
Zurück zum Zitat Li X, Lin J (2017) Linear time complexity time series classification with bag-of-pattern-features. In: 2017 IEEE International Conference on Data Mining (ICDM), IEEE, pp 277–286 Li X, Lin J (2017) Linear time complexity time series classification with bag-of-pattern-features. In: 2017 IEEE International Conference on Data Mining (ICDM), IEEE, pp 277–286
Zurück zum Zitat Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144MathSciNetCrossRef Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144MathSciNetCrossRef
Zurück zum Zitat MacQueen J (1967) Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, CA, USA 1:281–297 MacQueen J (1967) Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, CA, USA 1:281–297
Zurück zum Zitat Madiraju NS, Sadat SM, Fisher D, Karimabadi H (2018) Deep temporal clustering: Fully unsupervised learning of time-domain features Madiraju NS, Sadat SM, Fisher D, Karimabadi H (2018) Deep temporal clustering: Fully unsupervised learning of time-domain features
Zurück zum Zitat Niennattrakul V, Ratanamahatana CA (2009) Shape averaging under time warping. In: 2009 6th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, IEEE, vol 2, pp 626–629 Niennattrakul V, Ratanamahatana CA (2009) Shape averaging under time warping. In: 2009 6th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, IEEE, vol 2, pp 626–629
Zurück zum Zitat Paparrizos J, Gravano L (2015) k-shape: Efficient and accurate clustering of time series. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, ACM, pp 1855–1870 Paparrizos J, Gravano L (2015) k-shape: Efficient and accurate clustering of time series. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, ACM, pp 1855–1870
Zurück zum Zitat Petitjean F, Ketterlin A, Gançarski P (2011) A global averaging method for dynamic time warping, with applications to clustering. Pattern Recognit 44(3):678–693CrossRef Petitjean F, Ketterlin A, Gançarski P (2011) A global averaging method for dynamic time warping, with applications to clustering. Pattern Recognit 44(3):678–693CrossRef
Zurück zum Zitat Ratanamahatana CA, Keogh E (2004) Everything you know about dynamic time warping is wrong. Citeseer, USA Ratanamahatana CA, Keogh E (2004) Everything you know about dynamic time warping is wrong. Citeseer, USA
Zurück zum Zitat Rebbapragada U, Protopapas P, Brodley CE, Alcock C (2009) Finding anomalous periodic time series. Mach Learn 74(3):281–313CrossRef Rebbapragada U, Protopapas P, Brodley CE, Alcock C (2009) Finding anomalous periodic time series. Mach Learn 74(3):281–313CrossRef
Zurück zum Zitat Saito N, Coifman RR (1994) Local feature extraction and its applications using a library of bases. PhD thesis, Yale University Saito N, Coifman RR (1994) Local feature extraction and its applications using a library of bases. PhD thesis, Yale University
Zurück zum Zitat Steinbach M, Tan PN, Kumar V, Klooster S, Potter C (2003) Discovery of climate indices using clustering. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 446–455 Steinbach M, Tan PN, Kumar V, Klooster S, Potter C (2003) Discovery of climate indices using clustering. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 446–455
Zurück zum Zitat Subhani N, Rueda L, Ngom A, Burden CJ (2010) Multiple gene expression profile alignment for microarray time-series data clustering. Bioinformatics 26(18):2281–2288CrossRef Subhani N, Rueda L, Ngom A, Burden CJ (2010) Multiple gene expression profile alignment for microarray time-series data clustering. Bioinformatics 26(18):2281–2288CrossRef
Zurück zum Zitat Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P, Keogh E (2013) Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Discov 26(2):275–309MathSciNetCrossRef Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P, Keogh E (2013) Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Discov 26(2):275–309MathSciNetCrossRef
Zurück zum Zitat Yang J, Leskovec J (2011) Patterns of temporal variation in online media. In: Proceedings of the fourth ACM international conference on Web search and data mining, pp 177–186 Yang J, Leskovec J (2011) Patterns of temporal variation in online media. In: Proceedings of the fourth ACM international conference on Web search and data mining, pp 177–186
Zurück zum Zitat Zakaria J, Mueen A, Keogh E (2012) Clustering time series using unsupervised-shapelets. In: 2012 IEEE 12th International Conference on Data Mining, IEEE, pp 785–794 Zakaria J, Mueen A, Keogh E (2012) Clustering time series using unsupervised-shapelets. In: 2012 IEEE 12th International Conference on Data Mining, IEEE, pp 785–794
Zurück zum Zitat Zhang Q, Wu J, Yang H, Tian Y, Zhang C (2016) Unsupervised feature learning from time series. In: IJCAI, pp 2322–2328 Zhang Q, Wu J, Yang H, Tian Y, Zhang C (2016) Unsupervised feature learning from time series. In: IJCAI, pp 2322–2328
Metadaten
Titel
Time series clustering in linear time complexity
verfasst von
Xiaosheng Li
Jessica Lin
Liang Zhao
Publikationsdatum
18.09.2021
Verlag
Springer US
Erschienen in
Data Mining and Knowledge Discovery / Ausgabe 6/2021
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-021-00798-w

Weitere Artikel der Ausgabe 6/2021

Data Mining and Knowledge Discovery 6/2021 Zur Ausgabe