Skip to main content
Top
Published in: Data Mining and Knowledge Discovery 5/2018

10-05-2018

Exploring variable-length time series motifs in one hundred million length scale

Authors: Yifeng Gao, Jessica Lin

Published in: Data Mining and Knowledge Discovery | Issue 5/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The exploration of repeated patterns with different lengths, also called variable-length motifs, has received a great amount of attention in recent years. However, existing algorithms to detect variable-length motifs in large-scale time series are very time-consuming. In this paper, we introduce a time- and space-efficient approximate variable-length motif discovery algorithm, Distance-Propagation Sequitur (DP-Sequitur), for detecting variable-length motifs in large-scale time series data (e.g. over one hundred million in length). The discovered motifs can be ranked by different metrics such as frequency or similarity, and can benefit a wide variety of real-world applications. We demonstrate that our approach can discover motifs in time series with over one hundred million points in just minutes, which is significantly faster than the fastest existing algorithm to date. We demonstrate the superiority of our algorithm over the state-of-the-art using several real world time series datasets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Example of Canebrake Groundcreeper records contain motif A: Athanas
 
2
Example of Streak-backed Canastero records contain motif B: Calderon-F
 
Literature
go back to reference Begum N, Keogh E (2014) Rare time series motif discovery from unbounded streams. Proc VLDB Endow 8(2):149–160CrossRef Begum N, Keogh E (2014) Rare time series motif discovery from unbounded streams. Proc VLDB Endow 8(2):149–160CrossRef
go back to reference Castro N, Azevedo PJ (2010) Multiresolution motif discovery in time series. In: Proceedings of the 2010 SIAM international conference on data mining. SIAM, pp 665–676 Castro N, Azevedo PJ (2010) Multiresolution motif discovery in time series. In: Proceedings of the 2010 SIAM international conference on data mining. SIAM, pp 665–676
go back to reference Chiu B, Keogh E, Lonardi S (2003) Probabilistic discovery of time series motifs. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 493–498 Chiu B, Keogh E, Lonardi S (2003) Probabilistic discovery of time series motifs. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 493–498
go back to reference Gao Y, Lin J, Rangwala H (2016) Iterative grammar-based framework for discovering variable-length time series motifs. In: 15th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 7–12 Gao Y, Lin J, Rangwala H (2016) Iterative grammar-based framework for discovering variable-length time series motifs. In: 15th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 7–12
go back to reference Gao Y, Li Q, Li X, Lin J, Rangwala H (2017) Trajviz: a tool for visualizing patterns and anomalies in trajectory. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 428–431 Gao Y, Li Q, Li X, Lin J, Rangwala H (2017) Trajviz: a tool for visualizing patterns and anomalies in trajectory. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 428–431
go back to reference Giancarlo R, Scaturro D, Utro F (2009) Textual data compression in computational biology: a synopsis. Bioinformatics 25(13):1575–1586CrossRefMATH Giancarlo R, Scaturro D, Utro F (2009) Textual data compression in computational biology: a synopsis. Bioinformatics 25(13):1575–1586CrossRefMATH
go back to reference Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, Mietus JE, Moody GB, Peng C-K, Stanley HE (2000) Physiobank, physiotoolkit, and physionet components of a new research resource for complex physiologic signals. Circulation 101(23):e215–e220CrossRef Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, Mietus JE, Moody GB, Peng C-K, Stanley HE (2000) Physiobank, physiotoolkit, and physionet components of a new research resource for complex physiologic signals. Circulation 101(23):e215–e220CrossRef
go back to reference Hughes JF, Skaletsky H, Pyntikova T, Graves TA, van Daalen SK, Minx PJ, Fulton RS, McGrath SD, Locke DP, Friedman C et al (2010) Chimpanzee and human Y chromosomes are remarkably divergent in structure and gene content. Nature 463(7280):536CrossRef Hughes JF, Skaletsky H, Pyntikova T, Graves TA, van Daalen SK, Minx PJ, Fulton RS, McGrath SD, Locke DP, Friedman C et al (2010) Chimpanzee and human Y chromosomes are remarkably divergent in structure and gene content. Nature 463(7280):536CrossRef
go back to reference Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D (2002) The human genome browser at UCSC. Genome Res 12(6):996–1006CrossRef Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D (2002) The human genome browser at UCSC. Genome Res 12(6):996–1006CrossRef
go back to reference Keogh E, Lin J, Fu A (2005b) Hot sax: efficiently finding the most unusual time series subsequence. In: 2005 IEEE 5th international conference on data mining (ICDM), p 8 Keogh E, Lin J, Fu A (2005b) Hot sax: efficiently finding the most unusual time series subsequence. In: 2005 IEEE 5th international conference on data mining (ICDM), p 8
go back to reference Li Y, Lin J, Oates T (2012) Visualizing variable-length time series motifs. In: Proceedings of the 2012 SIAM international conference on data mining. SIAM, pp 895–906 Li Y, Lin J, Oates T (2012) Visualizing variable-length time series motifs. In: Proceedings of the 2012 SIAM international conference on data mining. SIAM, pp 895–906
go back to reference Li Y, Yiu ML, Gong Z, et al. (2015) Quick-motif: an efficient and scalable framework for exact motif discovery. In: 2015 IEEE 31st international conference on data engineering (ICDE). IEEE, pp 579–590 Li Y, Yiu ML, Gong Z, et al. (2015) Quick-motif: an efficient and scalable framework for exact motif discovery. In: 2015 IEEE 31st international conference on data engineering (ICDE). IEEE, pp 579–590
go back to reference Lin J, Keogh E, Lonardi S, Lankford JP, Nystrom DM (2004) Visually mining and monitoring massive time series. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 460–469 Lin J, Keogh E, Lonardi S, Lankford JP, Nystrom DM (2004) Visually mining and monitoring massive time series. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 460–469
go back to reference Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144MathSciNetCrossRef Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144MathSciNetCrossRef
go back to reference Lines J, Davis LM, Hills J, Bagnall A (2012) A shapelet transform for time series classification. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 289–297 Lines J, Davis LM, Hills J, Bagnall A (2012) A shapelet transform for time series classification. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 289–297
go back to reference Liu B, Li J, Chen C, Tan W, Chen Q, Zhou M (2015) Efficient motif discovery for large-scale time series in healthcare. IEEE Trans Ind Inform 11(3):583–590CrossRef Liu B, Li J, Chen C, Tan W, Chen Q, Zhou M (2015) Efficient motif discovery for large-scale time series in healthcare. IEEE Trans Ind Inform 11(3):583–590CrossRef
go back to reference Locke DP, Hillier LW, Warren WC, Worley KC, Nazareth LV, Muzny DM, Yang S-P, Wang Z, Chinwalla AT, Minx P et al (2011) Comparative and demographic analysis of orang-utan genomes. Nature 469(7331):529CrossRef Locke DP, Hillier LW, Warren WC, Worley KC, Nazareth LV, Muzny DM, Yang S-P, Wang Z, Chinwalla AT, Minx P et al (2011) Comparative and demographic analysis of orang-utan genomes. Nature 469(7331):529CrossRef
go back to reference Mohammad Y, Nishida T (2009) Constrained motif discovery in time series. New Gener Comput 27(4):319–346CrossRefMATH Mohammad Y, Nishida T (2009) Constrained motif discovery in time series. New Gener Comput 27(4):319–346CrossRefMATH
go back to reference Mohammad Y, Nishida T (2014a) Exact discovery of length-range motifs. In: Intelligent information and database systems. Springer, pp 23–32 Mohammad Y, Nishida T (2014a) Exact discovery of length-range motifs. In: Intelligent information and database systems. Springer, pp 23–32
go back to reference Mohammad Y, Nishida T (2014b) Scale invariant multi-length motif discovery. In: Modern advances in applied intelligence. Springer, pp 417–426 Mohammad Y, Nishida T (2014b) Scale invariant multi-length motif discovery. In: Modern advances in applied intelligence. Springer, pp 417–426
go back to reference Mueen A (2013) Enumeration of time series motifs of all lengths. In: 2013 IEEE 13th international conference on data mining (ICDM). IEEE, pp 547–556 Mueen A (2013) Enumeration of time series motifs of all lengths. In: 2013 IEEE 13th international conference on data mining (ICDM). IEEE, pp 547–556
go back to reference Mueen A, Keogh E (2010) Online discovery and maintenance of time series motifs. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1089–1098 Mueen A, Keogh E (2010) Online discovery and maintenance of time series motifs. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1089–1098
go back to reference Mueen A, Keogh EJ, Zhu Q, Cash S, Westover MB (2009) Exact discovery of time series motifs. In: Proceedings of the 2009 SIAM international conference on data mining. SIAM, pp. 473–484 Mueen A, Keogh EJ, Zhu Q, Cash S, Westover MB (2009) Exact discovery of time series motifs. In: Proceedings of the 2009 SIAM international conference on data mining. SIAM, pp. 473–484
go back to reference Murray D, Liao J, Stankovic L, Stankovic V, Hauxwell-Baldwin R, Wilson C, Coleman M, Kane T, Firth S (2015) A data management platform for personalised real-time energy feedback. In: Proceedings of the 8th international conference on energy efficiency in domestic appliances and lighting, pp 1–15 Murray D, Liao J, Stankovic L, Stankovic V, Hauxwell-Baldwin R, Wilson C, Coleman M, Kane T, Firth S (2015) A data management platform for personalised real-time energy feedback. In: Proceedings of the 8th international conference on energy efficiency in domestic appliances and lighting, pp 1–15
go back to reference Nevill-Manning CG, Witten IH (1997) Identifying hierarchical strcture in sequences: a linear-time algorithm. J Artif Intell Res (JAIR) 7:67–82CrossRefMATH Nevill-Manning CG, Witten IH (1997) Identifying hierarchical strcture in sequences: a linear-time algorithm. J Artif Intell Res (JAIR) 7:67–82CrossRefMATH
go back to reference Nunthanid P, Niennattrakul V, Ratanamahatana CA (2011) Discovery of variable length time series motif. In: 2011 8th international conference on electrical engineering/electronics, computer, telecommunications and information technology (ECTI-CON). IEEE, pp 472–475 Nunthanid P, Niennattrakul V, Ratanamahatana CA (2011) Discovery of variable length time series motif. In: 2011 8th international conference on electrical engineering/electronics, computer, telecommunications and information technology (ECTI-CON). IEEE, pp 472–475
go back to reference Patel P, Keogh E, Jessica L, Lonardi S (2002) Mining motifs in massive time series databases. In: 2003 proceedings of the 2002 IEEE international conference on data mining (ICDM). IEEE, pp 370–377 Patel P, Keogh E, Jessica L, Lonardi S (2002) Mining motifs in massive time series databases. In: 2003 proceedings of the 2002 IEEE international conference on data mining (ICDM). IEEE, pp 370–377
go back to reference Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 262–270 Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 262–270
go back to reference Senin P, Malinchik S (2013) Sax-vsm: Interpretable time series classification using sax and vector space model. In: 2013 IEEE 13th international conference on data mining (ICDM). IEEE, pp 1175–1180 Senin P, Malinchik S (2013) Sax-vsm: Interpretable time series classification using sax and vector space model. In: 2013 IEEE 13th international conference on data mining (ICDM). IEEE, pp 1175–1180
go back to reference Senin P, Lin J, Wang X, Oates T, Gandhi S, Boedihardjo AP, Chen C, Frankenstein S, Lerner M (2014) Grammarviz 2.0: a tool for grammar-based pattern discovery in time series. In: Machine learning and knowledge discovery in databases. Springer, pp 468–472 Senin P, Lin J, Wang X, Oates T, Gandhi S, Boedihardjo AP, Chen C, Frankenstein S, Lerner M (2014) Grammarviz 2.0: a tool for grammar-based pattern discovery in time series. In: Machine learning and knowledge discovery in databases. Springer, pp 468–472
go back to reference Shieh J, Keogh E (2009) iSAX: disk-aware mining and indexing of massive time series datasets. Data Min Knowl Discov 19(1):24–57MathSciNetCrossRef Shieh J, Keogh E (2009) iSAX: disk-aware mining and indexing of massive time series datasets. Data Min Knowl Discov 19(1):24–57MathSciNetCrossRef
go back to reference Shokoohi-Yekta M, Chen Y, Campana B, Hu B, Zakaria J, Keogh E (2015) Discovery of meaningful rules in time series. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1085–1094 Shokoohi-Yekta M, Chen Y, Campana B, Hu B, Zakaria J, Keogh E (2015) Discovery of meaningful rules in time series. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1085–1094
go back to reference Skaletsky H, Kuroda-Kawaguchi T, Minx PJ, Cordum HS, Hillier L, Brown LG, Repping S, Pyntikova T, Ali J, Bieri T et al (2003) The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature 423(6942):825–837CrossRef Skaletsky H, Kuroda-Kawaguchi T, Minx PJ, Cordum HS, Hillier L, Brown LG, Repping S, Pyntikova T, Ali J, Bieri T et al (2003) The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature 423(6942):825–837CrossRef
go back to reference Tang H, Liao SS (2008) Discovering original motifs with different lengths from time series. Knowl Based Syst 21(7):666–671CrossRef Tang H, Liao SS (2008) Discovering original motifs with different lengths from time series. Knowl Based Syst 21(7):666–671CrossRef
go back to reference Wang X, Lin J, Senin P, Oates T, Gandhi S, Boedihardjo AP, Chen C, Frankenstein S (2016) RPM: Representative pattern mining for efficient time series classification. In: 19th international conference on extending database technology (EDBT), pp 185–196 Wang X, Lin J, Senin P, Oates T, Gandhi S, Boedihardjo AP, Chen C, Frankenstein S (2016) RPM: Representative pattern mining for efficient time series classification. In: 19th international conference on extending database technology (EDBT), pp 185–196
go back to reference Yeh C-CM, Zhu Y, Ulanova L, Begum N, Ding Y, Dau HA, Silva DF, Mueen A, Keogh E (2016) Matrix profile i: All pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: 2016 IEEE 16th international conference on data mining (ICDM), pp 1317–1322 Yeh C-CM, Zhu Y, Ulanova L, Begum N, Ding Y, Dau HA, Silva DF, Mueen A, Keogh E (2016) Matrix profile i: All pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: 2016 IEEE 16th international conference on data mining (ICDM), pp 1317–1322
go back to reference Zhu Y, Schall-Zimmerman Z, Senobari NS, Yeh C-CM, Funning G, Mueen A, Brisk P, Keogh EJ (2016) Matrix profile ii: exploiting a novel algorithm and gpus to break the one hundred million barrier for time series motifs and joins. In: 2016 IEEE 16th international conference on data mining (ICDM), pp 739–748 Zhu Y, Schall-Zimmerman Z, Senobari NS, Yeh C-CM, Funning G, Mueen A, Brisk P, Keogh EJ (2016) Matrix profile ii: exploiting a novel algorithm and gpus to break the one hundred million barrier for time series motifs and joins. In: 2016 IEEE 16th international conference on data mining (ICDM), pp 739–748
Metadata
Title
Exploring variable-length time series motifs in one hundred million length scale
Authors
Yifeng Gao
Jessica Lin
Publication date
10-05-2018
Publisher
Springer US
Published in
Data Mining and Knowledge Discovery / Issue 5/2018
Print ISSN: 1384-5810
Electronic ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-018-0570-1

Other articles of this Issue 5/2018

Data Mining and Knowledge Discovery 5/2018 Go to the issue

Premium Partner