Skip to main content
Erschienen in: Knowledge and Information Systems 1/2018

22.12.2017 | Regular Paper

Exploiting a novel algorithm and GPUs to break the ten quadrillion pairwise comparisons barrier for time series motifs and joins

verfasst von: Yan Zhu, Zachary Zimmerman, Nader Shakibay Senobari, Chin-Chia Michael Yeh, Gareth Funning, Abdullah Mueen, Philip Brisk, Eamonn Keogh

Erschienen in: Knowledge and Information Systems | Ausgabe 1/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Time series motifs are approximately repeated subsequences found within a longer time series. They have been in the literature since 2002, but recently they have begun to receive significant attention in research and industrial communities. This is perhaps due to the growing realization that they implicitly offer solutions to a host of time series problems, including rule discovery, anomaly detection, density estimation, semantic segmentation, summarization, etc. Recent work has improved the scalability so exact motifs can be computed on datasets with up to a million data points in tenable time. However, in some domains, for example seismology or climatology, there is an immediate need to address even larger datasets. In this work, we demonstrate that a combination of a novel algorithm and a high-performance GPU allows us to significantly improve the scalability of motif discovery. We demonstrate the scalability of our ideas by finding the full set of exact motifs on a dataset with one hundred and forty-three million subsequences, which is by far the largest dataset ever mined for time series motifs/joins; it requires ten quadrillion pairwise comparisons. Furthermore, we demonstrate that our algorithm can produce actionable insights into seismology and ethology.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
10,224,499,928,500,000 * 0.000001 s is 324 years.
 
2
A small earthquake of that magnitude would only be felt by attentive humans in the immediate vicinity of the epicenter.
 
Literatur
1.
Zurück zum Zitat Agrawal R, Faloutsos C, Swami A (1993) Efficient similarity search in sequence databases. Foundations of data organization and algorithmsm, 69–84 Agrawal R, Faloutsos C, Swami A (1993) Efficient similarity search in sequence databases. Foundations of data organization and algorithmsm, 69–84
2.
Zurück zum Zitat Allstadt K, Malone SD (2014) Swarms of repeating stick-slip icequakes triggered by snow loading at Mount Rainier volcano. J Geophys Res Earth Surf 119(5):1180–1203CrossRef Allstadt K, Malone SD (2014) Swarms of repeating stick-slip icequakes triggered by snow loading at Mount Rainier volcano. J Geophys Res Earth Surf 119(5):1180–1203CrossRef
3.
Zurück zum Zitat Balasubramanian A, Wang J, Balakrishnan P (2016) Discovering multidimensional motifs in physiological signals for personalized healthcare. IEEE J Sel Top Signal Process 10(5):832–841CrossRef Balasubramanian A, Wang J, Balakrishnan P (2016) Discovering multidimensional motifs in physiological signals for personalized healthcare. IEEE J Sel Top Signal Process 10(5):832–841CrossRef
4.
Zurück zum Zitat Bailis P, Gan E, Rong K et al (2017) Prioritizing attention in fast data: principles and promise. In: CIDR Bailis P, Gan E, Rong K et al (2017) Prioritizing attention in fast data: principles and promise. In: CIDR
5.
Zurück zum Zitat Brown AEX, Yemini EI, Grundy LJ et al (2013) A dictionary of behavioral motifs reveals clusters of genes affecting caenorhabditis elegans locomotion. Proc Natl Acad Sci 110(2):791–796CrossRef Brown AEX, Yemini EI, Grundy LJ et al (2013) A dictionary of behavioral motifs reveals clusters of genes affecting caenorhabditis elegans locomotion. Proc Natl Acad Sci 110(2):791–796CrossRef
7.
Zurück zum Zitat Chandola V, Banerjee A, Kumar V (2007) Anomaly detection: a survey. Technical report, University of Minnesota Chandola V, Banerjee A, Kumar V (2007) Anomaly detection: a survey. Technical report, University of Minnesota
8.
Zurück zum Zitat Chiu B, Keogh E, Lonardi S (2003) Probabilistic discovery of time series motifs. In: SIGKDD, pp 493–498 Chiu B, Keogh E, Lonardi S (2003) Probabilistic discovery of time series motifs. In: SIGKDD, pp 493–498
9.
Zurück zum Zitat Geller RJ, Mueller CS (1980) Four similar earthquakes in central California. Geophys Res Lett 7(10):821–824CrossRef Geller RJ, Mueller CS (1980) Four similar earthquakes in central California. Geophys Res Lett 7(10):821–824CrossRef
10.
Zurück zum Zitat Harris M (2007) Optimizing parallel reduction in CUDA. NVIDIA Developer Technology 2.4 Harris M (2007) Optimizing parallel reduction in CUDA. NVIDIA Developer Technology 2.4
11.
Zurück zum Zitat Havskov J, Alguacil G (2004) Instrumentation in earthquake seismology, vol 358. Springer, DordrechtCrossRef Havskov J, Alguacil G (2004) Instrumentation in earthquake seismology, vol 358. Springer, DordrechtCrossRef
13.
Zurück zum Zitat Iverson RM, Dzurisin D, Gardner CA et al (2006) Dynamics of seismogenic volcanic extrusion at Mount St. Helens in 2004–2005. Nature 444(7118):439–443CrossRef Iverson RM, Dzurisin D, Gardner CA et al (2006) Dynamics of seismogenic volcanic extrusion at Mount St. Helens in 2004–2005. Nature 444(7118):439–443CrossRef
14.
Zurück zum Zitat Li Y, U LH, Yiu ML, Gong Z (2015) Quick-motif: An efficient and scalable framework for exact motif discovery. In: ICDE, IEEE, pp 579–590 Li Y, U LH, Yiu ML, Gong Z (2015) Quick-motif: An efficient and scalable framework for exact motif discovery. In: ICDE, IEEE, pp 579–590
15.
Zurück zum Zitat Luo W, Tan H, Mao H et al (2012) Efficient similarity joins on massive high-dimensional datasets using mapreduce. In: MDM, IEEE, pp 1–10 Luo W, Tan H, Mao H et al (2012) Efficient similarity joins on massive high-dimensional datasets using mapreduce. In: MDM, IEEE, pp 1–10
16.
Zurück zum Zitat McGovern A, Rosendahl D, Brown R et al (2011) Identifying predictive multi-dimensional time series motifs: an application to severe weather prediction. Data Min Knowl Discov 22(1):232–258CrossRef McGovern A, Rosendahl D, Brown R et al (2011) Identifying predictive multi-dimensional time series motifs: an application to severe weather prediction. Data Min Knowl Discov 22(1):232–258CrossRef
17.
Zurück zum Zitat Meng X, Yu X, Peng Z et al (2012) Detecting earthquakes around salton sea following the 2010 mw7.2 El Mayor-Cucapah earthquake using GPU parallel computing. Procedia Comput Sci 9:937–946CrossRef Meng X, Yu X, Peng Z et al (2012) Detecting earthquakes around salton sea following the 2010 mw7.2 El Mayor-Cucapah earthquake using GPU parallel computing. Procedia Comput Sci 9:937–946CrossRef
18.
Zurück zum Zitat Minnen D, Isbell CL, Essa I et al (2007) Discovering multivariate motifs using subsequence density estimation and greedy mixture learning. In: AAAI, pp 615–620 Minnen D, Isbell CL, Essa I et al (2007) Discovering multivariate motifs using subsequence density estimation and greedy mixture learning. In: AAAI, pp 615–620
19.
Zurück zum Zitat Mueen A, Keogh E, Zhu Q et al (2009) Exact discovery of time series motifs. In: SDM, pp 473–484 Mueen A, Keogh E, Zhu Q et al (2009) Exact discovery of time series motifs. In: SDM, pp 473–484
24.
Zurück zum Zitat Rakthanmanon T, Campana B, Mueen A et al (2013) Addressing big data time series: mining trillions of time series subsequences under dynamic time warping. TKDD 7(3):10CrossRef Rakthanmanon T, Campana B, Mueen A et al (2013) Addressing big data time series: mining trillions of time series subsequences under dynamic time warping. TKDD 7(3):10CrossRef
25.
Zurück zum Zitat Rong K, Bailis P (2017) ASAP: prioritizing attention via time series smoothing. VLDB Endowment 10(11):1358–1369CrossRef Rong K, Bailis P (2017) ASAP: prioritizing attention via time series smoothing. VLDB Endowment 10(11):1358–1369CrossRef
26.
Zurück zum Zitat Shelly DR, Beroza GC, Ide S et al (2006) Low-frequency earthquakes in Shikoku, Japan, and their relationship to episodic tremor and slip. Nature 442(7099):188–191CrossRef Shelly DR, Beroza GC, Ide S et al (2006) Low-frequency earthquakes in Shikoku, Japan, and their relationship to episodic tremor and slip. Nature 442(7099):188–191CrossRef
27.
Zurück zum Zitat Shelly DR, Beroza GC, Ide S (2017) Non-volcanic tremor and low-frequency earthquake swarms. Nature 446(7133):305–307CrossRef Shelly DR, Beroza GC, Ide S (2017) Non-volcanic tremor and low-frequency earthquake swarms. Nature 446(7133):305–307CrossRef
28.
Zurück zum Zitat Shelly DR, Ellsworth WL, Ryberg T et al (2009) Precise location of San Andreas Fault tremors near Cholame, California using seismometer clusters: Slip on the deep extension of the fault? Geophys Res Lett 36, L01303. https://doi.org/10.1029/2008GL036367 Shelly DR, Ellsworth WL, Ryberg T et al (2009) Precise location of San Andreas Fault tremors near Cholame, California using seismometer clusters: Slip on the deep extension of the fault? Geophys Res Lett 36, L01303. https://​doi.​org/​10.​1029/​2008GL036367
29.
Zurück zum Zitat Simeone A, Wilson RP (2003) In-depth studies of Magellanic penguin (Spheniscus magellanicus) foraging: Can we estimate prey consumption by perturbations in the dive profile? Mar Biol 143(4):825–831CrossRef Simeone A, Wilson RP (2003) In-depth studies of Magellanic penguin (Spheniscus magellanicus) foraging: Can we estimate prey consumption by perturbations in the dive profile? Mar Biol 143(4):825–831CrossRef
31.
Zurück zum Zitat Tanaka Y, Iwamoto K, Uehara K (2005) Discovery of time-series motif from multi-dimensional data based on MDL principle. Mach Learn 58(2):269–300CrossRefMATH Tanaka Y, Iwamoto K, Uehara K (2005) Discovery of time-series motif from multi-dimensional data based on MDL principle. Mach Learn 58(2):269–300CrossRefMATH
32.
Zurück zum Zitat Vahdatpour A, Amini N, Sarrafzadeh M (2009) Toward unsupervised activity discovery using multi-dimensional motif detection in time series. IJCAI 9:1261–1266 Vahdatpour A, Amini N, Sarrafzadeh M (2009) Toward unsupervised activity discovery using multi-dimensional motif detection in time series. IJCAI 9:1261–1266
33.
Zurück zum Zitat Wang L, Chng ES, Li H (2010) A tree-construction search approach for multivariate time series motifs discovery. Pattern Recognit Lett 31(9):869–875CrossRef Wang L, Chng ES, Li H (2010) A tree-construction search approach for multivariate time series motifs discovery. Pattern Recognit Lett 31(9):869–875CrossRef
34.
Zurück zum Zitat Wang X, Mueen A, Ding H et al (2013) Comparison of representation methods and distance measures for time series data. Data Min Knowl Discov 26(2):275–309MathSciNetCrossRef Wang X, Mueen A, Ding H et al (2013) Comparison of representation methods and distance measures for time series data. Data Min Knowl Discov 26(2):275–309MathSciNetCrossRef
35.
Zurück zum Zitat Yeh CCM, Zhu Y, Ulanova L et al (2016) Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: ICDM, IEEE, pp 579–588 Yeh CCM, Zhu Y, Ulanova L et al (2016) Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: ICDM, IEEE, pp 579–588
36.
Zurück zum Zitat Yoon CE, O’Reilly O, Bergen KJ et al (2015) Earthquake detection through computationally efficient similarity search. Sci Adv 1(11):e1501057CrossRef Yoon CE, O’Reilly O, Bergen KJ et al (2015) Earthquake detection through computationally efficient similarity search. Sci Adv 1(11):e1501057CrossRef
Metadaten
Titel
Exploiting a novel algorithm and GPUs to break the ten quadrillion pairwise comparisons barrier for time series motifs and joins
verfasst von
Yan Zhu
Zachary Zimmerman
Nader Shakibay Senobari
Chin-Chia Michael Yeh
Gareth Funning
Abdullah Mueen
Philip Brisk
Eamonn Keogh
Publikationsdatum
22.12.2017
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 1/2018
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-017-1138-x

Weitere Artikel der Ausgabe 1/2018

Knowledge and Information Systems 1/2018 Zur Ausgabe