Skip to main content
Log in

Identifying predictive multi-dimensional time series motifs: an application to severe weather prediction

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

We introduce an efficient approach to mining multi-dimensional temporal streams of real-world data for ordered temporal motifs that can be used for prediction. Since many of the dimensions of the data are known or suspected to be irrelevant, our approach first identifies the salient dimensions of the data, then the key temporal motifs within each dimension, and finally the temporal ordering of the motifs necessary for prediction. For the prediction element, the data are assumed to be labeled. We tested the approach on two real-world data sets. To verify the generality of the approach, we validated the application on several subjects from the CMU Motion Capture database. Our main application uses several hundred numerically simulated supercell thunderstorms where the goal is to identify the most important features and feature interrelationships which herald the development of strong rotation in the lowest altitudes of a storm. We identified sets of precursors, in the form of meteorological quantities reaching extreme values in a particular temporal sequence, unique to storms producing strong low-altitude rotation. The eventual goal is to use this knowledge for future severe weather detection and prediction algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Adlerman E, Droegemeier KK (2005) The dependence of numerically simulated cyclic esocyclogenesis upon environmental vertical wind shear. Mon Weather Rev 133: 3595–3623

    Article  Google Scholar 

  • Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Bocca JB, Jarke M, Zaniolo C (eds) Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, Morgan Kaufmann, pp 487–499

  • Brotzge J, Droegemeier KK, McLaughlin DJ (2006) Collaborative adaptive sensing of the atmosphere (CASA): new radar system for improving analysis and forecasting of surface weather conditions. J Transp Res Board (1948), pp 145–151

  • Burgess DW, Donaldson RJ Jr, Desrochers PR (1993) The tornado: its structure, dynamics, prediction, and hazards, vol 79, American Geophysical Union, chap Tornado detection and warning by radar, pp 203–221

  • Cheng H, Tan PN (2008) Semi-supervised learning with data calibration for long-term time series forecasting. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 133–141

  • Chiu B, Keogh E, Lonardi S (2003) Probabilistic discovery of time series motifs. In: In the 9th ACM SIGKDD international conference on knowledge discovery and data mining, Washington, DC, pp 493–498

  • Das G, Lin K, Mannila H, Renganathan G, Smyth P (1998) Rule discovery from time series. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, New York, NY, pp 16–22

  • Denton A (2005) Kernel-density-based clustering of time series subsequences using a continuous random-walk noise model. In: Proceedings of the fifth IEEE international conference on data mining, pp 122–129

  • Donaldson RJ Jr, Dyer RM, Kraus MJ (1975) An objective evaluator of techniques for predicting severe weather events. In: Preprints: ninth conference on severe local storms, American Meteorological Society, pp 321–326

  • Faloutsos C, Jagadish HV, Mendelzon AO, Milo T (1997) A signature technique for similarity-based queries. In: Proceedings of compression and complexity of sequences, pp 2–20

  • Goldin D, Mardales R, Nagy G (2006) In search of meaning for time series subsequence clustering: matching algorithms based on a new distance measure. In: Proceedings of the 15th ACM international conference on information and knowledge management, pp 347–356

  • Hu M, Xue M, Brewster K, Gao J (2004) Prediction of Fort Worth tornadic thunderstorms using 3DVAR and cloud analysis with WSR-88D Level-II data. In: 11th Conference on aviation, range, aerospace and 22nd conference on severe local storms, American Meteorological Society, Electronically published, Paper J1.2

  • Idé T (2006) Why does subsequence time-series clustering produce sine waves? Lecture Notes in Computer Science. Springer, Berlin/Heidelberg

    Google Scholar 

  • Johnson JT, MacKeen PL, Witt A, Mitchell ED, Stumpf GJ, Eilts MD, Thomas KW (1998) The storm cell identification and tracking algorithm: an enhanced WSR-88D algorithm. Weather Forecast 13(2): 263–276

    Article  Google Scholar 

  • Kahveci T, Singh A, Gürel A (2002) Similarity searching for multi-attribute sequences. In: Proceedings of the international conference on scientific and statistical database management, pp 175–184

  • Kasetty S, Stafford C, Walker GP, Wang X, Keogh E (2008) Real-time classification of streaming sensor data. In: Proceedings of the 20th IEEE international conference on tools with artificial intelligence

  • Keogh E, Lin J, Truppel W (2003) Clustering of time series subsequences is meaningless: implications for past and future research. In: Proceedings of the 3rd IEEE international conference on data mining, pp 115–122

  • Keogh E, Lin J, Fu A (2005) HOT SAX: efficiently finding the most unusual time series subsequence. In: Proceedings of the 5th IEEE international conference on data mining (ICDM 2005), Houston, Texas, pp 226–233

  • Lee SL, Chun SJ, Kim DH, Lee JH, Chung CW (2000) Similarity search for multidimensional data sequences. In: Proceedings of the IEEE international conference on data engineering, pp 599–608

  • Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery, pp 2–11

  • Lin J, Keogh E, Li W, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Discov 15(2): 107–144

    Article  MathSciNet  Google Scholar 

  • McGovern A, Jensen D (2008) Optimistic pruning for multiple instance learning. Pattern Recognit Lett 29(9): 1252–1260

    Article  Google Scholar 

  • McGovern A, Supinie T, Gagne II DJ, Troutman N, Collier M, Brown RA, Basara J, Williams J (2010) Understanding severe weather processes through spatiotemporal relational random forests. In: 2010 NASA conference on intelligent data understanding (to appear)

  • McGovern A, Rosendahl DH, Kruger A, Beaton MG, Brown RA, Droegemeier KK (2007) Anticipating the formation of tornadoes through data mining. In: Preprints of the Fifth conference on artificial intelligence and its applications to environmental sciences at the american meteorological society annual meeting, American Meteorological Society, San Antonio, TX, Paper 4.3A

  • McGovern A, Hiers N, Collier M, Gagne II DJ, Brown RA (2008) Spatiotemporal relational probability trees. In: Proceedings of the 2008 IEEE international conference on data mining, Pisa, Italy, pp 935–940

  • Mueen A, Keogh E, Zhu Q, Cash S, Westover B (2009) Exact discovery of time series motifs. In: Proceedings of the SIAM international conference on data mining, pp 473–484

  • Oates T (1999) Identifying distinctive subsequences in multivariate time series by clustering. In: Proceedings of the Fifth international conference on knowledge discovery and data mining, pp 322–326

  • Oates T, Cohen PR (1996) Searching for structure in multiple streams of data. In: Proceedings of the thirteenth international conference on machine learning, Morgan Kaufmann, pp 346–354

  • Oates T, Jensen D, Cohen PR (1998) Discovering rules for clustering and predicting asynchronous events. In: Predicting the future: AI approaches to time series workshop, AAAI-98, pp 73–79

  • Provost FJ, Domingos P (2003) Tree induction for probability-based ranking. Mach Learn 52: 199–215

    Article  MATH  Google Scholar 

  • Rosendahl DH (2008) Identifying precursors to strong low-level rotation within numerically simulated supercell thunderstorms: a data mining approach. Master’s thesis, School of Meteorology, University of Oklahoma

  • Schaefer JT (1990) The critical success index as an indicator of warning skill. Weather Forecast 5(4): 570–575

    Article  MathSciNet  Google Scholar 

  • Shieh J, Keogh E (2009) iSAX: Indexing and mining terabyte sized time series. In: Proceedings of the IEEE international conference on data mining

  • Supinie T, McGovern A, Williams J, Abernethy J (2009) Spatiotemporal relational random forests. In: Proceedings of the IEEE international conference on data mining (ICDM) workshop on spatiotemporal data mining, p electronically published

  • Tanaka Y, Uehara K (2003) Discover motifs in multi-dimensional time-series using the principal component analysis and the mdl principle. In: Proceedings of the third international conference on machine learning and data mining in pattern recognition (MLDM 2003), pp 252–265

  • Vlachos M, Hadjielefheriou M, Gunopulos D, Keogh E (2006) Indexing multidimensional time-series. Int J Very Large Data Bases 15(1): 1–20

    Article  Google Scholar 

  • Webb GI (1995) OPUS: an efficient admissible algorithm for unordered search. J Artif Intell Res 3: 431–465

    MATH  Google Scholar 

  • Xi X, Keogh E, Wei L, Mafra-Neto A (2007) Finding motifs in database of shapes. In: Proceedings of the SIAM international conference on data mining

  • Xue M, Droegemeier KK, Wong V (2000) The advanced regional prediction system (ARPS)—a multiscale nonhydrostatic atmospheric simulation and prediction model. Part I: model dynamics and verification. Meteorol Atmos Phys 75: 161–193

    Article  Google Scholar 

  • Xue M, Droegemeier KK, Wong V, Shapiro A, Brewster K, Carr F, Weber D, Liu Y, Wang D (2001) The advanced regional prediction system (ARPS)—a multiscale nonhydrostatic atmospheric simulation and prediction tool. Part II: model physics and applications. Meteorol Atmos Phys 76: 134–165

    Article  Google Scholar 

  • Xue M, Wang D, Gao J, Brewster K, Droegemeier KK (2003) The advanced regional prediction system (ARPS), storm-scale numerical weather prediction and data assimilation. Meteorol Atmos Phys 82: 139–170

    Article  Google Scholar 

  • Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 947–956

  • Yin J, Gaber MM (2008) Clustering distibutied time series in sensor networks. In: Proceedings of the IEEE international conference on data mining, pp 678–687

  • Zaki MJ (2001) Spade: An efficient algorithm for mining frequent sequences. Mach Learn 42(1/2):31–60, special issue on unsupervised learning

    Google Scholar 

  • Zaki MJ, Parimi N, De N, Gao F, Phoophakdee B, Urban J, Chaoji V, Hasan MA, Salem S (2005) Towards generic pattern mining. In: International conference on formal concept anaysis

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amy McGovern.

Additional information

Responsible editor: Eamonn Keogh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

McGovern, A., Rosendahl, D.H., Brown, R.A. et al. Identifying predictive multi-dimensional time series motifs: an application to severe weather prediction. Data Min Knowl Disc 22, 232–258 (2011). https://doi.org/10.1007/s10618-010-0193-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-010-0193-7

Keywords

Navigation