Abstract
We introduce an efficient approach to mining multi-dimensional temporal streams of real-world data for ordered temporal motifs that can be used for prediction. Since many of the dimensions of the data are known or suspected to be irrelevant, our approach first identifies the salient dimensions of the data, then the key temporal motifs within each dimension, and finally the temporal ordering of the motifs necessary for prediction. For the prediction element, the data are assumed to be labeled. We tested the approach on two real-world data sets. To verify the generality of the approach, we validated the application on several subjects from the CMU Motion Capture database. Our main application uses several hundred numerically simulated supercell thunderstorms where the goal is to identify the most important features and feature interrelationships which herald the development of strong rotation in the lowest altitudes of a storm. We identified sets of precursors, in the form of meteorological quantities reaching extreme values in a particular temporal sequence, unique to storms producing strong low-altitude rotation. The eventual goal is to use this knowledge for future severe weather detection and prediction algorithms.
Similar content being viewed by others
References
Adlerman E, Droegemeier KK (2005) The dependence of numerically simulated cyclic esocyclogenesis upon environmental vertical wind shear. Mon Weather Rev 133: 3595–3623
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Bocca JB, Jarke M, Zaniolo C (eds) Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, Morgan Kaufmann, pp 487–499
Brotzge J, Droegemeier KK, McLaughlin DJ (2006) Collaborative adaptive sensing of the atmosphere (CASA): new radar system for improving analysis and forecasting of surface weather conditions. J Transp Res Board (1948), pp 145–151
Burgess DW, Donaldson RJ Jr, Desrochers PR (1993) The tornado: its structure, dynamics, prediction, and hazards, vol 79, American Geophysical Union, chap Tornado detection and warning by radar, pp 203–221
Cheng H, Tan PN (2008) Semi-supervised learning with data calibration for long-term time series forecasting. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 133–141
Chiu B, Keogh E, Lonardi S (2003) Probabilistic discovery of time series motifs. In: In the 9th ACM SIGKDD international conference on knowledge discovery and data mining, Washington, DC, pp 493–498
Das G, Lin K, Mannila H, Renganathan G, Smyth P (1998) Rule discovery from time series. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, New York, NY, pp 16–22
Denton A (2005) Kernel-density-based clustering of time series subsequences using a continuous random-walk noise model. In: Proceedings of the fifth IEEE international conference on data mining, pp 122–129
Donaldson RJ Jr, Dyer RM, Kraus MJ (1975) An objective evaluator of techniques for predicting severe weather events. In: Preprints: ninth conference on severe local storms, American Meteorological Society, pp 321–326
Faloutsos C, Jagadish HV, Mendelzon AO, Milo T (1997) A signature technique for similarity-based queries. In: Proceedings of compression and complexity of sequences, pp 2–20
Goldin D, Mardales R, Nagy G (2006) In search of meaning for time series subsequence clustering: matching algorithms based on a new distance measure. In: Proceedings of the 15th ACM international conference on information and knowledge management, pp 347–356
Hu M, Xue M, Brewster K, Gao J (2004) Prediction of Fort Worth tornadic thunderstorms using 3DVAR and cloud analysis with WSR-88D Level-II data. In: 11th Conference on aviation, range, aerospace and 22nd conference on severe local storms, American Meteorological Society, Electronically published, Paper J1.2
Idé T (2006) Why does subsequence time-series clustering produce sine waves? Lecture Notes in Computer Science. Springer, Berlin/Heidelberg
Johnson JT, MacKeen PL, Witt A, Mitchell ED, Stumpf GJ, Eilts MD, Thomas KW (1998) The storm cell identification and tracking algorithm: an enhanced WSR-88D algorithm. Weather Forecast 13(2): 263–276
Kahveci T, Singh A, Gürel A (2002) Similarity searching for multi-attribute sequences. In: Proceedings of the international conference on scientific and statistical database management, pp 175–184
Kasetty S, Stafford C, Walker GP, Wang X, Keogh E (2008) Real-time classification of streaming sensor data. In: Proceedings of the 20th IEEE international conference on tools with artificial intelligence
Keogh E, Lin J, Truppel W (2003) Clustering of time series subsequences is meaningless: implications for past and future research. In: Proceedings of the 3rd IEEE international conference on data mining, pp 115–122
Keogh E, Lin J, Fu A (2005) HOT SAX: efficiently finding the most unusual time series subsequence. In: Proceedings of the 5th IEEE international conference on data mining (ICDM 2005), Houston, Texas, pp 226–233
Lee SL, Chun SJ, Kim DH, Lee JH, Chung CW (2000) Similarity search for multidimensional data sequences. In: Proceedings of the IEEE international conference on data engineering, pp 599–608
Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery, pp 2–11
Lin J, Keogh E, Li W, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Discov 15(2): 107–144
McGovern A, Jensen D (2008) Optimistic pruning for multiple instance learning. Pattern Recognit Lett 29(9): 1252–1260
McGovern A, Supinie T, Gagne II DJ, Troutman N, Collier M, Brown RA, Basara J, Williams J (2010) Understanding severe weather processes through spatiotemporal relational random forests. In: 2010 NASA conference on intelligent data understanding (to appear)
McGovern A, Rosendahl DH, Kruger A, Beaton MG, Brown RA, Droegemeier KK (2007) Anticipating the formation of tornadoes through data mining. In: Preprints of the Fifth conference on artificial intelligence and its applications to environmental sciences at the american meteorological society annual meeting, American Meteorological Society, San Antonio, TX, Paper 4.3A
McGovern A, Hiers N, Collier M, Gagne II DJ, Brown RA (2008) Spatiotemporal relational probability trees. In: Proceedings of the 2008 IEEE international conference on data mining, Pisa, Italy, pp 935–940
Mueen A, Keogh E, Zhu Q, Cash S, Westover B (2009) Exact discovery of time series motifs. In: Proceedings of the SIAM international conference on data mining, pp 473–484
Oates T (1999) Identifying distinctive subsequences in multivariate time series by clustering. In: Proceedings of the Fifth international conference on knowledge discovery and data mining, pp 322–326
Oates T, Cohen PR (1996) Searching for structure in multiple streams of data. In: Proceedings of the thirteenth international conference on machine learning, Morgan Kaufmann, pp 346–354
Oates T, Jensen D, Cohen PR (1998) Discovering rules for clustering and predicting asynchronous events. In: Predicting the future: AI approaches to time series workshop, AAAI-98, pp 73–79
Provost FJ, Domingos P (2003) Tree induction for probability-based ranking. Mach Learn 52: 199–215
Rosendahl DH (2008) Identifying precursors to strong low-level rotation within numerically simulated supercell thunderstorms: a data mining approach. Master’s thesis, School of Meteorology, University of Oklahoma
Schaefer JT (1990) The critical success index as an indicator of warning skill. Weather Forecast 5(4): 570–575
Shieh J, Keogh E (2009) iSAX: Indexing and mining terabyte sized time series. In: Proceedings of the IEEE international conference on data mining
Supinie T, McGovern A, Williams J, Abernethy J (2009) Spatiotemporal relational random forests. In: Proceedings of the IEEE international conference on data mining (ICDM) workshop on spatiotemporal data mining, p electronically published
Tanaka Y, Uehara K (2003) Discover motifs in multi-dimensional time-series using the principal component analysis and the mdl principle. In: Proceedings of the third international conference on machine learning and data mining in pattern recognition (MLDM 2003), pp 252–265
Vlachos M, Hadjielefheriou M, Gunopulos D, Keogh E (2006) Indexing multidimensional time-series. Int J Very Large Data Bases 15(1): 1–20
Webb GI (1995) OPUS: an efficient admissible algorithm for unordered search. J Artif Intell Res 3: 431–465
Xi X, Keogh E, Wei L, Mafra-Neto A (2007) Finding motifs in database of shapes. In: Proceedings of the SIAM international conference on data mining
Xue M, Droegemeier KK, Wong V (2000) The advanced regional prediction system (ARPS)—a multiscale nonhydrostatic atmospheric simulation and prediction model. Part I: model dynamics and verification. Meteorol Atmos Phys 75: 161–193
Xue M, Droegemeier KK, Wong V, Shapiro A, Brewster K, Carr F, Weber D, Liu Y, Wang D (2001) The advanced regional prediction system (ARPS)—a multiscale nonhydrostatic atmospheric simulation and prediction tool. Part II: model physics and applications. Meteorol Atmos Phys 76: 134–165
Xue M, Wang D, Gao J, Brewster K, Droegemeier KK (2003) The advanced regional prediction system (ARPS), storm-scale numerical weather prediction and data assimilation. Meteorol Atmos Phys 82: 139–170
Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 947–956
Yin J, Gaber MM (2008) Clustering distibutied time series in sensor networks. In: Proceedings of the IEEE international conference on data mining, pp 678–687
Zaki MJ (2001) Spade: An efficient algorithm for mining frequent sequences. Mach Learn 42(1/2):31–60, special issue on unsupervised learning
Zaki MJ, Parimi N, De N, Gao F, Phoophakdee B, Urban J, Chaoji V, Hasan MA, Salem S (2005) Towards generic pattern mining. In: International conference on formal concept anaysis
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Eamonn Keogh.
Rights and permissions
About this article
Cite this article
McGovern, A., Rosendahl, D.H., Brown, R.A. et al. Identifying predictive multi-dimensional time series motifs: an application to severe weather prediction. Data Min Knowl Disc 22, 232–258 (2011). https://doi.org/10.1007/s10618-010-0193-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-010-0193-7