ABSTRACT
Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms. The difficulty of scaling search to large datasets largely explains why most academic work on time series data mining has plateaued at considering a few millions of time series objects, while much of industry and science sits on billions of time series objects waiting to be explored. In this work we show that by using a combination of four novel ideas we can search and mine truly massive time series for the first time. We demonstrate the following extremely unintuitive fact; in large datasets we can exactly search under DTW much more quickly than the current state-of-the-art Euclidean distance search algorithms. We demonstrate our work on the largest set of time series experiments ever attempted. In particular, the largest dataset we consider is larger than the combined size of all of the time series datasets considered in all data mining papers ever published. We show that our ideas allow us to solve higher-level time series data mining problem such as motif discovery and clustering at scales that would otherwise be untenable. In addition to mining massive datasets, we will show that our ideas also have implications for real-time monitoring of data streams, allowing us to handle much faster arrival rates and/or use cheaper and lower powered devices than are currently possible.
Supplemental Material
- N. Adams, D. Marquez, and G. Wakefield. 2005. Iterative deepening for melody alignment and retrieval. ISMIR, 199--206.Google Scholar
- I. Assent, R. Krieger, F. Afschari, and T. Seidl. 2008. The TS-Tree: efficient time series search and retrieval. EDBT, 252--63. Google ScholarDigital Library
- J. Alon, V. Athitsos, Q. Yuan, and S. Sclaroff. 2009. A unified framework for gesture recognition and spatiotemporal gesture segmentation. IEEE PAMI 31, 9, 1685--1699. Google ScholarDigital Library
- T. Bragge, M.P. Tarvainen, and P. A. Karjalainen. 2004. High-Resolution QRS Detection Algorithm for Sparsely Sampled ECG Recordings. Univ. of Kuopio, Dept. of Applied Physics Report.Google Scholar
- N. Chadwick, D. McMeekin, and T. Tan. 2011. Classifying eye and head movement artifacts in EEG Signals. DEST.Google Scholar
- H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, and E. J. Keogh. 2008. Querying and mining of time series data: experimental comparison of representations and distance measures. PVLDB 1, 2, 1542--52. Google ScholarDigital Library
- B. Dupasquier and S. Burschka. 2011. Data mining for hackers -- encrypted traffic mining. The 28th Chaos Comm' Congress.Google Scholar
- Y. Chen, G. Chen, K. Chen, and B. C. Ooi. 2009. Efficient processing of warping time series join of motion capture data. ICDE, 1048--1059. Google ScholarDigital Library
- Faceted DBLP. 2011. http://dblp.l3s.deGoogle Scholar
- A. Fornés, J. Lladós, and G. Sanchez. 2007. Old handwritten musical symbol classification by a dynamic time warping based method. Graphics Recognition 5046, 51--60.Google Scholar
- A. Fu, E. Keogh, L. Lau, C. Ratanamahatana, and R. Wong. 2008. Scaling and time warping in time series querying. VLDB J. 17, 4, 899--921. Google ScholarDigital Library
- N. Gillian, R. Knapp, and S. O'Modhrain. 2011. Recognition of multivariate temporal musical gestures using n-dimensional dynamic time warping. Proc of the 11th Int'l conference on New Interfaces for Musical Expression.Google Scholar
- D. Goldberg. 1991. What every computer scientist should know about floating-point arithmetic. ACM Computing Surveys 23, 1. Google ScholarDigital Library
- G. Guitel. 1975. Histoire comparée des numérations écrites. Chapter: "Les grands nombres en numération parlée," Paris: Flammarion, 566--574.Google Scholar
- R. Huber-Mörk, S. Zambanini, M. Zaharieva, and M. Kampel. 2011. Identification of ancient coins based on fusion of shape and local features. Mach. Vis. Appl. 22, 6, 983--994. Google ScholarDigital Library
- M. Hsiao, K. West, and G. Vedatesh. 2005. Online context recognition in multisensor system using dynamic time warping. ISSNIP, 283--288.Google Scholar
- H. Jegou, M. Douze, C. Schmid, and P. Perez. 2010. Aggregating local descriptors into a compact image representation. IEEE CVPR, San Francisco, CA, USA.Google Scholar
- T. Kahveci and A. K. Singh. 2004. Optimizing similarity search for arbitrary length time series queries. IEEE Trans. Knowl. Data Eng. 16, 4, 418--433. Google ScholarDigital Library
- E. Keogh and S. Kasetty. 2003. On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Mining and Knowledge. Discovery 7, 4, 349--371. Google ScholarDigital Library
- E. Keogh, L. Wei, X. Xi, M. Vlachos, S.H. Lee, and P. Protopapas. 2009. Supporting exact indexing of arbitrarily rotated shapes and periodic time series under Euclidean and warping distance measures. VLDB J. 18, 3, 611--630. Google ScholarDigital Library
- S. Kim, S Park, and W. Chu. 2001. An index-based approach for similarity search supporting time warping in large sequence databases. ICDE, 607--61. Google ScholarDigital Library
- K. Laerhoven, E. Berlin, and B. Schiele. 2009. Enabling efficient time series analysis for wearable activity data. ICMLA, 392--397. Google ScholarDigital Library
- S. H. Lim, H. Park, and S. W. Kim. 2007. Using multiple indexes for efficient subsequence matching in time-series databases. Inf. Sci. 177, 24, 5691--5706. Google ScholarDigital Library
- D. P. Locke, L. W. Hillier, W. C. Warren, et al. 2011. Comparative and demographic analysis of orangutan genomes. Nature 469, 529--533.Google ScholarCross Ref
- A. Mueen and E. Keogh. 2010. Online discovery and maintenance of time series motifs. KDD, 1089--1098. Google ScholarDigital Library
- A. Mueen, E. Keogh, Q. Zhu, S. Cash, M. B. Westover, and N. Shamlo. 2011. A disk-aware algorithm for time series motif discovery. Data Min. Knowl. Discov. 22, 1--2, 73--105. Google ScholarDigital Library
- M. Muller. 2009. Analysis and retrieval techniques for motion and music data. EUROGRAPHICS tutorial.Google Scholar
- P. Papapetrou, V. Athitsos, M. Potamias, G. Kollios, and D. Gunopulos. 2011. Embedding-based subsequence matching in time-series databases. ACM TODS 36, 3, 17*. Google ScholarDigital Library
- W. Pressly. 2008. TSPad: a Tablet-PC based application for annotation and collaboration on time series data. ACM Southeast Regional Conference, 527--52. Google ScholarDigital Library
- B. Raghavendra, D. Bera, A. Bopardikar, and R. Narayanan. 2011. Cardiac arrhythmia detection using dynamic time warping of ECG beats in e-healthcare systems. WOWMOM, 1--6. Google ScholarDigital Library
- U. Rebbapragada, P. Protopapas, C. Brodley, and C. Alcock. 2009. Finding anomalous periodic time series. Machine Learning 74, 3, 281--313. Google ScholarDigital Library
- Y. Sakurai, C. Faloutsos, and M. Yamamuro. 2007. Stream monitoring under the time warping distance. ICDE, 1046--55.Google Scholar
- Y. Sakurai, M. Yoshikawa, and C. Faloutsos. 2005. FTW: fast similarity search under the time warping distance. PODS'05. Google ScholarDigital Library
- S. Srikanthan, A.Kumar, and R. Gupta. 2011. Implementing the dynamic time warping algorithm in multithreaded environments for real time and unsupervised pattern discovery. IEEE ICCCT, 394--398.Google Scholar
- J. Shieh and E. J. Keogh. 2008. iSAX: indexing and mining terabyte sized time series. KDD, 623--631. Google ScholarDigital Library
- T. Stiefmeier, D. Roggen, and G. Tröster. 2007. Gestures are strings: efficient online gesture spotting and classification using string matching. Proceedings of the ICST 2nd international conference on Body area networks. Google ScholarDigital Library
- C. R. Whitney. 1997. Jeanne Calment, World's elder, dies at 122. New York Times (August 5th, 1997).Google Scholar
- J. O. Wobbrock, A. D. Wilson, and Y. Li. 2007. Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes. ACM UIST, 159--168. Google ScholarDigital Library
- L. Ye and E. Keogh. 2009. Time series shapelets: a new primitive for data mining. KDD, 947--956. Google ScholarDigital Library
- B. Yi, H. Jagadish, and C. Faloutsos. 1998. Efficient retrieval of similar time sequences under time warping. ICDE, 201--208. Google ScholarDigital Library
- Y. Zhang and J. Glass. 2011. An inner-product lower-bound estimate for dynamic time warping. ICASSP, 5660--5663.Google Scholar
- A. Zinke and D. Mayer. 2006. Iterative Multi Scale Dynamic Time Warping. Universität Bonn, Tech Report # CG-2006--1.Google Scholar
- Project Website: www.cs.ucr.edu/~eamonn/UCRsuite.htmlGoogle Scholar
Index Terms
- Searching and mining trillions of time series subsequences under dynamic time warping
Recommendations
Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping
Special Issue on ACM SIGKDD 2012Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms, including classification, clustering, motif ...
Data mining a trillion time series subsequences under dynamic time warping
IJCAI '13: Proceedings of the Twenty-Third international joint conference on Artificial IntelligenceMost time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms. The difficulty of scaling search to large datasets ...
Speeding up similarity search under dynamic time warping by pruning unpromising alignments
Similarity search is the core procedure for several time series mining tasks. While different distance measures can be used for this purpose, there is clear evidence that the Dynamic Time Warping (DTW) is the most suitable distance function for a wide ...
Comments