skip to main content
10.1145/2339530.2339576acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Searching and mining trillions of time series subsequences under dynamic time warping

Authors Info & Claims
Published:12 August 2012Publication History

ABSTRACT

Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms. The difficulty of scaling search to large datasets largely explains why most academic work on time series data mining has plateaued at considering a few millions of time series objects, while much of industry and science sits on billions of time series objects waiting to be explored. In this work we show that by using a combination of four novel ideas we can search and mine truly massive time series for the first time. We demonstrate the following extremely unintuitive fact; in large datasets we can exactly search under DTW much more quickly than the current state-of-the-art Euclidean distance search algorithms. We demonstrate our work on the largest set of time series experiments ever attempted. In particular, the largest dataset we consider is larger than the combined size of all of the time series datasets considered in all data mining papers ever published. We show that our ideas allow us to solve higher-level time series data mining problem such as motif discovery and clustering at scales that would otherwise be untenable. In addition to mining massive datasets, we will show that our ideas also have implications for real-time monitoring of data streams, allowing us to handle much faster arrival rates and/or use cheaper and lower powered devices than are currently possible.

Skip Supplemental Material Section

Supplemental Material

best_paper_4.mp4

mp4

437.4 MB

References

  1. N. Adams, D. Marquez, and G. Wakefield. 2005. Iterative deepening for melody alignment and retrieval. ISMIR, 199--206.Google ScholarGoogle Scholar
  2. I. Assent, R. Krieger, F. Afschari, and T. Seidl. 2008. The TS-Tree: efficient time series search and retrieval. EDBT, 252--63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Alon, V. Athitsos, Q. Yuan, and S. Sclaroff. 2009. A unified framework for gesture recognition and spatiotemporal gesture segmentation. IEEE PAMI 31, 9, 1685--1699. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Bragge, M.P. Tarvainen, and P. A. Karjalainen. 2004. High-Resolution QRS Detection Algorithm for Sparsely Sampled ECG Recordings. Univ. of Kuopio, Dept. of Applied Physics Report.Google ScholarGoogle Scholar
  5. N. Chadwick, D. McMeekin, and T. Tan. 2011. Classifying eye and head movement artifacts in EEG Signals. DEST.Google ScholarGoogle Scholar
  6. H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, and E. J. Keogh. 2008. Querying and mining of time series data: experimental comparison of representations and distance measures. PVLDB 1, 2, 1542--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Dupasquier and S. Burschka. 2011. Data mining for hackers -- encrypted traffic mining. The 28th Chaos Comm' Congress.Google ScholarGoogle Scholar
  8. Y. Chen, G. Chen, K. Chen, and B. C. Ooi. 2009. Efficient processing of warping time series join of motion capture data. ICDE, 1048--1059. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Faceted DBLP. 2011. http://dblp.l3s.deGoogle ScholarGoogle Scholar
  10. A. Fornés, J. Lladós, and G. Sanchez. 2007. Old handwritten musical symbol classification by a dynamic time warping based method. Graphics Recognition 5046, 51--60.Google ScholarGoogle Scholar
  11. A. Fu, E. Keogh, L. Lau, C. Ratanamahatana, and R. Wong. 2008. Scaling and time warping in time series querying. VLDB J. 17, 4, 899--921. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. N. Gillian, R. Knapp, and S. O'Modhrain. 2011. Recognition of multivariate temporal musical gestures using n-dimensional dynamic time warping. Proc of the 11th Int'l conference on New Interfaces for Musical Expression.Google ScholarGoogle Scholar
  13. D. Goldberg. 1991. What every computer scientist should know about floating-point arithmetic. ACM Computing Surveys 23, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Guitel. 1975. Histoire comparée des numérations écrites. Chapter: "Les grands nombres en numération parlée," Paris: Flammarion, 566--574.Google ScholarGoogle Scholar
  15. R. Huber-Mörk, S. Zambanini, M. Zaharieva, and M. Kampel. 2011. Identification of ancient coins based on fusion of shape and local features. Mach. Vis. Appl. 22, 6, 983--994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Hsiao, K. West, and G. Vedatesh. 2005. Online context recognition in multisensor system using dynamic time warping. ISSNIP, 283--288.Google ScholarGoogle Scholar
  17. H. Jegou, M. Douze, C. Schmid, and P. Perez. 2010. Aggregating local descriptors into a compact image representation. IEEE CVPR, San Francisco, CA, USA.Google ScholarGoogle Scholar
  18. T. Kahveci and A. K. Singh. 2004. Optimizing similarity search for arbitrary length time series queries. IEEE Trans. Knowl. Data Eng. 16, 4, 418--433. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. Keogh and S. Kasetty. 2003. On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Mining and Knowledge. Discovery 7, 4, 349--371. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. Keogh, L. Wei, X. Xi, M. Vlachos, S.H. Lee, and P. Protopapas. 2009. Supporting exact indexing of arbitrarily rotated shapes and periodic time series under Euclidean and warping distance measures. VLDB J. 18, 3, 611--630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Kim, S Park, and W. Chu. 2001. An index-based approach for similarity search supporting time warping in large sequence databases. ICDE, 607--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. Laerhoven, E. Berlin, and B. Schiele. 2009. Enabling efficient time series analysis for wearable activity data. ICMLA, 392--397. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. H. Lim, H. Park, and S. W. Kim. 2007. Using multiple indexes for efficient subsequence matching in time-series databases. Inf. Sci. 177, 24, 5691--5706. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. P. Locke, L. W. Hillier, W. C. Warren, et al. 2011. Comparative and demographic analysis of orangutan genomes. Nature 469, 529--533.Google ScholarGoogle ScholarCross RefCross Ref
  25. A. Mueen and E. Keogh. 2010. Online discovery and maintenance of time series motifs. KDD, 1089--1098. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Mueen, E. Keogh, Q. Zhu, S. Cash, M. B. Westover, and N. Shamlo. 2011. A disk-aware algorithm for time series motif discovery. Data Min. Knowl. Discov. 22, 1--2, 73--105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Muller. 2009. Analysis and retrieval techniques for motion and music data. EUROGRAPHICS tutorial.Google ScholarGoogle Scholar
  28. P. Papapetrou, V. Athitsos, M. Potamias, G. Kollios, and D. Gunopulos. 2011. Embedding-based subsequence matching in time-series databases. ACM TODS 36, 3, 17*. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. W. Pressly. 2008. TSPad: a Tablet-PC based application for annotation and collaboration on time series data. ACM Southeast Regional Conference, 527--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. B. Raghavendra, D. Bera, A. Bopardikar, and R. Narayanan. 2011. Cardiac arrhythmia detection using dynamic time warping of ECG beats in e-healthcare systems. WOWMOM, 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. U. Rebbapragada, P. Protopapas, C. Brodley, and C. Alcock. 2009. Finding anomalous periodic time series. Machine Learning 74, 3, 281--313. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Y. Sakurai, C. Faloutsos, and M. Yamamuro. 2007. Stream monitoring under the time warping distance. ICDE, 1046--55.Google ScholarGoogle Scholar
  33. Y. Sakurai, M. Yoshikawa, and C. Faloutsos. 2005. FTW: fast similarity search under the time warping distance. PODS'05. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Srikanthan, A.Kumar, and R. Gupta. 2011. Implementing the dynamic time warping algorithm in multithreaded environments for real time and unsupervised pattern discovery. IEEE ICCCT, 394--398.Google ScholarGoogle Scholar
  35. J. Shieh and E. J. Keogh. 2008. iSAX: indexing and mining terabyte sized time series. KDD, 623--631. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. T. Stiefmeier, D. Roggen, and G. Tröster. 2007. Gestures are strings: efficient online gesture spotting and classification using string matching. Proceedings of the ICST 2nd international conference on Body area networks. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. C. R. Whitney. 1997. Jeanne Calment, World's elder, dies at 122. New York Times (August 5th, 1997).Google ScholarGoogle Scholar
  38. J. O. Wobbrock, A. D. Wilson, and Y. Li. 2007. Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes. ACM UIST, 159--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. L. Ye and E. Keogh. 2009. Time series shapelets: a new primitive for data mining. KDD, 947--956. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. B. Yi, H. Jagadish, and C. Faloutsos. 1998. Efficient retrieval of similar time sequences under time warping. ICDE, 201--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Y. Zhang and J. Glass. 2011. An inner-product lower-bound estimate for dynamic time warping. ICASSP, 5660--5663.Google ScholarGoogle Scholar
  42. A. Zinke and D. Mayer. 2006. Iterative Multi Scale Dynamic Time Warping. Universität Bonn, Tech Report # CG-2006--1.Google ScholarGoogle Scholar
  43. Project Website: www.cs.ucr.edu/~eamonn/UCRsuite.htmlGoogle ScholarGoogle Scholar

Index Terms

  1. Searching and mining trillions of time series subsequences under dynamic time warping

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2012
      1616 pages
      ISBN:9781450314626
      DOI:10.1145/2339530

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 August 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader