skip to main content
research-article

Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping

Published:01 September 2013Publication History
Skip Abstract Section

Abstract

Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms, including classification, clustering, motif discovery, anomaly detection, and so on. The difficulty of scaling a search to large datasets explains to a great extent why most academic work on time series data mining has plateaued at considering a few millions of time series objects, while much of industry and science sits on billions of time series objects waiting to be explored. In this work we show that by using a combination of four novel ideas we can search and mine massive time series for the first time. We demonstrate the following unintuitive fact: in large datasets we can exactly search under Dynamic Time Warping (DTW) much more quickly than the current state-of-the-art Euclidean distance search algorithms. We demonstrate our work on the largest set of time series experiments ever attempted. In particular, the largest dataset we consider is larger than the combined size of all of the time series datasets considered in all data mining papers ever published. We explain how our ideas allow us to solve higher-level time series data mining problems such as motif discovery and clustering at scales that would otherwise be untenable. Moreover, we show how our ideas allow us to efficiently support the uniform scaling distance measure, a measure whose utility seems to be underappreciated, but which we demonstrate here. In addition to mining massive datasets with up to one trillion datapoints, we will show that our ideas also have implications for real-time monitoring of data streams, allowing us to handle much faster arrival rates and/or use cheaper and lower powered devices than are currently possible.

References

  1. Adams, N., Marquez, D., and Wakefield, G. 2005. Iterative deepening for melody alignment and retrieval. In Proceedings of ISMIR. 199--206.Google ScholarGoogle Scholar
  2. Alon, J., Athitsos, V., Yuan, Q., and Sclaroff, S. 2009. A unified framework for gesture recognition and spatiotemporal gesture segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 31, 9, 1685--1699. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Assent, I., Krieger, R., Afschari, F., and Seidl, T. 2008. The TS-Tree: Efficient time series search and retrieval. In Proceedings of EDBT. 252--63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bei, C. D. and Gray, R. M. 1985. An improvement of the minimum distortion encoding algorithm for vector quantization. IEEE Trans. Commun. 33, 10, 1132--1133.Google ScholarGoogle ScholarCross RefCross Ref
  5. Bragge, T., Tarvainen, M. P., and Karjalainen, P. A. 2004. High-Resolution QRS Detection Algorithm for Sparsely Sampled ECG Recordings. Department of Applied Physics Report, University of Kuopio.Google ScholarGoogle Scholar
  6. Chadwick, N. A., McMeekin, D. A., and Tan, T. 2011. Classifying eye and head movement artifacts in EEG Signals. In Proceedings of IEEE DEST. 285--291.Google ScholarGoogle Scholar
  7. Chan, T. F., Golub, G. H., and Leveque, R. J. 1983. Algorithms for computing the sample variance: Analysis and recommendations. Amer. Statist. 37, 242--247.Google ScholarGoogle ScholarCross RefCross Ref
  8. Chaovalitwongse, W. A., Sachdeo, R. C., Pardalos, P. M., Iasemidis, L. D., and Sackellares, J. C. 2005. Automated brain activity classifier. Epilepsia 46, 313.Google ScholarGoogle Scholar
  9. Chen, L. and Ng, R. 2004. On the marriage of LP-norms and edit distance. In Proceedings of VLDB. 792--803. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chen, Y., Chen, G., Chen, K., and Ooi, B. C. 2009. Efficient processing of warping time series join of motion capture data. In Proceedings of ICDE. 1048--1059. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cheng, D. Y., Gersho, A., Ramamurthi, B., and Shoham, Y. 1984. Fast search algorithms for vector quantization and pattern matching. In Proceedings of ICASSP. 372--375.Google ScholarGoogle Scholar
  12. Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., and Keogh, E. J. 2008. Querying and mining of time series data: Experimental comparison of representations and distance measures. J. VLDB 1, 2, 1542--1552. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Dupasquier, B. and Burschka, S. 2011. Data mining for hackers--Encrypted traffic mining. In Proceedings of the 28th Chaos Comm’ Congress.Google ScholarGoogle Scholar
  14. Fornés, A., Lladós, J., and Sanchez, G. 2007. Old handwritten musical symbol classification by a dynamic time warping based method. Graph. Recogn. 5046, 51--60.Google ScholarGoogle Scholar
  15. Fu, A., Keogh, E. J., Lau, L., Ratanamahatana, C., and Wong, R. 2008. Scaling and time warping in time series querying. VLDB J. 17, 4, 899--921. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Gillian, N., Knapp, R., and O’Modhrain, S. 2011. Recognition of multivariate temporal musical gestures using n-dimensional dynamic time warping. In Proceedings of the 11th International Conference on New Interfaces for Musical Expression.Google ScholarGoogle Scholar
  17. Goldberg, D. 1991. What every computer scientist should know about floating-point arithmetic. ACM Comput. Surv. 23, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Guitel, G. 1975. Histoire Comparée des Numérations Écrites. Flammarion, Paris. 566--574.Google ScholarGoogle Scholar
  19. Hsiao, M., West, K., and Vedatesh, G. 2005. Online context recognition in multisensor system using dynamic time warping. In Proceedings of ISSNIP. 283--288.Google ScholarGoogle Scholar
  20. Huber-Mörk, R., Zambanini, S., Zaharieva, M., and Kampel, M. 2011. Identification of ancient coins based on fusion of shape and local features. Mach. Vis. Appl. 22, 6, 983--994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jegou, H., Douze, M., Schmid, C., and Perez, P. 2010. Aggregating local descriptors into a compact image representation. In Proceedings of IEEE CVPR. 3304--3311.Google ScholarGoogle Scholar
  22. Kahveci, T. and Singh, A. K. 2004. Optimizing similarity search for arbitrary length time series queries. IEEE Trans. Knowl. Data Eng. 16, 4, 418--433. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Keogh, E. and Ratanamahatana, C. A. 2005. Exact indexing of dynamic time warping. Knowl. Inform. Syst. 7, 3, 358--386. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Keogh, E. J. and Kasetty, S. 2003. On the need for time series data mining benchmarks: A survey and empirical demonstration. Data Mining Knowl. Discov. 7, 4, 349--371. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Keogh, E. J., Palpanas, T., Zordan, V. B., Gunopulos, D., and Cardle, M. 2004. Indexing large human-motion databases. In Proceedings of VLDB. 780--791. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Keogh, E. J., Wei, L., Xi, X., Vlachos, M., Lee, S. H., and Protopapas, P. 2009. Supporting exact indexing of arbitrarily rotated shapes and periodic time series under Euclidean and warping distance measures. VLDB J. 18, 3, 611--630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kim, S., Park, S., and Chu, W. 2001. An index-based approach for similarity search supporting time warping in large sequence databases. In Proceedings of ICDE. 607--614. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Laerhoven, K., Berlin, E., and Schiele, B. 2009. Enabling efficient time series analysis for wearable activity data. In Proceedings of ICMLA. 392--397. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Lim, S. H., Park, H., and Kim, S. W. 2007. Using multiple indexes for efficient subsequence matching in time-series databases. Inf. Sci. 177, 24, 5691--5706. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ling, R. F. 1974. Comparison of several algorithms for computing sample means and variances. J. Amer. Statist. Assoc. 69, 348, 859--866.Google ScholarGoogle ScholarCross RefCross Ref
  31. Locke, D. P., Hillier, L. W., Warren, W. C., et al. 2011. Comparative and demographic analysis of orangutan genomes. Nature 469, 529--533.Google ScholarGoogle ScholarCross RefCross Ref
  32. Masek, W. J. and Paterson, M. S. 1980. A faster algorithm computing string edit distances. J. Comput. Syst. Sci. 20, 1, 18--31.Google ScholarGoogle ScholarCross RefCross Ref
  33. McNames, J. 2000. Rotated partial distance search for faster vector quantization encoding. IEEE Signal Proc. Lett. 7, 9, 244--246.Google ScholarGoogle ScholarCross RefCross Ref
  34. Mueen, A. and Keogh, E. J. 2010. Online discovery and maintenance of time series motifs. In Proceedings of KDD. 1089--1098. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Mueen, A., Keogh, E. J., Zhu, Q., Cash, S., Westover, M. B., and Shamlo, N. 2011. A disk-aware algorithm for time series motif discovery. Data Min. Knowl. Discov. 22, 1--2, 73--105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Muller, M. 2009. Analysis and retrieval techniques for motion and music data. EUROGRAPHICS tutorial.Google ScholarGoogle Scholar
  37. Papapetrou, P., Athitsos, V., Potamias, M., Kollios, G., and Gunopulos, D. 2011. Embedding-based subsequence matching in time-series databases. ACM Trans. Datab. Syst. 36, 3, 174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Pressly, W. 2008. TSPad: A Tablet-PC based application for annotation and collaboration on time series data. In Proceedings of ACM Southeast Regional Conference. 527--552. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Raghavendra, B., Bera, D., Bopardikar, A., and Narayanan, R. 2011. Cardiac arrhythmia detection using dynamic time warping of ECG beats in e-healthcare systems. In Proceedings of WOWMOM. 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Rebbapragada, U., Protopapas, P., Brodley, C., and Alcock, C. 2009. Finding anomalous periodic time series. Mach. Learn. 74, 3, 281--313. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Sakurai, Y., Yoshikawa, M., and Faloutsos, C. 2005. FTW: Fast similarity search under the time warping distance. In Proceedings of PODS. 326--337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Sakurai, Y., Faloutsos, C., and Yamamuro, M. 2007. Stream monitoring under the time warping distance. In Proceedings of ICDE. 1046--1055.Google ScholarGoogle Scholar
  43. Shieh, J. and Keogh, E. J. 2008. iSAX: Indexing and mining terabyte sized time series. In Proceedings of KDD. 623--631. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Srikanthan, S., Kumar, A., and Gupta, R. 2011. Implementing the dynamic time warping algorithm in multithreaded environments for real time and unsupervised pattern discovery. In Proceedings of IEEE ICCCT. 394--398.Google ScholarGoogle Scholar
  45. Stiefmeier, T., Roggen, D., and Tröster, G. 2007. Gestures are strings: Efficient online gesture spotting and classification using string matching. In Proceedings of the ICST 2nd International Conference on Body Area Networks. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Vlachos, M., Hadjieleftheriou, M., Gunopulos, D., and Keogh, E. J. 2003. Indexing multi-dimensional time-series with support for multiple distance measures. In Proceedings of KDD. 216--225. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Whitney, C. R. 1997. Jeanne Calment, World’s elder, dies at 122. New York Times (8/5/97).Google ScholarGoogle Scholar
  48. Wobbrock, J. O., Wilson, A. D., and Li, Y. 2007. Gestures without libraries, toolkits or training: A $1 recognizer for user interface prototypes. In Proceedings of ACM UIST. 159--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Ye, L. and Keogh, E. J. 2009. Time series shapelets: A new primitive for data mining. In Proceedings of KDD. 947--956. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Yi, B., Jagadish, H., and Faloutsos, C. 1998. Efficient retrieval of similar time sequences under time warping. In Proceedings of ICDE. 201--208. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 7, Issue 3
      Special Issue on ACM SIGKDD 2012
      September 2013
      156 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/2513092
      Issue’s Table of Contents

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 September 2013
      • Accepted: 1 February 2013
      • Revised: 1 December 2012
      • Received: 1 August 2012
      Published in tkdd Volume 7, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader