research-article

Searching and mining trillions of time series subsequences under dynamic time warping

Authors:
Thanawin Rakthanmanon

University of California - Riverside, Riverside, CA, USA

University of California - Riverside, Riverside, CA, USA
View Profile

,
Bilson Campana

University of California - Riverside, Riverside, CA, USA

University of California - Riverside, Riverside, CA, USA
View Profile

,
Abdullah Mueen

University of California - Riverside, Riverside, CA, USA

University of California - Riverside, Riverside, CA, USA
View Profile

,
Gustavo Batista

University of São Paulo, São Paulo, Brazil

University of São Paulo, São Paulo, Brazil
View Profile

,
Brandon Westover

Brigham and Women's Hospital, Boston, MA, USA

Brigham and Women's Hospital, Boston, MA, USA
View Profile

,
Qiang Zhu

University of California - Riverside, Riverside, CA, USA

University of California - Riverside, Riverside, CA, USA
View Profile

,
Jesin Zakaria

University of California - Riverside, Riverside, CA, USA

University of California - Riverside, Riverside, CA, USA
View Profile

,
Eamonn Keogh

University of California - Riverside, Riverside, CA, USA

University of California - Riverside, Riverside, CA, USA
View Profile

KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2012Pages 262–270https://doi.org/10.1145/2339530.2339576

Published:12 August 2012Publication History

KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 262–270

ABSTRACT

Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms. The difficulty of scaling search to large datasets largely explains why most academic work on time series data mining has plateaued at considering a few millions of time series objects, while much of industry and science sits on billions of time series objects waiting to be explored. In this work we show that by using a combination of four novel ideas we can search and mine truly massive time series for the first time. We demonstrate the following extremely unintuitive fact; in large datasets we can exactly search under DTW much more quickly than the current state-of-the-art Euclidean distance search algorithms. We demonstrate our work on the largest set of time series experiments ever attempted. In particular, the largest dataset we consider is larger than the combined size of all of the time series datasets considered in all data mining papers ever published. We show that our ideas allow us to solve higher-level time series data mining problem such as motif discovery and clustering at scales that would otherwise be untenable. In addition to mining massive datasets, we will show that our ideas also have implications for real-time monitoring of data streams, allowing us to handle much faster arrival rates and/or use cheaper and lower powered devices than are currently possible.

Supplemental Material

best_paper_4.mp4

mp4

437.4 MB

Download

References

N. Adams, D. Marquez, and G. Wakefield. 2005. Iterative deepening for melody alignment and retrieval. ISMIR, 199--206.Google Scholar
I. Assent, R. Krieger, F. Afschari, and T. Seidl. 2008. The TS-Tree: efficient time series search and retrieval. EDBT, 252--63. Google ScholarDigital Library
J. Alon, V. Athitsos, Q. Yuan, and S. Sclaroff. 2009. A unified framework for gesture recognition and spatiotemporal gesture segmentation. IEEE PAMI 31, 9, 1685--1699. Google ScholarDigital Library
T. Bragge, M.P. Tarvainen, and P. A. Karjalainen. 2004. High-Resolution QRS Detection Algorithm for Sparsely Sampled ECG Recordings. Univ. of Kuopio, Dept. of Applied Physics Report.Google Scholar
N. Chadwick, D. McMeekin, and T. Tan. 2011. Classifying eye and head movement artifacts in EEG Signals. DEST.Google Scholar
H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, and E. J. Keogh. 2008. Querying and mining of time series data: experimental comparison of representations and distance measures. PVLDB 1, 2, 1542--52. Google ScholarDigital Library
B. Dupasquier and S. Burschka. 2011. Data mining for hackers -- encrypted traffic mining. The 28th Chaos Comm' Congress.Google Scholar
Y. Chen, G. Chen, K. Chen, and B. C. Ooi. 2009. Efficient processing of warping time series join of motion capture data. ICDE, 1048--1059. Google ScholarDigital Library
Faceted DBLP. 2011. http://dblp.l3s.deGoogle Scholar
A. Fornés, J. Lladós, and G. Sanchez. 2007. Old handwritten musical symbol classification by a dynamic time warping based method. Graphics Recognition 5046, 51--60.Google Scholar
A. Fu, E. Keogh, L. Lau, C. Ratanamahatana, and R. Wong. 2008. Scaling and time warping in time series querying. VLDB J. 17, 4, 899--921. Google ScholarDigital Library
N. Gillian, R. Knapp, and S. O'Modhrain. 2011. Recognition of multivariate temporal musical gestures using n-dimensional dynamic time warping. Proc of the 11th Int'l conference on New Interfaces for Musical Expression.Google Scholar
D. Goldberg. 1991. What every computer scientist should know about floating-point arithmetic. ACM Computing Surveys 23, 1. Google ScholarDigital Library
G. Guitel. 1975. Histoire comparée des numérations écrites. Chapter: "Les grands nombres en numération parlée," Paris: Flammarion, 566--574.Google Scholar
R. Huber-Mörk, S. Zambanini, M. Zaharieva, and M. Kampel. 2011. Identification of ancient coins based on fusion of shape and local features. Mach. Vis. Appl. 22, 6, 983--994. Google ScholarDigital Library
M. Hsiao, K. West, and G. Vedatesh. 2005. Online context recognition in multisensor system using dynamic time warping. ISSNIP, 283--288.Google Scholar
H. Jegou, M. Douze, C. Schmid, and P. Perez. 2010. Aggregating local descriptors into a compact image representation. IEEE CVPR, San Francisco, CA, USA.Google Scholar
T. Kahveci and A. K. Singh. 2004. Optimizing similarity search for arbitrary length time series queries. IEEE Trans. Knowl. Data Eng. 16, 4, 418--433. Google ScholarDigital Library
E. Keogh and S. Kasetty. 2003. On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Mining and Knowledge. Discovery 7, 4, 349--371. Google ScholarDigital Library
E. Keogh, L. Wei, X. Xi, M. Vlachos, S.H. Lee, and P. Protopapas. 2009. Supporting exact indexing of arbitrarily rotated shapes and periodic time series under Euclidean and warping distance measures. VLDB J. 18, 3, 611--630. Google ScholarDigital Library
S. Kim, S Park, and W. Chu. 2001. An index-based approach for similarity search supporting time warping in large sequence databases. ICDE, 607--61. Google ScholarDigital Library
K. Laerhoven, E. Berlin, and B. Schiele. 2009. Enabling efficient time series analysis for wearable activity data. ICMLA, 392--397. Google ScholarDigital Library
S. H. Lim, H. Park, and S. W. Kim. 2007. Using multiple indexes for efficient subsequence matching in time-series databases. Inf. Sci. 177, 24, 5691--5706. Google ScholarDigital Library
D. P. Locke, L. W. Hillier, W. C. Warren, et al. 2011. Comparative and demographic analysis of orangutan genomes. Nature 469, 529--533.Google ScholarCross Ref
A. Mueen and E. Keogh. 2010. Online discovery and maintenance of time series motifs. KDD, 1089--1098. Google ScholarDigital Library
A. Mueen, E. Keogh, Q. Zhu, S. Cash, M. B. Westover, and N. Shamlo. 2011. A disk-aware algorithm for time series motif discovery. Data Min. Knowl. Discov. 22, 1--2, 73--105. Google ScholarDigital Library
M. Muller. 2009. Analysis and retrieval techniques for motion and music data. EUROGRAPHICS tutorial.Google Scholar
P. Papapetrou, V. Athitsos, M. Potamias, G. Kollios, and D. Gunopulos. 2011. Embedding-based subsequence matching in time-series databases. ACM TODS 36, 3, 17*. Google ScholarDigital Library
W. Pressly. 2008. TSPad: a Tablet-PC based application for annotation and collaboration on time series data. ACM Southeast Regional Conference, 527--52. Google ScholarDigital Library
B. Raghavendra, D. Bera, A. Bopardikar, and R. Narayanan. 2011. Cardiac arrhythmia detection using dynamic time warping of ECG beats in e-healthcare systems. WOWMOM, 1--6. Google ScholarDigital Library
U. Rebbapragada, P. Protopapas, C. Brodley, and C. Alcock. 2009. Finding anomalous periodic time series. Machine Learning 74, 3, 281--313. Google ScholarDigital Library
Y. Sakurai, C. Faloutsos, and M. Yamamuro. 2007. Stream monitoring under the time warping distance. ICDE, 1046--55.Google Scholar
Y. Sakurai, M. Yoshikawa, and C. Faloutsos. 2005. FTW: fast similarity search under the time warping distance. PODS'05. Google ScholarDigital Library
S. Srikanthan, A.Kumar, and R. Gupta. 2011. Implementing the dynamic time warping algorithm in multithreaded environments for real time and unsupervised pattern discovery. IEEE ICCCT, 394--398.Google Scholar
J. Shieh and E. J. Keogh. 2008. iSAX: indexing and mining terabyte sized time series. KDD, 623--631. Google ScholarDigital Library
T. Stiefmeier, D. Roggen, and G. Tröster. 2007. Gestures are strings: efficient online gesture spotting and classification using string matching. Proceedings of the ICST 2nd international conference on Body area networks. Google ScholarDigital Library
C. R. Whitney. 1997. Jeanne Calment, World's elder, dies at 122. New York Times (August 5th, 1997).Google Scholar
J. O. Wobbrock, A. D. Wilson, and Y. Li. 2007. Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes. ACM UIST, 159--168. Google ScholarDigital Library
L. Ye and E. Keogh. 2009. Time series shapelets: a new primitive for data mining. KDD, 947--956. Google ScholarDigital Library
B. Yi, H. Jagadish, and C. Faloutsos. 1998. Efficient retrieval of similar time sequences under time warping. ICDE, 201--208. Google ScholarDigital Library
Y. Zhang and J. Glass. 2011. An inner-product lower-bound estimate for dynamic time warping. ICASSP, 5660--5663.Google Scholar
A. Zinke and D. Mayer. 2006. Iterative Multi Scale Dynamic Time Warping. Universität Bonn, Tech Report # CG-2006--1.Google Scholar
Project Website: www.cs.ucr.edu/~eamonn/UCRsuite.htmlGoogle Scholar

Index Terms

Searching and mining trillions of time series subsequences under dynamic time warping
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping
Special Issue on ACM SIGKDD 2012

Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms, including classification, clustering, motif ...
Read More
Data mining a trillion time series subsequences under dynamic time warping
IJCAI '13: Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms. The difficulty of scaling search to large datasets ...
Read More
Speeding up similarity search under dynamic time warping by pruning unpromising alignments

Similarity search is the core procedure for several time series mining tasks. While different distance measures can be used for this purpose, there is clear evidence that the Dynamic Time Warping (DTW) is the most suitable distance function for a wide ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2012
1616 pages
ISBN:9781450314626
DOI:10.1145/2339530
General Chair:
Qiang Yang
Hong Kong University of Science and Technology
,
Program Chairs:
Deepak Agarwal
LinkedIn
,
Jian Pei
Simon Fraser University
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 August 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
lower bounds
similarity search
time series
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 659
  Total Citations
  View Citations
- 5,292
  Total Downloads
- Downloads (Last 12 months)492
- Downloads (Last 6 weeks)89
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Searching and mining trillions of time series subsequences under dynamic time warping

KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping

Data mining a trillion time series subsequences under dynamic time warping

Speeding up similarity search under dynamic time warping by pruning unpromising alignments