ABSTRACT
The detection of repeated subsequences, time series motifs, is a problem which has been shown to have great utility for several higher-level data mining algorithms, including classification, clustering, segmentation, forecasting, and rule discovery. In recent years there has been significant research effort spent on efficiently discovering these motifs in static offline databases. However, for many domains, the inherent streaming nature of time series demands online discovery and maintenance of time series motifs. In this paper, we develop the first online motif discovery algorithm which monitors and maintains motifs exactly in real time over the most recent history of a stream. Our algorithm has a worst-case update time which is linear to the window size and is extendible to maintain more complex pattern structures. In contrast, the current offline algorithms either need significant update time or require very costly pre-processing steps which online algorithms simply cannot afford.
Our core ideas allow useful extensions of our algorithm to deal with arbitrary data rates and discovering multidimensional motifs. We demonstrate the utility of our algorithms with a variety of case studies in the domains of robotics, acoustic monitoring and online compression.
Supplemental Material
- Agrawal, R., Faloutsos, C. and Swami, A.N. Efficient Similarity Search in Sequence Databases. FODO 1993: 69--84. Google ScholarDigital Library
- Beaudoin, P., Panne, M., Poulin, P. and Coros, S. Motion-Motif Graphs. ACM/EG Symposium on Computer Animation 2008. Google ScholarDigital Library
- Bespamyatnikh, S. N. An Optimal Algorithm for Closest Pair Maintenance. ACM SCG '95, 152--161. Google ScholarDigital Library
- Bulut, A. and Singh, A.: SWAT: Hierarchical Stream Summarization in Large Networks. In Proceedings of the ICDE (2003).Google ScholarCross Ref
- Cardinal, J. and Eppstein, D. Lazy Algorithms for Dynamic Closest Pair with Arbitrary Distance Measures. ALENEX/ANALC 2004.Google Scholar
- Chiu, B., Keogh, E. and Lonardi, S. Probabilistic Discovery of Time Series Motifs. ACM SIGKDD 2003. pp 493--498. Google ScholarDigital Library
- Cummins, M. and Newman, P. FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance. The International Journal of Robotics Research, 27(6), 647--665, 2008. Google ScholarDigital Library
- Datar, M., Gionis, A., Indyk, P., and Motwani, R. Maintaining Stream Statistics over Sliding Windows. SIAM J. Comput. 31, 6 (Jun. 2002), 1794--1813. Google ScholarDigital Library
- Dawson, D. K. and Efford, M. G. Bird Population Density Estimated from Acoustic Signals. Journal of Applied Ecology. Volume 46 Issue 6, Pages 1201--1209.Google Scholar
- Ding, H., Trajcevski, G., Scheuermann, P., Wang, X. and Keogh, E. Querying and Mining of Time Series Data: Experimental Comparison of Representations and Distance Measures. VLDB 2008. Google ScholarDigital Library
- Dohnal, V., Gennaro C. and Zezula, P. Similarity Join in Metric Spaces Using eD-Index. Database and Expert Systems Applications, Volume 2736, pp. 484--493, 2003. Google ScholarDigital Library
- Eppstein, D. Fast Hierarchical Clustering and Other Applications of Dynamic Closest Pairs. ACM Journal of Experimental Algorithmics 5:1 (2000). Google ScholarDigital Library
- Fuchs, E., Gruber, T., Nitschke, J. and Sick, B. On-line Motif Detection in Time Series with SwiftMotif. In: Pattern Recognition 42(11):3015--3031, 2009. Google ScholarDigital Library
- Keogh, E., Chu, S., Hart, D. and Pazzani, M. An Online Algorithm for Segmenting Time Series. ICDM, pp. 289--296, 2001. Google ScholarDigital Library
- Lazaridis, I. and Mehrotra, S. Capturing Sensor-Generated Time Series with Quality Guarantees. ICDE 2003.Google Scholar
- Lin, J., Keogh, E., Lonardi, S. and Patel, P. Finding Motifs in Time Series, Workshop on Temporal Data Mining (KDD'02), 2002.Google Scholar
- Mueen, A., Keogh, E., Zhu, Q., Cash, S. and Westover, B. Exact Discovery of Time Series Motif. SDM 2009.Google Scholar
- N. Tatbul, N., Çetintemel, U., Zdonik, S., Cherniack, M. and Stonebraker. M. Load Shedding in a Data Stream Manager. VLDB 2003, pp. 309--320. Google ScholarDigital Library
- Nanopoulos A., Theodoridis Y. and Manolopoulos, Y. C2P: Clustering Based on Closest Pairs. VLDB, pp. 331--340, 2001. Google ScholarDigital Library
- Odlyzko, A.M. and Rains, E.M. On Longest Increasing Subsequences in Random Permutation, Analysis, Geometry, Number Theory: the Mathematics of Leon Ehrenpreis.439--451, Contemp. Math., 251, Amer. Math. Soc., Providence, RI, 2000.Google Scholar
- Ogras, Y. and Ferhatosmanoglu, H. Online Summarization of Dynamic Time Series Data. The VLDB Journal, 15(1):84--98, 2006. Google ScholarDigital Library
- Palpanas, T., Vlachos, M., Keogh, E., Gunopulos, D. and Truppel, W. Online Amnesic Approximation of Streaming Time Series. In ICDE 2004. Google ScholarDigital Library
- Patel, P., Keogh, E., Lin, J. and Lonardi, S. Mining Motifs in Massive Time Series Databases. ICDM 2002. Google ScholarDigital Library
- Patnaik, D., Marwah, M., Sharma, R.K. and Ramakrishnan, N. Sustainable Operation and Management of Data Center Chillers using Temporal Data Mining. KDD 2009: 1305--1314. Google ScholarDigital Library
- Penteriani, V. Variation in the Function of Eagle Owl Vocal Behaviour: Territorial Defence and Intra-Pair Communication? Ethol. Ecol. Evol. 14: 275--281.Google ScholarCross Ref
- Trifa, V.M., Girod, L., Collier, T., Blumstein, D.T. and Taylor, C.E. Automated Wildlife Monitoring Using Self-Configuring Sensor Networks Deployed in Natural Habitats. AROB 2007.Google Scholar
- Vahdatpour, A., Amini, N. and Sarrafzadeh, M. Towards Unsupervised Activity Discovery using Multi Dimensional Motif Detection in Time Series. 21st International Joint Conference on Artificial Intelligence (IJCAI) 2009, Pasadena, California. Google ScholarDigital Library
- Weber R., Schek, H-J. and Blott, S. A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. VLDB, pp. 194--205, 1998. Google ScholarDigital Library
- Wurst, M., Morik, K. and Mierswa, I. Localized Alternative Cluster Ensembles for Collaborative Structuring. ECML 2006. pp. 485--496. Google ScholarDigital Library
- Yankov, D., Keogh, E., Medina, J., Chiu, B. and Zordan V. Detecting Motifs under Uniform Scaling. SIGKDD 2007. Google ScholarDigital Library
- Supporting webpage containing Data, Code, Videos, Excel sheet and Presentation slides. Link: http://www.cs.ucr.edu/~mueen/OnlineMotif/index.htmlGoogle Scholar
Index Terms
- Online discovery and maintenance of time series motifs
Recommendations
Probabilistic discovery of time series motifs
KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data miningSeveral important time series data mining problems reduce to the core task of finding approximately repeated subsequences in a longer time series. In an earlier work, we formalized the idea of approximately repeated subsequences by introducing the ...
A variable-length motifs discovery method in time series using hybrid approach
iiWAS '17: Proceedings of the 19th International Conference on Information Integration and Web-based Applications & ServicesDiscovery of repeated patterns, known as motifs, from long time series is essential for providing hidden knowledge to real-world applications like medical, financial and weather analysis. Motifs can be discovered on raw time series directly or on their ...
Latent Time-Series Motifs
Motifs are the most repetitive/frequent patterns of a time-series. The discovery of motifs is crucial for practitioners in order to understand and interpret the phenomena occurring in sequential data. Currently, motifs are searched among series sub-...
Comments