Abstract
We study the problem of mining and summarizing multiple time series effectively and efficiently. We propose PLiF, a novel method to discover essential characteristics ("fingerprints"), by exploiting the joint dynamics in numerical sequences. Our fingerprinting method has the following benefits: (a) it leads to interpretable features; (b) it is versatile: PLiF enables numerous mining tasks, including clustering, compression, visualization, forecasting, and segmentation, matching top competitors in each task; and (c) it is fast and scalable, with linear complexity on the length of the sequences.
We did experiments on both synthetic and real datasets, including human motion capture data (17MB of human motions), sensor data (166 sensors), and network router traffic data (18 million raw updates over 2 years). Despite its generality, PLiF outperforms the top clustering methods on clustering; the top compression methods on compression (3 times better reconstruction error, for the same compression ratio); it gives meaningful visualization and at the same time, enjoys a linear scale-up.
- G. E. Box, G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. Prentice Hall, Englewood Cliffs, NJ, 3rd edition, 1994. Google ScholarDigital Library
- A. Deshpande, C. Guestrin, S. Madden, J. M. Hellerstein, and W. Hong. Model-driven data acquisition in sensor networks. In VLDB, pages 588--599, 2004. Google ScholarDigital Library
- C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series databases. In SIGMOD, pages 419--429, Minneapolis, MN, May 25--27 1994. Google ScholarDigital Library
- A. W.-C. Fu, E. J. Keogh, L. Y. H. Lau, and C. A. Ratanamahatana. Scaling and time warping in time series querying. In VLDB, pages 649--660, 2005. Google ScholarDigital Library
- K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, San Diego, CA, 1990. Google ScholarDigital Library
- M. Garofalakis, J. Gehrke, and R. Rastogi. Data Stream Management: Processing High-Speed Data Streams. Springer, 2009. Google ScholarDigital Library
- Z. Ghahramani and G. E. Hinton. Parameter estimation for linear dynamical systems. Technical Report CRG-TR-96-2, February 1996.Google Scholar
- A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss. Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. In VLDB, pages 79--88, 2001. Google ScholarDigital Library
- G. H. Golub and C. F. Van Loan. Matrix computations (3rd ed.). Johns Hopkins University Press, Baltimore, MD, USA, 1996. Google ScholarDigital Library
- D. Gunopulos and G. Das. Time series similarity measures and time series indexing. In SIGMOD Conference, Santa Barbara, CA, 2001. Tutorial. Google ScholarDigital Library
- T. Hastie, R. Tibshirani, and J. H. Friedman. The Elements of Statistical Learning. Springer, corrected edition, July 2003.Google Scholar
- M. Jahangiri, D. Sacharidis, and C. Shahabi. Shift-split: I/o efficient maintenance of wavelet-transformed multidimensional data. In SIGMOD Conference, pages 275--286, 2005. Google ScholarDigital Library
- A. Jain, E. Y. Chang, and Y.-F. Wang. Adaptive stream resource management using kalman filters. In SIGMOD, pages 11--22, 2004. Google ScholarDigital Library
- C. S. Jensen and S. Pakalnis. Trax - real-world tracking of moving objects. In VLDB, pages 1362--1365, 2007. Google ScholarDigital Library
- I. Jolliffe. Principal Component Analysis. Springer Verlag, 1986.Google Scholar
- R. E. Kalman. A new approach to linear filtering and prediction problems. Transactions of the ASME C Journal of Basic Engineering, (82 (Series D)):35--45, 1960.Google Scholar
- K. Kalpakis, D. Gada, and V. Puttagunta. Distance measures for effective clustering of arima time-series. In ICDM, pages 273--280, 2001. Google ScholarDigital Library
- E. J. Keogh. Exact indexing of dynamic time warping. In VLDB, pages 406--417, 2002. Google ScholarDigital Library
- E. J. Keogh, T. Palpanas, V. B. Zordan, D. Gunopulos, and M. Cardle. Indexing large human-motion databases. In VLDB, pages 780--791, 2004. Google ScholarDigital Library
- G. Kollios, D. Gunopulos, and V. J. Tsotras. On indexing mobile objects. PODS, pages 261--272, 1999. Google ScholarDigital Library
- L. Li, J. McCann, N. Pollard, and C. Faloutsos. Dynammo: Mining and summarization of coevolving sequences with missing values. In KDD, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- K. Mouratidis, M. L. Yiu, D. Papadias, and N. Mamoulis. Continuous nearest neighbor monitoring in road networks. In VLDB, pages 43--54, 2006. Google ScholarDigital Library
- Ü. Y. Ogras and H. Ferhatosmanoglu. Online summarization of dynamic time series data. VLDB J., 15(1):84--98, 2006. Google ScholarDigital Library
- C. H. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala. Latent semantic indexing: A probabilistic analysis. In PODS, pages 159--168, 1998. Google ScholarDigital Library
- S. Papadimitriou, J. Sun, and C. Faloutsos. Streaming pattern discovery in multiple time-series. VLDB, 2005. Google ScholarDigital Library
- D. Rafiei and A. O. Mendelzon. Similarity-based queries for time series data. In SIGMOD Conference, pages 13--25, Tucson, AZ, 1997. Google ScholarDigital Library
- G. Reeves, J. Liu, S. Nath, and F. Zhao. Managing massive time series streams with multiscale compressed trickles. PVLDB, 2(1):97--108, 2009. Google ScholarDigital Library
- A. Safonova and J. K. Hodgins. Construction and optimal search of interpolated motion graphs. ACM Trans. Graph., 26(3):106, 2007. Google ScholarDigital Library
- R. H. Shumway and D. S. Stoffer. An approach to time series smoothing and forecasting using the em algorithm. Journal of Time Series Analysis, 3:253--264, 1982.Google ScholarCross Ref
- Y. Tao, C. Faloutsos, D. Papadias, and B. Liu. Prediction and indexing of moving objects with unknown motion patterns. In SIGMOD, pages 611--622, 2004. Google ScholarDigital Library
- M. E. Tipping and C. M. Bishop. Probabilistic principal component analysis. Journal of the Royal Statistical Society, Series B, 61:611--622, 1999.Google ScholarCross Ref
- Z. N. Zhang. The jordan canonical form of a real random matrix. In Numer. Math. J. Chinese Univ., 23(2001).Google Scholar
Index Terms
- Parsimonious linear fingerprinting for time series
Comments