skip to main content
research-article

Parsimonious linear fingerprinting for time series

Published:01 September 2010Publication History
Skip Abstract Section

Abstract

We study the problem of mining and summarizing multiple time series effectively and efficiently. We propose PLiF, a novel method to discover essential characteristics ("fingerprints"), by exploiting the joint dynamics in numerical sequences. Our fingerprinting method has the following benefits: (a) it leads to interpretable features; (b) it is versatile: PLiF enables numerous mining tasks, including clustering, compression, visualization, forecasting, and segmentation, matching top competitors in each task; and (c) it is fast and scalable, with linear complexity on the length of the sequences.

We did experiments on both synthetic and real datasets, including human motion capture data (17MB of human motions), sensor data (166 sensors), and network router traffic data (18 million raw updates over 2 years). Despite its generality, PLiF outperforms the top clustering methods on clustering; the top compression methods on compression (3 times better reconstruction error, for the same compression ratio); it gives meaningful visualization and at the same time, enjoys a linear scale-up.

References

  1. G. E. Box, G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. Prentice Hall, Englewood Cliffs, NJ, 3rd edition, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Deshpande, C. Guestrin, S. Madden, J. M. Hellerstein, and W. Hong. Model-driven data acquisition in sensor networks. In VLDB, pages 588--599, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series databases. In SIGMOD, pages 419--429, Minneapolis, MN, May 25--27 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. W.-C. Fu, E. J. Keogh, L. Y. H. Lau, and C. A. Ratanamahatana. Scaling and time warping in time series querying. In VLDB, pages 649--660, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, San Diego, CA, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Garofalakis, J. Gehrke, and R. Rastogi. Data Stream Management: Processing High-Speed Data Streams. Springer, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Z. Ghahramani and G. E. Hinton. Parameter estimation for linear dynamical systems. Technical Report CRG-TR-96-2, February 1996.Google ScholarGoogle Scholar
  8. A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss. Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. In VLDB, pages 79--88, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. H. Golub and C. F. Van Loan. Matrix computations (3rd ed.). Johns Hopkins University Press, Baltimore, MD, USA, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Gunopulos and G. Das. Time series similarity measures and time series indexing. In SIGMOD Conference, Santa Barbara, CA, 2001. Tutorial. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. Hastie, R. Tibshirani, and J. H. Friedman. The Elements of Statistical Learning. Springer, corrected edition, July 2003.Google ScholarGoogle Scholar
  12. M. Jahangiri, D. Sacharidis, and C. Shahabi. Shift-split: I/o efficient maintenance of wavelet-transformed multidimensional data. In SIGMOD Conference, pages 275--286, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Jain, E. Y. Chang, and Y.-F. Wang. Adaptive stream resource management using kalman filters. In SIGMOD, pages 11--22, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. S. Jensen and S. Pakalnis. Trax - real-world tracking of moving objects. In VLDB, pages 1362--1365, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. I. Jolliffe. Principal Component Analysis. Springer Verlag, 1986.Google ScholarGoogle Scholar
  16. R. E. Kalman. A new approach to linear filtering and prediction problems. Transactions of the ASME C Journal of Basic Engineering, (82 (Series D)):35--45, 1960.Google ScholarGoogle Scholar
  17. K. Kalpakis, D. Gada, and V. Puttagunta. Distance measures for effective clustering of arima time-series. In ICDM, pages 273--280, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. J. Keogh. Exact indexing of dynamic time warping. In VLDB, pages 406--417, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. J. Keogh, T. Palpanas, V. B. Zordan, D. Gunopulos, and M. Cardle. Indexing large human-motion databases. In VLDB, pages 780--791, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. Kollios, D. Gunopulos, and V. J. Tsotras. On indexing mobile objects. PODS, pages 261--272, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Li, J. McCann, N. Pollard, and C. Faloutsos. Dynammo: Mining and summarization of coevolving sequences with missing values. In KDD, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. Mouratidis, M. L. Yiu, D. Papadias, and N. Mamoulis. Continuous nearest neighbor monitoring in road networks. In VLDB, pages 43--54, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ü. Y. Ogras and H. Ferhatosmanoglu. Online summarization of dynamic time series data. VLDB J., 15(1):84--98, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. H. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala. Latent semantic indexing: A probabilistic analysis. In PODS, pages 159--168, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Papadimitriou, J. Sun, and C. Faloutsos. Streaming pattern discovery in multiple time-series. VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Rafiei and A. O. Mendelzon. Similarity-based queries for time series data. In SIGMOD Conference, pages 13--25, Tucson, AZ, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. Reeves, J. Liu, S. Nath, and F. Zhao. Managing massive time series streams with multiscale compressed trickles. PVLDB, 2(1):97--108, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Safonova and J. K. Hodgins. Construction and optimal search of interpolated motion graphs. ACM Trans. Graph., 26(3):106, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. H. Shumway and D. S. Stoffer. An approach to time series smoothing and forecasting using the em algorithm. Journal of Time Series Analysis, 3:253--264, 1982.Google ScholarGoogle ScholarCross RefCross Ref
  30. Y. Tao, C. Faloutsos, D. Papadias, and B. Liu. Prediction and indexing of moving objects with unknown motion patterns. In SIGMOD, pages 611--622, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. E. Tipping and C. M. Bishop. Probabilistic principal component analysis. Journal of the Royal Statistical Society, Series B, 61:611--622, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  32. Z. N. Zhang. The jordan canonical form of a real random matrix. In Numer. Math. J. Chinese Univ., 23(2001).Google ScholarGoogle Scholar

Index Terms

  1. Parsimonious linear fingerprinting for time series
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the VLDB Endowment
      Proceedings of the VLDB Endowment  Volume 3, Issue 1-2
      September 2010
      1658 pages

      Publisher

      VLDB Endowment

      Publication History

      • Published: 1 September 2010
      Published in pvldb Volume 3, Issue 1-2

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader