skip to main content
research-article

Managing massive time series streams with multi-scale compressed trickles

Published:01 August 2009Publication History
Skip Abstract Section

Abstract

We present Cypress, a novel framework to archive and query massive time series streams such as those generated by sensor networks, data centers, and scientific computing. Cypress applies multi-scale analysis to decompose time series and to obtain sparse representations in various domains (e.g. frequency domain and time domain). Relying on the sparsity, the time series streams can be archived with reduced storage space. We then show that many statistical queries such as trend, histogram and correlations can be answered directly from compressed data rather than from reconstructed raw data. Our evaluation with server utilization data collected from real data centers shows significant benefit of our framework.

References

  1. D. J. Abadi, Y. Ahmad, M. Balazinska, U. Cetintemel, M. Cherniack, J.-H. Hwang, W. Lindner, A. S. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. Zdonik. The Design of the Borealis Stream Processing Engine. In Second Biennial Conference on Innovative Data Systems Research (CIDR 2005), 2005.Google ScholarGoogle Scholar
  2. D. Achlioptas. Database-friendly random projections. In ACM PODS, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. K. Aguilera, J. C. Mogul, J. L. Wiener, P. Reynolds, and A. Muthitacharoen. Performance debugging for distributed systems of black boxes. In ACM SOSP, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. E. Bingham and H. Mannila. Random projection in dimensionality reduction: applications to image and text data. In in Knowledge Discovery and Data Mining, pages 245--250. ACM Press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. Chen, W. He, J. Liu, S. Nath, L. Rigas, L. Xiao, and F. Zhao. Energy-aware server provisioning and load dispatching for connection-intensive internet services. In USENIX NSDI, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Cole, D. Shasha, and X. Zhao. Fast window correlations over uncooperative time series. In KDD, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Cormode, P. Indyk, N. Koudas, and S. Muthukrishan. Fast mining of massive tabular data via approximate distance computations. In ICDE, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  8. D. Donoho. Compressed sensing. IEEE Trans. on Information Theory, 52(4):1289--1306, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. X. Fan, W. dietrich Weber, and L. A. Barroso. Power provisioning for a warehouse-sized computer. In Intl. Symp. on Computer Architecture, pages 13--23, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. X. Z. Fern and C. E. Brodley. Random projection for high dimensional data clustering: A cluster ensemble approach. In Intl. Conf. on Machine Learning, 2003.Google ScholarGoogle Scholar
  11. S. Gandhi, S. Nath, S. Suri, and J. Liu. GAMPS: Compressing Multi Sensor Data by Grouping and Amplitude Scaling. In ACM SIGMOD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Garofalakis and P. B. Gibbons. Wavelet Synopses with Error Guarantees. In ACM SIGMOD, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. J. Strauss. One-pass wavelet decompositions of data streams. IEEE Transactions on Knowledge and Data Engineering, 15(3):541--554, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Haupt and R. Nowak. Signal reconstruction from noisy random projections. IEEE Trans. on Information Theory, 52(9):4036--4048, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Indyk, N. Koudas, and S. Muthukrishan. Identifying representative trends in massive time series data sets using sketches. In VLDB, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Indyk and R. Motwani. Approximate nearest neighbots: Towards removing the curse of dimensionality. In ACM STOC, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. W. Johnson and J. Lindenstrauss. Extensions of lipschitz mapping into hilber space. Contemporary Mathematics, 26:189--206, 1984.Google ScholarGoogle ScholarCross RefCross Ref
  18. Y. Lu, A. Montanari, B. Prabhakar, S. Dharmapurikar, and A. Kabbani. Counter braids: a novel counter architecture for per-flow measurement. In ACM SIGMETRICS, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. K. Menon, G. V. A. Pham, S. Chawla, and A. Viglas. An incremental data-stream sketch using sparse random projections. In SIAM International Conference on Data Mining, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  20. S. Papdimitriou, J. Sun, and C. Faloutsos. Streaming pattern discovery in multiple time-series. In VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. Sakurai, S. Papdimitriou, and C. Faloutsos. Braid: Stream mining through group lag correslations. In SIGMOD, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Z. D. Shasha. StatStream: Statistical monitoring of thousands of data streams in real time. In In VLDB, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Shieh and E. Keogh. iSAX: indexing and mining terabyte sized time series. In ACM SIGKDD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. N. Thaper, S. Guha, P. Indyk, and N. Koudas. Dynamic multidimensional histograms. In ACM SIGMOD, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Thiagarajan and S. Madden. Querying continuous functions in a database system. In ACM SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Vlachos, M. Hadjieleftheriou, D. Gunopulos, and E. Keogh. Indexing multidimensional time-series. The VLDB Journal, 15(1):1--20, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Managing massive time series streams with multi-scale compressed trickles

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader