research-article

Managing massive time series streams with multi-scale compressed trickles

Authors:
Galen Reeves

University of California, Berkeley, CA

University of California, Berkeley, CA
View Profile

,
Jie Liu

Microsoft Research, Redmond, WA

Microsoft Research, Redmond, WA
View Profile

,
Suman Nath

Microsoft Research, Redmond, WA

Microsoft Research, Redmond, WA
View Profile

,
Feng Zhao

Microsoft Research, Redmond, WA

Microsoft Research, Redmond, WA
View Profile

Proceedings of the VLDB Endowment Volume 2 Issue 1pp 97–108https://doi.org/10.14778/1687627.1687639

Published:01 August 2009Publication History

Proceedings of the VLDB Endowment

Abstract

We present Cypress, a novel framework to archive and query massive time series streams such as those generated by sensor networks, data centers, and scientific computing. Cypress applies multi-scale analysis to decompose time series and to obtain sparse representations in various domains (e.g. frequency domain and time domain). Relying on the sparsity, the time series streams can be archived with reduced storage space. We then show that many statistical queries such as trend, histogram and correlations can be answered directly from compressed data rather than from reconstructed raw data. Our evaluation with server utilization data collected from real data centers shows significant benefit of our framework.

References

D. J. Abadi, Y. Ahmad, M. Balazinska, U. Cetintemel, M. Cherniack, J.-H. Hwang, W. Lindner, A. S. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. Zdonik. The Design of the Borealis Stream Processing Engine. In Second Biennial Conference on Innovative Data Systems Research (CIDR 2005), 2005.Google Scholar
D. Achlioptas. Database-friendly random projections. In ACM PODS, 2001. Google ScholarDigital Library
M. K. Aguilera, J. C. Mogul, J. L. Wiener, P. Reynolds, and A. Muthitacharoen. Performance debugging for distributed systems of black boxes. In ACM SOSP, 2003. Google ScholarDigital Library
E. Bingham and H. Mannila. Random projection in dimensionality reduction: applications to image and text data. In in Knowledge Discovery and Data Mining, pages 245--250. ACM Press, 2001. Google ScholarDigital Library
G. Chen, W. He, J. Liu, S. Nath, L. Rigas, L. Xiao, and F. Zhao. Energy-aware server provisioning and load dispatching for connection-intensive internet services. In USENIX NSDI, 2008. Google ScholarDigital Library
R. Cole, D. Shasha, and X. Zhao. Fast window correlations over uncooperative time series. In KDD, 2005. Google ScholarDigital Library
G. Cormode, P. Indyk, N. Koudas, and S. Muthukrishan. Fast mining of massive tabular data via approximate distance computations. In ICDE, 2002.Google ScholarCross Ref
D. Donoho. Compressed sensing. IEEE Trans. on Information Theory, 52(4):1289--1306, 2006. Google ScholarDigital Library
X. Fan, W. dietrich Weber, and L. A. Barroso. Power provisioning for a warehouse-sized computer. In Intl. Symp. on Computer Architecture, pages 13--23, 2007. Google ScholarDigital Library
X. Z. Fern and C. E. Brodley. Random projection for high dimensional data clustering: A cluster ensemble approach. In Intl. Conf. on Machine Learning, 2003.Google Scholar
S. Gandhi, S. Nath, S. Suri, and J. Liu. GAMPS: Compressing Multi Sensor Data by Grouping and Amplitude Scaling. In ACM SIGMOD, 2009. Google ScholarDigital Library
M. Garofalakis and P. B. Gibbons. Wavelet Synopses with Error Guarantees. In ACM SIGMOD, 2002. Google ScholarDigital Library
A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. J. Strauss. One-pass wavelet decompositions of data streams. IEEE Transactions on Knowledge and Data Engineering, 15(3):541--554, 2003. Google ScholarDigital Library
J. Haupt and R. Nowak. Signal reconstruction from noisy random projections. IEEE Trans. on Information Theory, 52(9):4036--4048, 2006. Google ScholarDigital Library
P. Indyk, N. Koudas, and S. Muthukrishan. Identifying representative trends in massive time series data sets using sketches. In VLDB, 2000. Google ScholarDigital Library
P. Indyk and R. Motwani. Approximate nearest neighbots: Towards removing the curse of dimensionality. In ACM STOC, 1998. Google ScholarDigital Library
W. Johnson and J. Lindenstrauss. Extensions of lipschitz mapping into hilber space. Contemporary Mathematics, 26:189--206, 1984.Google ScholarCross Ref
Y. Lu, A. Montanari, B. Prabhakar, S. Dharmapurikar, and A. Kabbani. Counter braids: a novel counter architecture for per-flow measurement. In ACM SIGMETRICS, 2008. Google ScholarDigital Library
A. K. Menon, G. V. A. Pham, S. Chawla, and A. Viglas. An incremental data-stream sketch using sparse random projections. In SIAM International Conference on Data Mining, 2007.Google ScholarCross Ref
S. Papdimitriou, J. Sun, and C. Faloutsos. Streaming pattern discovery in multiple time-series. In VLDB, 2005. Google ScholarDigital Library
Y. Sakurai, S. Papdimitriou, and C. Faloutsos. Braid: Stream mining through group lag correslations. In SIGMOD, June 2005. Google ScholarDigital Library
Y. Z. D. Shasha. StatStream: Statistical monitoring of thousands of data streams in real time. In In VLDB, 2002. Google ScholarDigital Library
J. Shieh and E. Keogh. iSAX: indexing and mining terabyte sized time series. In ACM SIGKDD, 2008. Google ScholarDigital Library
N. Thaper, S. Guha, P. Indyk, and N. Koudas. Dynamic multidimensional histograms. In ACM SIGMOD, 2002. Google ScholarDigital Library
A. Thiagarajan and S. Madden. Querying continuous functions in a database system. In ACM SIGMOD, 2008. Google ScholarDigital Library
M. Vlachos, M. Hadjieleftheriou, D. Gunopulos, and E. Keogh. Indexing multidimensional time-series. The VLDB Journal, 15(1):1--20, 2006. Google ScholarDigital Library

Index Terms

Managing massive time series streams with multi-scale compressed trickles

Recommendations

Analyzing massive data streams: past, present, and future
DMKD '03: Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery

Continuous data streams arise naturally, for example, in the installations of large telecom and Internet service providers where detailed usage information (Call-Detail-Records, SNMP-/RMON packet-flow data, etc.) from different parts of the underlying ...
Read More
A Framework for Clustering Massive-Domain Data Streams
ICDE '09: Proceedings of the 2009 IEEE International Conference on Data Engineering

In this paper, we will examine the problem of clustering massive domain data streams. Massive-domain data streams are those in which the number of possible domain values for each attribute are very large and cannot be easily tracked for clustering ...
Read More
Query-aware partitioning for monitoring massive network data streams
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data

Data Stream Management Systems (DSMS) are gaining acceptance for applications that need to process very large volumes of data in real time. The load generated by such applications frequently exceeds by far the computation capabilities of a single ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 2, Issue 1
August 2009
1293 pages
ISSN:2150-8097
Editors:
Serge Abiteboul,
Tova Milo,
Jignesh Patel,
Philippe Rigaux
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 August 2009
Published in pvldb Volume 2, Issue 1
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 20
  Total Citations
  View Citations
- 386
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Managing massive time series streams with multi-scale compressed trickles

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Analyzing massive data streams: past, present, and future

A Framework for Clustering Massive-Domain Data Streams

Query-aware partitioning for monitoring massive network data streams

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Managing massive time series streams with multi-scale compressed trickles

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Analyzing massive data streams: past, present, and future

A Framework for Clustering Massive-Domain Data Streams

Query-aware partitioning for monitoring massive network data streams

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media