Article

Data-streams and histograms

Authors:
Sudipto Guha

AT&T Research

AT&T Research
View Profile

,
Nick Koudas

AT&T Research

AT&T Research
View Profile

,
Kyuseok Shim

Computer Science Department and AITRC, KAIST

Computer Science Department and AITRC, KAIST
View Profile

STOC '01: Proceedings of the thirty-third annual ACM symposium on Theory of computingJuly 2001Pages 471–475https://doi.org/10.1145/380752.380841

Published:06 July 2001Publication History

STOC '01: Proceedings of the thirty-third annual ACM symposium on Theory of computing

Pages 471–475

ABSTRACT

Histograms have been used widely to capture data distribution, to represent the data by a small number of step functions. Dynamic programming algorithms which provide optimal construction of these histograms exist, albeit running in quadratic time and linear space. In this paper we provide linear time construction of 1 + ε approximation of optimal histograms, running in polylogarithmic space.

Our results extend to the context of data-streams, and in fact generalize to give 1 + ε approximation of several problems in data-streams which require partitioning the index set into intervals. The only assumptions required are that the cost of an interval is monotonic under inclusion (larger interval has larger cost) and that the cost can be computed or approximated in small space. This exhibits a nice class of problems for which we can have near optimal data-stream algorithms.

References

1.S. Acharya, P. Gibbons, V. Poosala, and S. Ramaswamy. Join Synopses For Approximate Query Answering. Proceedings of ACM SIGMOD, pages 275-286, June 1999.]] Google ScholarDigital Library
2.S. Acharya, P. Gibbons, V. Poosala, and S. Ramaswamy. The Aqua Approximate Query Answering System. Proceedings of ACM SIGMOD, pages 574-578, June 1999.]] Google ScholarDigital Library
3.M. Bertolotto and M. J. Egenhofer. Progressive vector transmission. Proceedings of the 7th ACM symposium on Advances in Geographical Information Systems, 1999.]] Google ScholarDigital Library
4.S. Chaudhuri, R. Motwani, and V. Narasayya. Random sampling for histogram construction: How much is enough ? Proceedings of ACM SIGMOD, 1998.]] Google ScholarDigital Library
5.J. Feigenbaum, S. Kannan, M. Strauss, and M. Vishwanathan. An approximate l1-difference algorithm for massive data sets. Proceedings of 40th Annual IEEE Symposium on Foundations of Computer Science, pages 501-511, 1999.]] Google ScholarDigital Library
6.P. Flajolet and G. N. Martin. Probabilistic counting. Proceedings of 24th Annual IEEE Symposium on Foundations of Computer Science, pages 76-82, 1983.]]Google ScholarDigital Library
7.P. Gibbons, Y. Mattias, and V. Poosala. Fast Incremental Maintenance of Approximate Histograms. Proceedings of VLDB, pages 466-475, 1997.]] Google ScholarDigital Library
8.J. Gray, A. Bosworth, A. Leyman, and H. Pirahesh. Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross Tab and Sub Total. Proceedings of ICDE, pages 152-159, 1996.]] Google ScholarDigital Library
9.M. R. Henzinger, P .Raghavan, and S. Rajagopalan. Computing on data streams. Technical Report 1998-011, Digital Equipment Corporation, Systems Research Center, May, 1998.]]Google Scholar
10.P. Indyk. Stable distributions, pseudorandom generators, embeddings and data stream computation. Proceedings of 41st Annual Symposium on Foundations of Computer Science, 2000.]] Google ScholarDigital Library
11.Y. Ioannidis. Universality of Serial Histograms. Proceedings of VLDB, pages 256-277, 1993.]] Google ScholarDigital Library
12.Y. Ioannidis and S. Christodoulakis. Optimal Histograms for Limiting Worst-Case Error Propagation in the Size of Join Results. ACM Transactions on Database Systems, Vol. 18, No. 4, pages 709-748, December 1993.]] Google ScholarDigital Library
13.Y. Ioannidis and V. Poosala. Histogram Based Approximation of Set Valued Query Answers. Proceedings of VLDB, pages 174-185, 1999.]] Google ScholarDigital Library
14.Y. Ioannidis and V. Poosala. Balancing Histogram Optimality and Practicality for Query Result Size Estimation. Proceedings of ACM SIGMOD, 1995.]] Google ScholarDigital Library
15.H. V Jagadish, N. Koudas, S. Muthukrishnan, V. Poosala, K. C. Sevcik, and T. Suel. Optimal Histograms with Quality Guarantees. Proceedings of VLDB, 1998.]] Google ScholarDigital Library
16.H. V. Jagadish, Nick Koudas, and S. Muthukrishnan. Mining Deviants in a Time Series Database. Proceedings of VLDB, 1999.]] Google ScholarDigital Library
17.N. Koudas, S. Muthukrishnan, and D. Srivastava. Optimal histograms for hierarchical range queries. Proceedings of ACM PODS, 2000.]] Google ScholarDigital Library
18.G .S .Manku, S. Rajagopalan, and B. Lindsay. Approximate medians and other quantiles in one pass with limited memory. Proceedings of the ACM SIGMOD, 1998.]] Google ScholarDigital Library
19.G .S .Manku, S. Rajagopalan, and B. Lindsay. Random sampling techniques for space efficient online computation of order statistics of large databases. Proceedings of ACM SIGMOD, 1999.]] Google ScholarDigital Library
20.Y. Matias, J. Scott Vitter, and M. Wang. Wavelet-Based Histograms for Selectivity Estimation. Proceedings of ACM SIGMOD, 1998.]] Google ScholarDigital Library
21.J. I. Munro and M. S. Paterson. Selection and sorting with limited storage. Theoretical Computer Science, pages 315-323, 1980.]]Google Scholar
22.M. Muralikrishna and D. J. DeWitt. Equi-depth histograms for estimating selectivity factors for multidimension al queries. Proceedings of ACM SIGMOD, 1998.]]Google Scholar
23.S. Muthukrishnan, V. Poosala, and T. Suel. On Rectangular Partitioning In Two Dimensions: Algorithms, Complexity and Applications. Proceedings of ICDT, 1999.]] Google ScholarDigital Library
24.V. Poosala and V. Ganti. Fast Approximate Answers To Aggregate Queries On A Datacube. SSDBM, pages 24-33, 1999.]] Google ScholarDigital Library
25.V. Poosala and Y. Ioannidis. Estimation of Query-Result Distribution and Its Application In Parallel Join Load Balancing. Proceedings of VLDB, 1996.]] Google ScholarDigital Library
26.V. Poosala and Y. Ioannidis. Selectivity Estimation Without the Attribute Value Independence Assumption. Proceedings of VLDB, 1997.]] Google ScholarDigital Library
27.V. Poosala, Y. Ioannidis, P. Haas, and E. Shekita. Improved Histograms for Selectivity Estimation of Range Predicates. Proceedings of ACM SIGMOD, 1996.]] Google ScholarDigital Library
28.J. Vitter and M. Wang. Approximate computation of multidimensional aggregates on sparse data using wavelets. Proceedings of ACM SIGMOD, 1999.]] Google ScholarDigital Library
29.J.S. Vitter, M. Wang, and B. R. Iyer. Data Cube Approximation and Histograms via Wavelets. Proceedings of the 1998 ACM CIKM Intern. Conf. on Information and Knowledge Management, 1998.]] Google ScholarDigital Library

Index Terms

Data-streams and histograms

Recommendations

Data Streams with Bounded Deletions
PODS '18: Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Two prevalent models in the data stream literature are the insertion-only and turnstile models. Unfortunately, many important streaming problems require a Θ(log(n)) multiplicative factor more space for turnstile streams than for insertion-only streams. ...
Read More
Change detection in learning histograms from data streams
EPIA'07: Proceedings of the aritficial intelligence 13th Portuguese conference on Progress in artificial intelligence

In this paper we study the problem of constructing histograms from high-speed time-changing data streams. Learning in this context requires the ability to process examples once at the rate they arrive, maintaining a histogram consistent with the most ...
Read More
Punctuated data streams
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
STOC '01: Proceedings of the thirty-third annual ACM symposium on Theory of computing
July 2001
755 pages
ISBN:1581133499
DOI:10.1145/380752
Chairmen:
Jeffrey S. Vitter,
Paul Spirakis,
Mihalis Yannakakis
Copyright © 2001 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 July 2001
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
STOC '01 Paper Acceptance Rate83of230submissions,36%Overall Acceptance Rate1,469of4,586submissions,32%
More
Upcoming Conference
STOC '24

Sponsor:

sigact

56th Annual ACM Symposium on Theory of Computing (STOC 2024)

June 24 - 28, 2024

Vancouver , BC , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 186
  Total Citations
  View Citations
- 203
  Total Downloads
- Downloads (Last 12 months)37
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Data-streams and histograms

STOC '01: Proceedings of the thirty-third annual ACM symposium on Theory of computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Data Streams with Bounded Deletions

Change detection in learning histograms from data streams

Punctuated data streams

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Data-streams and histograms

STOC '01: Proceedings of the thirty-third annual ACM symposium on Theory of computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Data Streams with Bounded Deletions

Change detection in learning histograms from data streams

Punctuated data streams

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media