skip to main content
10.1145/380752.380841acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
Article

Data-streams and histograms

Published:06 July 2001Publication History

ABSTRACT

Histograms have been used widely to capture data distribution, to represent the data by a small number of step functions. Dynamic programming algorithms which provide optimal construction of these histograms exist, albeit running in quadratic time and linear space. In this paper we provide linear time construction of 1 + ε approximation of optimal histograms, running in polylogarithmic space.

Our results extend to the context of data-streams, and in fact generalize to give 1 + ε approximation of several problems in data-streams which require partitioning the index set into intervals. The only assumptions required are that the cost of an interval is monotonic under inclusion (larger interval has larger cost) and that the cost can be computed or approximated in small space. This exhibits a nice class of problems for which we can have near optimal data-stream algorithms.

References

  1. 1.S. Acharya, P. Gibbons, V. Poosala, and S. Ramaswamy. Join Synopses For Approximate Query Answering. Proceedings of ACM SIGMOD, pages 275-286, June 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.S. Acharya, P. Gibbons, V. Poosala, and S. Ramaswamy. The Aqua Approximate Query Answering System. Proceedings of ACM SIGMOD, pages 574-578, June 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.M. Bertolotto and M. J. Egenhofer. Progressive vector transmission. Proceedings of the 7th ACM symposium on Advances in Geographical Information Systems, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.S. Chaudhuri, R. Motwani, and V. Narasayya. Random sampling for histogram construction: How much is enough ? Proceedings of ACM SIGMOD, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.J. Feigenbaum, S. Kannan, M. Strauss, and M. Vishwanathan. An approximate l1-difference algorithm for massive data sets. Proceedings of 40th Annual IEEE Symposium on Foundations of Computer Science, pages 501-511, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.P. Flajolet and G. N. Martin. Probabilistic counting. Proceedings of 24th Annual IEEE Symposium on Foundations of Computer Science, pages 76-82, 1983.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.P. Gibbons, Y. Mattias, and V. Poosala. Fast Incremental Maintenance of Approximate Histograms. Proceedings of VLDB, pages 466-475, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.J. Gray, A. Bosworth, A. Leyman, and H. Pirahesh. Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross Tab and Sub Total. Proceedings of ICDE, pages 152-159, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.M. R. Henzinger, P .Raghavan, and S. Rajagopalan. Computing on data streams. Technical Report 1998-011, Digital Equipment Corporation, Systems Research Center, May, 1998.]]Google ScholarGoogle Scholar
  10. 10.P. Indyk. Stable distributions, pseudorandom generators, embeddings and data stream computation. Proceedings of 41st Annual Symposium on Foundations of Computer Science, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.Y. Ioannidis. Universality of Serial Histograms. Proceedings of VLDB, pages 256-277, 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.Y. Ioannidis and S. Christodoulakis. Optimal Histograms for Limiting Worst-Case Error Propagation in the Size of Join Results. ACM Transactions on Database Systems, Vol. 18, No. 4, pages 709-748, December 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.Y. Ioannidis and V. Poosala. Histogram Based Approximation of Set Valued Query Answers. Proceedings of VLDB, pages 174-185, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14.Y. Ioannidis and V. Poosala. Balancing Histogram Optimality and Practicality for Query Result Size Estimation. Proceedings of ACM SIGMOD, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.H. V Jagadish, N. Koudas, S. Muthukrishnan, V. Poosala, K. C. Sevcik, and T. Suel. Optimal Histograms with Quality Guarantees. Proceedings of VLDB, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.H. V. Jagadish, Nick Koudas, and S. Muthukrishnan. Mining Deviants in a Time Series Database. Proceedings of VLDB, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.N. Koudas, S. Muthukrishnan, and D. Srivastava. Optimal histograms for hierarchical range queries. Proceedings of ACM PODS, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18.G .S .Manku, S. Rajagopalan, and B. Lindsay. Approximate medians and other quantiles in one pass with limited memory. Proceedings of the ACM SIGMOD, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.G .S .Manku, S. Rajagopalan, and B. Lindsay. Random sampling techniques for space efficient online computation of order statistics of large databases. Proceedings of ACM SIGMOD, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20.Y. Matias, J. Scott Vitter, and M. Wang. Wavelet-Based Histograms for Selectivity Estimation. Proceedings of ACM SIGMOD, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21.J. I. Munro and M. S. Paterson. Selection and sorting with limited storage. Theoretical Computer Science, pages 315-323, 1980.]]Google ScholarGoogle Scholar
  22. 22.M. Muralikrishna and D. J. DeWitt. Equi-depth histograms for estimating selectivity factors for multidimension al queries. Proceedings of ACM SIGMOD, 1998.]]Google ScholarGoogle Scholar
  23. 23.S. Muthukrishnan, V. Poosala, and T. Suel. On Rectangular Partitioning In Two Dimensions: Algorithms, Complexity and Applications. Proceedings of ICDT, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. 24.V. Poosala and V. Ganti. Fast Approximate Answers To Aggregate Queries On A Datacube. SSDBM, pages 24-33, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. 25.V. Poosala and Y. Ioannidis. Estimation of Query-Result Distribution and Its Application In Parallel Join Load Balancing. Proceedings of VLDB, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. 26.V. Poosala and Y. Ioannidis. Selectivity Estimation Without the Attribute Value Independence Assumption. Proceedings of VLDB, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. 27.V. Poosala, Y. Ioannidis, P. Haas, and E. Shekita. Improved Histograms for Selectivity Estimation of Range Predicates. Proceedings of ACM SIGMOD, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. 28.J. Vitter and M. Wang. Approximate computation of multidimensional aggregates on sparse data using wavelets. Proceedings of ACM SIGMOD, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. 29.J.S. Vitter, M. Wang, and B. R. Iyer. Data Cube Approximation and Histograms via Wavelets. Proceedings of the 1998 ACM CIKM Intern. Conf. on Information and Knowledge Management, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Data-streams and histograms

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                STOC '01: Proceedings of the thirty-third annual ACM symposium on Theory of computing
                July 2001
                755 pages
                ISBN:1581133499
                DOI:10.1145/380752

                Copyright © 2001 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 6 July 2001

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • Article

                Acceptance Rates

                STOC '01 Paper Acceptance Rate83of230submissions,36%Overall Acceptance Rate1,469of4,586submissions,32%

                Upcoming Conference

                STOC '24
                56th Annual ACM Symposium on Theory of Computing (STOC 2024)
                June 24 - 28, 2024
                Vancouver , BC , Canada

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader