skip to main content
10.1145/2213556.2213594acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
research-article

Space-efficient estimation of statistics over sub-sampled streams

Published:21 May 2012Publication History

ABSTRACT

In many stream monitoring situations, the data arrival rate is so high that it is not even possible to observe each element of the stream. The most common solution is to sample a small fraction of the data stream and use the sample to infer properties and estimate aggregates of the original stream. However, the quantities that need to be computed on the sampled stream are often different from the original quantities of interest and their estimation requires new algorithms. We present upper and lower bounds (often matching) for estimating frequency moments, support size, entropy, and heavy hitters of the original stream from the data observed in the sampled stream.

References

  1. N. Alon, Y. Matias, and M. Szegedy. The Space Complexity of Approximating the Frequency Moments. J. Comput. Syst. Sci., 58(1):137--147, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Z. Bar-Yossef. The complexity of massive dataset computations. PhD thesis, UC Berkeley, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Z. Bar-Yossef. Sampling lower bounds via information theory. In STOC, pages 335--344, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Bhattacharyya, A. Madeira, S. Muthukrishnan, and T. Ye. How to scalably and accurately skip past streams. In ICDE Workshops, pages 654--663, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Charikar, S. Chaudhuri, R. Motwani, and V. R. Narasayya. Towards estimation error guarantees for distinct values. In PODS, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cisco Systems. Random Sampled NetFlow. http://www.cisco.com/en/US/docs/ios/12_0s/feature/guide/nfstatsa.html.Google ScholarGoogle Scholar
  7. G. Cormode and M. Garofalakis. Sketching probabilistic data streams. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data, SIGMOD '07, pages 281--292, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. J. Algorithms, 55(1):58--75, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. G. Duffield, C. Lund, and M. Thorup. Properties and prediction of flow statistics from sampled packet streams. In Internet Measurement Workshop, pages 159--171, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. G. Duffield, C. Lund, and M. Thorup. Estimating flow distributions from sampled flow statistics. In SIGCOMM, pages 325--336, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Guha and Z. Huang. Revisiting the direct sum theorem and space lower bounds in random order streams. In ICALP (1), pages 513--524, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. N. J. A. Harvey, J. Nelson, and K. Onak. Sketching and streaming entropy via approximation theory. In FOCS, pages 489--498, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Indyk and D. P. Woodruff. Optimal approximations of the frequency moments of data streams. In Proceedings of the 37th Annual ACM Symposium on Theory of Computing (STOC), pages 202--208, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. S. Jayram, A. McGregor, S. Muthukrishnan, and E. Vee. Estimating statistical aggregates on probabilistic data streams. ACM Trans. Database Syst., 33:26:1--26:30, December 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. M. Kane, J. Nelson, and D. P. Woodruff. On the exact space complexity of sketching and streaming small norms. In SODA, pages 1161--1178, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. McGregor, editor. Open Problems in Data Streams and Related Topics, 2007. http://www.cse.iitk.ac.in/users/sganguly/data-stream-probs.pdf.Google ScholarGoogle Scholar
  17. J. Misra and D. Gries. Finding repeated elements. Science of Computer Programming, 2(2):143--152, 1982.Google ScholarGoogle ScholarCross RefCross Ref
  18. F. Rusu and A. Dobra. Sketching sampled data streams. In ICDE, pages 381--392, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Space-efficient estimation of statistics over sub-sampled streams

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PODS '12: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems
      May 2012
      332 pages
      ISBN:9781450312486
      DOI:10.1145/2213556

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 May 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      PODS '12 Paper Acceptance Rate26of101submissions,26%Overall Acceptance Rate642of2,707submissions,24%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader