skip to main content
10.1145/1005686.1005709acmconferencesArticle/Chapter ViewAbstractPublication PagesmetricsConference Proceedingsconference-collections
Article

Data streaming algorithms for efficient and accurate estimation of flow size distribution

Published:01 June 2004Publication History

ABSTRACT

Knowing the distribution of the sizes of traffic flows passing through a network link helps a network operator to characterize network resource usage, infer traffic demands, detect traffic anomalies, and accommodate new traffic demands through better traffic engineering. Previous work on estimating the flow size distribution has been focused on making inferences from sampled network traffic. Its accuracy is limited by the (typically) low sampling rate required to make the sampling operation affordable. In this paper we present a novel data streaming algorithm to provide much more accurate estimates of flow distribution, using a "lossy data structure" which consists of an array of counters fitted well into SRAM. For each incoming packet, our algorithm only needs to increment one underlying counter, making the algorithm fast enough even for 40 Gbps (OC-768) links. The data structure is lossy in the sense that sizes of multiple flows may collide into the same counter. Our algorithm uses Bayesian statistical methods such as Expectation Maximization to infer the most likely flow size distribution that results in the observed counter values after collision. Evaluations of this algorithm on large Internet traces obtained from several sources (including a tier-1 ISP) demonstrate that it has very high measurement accuracy (within 2%). Our algorithm not only dramatically improves the accuracy of flow distribution measurement, but also contributes to the field of data streaming by formalizing an existing methodology and applying it to the context of estimating the flow-distribution.

References

  1. N. Duffield, C. Lund, and M. Thorup, "Estimating flow distributions from sampled flow statistics," in Proc. ACM SIGCOMM, Aug. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Hohn and D. Veitch, "Inverting sampled traffic," in Proc. ACM SIGCOMM Internet Measurement Conference, Oct. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. N. Duffield, C. Lund, and M. Thorup, "Charging from sampled network usage," in Proc. ACM SIGCOMM Internet Measurement Workshop, Nov. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. N. Duffield, C. Lund, and M. Thorup, "Properties and prediction of flow statistics from sampled packet streams," in Proc. ACM SIGCOMM Internet Measurement Workshop, Nov. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Estan and G. Varghese, "New directions in traffic measurement and accounting," in Proc. ACM SIGCOMM, Aug. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Estan and G. Varghese, "Bitmap algorithms for counting active flows on high speed links," in Proc. ACM SIGCOMM Internet Measurement Conference, Oct. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Medina, N. Taft, K. Salamatian, and S. Bhattacharyyaand C. Diot, "Traffic matrix estimation: Existing techniques and new directions," in Proc. ACM SIGCOMM, Aug. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Vaton and A. Gravey, "Iterative bayesian estimation of network traffic matrices in the case of bursty flows," in Proc. ACM SIGCOMM Internet Measurement Workshop, Nov. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. Zhang, M. Roughan, N. Duffield, and A. Greenberg, "Fast accurate computation of large-scale IP traffic matrices from link loads," in Proc. ACM SIGMETRICS, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Zhang, M. Roughan, C. Lund, and D. Donoho, "An information-theoretic approach to traffic matrix estimation," in Proc. ACM SIGCOMM, Aug. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Muthukrishnan, "Data streams: Algorithms and applications," available at http://athos.rutgers.edu/~muthu/.Google ScholarGoogle Scholar
  12. S. Ramabhadran and G. Varghese, "Efficient implementation of a statistics counter architecture," in Proc. ACM SIGMETRICS, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Ramakrishna, E. Fu, and E. Bahcekapili, "Efficient hardware hashing functions for high performance computers," IEEE Trans. on Computers, vol. 46, no. 12, pp. 1378--1381, Dec. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. K. Whang, B. Vander-Zanden, and H. Taylor, "A linear-time probabilistic counting algorithm for database applications," ACM Transactions on Database Systems, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Dempster, N. Laird, and D. Rubin, "Maximum likelihood from incomplete data via the em algorithm," Journal of the Royal Statistical Society, Series B, vol. 39, no. 1, pp. 1--38, 1977.Google ScholarGoogle Scholar
  16. A. Kumar, J. Xu, J. Wang, O. Spatschek, and L. Li, "Space-Code Bloom Filter for Efficient per-flow Traffic Measurement," in Proc. IEEE Infocom, Mar. 2004.Google ScholarGoogle ScholarCross RefCross Ref
  17. C. Cranor, T. Johnson, and O. Spatscheck, "Gigascope: a stream database for network applications," in Proc. SIGMOD 2003, Jun 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Moor, V. Paxson, S. Savage, C. Shannon, S. Staniford, and N. Weaver, "The spread of the sapphire/slammer worm," in Technical Report, CAIDA, 2003.Google ScholarGoogle Scholar
  19. B. Krishnamurthy, S. Sen, Y. Zhang, and Y. Chen, "Sketch-based change detection: methods, evaluation, and applications," in Proc. ACM SIGCOMM Internet Measurement Conference, Oct. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. Bloom, "Space/time trade-offs in hash coding with allowable errors," CACM, vol. 13, no. 7, pp. 422--426, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Fan, P. Cao, J. Almeida, and A. Broder, "Summary cache: a scalable wide-area Web cache sharing protocol," IEEE/ACM Transactions on Networking, vol. 8, no. 3, pp. 281--293, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Data streaming algorithms for efficient and accurate estimation of flow size distribution

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                SIGMETRICS '04/Performance '04: Proceedings of the joint international conference on Measurement and modeling of computer systems
                June 2004
                450 pages
                ISBN:1581138733
                DOI:10.1145/1005686
                • cover image ACM SIGMETRICS Performance Evaluation Review
                  ACM SIGMETRICS Performance Evaluation Review  Volume 32, Issue 1
                  June 2004
                  432 pages
                  ISSN:0163-5999
                  DOI:10.1145/1012888
                  Issue’s Table of Contents

                Copyright © 2004 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 1 June 2004

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • Article

                Acceptance Rates

                Overall Acceptance Rate459of2,691submissions,17%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader