ABSTRACT
Knowing the distribution of the sizes of traffic flows passing through a network link helps a network operator to characterize network resource usage, infer traffic demands, detect traffic anomalies, and accommodate new traffic demands through better traffic engineering. Previous work on estimating the flow size distribution has been focused on making inferences from sampled network traffic. Its accuracy is limited by the (typically) low sampling rate required to make the sampling operation affordable. In this paper we present a novel data streaming algorithm to provide much more accurate estimates of flow distribution, using a "lossy data structure" which consists of an array of counters fitted well into SRAM. For each incoming packet, our algorithm only needs to increment one underlying counter, making the algorithm fast enough even for 40 Gbps (OC-768) links. The data structure is lossy in the sense that sizes of multiple flows may collide into the same counter. Our algorithm uses Bayesian statistical methods such as Expectation Maximization to infer the most likely flow size distribution that results in the observed counter values after collision. Evaluations of this algorithm on large Internet traces obtained from several sources (including a tier-1 ISP) demonstrate that it has very high measurement accuracy (within 2%). Our algorithm not only dramatically improves the accuracy of flow distribution measurement, but also contributes to the field of data streaming by formalizing an existing methodology and applying it to the context of estimating the flow-distribution.
- N. Duffield, C. Lund, and M. Thorup, "Estimating flow distributions from sampled flow statistics," in Proc. ACM SIGCOMM, Aug. 2003. Google ScholarDigital Library
- N. Hohn and D. Veitch, "Inverting sampled traffic," in Proc. ACM SIGCOMM Internet Measurement Conference, Oct. 2003. Google ScholarDigital Library
- N. Duffield, C. Lund, and M. Thorup, "Charging from sampled network usage," in Proc. ACM SIGCOMM Internet Measurement Workshop, Nov. 2001. Google ScholarDigital Library
- N. Duffield, C. Lund, and M. Thorup, "Properties and prediction of flow statistics from sampled packet streams," in Proc. ACM SIGCOMM Internet Measurement Workshop, Nov. 2002. Google ScholarDigital Library
- C. Estan and G. Varghese, "New directions in traffic measurement and accounting," in Proc. ACM SIGCOMM, Aug. 2002. Google ScholarDigital Library
- C. Estan and G. Varghese, "Bitmap algorithms for counting active flows on high speed links," in Proc. ACM SIGCOMM Internet Measurement Conference, Oct. 2003. Google ScholarDigital Library
- A. Medina, N. Taft, K. Salamatian, and S. Bhattacharyyaand C. Diot, "Traffic matrix estimation: Existing techniques and new directions," in Proc. ACM SIGCOMM, Aug. 2002. Google ScholarDigital Library
- S. Vaton and A. Gravey, "Iterative bayesian estimation of network traffic matrices in the case of bursty flows," in Proc. ACM SIGCOMM Internet Measurement Workshop, Nov. 2002. Google ScholarDigital Library
- Y. Zhang, M. Roughan, N. Duffield, and A. Greenberg, "Fast accurate computation of large-scale IP traffic matrices from link loads," in Proc. ACM SIGMETRICS, June 2003. Google ScholarDigital Library
- Y. Zhang, M. Roughan, C. Lund, and D. Donoho, "An information-theoretic approach to traffic matrix estimation," in Proc. ACM SIGCOMM, Aug. 2003. Google ScholarDigital Library
- S. Muthukrishnan, "Data streams: Algorithms and applications," available at http://athos.rutgers.edu/~muthu/.Google Scholar
- S. Ramabhadran and G. Varghese, "Efficient implementation of a statistics counter architecture," in Proc. ACM SIGMETRICS, 2003. Google ScholarDigital Library
- M. Ramakrishna, E. Fu, and E. Bahcekapili, "Efficient hardware hashing functions for high performance computers," IEEE Trans. on Computers, vol. 46, no. 12, pp. 1378--1381, Dec. 1997. Google ScholarDigital Library
- K. Whang, B. Vander-Zanden, and H. Taylor, "A linear-time probabilistic counting algorithm for database applications," ACM Transactions on Database Systems, 1990. Google ScholarDigital Library
- A. Dempster, N. Laird, and D. Rubin, "Maximum likelihood from incomplete data via the em algorithm," Journal of the Royal Statistical Society, Series B, vol. 39, no. 1, pp. 1--38, 1977.Google Scholar
- A. Kumar, J. Xu, J. Wang, O. Spatschek, and L. Li, "Space-Code Bloom Filter for Efficient per-flow Traffic Measurement," in Proc. IEEE Infocom, Mar. 2004.Google ScholarCross Ref
- C. Cranor, T. Johnson, and O. Spatscheck, "Gigascope: a stream database for network applications," in Proc. SIGMOD 2003, Jun 2003. Google ScholarDigital Library
- D. Moor, V. Paxson, S. Savage, C. Shannon, S. Staniford, and N. Weaver, "The spread of the sapphire/slammer worm," in Technical Report, CAIDA, 2003.Google Scholar
- B. Krishnamurthy, S. Sen, Y. Zhang, and Y. Chen, "Sketch-based change detection: methods, evaluation, and applications," in Proc. ACM SIGCOMM Internet Measurement Conference, Oct. 2003. Google ScholarDigital Library
- B. Bloom, "Space/time trade-offs in hash coding with allowable errors," CACM, vol. 13, no. 7, pp. 422--426, 1970. Google ScholarDigital Library
- L. Fan, P. Cao, J. Almeida, and A. Broder, "Summary cache: a scalable wide-area Web cache sharing protocol," IEEE/ACM Transactions on Networking, vol. 8, no. 3, pp. 281--293, 2000. Google ScholarDigital Library
Index Terms
- Data streaming algorithms for efficient and accurate estimation of flow size distribution
Recommendations
Data streaming algorithms for efficient and accurate estimation of flow size distribution
Knowing the distribution of the sizes of traffic flows passing through a network link helps a network operator to characterize network resource usage, infer traffic demands, detect traffic anomalies, and accommodate new traffic demands through better ...
Data streaming algorithms for accurate and efficient measurement of traffic and flow matrices
SIGMETRICS '05: Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systemsThe traffic volume between origin/destination (OD) pairs in a network, known as traffic matrix, is essential for efficient network provisioning and traffic engineering. Existing approaches of estimating the traffic matrix, based on statistical inference ...
Data streaming algorithms for accurate and efficient measurement of traffic and flow matrices
Performance evaluation reviewThe traffic volume between origin/destination (OD) pairs in a network, known as traffic matrix, is essential for efficient network provisioning and traffic engineering. Existing approaches of estimating the traffic matrix, based on statistical inference ...
Comments