Abstract
In order to control and manage highly aggregated Internet traffic flows efficiently, we need to be able to categorize flows into distinct classes and to be knowledgeable about the different behavior of flows belonging to these classes. In this paper we consider the problem of classifying BGP level prefix flows into a small set of homogeneous classes. We argue that using the entire distributional properties of flows can have significant benefits in terms of quality in the derived classification. We propose a method based on modeling flow histograms using Dirichlet Mixture Processes for random distributions. We present an inference procedure based on the Simulated Annealing Expectation Maximization algorithm that estimates all the model parameters as well as flow membership probabilities - the probability that a flow belongs to any given class. One of our key contributions is a new method for Internet flow classification. We show that our method is powerful in that it is capable of examining macroscopic flows while simultaneously making fine distinctions between different traffic classes. We demonstrate that our scheme can address issues with flows being close to class boundaries and the inherent dynamic behaviour of Internet flows.
- Jeff Bilmes. A gentle tutorial on the EM algorithm including gaussian mixtures and baum-welch. Technical Report TR-97-021, International Computer Science Institute, Berkeley, CA, 1997.Google Scholar
- N. Brownlee and KC. Claffy. Understandin internet traffic streams : Dragon ies and tortoise. IEEE communication magazine, pages 110--117, october 2002. Google ScholarDigital Library
- Gilles Celeux, Didier Chauveau, and Jean Diebolt. On stochastic versions of the EM algorithm. Technical Report RR-2514, INRIA, 1995.Google Scholar
- A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. 39:1--38, 1977.Google Scholar
- Michael D. Escobar. Estimating normal means with a dirichlet process prior. Journal of the American Statistical Association, 89(425):268--277, march 1994.Google ScholarCross Ref
- Michael D. Escobar and Mike West. Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90(430):577--588, 1995.Google ScholarCross Ref
- Cristian Estan and George Varghese. New directions in traffic measurement and accounting. In Proceedings of the First ACM SIGCOMM Workshop on Internet Measurement Workshop, pages 75--80. ACM Press, 2001. Google ScholarDigital Library
- R.Emilion Mixtures of orthogonal random distributions and clustering. In Compte Rendu Academie des Sciences de Paris I, 2002(335):189--193.Google ScholarCross Ref
- A. Lakhina, K. papagiannaki, M. Crovella, C. Diot, E. Kolaczyk, and N. Taft. Structural analysis of network traffic ows. In ACM Sigmetrics, New York, June 2004. Google ScholarDigital Library
- P. Muller, A. Erkanli, and M. West. Bayesian curve-fitting using multivariate normal mixtures. Biometrika, 83(1):67--79, 1996.Google ScholarCross Ref
- A. Oveissian, K. Salamatian, and A. Soule. Flow classification on short time scale. Technical Report *, LIP6, 2003.Google Scholar
- K. Papagiannaki, N. Taft, and C. Diot. Impact of flow dynamics on traffic engineering design principles. In IEEE Infocom, Hong Kong, March 2004.Google Scholar
- S. Sarvotham J. Rexford and K.Shin. Load-sensitive routing of ling-lived ip ows. In Proc. ACM SIGCOMM, september 1999.Google Scholar
- Christian P. Robert. The Bayesian Choice. Springer, 2001.Google Scholar
- Kavé Salamatian and Sandrine Vaton. Hidden markov modeling for network communication channels. In Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pages 92--101. ACM Press, 2001. Google ScholarDigital Library
- Shriram Sarvotham, Rudolf Riedi, and Richard Baraniuk. Connection-level analysis and modeling of network traffic. ACM SIGCOMM Internet Measurement Workshop, August 2002. Google ScholarDigital Library
- Iljitsch van Beijnum. BGP Building Reliable Networks with the Border Gateway Protocol. O'Reilly, 2002. Google ScholarDigital Library
Index Terms
- Flow classification by histograms: or how to go on safari in the internet
Recommendations
Flow classification by histograms: or how to go on safari in the internet
SIGMETRICS '04/Performance '04: Proceedings of the joint international conference on Measurement and modeling of computer systemsIn order to control and manage highly aggregated Internet traffic flows efficiently, we need to be able to categorize flows into distinct classes and to be knowledgeable about the different behavior of flows belonging to these classes. In this paper we ...
Session Level Flow Classification by Packet Size Distribution and Session Grouping
WAINA '12: Proceedings of the 2012 26th International Conference on Advanced Information Networking and Applications WorkshopsClassifying traffic into specific network applications is essential for application-aware network management and it becomes more challenging because modern applications obscure their network behaviors. While port number-based classifiers work only for ...
Session level flow classification by packet size distribution and session grouping
Classifying traffic into specific network applications is essential for application-aware network management and it becomes more challenging because modern applications complicate their network behaviors. While port number-based classifiers work only ...
Comments