skip to main content
article

Flow classification by histograms: or how to go on safari in the internet

Published:01 June 2004Publication History
Skip Abstract Section

Abstract

In order to control and manage highly aggregated Internet traffic flows efficiently, we need to be able to categorize flows into distinct classes and to be knowledgeable about the different behavior of flows belonging to these classes. In this paper we consider the problem of classifying BGP level prefix flows into a small set of homogeneous classes. We argue that using the entire distributional properties of flows can have significant benefits in terms of quality in the derived classification. We propose a method based on modeling flow histograms using Dirichlet Mixture Processes for random distributions. We present an inference procedure based on the Simulated Annealing Expectation Maximization algorithm that estimates all the model parameters as well as flow membership probabilities - the probability that a flow belongs to any given class. One of our key contributions is a new method for Internet flow classification. We show that our method is powerful in that it is capable of examining macroscopic flows while simultaneously making fine distinctions between different traffic classes. We demonstrate that our scheme can address issues with flows being close to class boundaries and the inherent dynamic behaviour of Internet flows.

References

  1. Jeff Bilmes. A gentle tutorial on the EM algorithm including gaussian mixtures and baum-welch. Technical Report TR-97-021, International Computer Science Institute, Berkeley, CA, 1997.Google ScholarGoogle Scholar
  2. N. Brownlee and KC. Claffy. Understandin internet traffic streams : Dragon ies and tortoise. IEEE communication magazine, pages 110--117, october 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Gilles Celeux, Didier Chauveau, and Jean Diebolt. On stochastic versions of the EM algorithm. Technical Report RR-2514, INRIA, 1995.Google ScholarGoogle Scholar
  4. A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. 39:1--38, 1977.Google ScholarGoogle Scholar
  5. Michael D. Escobar. Estimating normal means with a dirichlet process prior. Journal of the American Statistical Association, 89(425):268--277, march 1994.Google ScholarGoogle ScholarCross RefCross Ref
  6. Michael D. Escobar and Mike West. Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90(430):577--588, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  7. Cristian Estan and George Varghese. New directions in traffic measurement and accounting. In Proceedings of the First ACM SIGCOMM Workshop on Internet Measurement Workshop, pages 75--80. ACM Press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R.Emilion Mixtures of orthogonal random distributions and clustering. In Compte Rendu Academie des Sciences de Paris I, 2002(335):189--193.Google ScholarGoogle ScholarCross RefCross Ref
  9. A. Lakhina, K. papagiannaki, M. Crovella, C. Diot, E. Kolaczyk, and N. Taft. Structural analysis of network traffic ows. In ACM Sigmetrics, New York, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Muller, A. Erkanli, and M. West. Bayesian curve-fitting using multivariate normal mixtures. Biometrika, 83(1):67--79, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  11. A. Oveissian, K. Salamatian, and A. Soule. Flow classification on short time scale. Technical Report *, LIP6, 2003.Google ScholarGoogle Scholar
  12. K. Papagiannaki, N. Taft, and C. Diot. Impact of flow dynamics on traffic engineering design principles. In IEEE Infocom, Hong Kong, March 2004.Google ScholarGoogle Scholar
  13. S. Sarvotham J. Rexford and K.Shin. Load-sensitive routing of ling-lived ip ows. In Proc. ACM SIGCOMM, september 1999.Google ScholarGoogle Scholar
  14. Christian P. Robert. The Bayesian Choice. Springer, 2001.Google ScholarGoogle Scholar
  15. Kavé Salamatian and Sandrine Vaton. Hidden markov modeling for network communication channels. In Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pages 92--101. ACM Press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Shriram Sarvotham, Rudolf Riedi, and Richard Baraniuk. Connection-level analysis and modeling of network traffic. ACM SIGCOMM Internet Measurement Workshop, August 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Iljitsch van Beijnum. BGP Building Reliable Networks with the Border Gateway Protocol. O'Reilly, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Flow classification by histograms: or how to go on safari in the internet

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGMETRICS Performance Evaluation Review
          ACM SIGMETRICS Performance Evaluation Review  Volume 32, Issue 1
          June 2004
          432 pages
          ISSN:0163-5999
          DOI:10.1145/1012888
          Issue’s Table of Contents
          • cover image ACM Conferences
            SIGMETRICS '04/Performance '04: Proceedings of the joint international conference on Measurement and modeling of computer systems
            June 2004
            450 pages
            ISBN:1581138733
            DOI:10.1145/1005686

          Copyright © 2004 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 June 2004

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader