skip to main content
article

Internet traffic classification using bayesian analysis techniques

Published:06 June 2005Publication History
Skip Abstract Section

Abstract

Accurate traffic classification is of fundamental importance to numerous other network activities, from security monitoring to accounting, and from Quality of Service to providing operators with useful forecasts for long-term provisioning. We apply a Naïve Bayes estimator to categorize traffic by application. Uniquely, our work capitalizes on hand-classified network data, using it as input to a supervised Naïve Bayes estimator. In this paper we illustrate the high level of accuracy achievable with the \Naive Bayes estimator. We further illustrate the improved accuracy of refined variants of this estimator.Our results indicate that with the simplest of Naïve Bayes estimator we are able to achieve about 65% accuracy on per-flow classification and with two powerful refinements we can improve this value to better than 95%; this is a vast improvement over traditional techniques that achieve 50--70%. While our technique uses training data, with categories derived from packet-content, all of our training and testing was done using header-derived discriminators. We emphasize this as a powerful aspect of our approach: using samples of well-known traffic to allow the categorization of traffic using commonly available information alone.

References

  1. D. Moore, K. Keys, R. Koga, E. Lagache, and K. C. Claffy. CoralReef software suite as a tool for system and network administrators. In Proceedings of the LISA 2001 15th Systems Administration Conference, December 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Logg and L. Cottrell. Characterization of the Traffic between SLAC and the Internet, July 2003. http://www.slac.stanford.edu/comp/net/slac-netflow/html/SLAC-netflow.html.Google ScholarGoogle Scholar
  3. A. W. Moore and D. Papagiannaki. Toward the Accurate Identification of Network Applications. In Proceedings of the Sixth Passive and Active Measurement Workshop (PAM 2005), March 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Karagiannis, A. Broido, M. Faloutsos, and k c claffy. Transport layer identification of P2P traffic. In Proceedings of Internet Measurement Conference, Taormina, Sicily, Italy, October 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. W. Moore. Discrete content-based classification --- a data set. Technical report, Intel Research, Cambridge, 2005.Google ScholarGoogle Scholar
  6. V. Paxson. Empirically derived analytic models of wide-area tcp connections. IEEE/ACM Trans. Netw., 2(4):316--336, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. C. Claffy. Internet traffic characterization. PhD thesis, University of California, San Diego, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Christian Dewes, Arne Wichmann, and Anja Feldmann. An analysis of internet chat systems. In IMC '03: Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement, pages 51--64, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. V. Paxson and S. Floyd. Wide area traffic: the failure of Poisson modeling. IEEE/ACM Trans. Netw., 3(3):226--244, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Roughan, S. Sen, O. Spatscheck, and N. Duffield. Class-of-Service Mapping for QoS: A statistical signature-based approach to IP traffic classification. In ACM SIGCOMM Internet Measurement Conference, Taormina, Sicily, Italy, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. McGregor, M. Hall, P. Lorier, and J. Brunskill. Flow Clustering Using Machine Learning Techniques. In Proceedings of the Fifth Passive and Active Measurement Workshop (PAM 2004), April 2004.Google ScholarGoogle ScholarCross RefCross Ref
  12. A. Soule, K. Salamatian, N. Taft, R. Emilion, and K. Papagiannaki. Flow Classification by Histograms or How to Go on Safari in the Internet. In Proceedings of ACM Sigmetrics, New York, NY, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. F. Hernández-Campos, A. B. Nobel, F. D. Smith, and K.Jeffay. Statistical clustering of internet communication patterns. In Proceedings of the 35th Symposium on the Interface of Computing Science and Statistics, Computing Science and Statistics, volume 35, July 2003.Google ScholarGoogle Scholar
  14. A. W. Moore, J. Hall, C. Kreibich, E. Harris, and I. Pratt. Architecture of a Network Monitor. In Passive & Active Measurement Workshop 2003 (PAM2003), La Jolla, CA, April 2003.Google ScholarGoogle Scholar
  15. A. W. Moore and D. Zuev. Discriminators for use in flow-based classification. Technical report, Intel Research, Cambridge, 2005.Google ScholarGoogle Scholar
  16. N. G. Duffield, J. T. Lewis, N. O'Connell, R. Russell, and F. Toomey. Entropy of ATM traffic streams. IEEE Journal on Selected Areas in Communications, 13(6):981--990, August 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Padhye and S. Floyd. Identifying the TCP Behavior of Web Servers. In Proceedings of SIGCOMM 2001, San Diego, CA, June 2001.Google ScholarGoogle Scholar
  18. W-K Wong, A. Moore, G. Cooper, and M. Wagner. Bayesian Network Anomaly Pattern Detection for Disease Outbreaks. In Proceedings of the Twentieth International Conference on Machine Learning, August 2003.Google ScholarGoogle Scholar
  19. A. McCallum and K. Nigam. A comparison of event models for naive bayes text classification. In In AAAI-98 Workshop on Learning for Text Categorization, 1998.Google ScholarGoogle Scholar
  20. D. Bazell and D. W. Aha. Ensembles of classifiers for morphological galaxy classification. The Astrophysical Journal, 548:219--223, February 2001.Google ScholarGoogle ScholarCross RefCross Ref
  21. Lei Yu and Huan Liu. Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003), 2003.Google ScholarGoogle Scholar
  22. I. H. Witten and E. Frank. Data Mining. Morgan Kaufmann Publishers, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. P. Wand and M.C. Jones. Kernel Smoothing. Chapman & Hall/CRC, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  24. P. Langley G. H. John. Estimating continuous distributions in bayesian classifiers. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. R. Quinlan. C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Shawn Ostermann. tcptrace, 2003. http://www.tcptrace.org.Google ScholarGoogle Scholar

Index Terms

  1. Internet traffic classification using bayesian analysis techniques

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGMETRICS Performance Evaluation Review
          ACM SIGMETRICS Performance Evaluation Review  Volume 33, Issue 1
          Performance evaluation review
          June 2005
          417 pages
          ISSN:0163-5999
          DOI:10.1145/1071690
          Issue’s Table of Contents
          • cover image ACM Conferences
            SIGMETRICS '05: Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
            June 2005
            428 pages
            ISBN:1595930221
            DOI:10.1145/1064212

          Copyright © 2005 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 6 June 2005

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader