article

Internet traffic classification using bayesian analysis techniques

Authors:
Andrew W. Moore

University of Cambridge

University of Cambridge
View Profile

,
Denis Zuev

University of Oxford

University of Oxford
View Profile

Authors Info & Claims

ACM SIGMETRICS Performance Evaluation Review Volume 33 Issue 1June 2005pp 50–60https://doi.org/10.1145/1071690.1064220

Published:06 June 2005Publication History

ACM SIGMETRICS Performance Evaluation Review

Abstract

Accurate traffic classification is of fundamental importance to numerous other network activities, from security monitoring to accounting, and from Quality of Service to providing operators with useful forecasts for long-term provisioning. We apply a Naïve Bayes estimator to categorize traffic by application. Uniquely, our work capitalizes on hand-classified network data, using it as input to a supervised Naïve Bayes estimator. In this paper we illustrate the high level of accuracy achievable with the \Naive Bayes estimator. We further illustrate the improved accuracy of refined variants of this estimator.Our results indicate that with the simplest of Naïve Bayes estimator we are able to achieve about 65% accuracy on per-flow classification and with two powerful refinements we can improve this value to better than 95%; this is a vast improvement over traditional techniques that achieve 50--70%. While our technique uses training data, with categories derived from packet-content, all of our training and testing was done using header-derived discriminators. We emphasize this as a powerful aspect of our approach: using samples of well-known traffic to allow the categorization of traffic using commonly available information alone.

References

D. Moore, K. Keys, R. Koga, E. Lagache, and K. C. Claffy. CoralReef software suite as a tool for system and network administrators. In Proceedings of the LISA 2001 15th Systems Administration Conference, December 2001. Google ScholarDigital Library
C. Logg and L. Cottrell. Characterization of the Traffic between SLAC and the Internet, July 2003. http://www.slac.stanford.edu/comp/net/slac-netflow/html/SLAC-netflow.html.Google Scholar
A. W. Moore and D. Papagiannaki. Toward the Accurate Identification of Network Applications. In Proceedings of the Sixth Passive and Active Measurement Workshop (PAM 2005), March 2005. Google ScholarDigital Library
T. Karagiannis, A. Broido, M. Faloutsos, and k c claffy. Transport layer identification of P2P traffic. In Proceedings of Internet Measurement Conference, Taormina, Sicily, Italy, October 2004. Google ScholarDigital Library
A. W. Moore. Discrete content-based classification --- a data set. Technical report, Intel Research, Cambridge, 2005.Google Scholar
V. Paxson. Empirically derived analytic models of wide-area tcp connections. IEEE/ACM Trans. Netw., 2(4):316--336, 1994. Google ScholarDigital Library
K. C. Claffy. Internet traffic characterization. PhD thesis, University of California, San Diego, 1994. Google ScholarDigital Library
Christian Dewes, Arne Wichmann, and Anja Feldmann. An analysis of internet chat systems. In IMC '03: Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement, pages 51--64, 2003. Google ScholarDigital Library
V. Paxson and S. Floyd. Wide area traffic: the failure of Poisson modeling. IEEE/ACM Trans. Netw., 3(3):226--244, 1995. Google ScholarDigital Library
M. Roughan, S. Sen, O. Spatscheck, and N. Duffield. Class-of-Service Mapping for QoS: A statistical signature-based approach to IP traffic classification. In ACM SIGCOMM Internet Measurement Conference, Taormina, Sicily, Italy, 2004. Google ScholarDigital Library
A. McGregor, M. Hall, P. Lorier, and J. Brunskill. Flow Clustering Using Machine Learning Techniques. In Proceedings of the Fifth Passive and Active Measurement Workshop (PAM 2004), April 2004.Google ScholarCross Ref
A. Soule, K. Salamatian, N. Taft, R. Emilion, and K. Papagiannaki. Flow Classification by Histograms or How to Go on Safari in the Internet. In Proceedings of ACM Sigmetrics, New York, NY, June 2004. Google ScholarDigital Library
F. Hernández-Campos, A. B. Nobel, F. D. Smith, and K.Jeffay. Statistical clustering of internet communication patterns. In Proceedings of the 35th Symposium on the Interface of Computing Science and Statistics, Computing Science and Statistics, volume 35, July 2003.Google Scholar
A. W. Moore, J. Hall, C. Kreibich, E. Harris, and I. Pratt. Architecture of a Network Monitor. In Passive & Active Measurement Workshop 2003 (PAM2003), La Jolla, CA, April 2003.Google Scholar
A. W. Moore and D. Zuev. Discriminators for use in flow-based classification. Technical report, Intel Research, Cambridge, 2005.Google Scholar
N. G. Duffield, J. T. Lewis, N. O'Connell, R. Russell, and F. Toomey. Entropy of ATM traffic streams. IEEE Journal on Selected Areas in Communications, 13(6):981--990, August 1995. Google ScholarDigital Library
J. Padhye and S. Floyd. Identifying the TCP Behavior of Web Servers. In Proceedings of SIGCOMM 2001, San Diego, CA, June 2001.Google Scholar
W-K Wong, A. Moore, G. Cooper, and M. Wagner. Bayesian Network Anomaly Pattern Detection for Disease Outbreaks. In Proceedings of the Twentieth International Conference on Machine Learning, August 2003.Google Scholar
A. McCallum and K. Nigam. A comparison of event models for naive bayes text classification. In In AAAI-98 Workshop on Learning for Text Categorization, 1998.Google Scholar
D. Bazell and D. W. Aha. Ensembles of classifiers for morphological galaxy classification. The Astrophysical Journal, 548:219--223, February 2001.Google ScholarCross Ref
Lei Yu and Huan Liu. Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003), 2003.Google Scholar
I. H. Witten and E. Frank. Data Mining. Morgan Kaufmann Publishers, 2000. Google ScholarDigital Library
M. P. Wand and M.C. Jones. Kernel Smoothing. Chapman & Hall/CRC, 1994.Google ScholarCross Ref
P. Langley G. H. John. Estimating continuous distributions in bayesian classifiers. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, 1995. Google ScholarDigital Library
J. R. Quinlan. C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., 1993. Google ScholarDigital Library
Shawn Ostermann. tcptrace, 2003. http://www.tcptrace.org.Google Scholar

Index Terms

Internet traffic classification using bayesian analysis techniques

Recommendations

Internet traffic classification using bayesian analysis techniques
SIGMETRICS '05: Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems

Accurate traffic classification is of fundamental importance to numerous other network activities, from security monitoring to accounting, and from Quality of Service to providing operators with useful forecasts for long-term provisioning. We apply a ...
Read More
Lightweight application classification for network management
INM '07: Proceedings of the 2007 SIGCOMM workshop on Internet network management

Traffic application classification is an essential step in the network management process to provide high availability of network services. However, network management has seen limited use of traffic classification because of the significant overheads ...
Read More
Bayesian Neural Networks for Internet Traffic Classification

Internet traffic identification is an important tool for network management. It allows operators to better predict future traffic matrices and demands, security personnel to detect anomalous behavior, and researchers to develop more realistic traffic ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGMETRICS Performance Evaluation Review Volume 33, Issue 1
Performance evaluation review
June 2005
417 pages
ISSN:0163-5999
DOI:10.1145/1071690
Issue’s Table of Contents
SIGMETRICS '05: Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
June 2005
428 pages
ISBN:1595930221
DOI:10.1145/1064212
General Chairs:
Derek Eager
University of Saskatchewan, Canada
,
Carey Williamson
University of Calgary, Canada
,
Program Chairs:
Sem Borst
Bell Labs, USA & CWI, The Netherlands
,
John C. S. Lui
Chinese University of Hong Kong, China
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 June 2005
Check for updates
Author Tags
flow classification
internet traffic
traffic identification
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 995
  Total Citations
  View Citations
- 5,646
  Total Downloads
- Downloads (Last 12 months)201
- Downloads (Last 6 weeks)32
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Internet traffic classification using bayesian analysis techniques

ACM SIGMETRICS Performance Evaluation Review

Abstract

References

Cited By

Index Terms

Recommendations

Internet traffic classification using bayesian analysis techniques

Lightweight application classification for network management

Bayesian Neural Networks for Internet Traffic Classification