Abstract
Accurate traffic classification is of fundamental importance to numerous other network activities, from security monitoring to accounting, and from Quality of Service to providing operators with useful forecasts for long-term provisioning. We apply a Naïve Bayes estimator to categorize traffic by application. Uniquely, our work capitalizes on hand-classified network data, using it as input to a supervised Naïve Bayes estimator. In this paper we illustrate the high level of accuracy achievable with the \Naive Bayes estimator. We further illustrate the improved accuracy of refined variants of this estimator.Our results indicate that with the simplest of Naïve Bayes estimator we are able to achieve about 65% accuracy on per-flow classification and with two powerful refinements we can improve this value to better than 95%; this is a vast improvement over traditional techniques that achieve 50--70%. While our technique uses training data, with categories derived from packet-content, all of our training and testing was done using header-derived discriminators. We emphasize this as a powerful aspect of our approach: using samples of well-known traffic to allow the categorization of traffic using commonly available information alone.
- D. Moore, K. Keys, R. Koga, E. Lagache, and K. C. Claffy. CoralReef software suite as a tool for system and network administrators. In Proceedings of the LISA 2001 15th Systems Administration Conference, December 2001. Google ScholarDigital Library
- C. Logg and L. Cottrell. Characterization of the Traffic between SLAC and the Internet, July 2003. http://www.slac.stanford.edu/comp/net/slac-netflow/html/SLAC-netflow.html.Google Scholar
- A. W. Moore and D. Papagiannaki. Toward the Accurate Identification of Network Applications. In Proceedings of the Sixth Passive and Active Measurement Workshop (PAM 2005), March 2005. Google ScholarDigital Library
- T. Karagiannis, A. Broido, M. Faloutsos, and k c claffy. Transport layer identification of P2P traffic. In Proceedings of Internet Measurement Conference, Taormina, Sicily, Italy, October 2004. Google ScholarDigital Library
- A. W. Moore. Discrete content-based classification --- a data set. Technical report, Intel Research, Cambridge, 2005.Google Scholar
- V. Paxson. Empirically derived analytic models of wide-area tcp connections. IEEE/ACM Trans. Netw., 2(4):316--336, 1994. Google ScholarDigital Library
- K. C. Claffy. Internet traffic characterization. PhD thesis, University of California, San Diego, 1994. Google ScholarDigital Library
- Christian Dewes, Arne Wichmann, and Anja Feldmann. An analysis of internet chat systems. In IMC '03: Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement, pages 51--64, 2003. Google ScholarDigital Library
- V. Paxson and S. Floyd. Wide area traffic: the failure of Poisson modeling. IEEE/ACM Trans. Netw., 3(3):226--244, 1995. Google ScholarDigital Library
- M. Roughan, S. Sen, O. Spatscheck, and N. Duffield. Class-of-Service Mapping for QoS: A statistical signature-based approach to IP traffic classification. In ACM SIGCOMM Internet Measurement Conference, Taormina, Sicily, Italy, 2004. Google ScholarDigital Library
- A. McGregor, M. Hall, P. Lorier, and J. Brunskill. Flow Clustering Using Machine Learning Techniques. In Proceedings of the Fifth Passive and Active Measurement Workshop (PAM 2004), April 2004.Google ScholarCross Ref
- A. Soule, K. Salamatian, N. Taft, R. Emilion, and K. Papagiannaki. Flow Classification by Histograms or How to Go on Safari in the Internet. In Proceedings of ACM Sigmetrics, New York, NY, June 2004. Google ScholarDigital Library
- F. Hernández-Campos, A. B. Nobel, F. D. Smith, and K.Jeffay. Statistical clustering of internet communication patterns. In Proceedings of the 35th Symposium on the Interface of Computing Science and Statistics, Computing Science and Statistics, volume 35, July 2003.Google Scholar
- A. W. Moore, J. Hall, C. Kreibich, E. Harris, and I. Pratt. Architecture of a Network Monitor. In Passive & Active Measurement Workshop 2003 (PAM2003), La Jolla, CA, April 2003.Google Scholar
- A. W. Moore and D. Zuev. Discriminators for use in flow-based classification. Technical report, Intel Research, Cambridge, 2005.Google Scholar
- N. G. Duffield, J. T. Lewis, N. O'Connell, R. Russell, and F. Toomey. Entropy of ATM traffic streams. IEEE Journal on Selected Areas in Communications, 13(6):981--990, August 1995. Google ScholarDigital Library
- J. Padhye and S. Floyd. Identifying the TCP Behavior of Web Servers. In Proceedings of SIGCOMM 2001, San Diego, CA, June 2001.Google Scholar
- W-K Wong, A. Moore, G. Cooper, and M. Wagner. Bayesian Network Anomaly Pattern Detection for Disease Outbreaks. In Proceedings of the Twentieth International Conference on Machine Learning, August 2003.Google Scholar
- A. McCallum and K. Nigam. A comparison of event models for naive bayes text classification. In In AAAI-98 Workshop on Learning for Text Categorization, 1998.Google Scholar
- D. Bazell and D. W. Aha. Ensembles of classifiers for morphological galaxy classification. The Astrophysical Journal, 548:219--223, February 2001.Google ScholarCross Ref
- Lei Yu and Huan Liu. Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003), 2003.Google Scholar
- I. H. Witten and E. Frank. Data Mining. Morgan Kaufmann Publishers, 2000. Google ScholarDigital Library
- M. P. Wand and M.C. Jones. Kernel Smoothing. Chapman & Hall/CRC, 1994.Google ScholarCross Ref
- P. Langley G. H. John. Estimating continuous distributions in bayesian classifiers. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, 1995. Google ScholarDigital Library
- J. R. Quinlan. C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., 1993. Google ScholarDigital Library
- Shawn Ostermann. tcptrace, 2003. http://www.tcptrace.org.Google Scholar
Index Terms
- Internet traffic classification using bayesian analysis techniques
Recommendations
Internet traffic classification using bayesian analysis techniques
SIGMETRICS '05: Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systemsAccurate traffic classification is of fundamental importance to numerous other network activities, from security monitoring to accounting, and from Quality of Service to providing operators with useful forecasts for long-term provisioning. We apply a ...
Lightweight application classification for network management
INM '07: Proceedings of the 2007 SIGCOMM workshop on Internet network managementTraffic application classification is an essential step in the network management process to provide high availability of network services. However, network management has seen limited use of traffic classification because of the significant overheads ...
Bayesian Neural Networks for Internet Traffic Classification
Internet traffic identification is an important tool for network management. It allows operators to better predict future traffic matrices and demands, security personnel to detect anomalous behavior, and researchers to develop more realistic traffic ...
Comments