ABSTRACT
An accurate mapping of traffic to applications is important for a broad range of network management and measurement tasks. Internet applications have traditionally been identified using well-known default server network-port numbers in the TCP or UDP headers. However this approach has become increasingly inaccurate. An alternate, more accurate technique is to use specific application-level features in the protocol exchange to guide the identification. Unfortunately deriving the signatures manually is very time consuming and difficult.In this paper, we explore automatically extracting application signatures from IP traffic payload content. In particular we apply three statistical machine learning algorithms to automatically identify signatures for a range of applications. The results indicate that this approach is highly accurate and scales to allow online application identification on high speed links. We also discovered that content signatures still work in the presence of encryption. In these cases we were able to derive content signature for unencrypted handshakes negotiating the encryption parameters of a particular connection.
- I. Androutsopoulos, J. Koutsias, K. Chandrinos, G. Paliouras, and C. Spyropoulos. An evaluation of naive bayesian anti-spam filtering. In Proceedings of the Workshop on Machine Learning in New Information Age, Barcelona, Spain, 2000.Google Scholar
- A. L. Berger, S. A. Della Pietra, and V. J. Della Pietra. A Maximum Entropy Approach to Natural Language Processing. Computational Linguistics, 22(1):39--71, 1996. Google ScholarDigital Library
- M. Collins, R. E. Schapire, and Y. Singer. Logistic Regression, AdaBoost and Bregman Distances. In Proceedings of COLT'00, pages 158--169, Stanford, CA, 2000. Google ScholarDigital Library
- C. Dewes, A. Wichmann, and A. Feldmann. An analysis of internet chat systems. In Proceedings of ACM SIGCOMM Internet Measurement Conference, October 2003. Google ScholarDigital Library
- M. Dudik, S. Phillips, and R. E. Schapire. Performance Guarantees for Regularized Maximum Entropy Density Estimation. In Proceedings of COLT'04, Banff, Canada, 2004. Springer Verlag.Google ScholarCross Ref
- P. Haffner. Scaling Large Margin Classifiers for Spoken Language Understanding. In Accepted for Publication in Speech Communication, 2005.Google Scholar
- A. Moore and K. Papagiannaki. Toward the accurate identification of network applications. In Passive & Active Measurement Workshop, Boston, USA, March 2005. Google ScholarDigital Library
- I. Rish. An empirical study of the naive bayes classifier. In Proceedings of IJCAI-01 workshop on Empirical Methods in AI", pages 41--46, Sicily, Italy, 2001.Google Scholar
- M. Roughan, S. Sen, O. Spatscheck, and N. Duffield. Class-of-service mapping for qos: A statistical signature-based approach to tp traffic classification. In Proceedings of ACM SIGCOMM Internet Measurement Conderence (IMC'04), Sicily, Italy, October 2004. Google ScholarDigital Library
- R. E. Schapire. The boosting approach to machine learning: An overview. In MSRI Workshop on Nonlinear Estimation and Classification, 2002.Google Scholar
- S. Sen, O. Spatscheck, and D. Wang. Accurate, scalable in-network identification of p2p traffic using application signatures. In Proceedings of World Wide Web Conference, NY, USA, May 2004. Google ScholarDigital Library
- S. Souafi-Bensafi, M. Parizeau, F. Lebourgeois, and H. Emptoz. Bayesian networks classifiers applied to documents. In Proceedings of ICPR, Québec, Canada, 2002. Google ScholarDigital Library
- S. Zander, T. Nguyen, and G. Armitage. Self-learning ip traffic classification based on statistical flow characteristics. In Passive & Active Measurement Workshop, Boston, USA, March 2005. Google ScholarDigital Library
- D. Zuev and A. Moore. Traffic classification using a statistical approach. In Passive & Active Measurement Workshop, Boston, USA, March 2005. Google ScholarDigital Library
Index Terms
- ACAS: automated construction of application signatures
Recommendations
Traffic profiles and application signatures
Our hypothesis is that data applications leave a signature when used over networks. Data that is sent from the original source and travels to the destination utilises the network in a very similar manner, if no strong congestion is experienced. Should ...
Improving TCP Congestion Control with Machine Intelligence
NetAI'18: Proceedings of the 2018 Workshop on Network Meets AI & MLIn a TCP/IP network, a key to ensure efficient and fair sharing of network resources among its users is the TCP congestion control (CC) scheme. Previously, the design of TCP CC schemes is based on hard-wiring of predefined actions to specific feedback ...
A smart fairness mechanism for Concurrent multipath transfer in SCTP over wireless multi-hop networks
The emerging use of multihomed devices in wireless multi-hop networks has increased the demand for multipath transport protocols, such as Concurrent multipath transfer in Stream control transmission protocol (CMT-SCTP). The fairness of CMT-SCTP over ...
Comments