ABSTRACT
The automatic detection of applications associated with network traffic is an essential step for network security and traffic engineering. Unfortunately, simple port-based classification methods are not always efficient and systematic analysis of packet payloads is too slow. Most recent research proposals use flow statistics to classify traffic flows once they are finished, which limit their applicability for online classification. In this paper, we evaluate the feasibility of application identification at the beginning of a TCP connection. Based on an analysis of packet traces collected on eight different networks, we find that it is possible to distinguish the behavior of an application from the observation of the size and the direction of the first few packets of the TCP connection. We apply three techniques to cluster TCP connections: K-Means, Gaussian Mixture Model and spectral clustering. Resulting clusters are used together with assignment and labeling heuristics to design classifiers. We evaluate these classifiers on different packet traces. Our results show that the first four packets of a TCP connection are sufficient to classify known applications with an accuracy over 90% and to identify new applications as unknown with a probability of 60%.
- Community Resource for Archiving Wireless Data At Dartmouth: http://crawdad.cs.dartmouth.edu/.Google Scholar
- M2C Measurement Data Repository: http://m2c-a.ewi.utwente.nl/repository/.Google Scholar
- L. E. Baum, T. Petrie, G. Soules, and N. Weiss. A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. The Annals of Mathematical Statistics, 1970.Google ScholarCross Ref
- L. Bernaille, A. Soule, M.-I. Jeannin, and K. Salamatian. Blind application flow recognition through behavioral classification. Technical report, Laboratoire d'Informatique de Paris 6, Université Pierre et Marie Curie, http://www-rp.lip6.fr/~bernaill/techreport.pdf, 2005.Google Scholar
- L. Bernaille, R. Teixeira, I. Akodkenou, A. Soule, and K. Salamatian. Traffic classification on the fly. SIGCOMM Comput. Commun. Rev., 2006. Google ScholarDigital Library
- H. Binsztok, T. Artires, and P. Gallinari. A model-based approach to sequence clustering. In ECAI, Madrid, 2004.Google Scholar
- R. Boyer and J. Moore. A fast string searching algorithm. Communications of the ACM, 1977. Google ScholarDigital Library
- A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1--38, 1977.Google ScholarCross Ref
- Endace. http://www.endace.com.Google Scholar
- J. Erman, M. Arlitt, and A. Mahanti. Traffic classification using clustering algorithms. In MineNet'06: Proceedings of the 2006 SIGCOMM workshop on Mining network data, pages 281--286, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
- I. Fischer and J. Poland. New methods for spectral clustering. Technical report, IDISA, June 2004.Google Scholar
- T. Henderson, D. Kotz, and I. Abyzov. The changing usage of a mature campus-wide wireless network. In MobiCom '04: Proceedings of the 10th annual international conference on Mobile computing and networking, pages 187--201, New York, NY, USA, 2004. ACM Press. Google ScholarDigital Library
- N. Hohn and D. Veitch. Inverting sampled traffic. In Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement, 2003. Google ScholarDigital Library
- IPMON. ipmon.sprintlabs.com.Google Scholar
- T. Karagiannis, A. Broido, N. Brownlee, K. Claffy, and M. Faloutsos. Is p2p dying or just hiding? In Globecom, 2004.Google ScholarCross Ref
- T. Karagiannis, D. Papagiannaki, and M. Faloutsos. Blinc: Multilevel traffic classification in the dark. In SIGCOMM, 2005. Google ScholarDigital Library
- Ma, Levchenko, Kreibich, Savage, and Voelker. Unexpected means of protocol inference. In Internet Measurement Confererence, 2006. Google ScholarDigital Library
- A. McGregor, M. Hall, P. Lorier, and J. Brunskill. Flow clustering using machine learning techniques. In Passive and Active Measurement, 2004.Google ScholarCross Ref
- A. Moore and D. Zuev. Internet traffic classification using bayesian analysis. In Sigmetrics, 2005. Google ScholarDigital Library
- A. Ng, M. Jordan, and Y. Weiss. On spectral clustering : analysis and an algorithm. In NIPS, 2001.Google Scholar
- F. Porikli. Trajectory distance metric using hidden markov model based representation. In IEEE European Conference on Computer Vision, PETS Workshop, 2004.Google Scholar
- Qosmos. http://www.qosmos.com.Google Scholar
- M. Roughan, S. Sen, O. Spatscheck, and N. Duffield. A statistical signature-based approach to ip traffic classification. In IMC, 2004. Google ScholarDigital Library
- P. Smyth. Clustering sequences with hidden markov models. In Advances in Neural Information Processing, 1997.Google Scholar
- Snort. http://www.snort.org.Google Scholar
- A. Strehl and J. Ghosh. Cluster ensembles --- a knowledge reuse framework for combining multiple partitions. Journal on Machine Learning Research (JMLR), 2002. Google ScholarDigital Library
- K. Suh, D. R. Figueiredo, J. Kurose, and D. Towsley. Characterizing and detecting relayed traffic: A case study using skype. In IEEE Infocom, 2006.Google Scholar
- D. Zuev and A. Moore. Traffic classification using a statistical approach. In Passive and Active Measurement, 2005. Google ScholarDigital Library
Index Terms
- Early application identification
Recommendations
Traffic classification on the fly
The early detection of applications associated with TCP flows is an essential step for network security and traffic engineering. The classic way to identify flows, i.e. looking at port numbers, is not effective anymore. On the other hand, state-of-the-...
Efficient application identification and the temporal and spatial stability of classification schema
Motivated by the importance of accurate identification for a range of applications, this paper compares and contrasts the effective and efficient classification of network-based applications using behavioral observations of network-traffic and those ...
Comments