skip to main content
10.1145/1921168.1921171acmconferencesArticle/Chapter ViewAbstractPublication PagesconextConference Proceedingsconference-collections
research-article

Profiling-By-Association: a resilient traffic profiling solution for the internet backbone

Published:30 November 2010Publication History

ABSTRACT

Profiling Internet backbone traffic is becoming an increasingly hard problem since users and applications are avoiding detection using traffic obfuscation and encryption. The key question addressed here is: Is it possible to profile traffic at the backbone without relying on its packet and flow level information, which can be obfuscated? We propose a novel approach, called Profiling-By-Association (PBA), that uses only the IP-to-IP communication graph and information about some applications used by few IP-hosts (a.k.a. seeds). The key insight is that IP-hosts tend to communicate more frequently with hosts involved in the same application forming communities (or clusters). Profiling few members within a cluster can "give away" the whole community. Following our approach, we develop different algorithms to profile Internet traffic and evaluate them on real-traces from four large backbone networks. We show that PBA's accuracy is on average around 90% with knowledge of only 1% of all the hosts in a given data set and its runtime is on the order of minutes (≈ 5).

References

  1. L. Bernaille, R. Teixeira, and K. Salamatian. Early Application Identification. In ACM CoNEXT, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. J. Stat. Mech., page 10008, 2008.Google ScholarGoogle Scholar
  3. CAIDA Org. The CoralReef Project, http://www.caida.org/tools/measurement/coralreef/.Google ScholarGoogle Scholar
  4. D. Chakrabarti, S. Papadimitriou, D. S. Modha, and C. Faloutsos. Fully automatic cross-associations. In SIGKDD, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Clauset, M. Newman, and C. Moore. Finding community structure in very large networks. Physical Review E, 70:066111, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  6. M. Dusi, A. Este, F. Gringoli, and L. Salgarelli. Using GMM and SVM-based techniques for the classification of SSH-Encrypted traffic. In IEEE ICC, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Erman, A. Mahanti, M. Arlitt, and C. Williamson. Identifying and Discriminating Between Web and Peer-to-peer Traffic in the Network Core. In WWW, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Gallagher, M. Iliofotou, T. Eliassi-Rad, and M. Faloutsos. Homophily in application layer and its usage in traffic classification. In IEEE INFOCOM (mini-conference), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. Henderson and T. Eliassi-Rad. Applying latent Dirichlet allocation to group discovery in large graphs. In ACM SAC, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Iliofotou, H. Kim, P. Pappu, M. Faloutsos, M. Mitzenmacher, and G. Varghese. Graph-based P2P Traffic Classification at the Internet Backbone. In IEEE GI, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  11. Y. Jin, S. Esam, and Z. L. Zhang. Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition. In ACM SIGMETRICS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Karagiannis, K. Papagiannaki, and M. Faloutsos. BLINC: Multi-level Traffic Classification in the Dark. In ACM SIGCOMM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. Kim, K. Claffy, M. Fomenkov, D. Barman, M. Faloutsos, and K. Lee. Internet Traffic Classification Demystified: Myths, Caveats, and the Best Practices. In ACM CoNEXT, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. Kwak, Y. Choi, Y.-H. Eom, H. Jeong, and S. Moon. Mining communities in networks: a solution for consistency and its evaluation. In IMC. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Ma, K. Levchenko, C. Kreibich, S. Savage, and G. M. Voelker. Unexpected Means of Protocol Inference. In ACM IMC, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Moore and K. Papagiannaki. Toward the accurate identification of network applications. In PAM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Moore and D. Zuev. Internet Traffic Classification Using Bayesian Analysis Techniques. In ACM SIGMETRICS, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Sen, O. Spatscheck, and D. Wang. Accurate, scalable in-network identification of p2p traffic using application signatures. In WWW, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. I. Trestian, S. Ranjan, A. Kuzmanovic, and A. Nucci. Unconstrained endpoint profiling (Googling the Internet). In ACM SIGCOMM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. van Dongen. Graph Clustering by Flow Simulation. PhD thesis, University of Utrecht, 2000. http://www.micans.org/mcl/.Google ScholarGoogle Scholar
  21. K. Xu, Z. Zhang, and S. Bhattacharyya. Profiling Internet Backbone Traffic: Behavior Models and Applications. In ACM SIGCOMM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Profiling-By-Association: a resilient traffic profiling solution for the internet backbone

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          Co-NEXT '10: Proceedings of the 6th International COnference
          November 2010
          349 pages
          ISBN:9781450304481
          DOI:10.1145/1921168

          Copyright © 2010 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 30 November 2010

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate198of789submissions,25%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader