skip to main content
10.1145/1835804.1835828acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Metric forensics: a multi-level approach for mining volatile graphs

Published:25 July 2010Publication History

ABSTRACT

Advances in data collection and storage capacity have made it increasingly possible to collect highly volatile graph data for analysis. Existing graph analysis techniques are not appropriate for such data, especially in cases where streaming or near-real-time results are required. An example that has drawn significant research interest is the cyber-security domain, where internet communication traces are collected and real-time discovery of events, behaviors, patterns, and anomalies is desired. We propose MetricForensics, a scalable framework for analysis of volatile graphs. MetricForensics combines a multi-level "drill down" approach, a collection of user-selected graph metrics, and a collection of analysis techniques. At each successive level, more sophisticated metrics are computed and the graph is viewed at finer temporal resolutions. In this way, MetricForensics scales to highly volatile graphs by only allocating resources for computationally expensive analysis when an interesting event is discovered at a coarser resolution first. We test MetricForensics on three real-world graphs: an enterprise IP trace, a trace of legitimate and malicious network traffic from a research institution, and the MIT Reality Mining proximity sensor data. Our largest graph has 3M vertices and 32M edges, spanning 4.5 days. The results demonstrate the scalability and capability of MetricForensics in analyzing volatile graphs; and highlight four novel phenomena in such graphs: elbows, broken correlations, prolonged spikes, and lightweight stars.

Skip Supplemental Material Section

Supplemental Material

kdd2010_henderson_mfml_01.mov

mov

105.8 MB

References

  1. D. Agarwal, A. Z. Broder, D. Chakrabarti, D. Diklic, V. Josifovski, and M. Sayyadian. Estimating rates of rare events at multiple resolutions. In KDD, pages 16--25, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. Akoglu, M. McGlohon, and C. Faloutsos. OddBall: Spotting anomalies in weighted graphs. In PAKDD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Albert, H. Jeong, and A.-L. Barabasi. Diameter of the world wide web. Nature, (401):130--131, 1999.Google ScholarGoogle Scholar
  4. L. Backstrom, D. P. Huttenlocher, J. M. Kleinberg, and X. Lan. Group formation in large social networks: membership, growth, and evolution. In KDD, pages 44--54, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A.-L. Barabasi. The origin of bursts and heavy tails in human dynamics. Nature, 435:207--211, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  6. A. Belussi and C. Faloutsos. Estimating the selectivity of spatial queries using the 'correlation' fractal dimension. In VLDB, pages 299--310, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. LOF: Identifying density-based local outliers. In SIGMOD, pages 93--104, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Z. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. L.Wiener. Graph structure in the web. Computer Networks, 33(1-6):309--320, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. Bunke, P. J. Dickinson, A. Humm, C. Irniger, and M. Kraetzl. Graph sequence visualisation and its application to computer network monitoring and abnormal event detection. In A. Kandel, H. Bunke, and M. Last, editors, Applied Graph Theory in Computer Vision and Pattern Recognition, volume 52 of Studies in Computational Intelligence, pages 227--245. Springer, 2007.Google ScholarGoogle Scholar
  10. D. Chakrabarti. Autopart: Parameter-free graph partitioning and outlier detection. In PKDD, pages 112--124, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Chakrabarti, S. Papadimitriou, D. Modha, and C. Faloutsos. Fully automatic cross-associations. In KDD, pages 79--88, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A survey. ACM Comput. Surv., 41(3), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Chi, X. Song, D. Zhou, K. Hino, and B. L. Tseng. Evolutionary spectral clustering by incorporating temporal smoothness. In KDD, pages 153--162, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. W. Eberle and L. B. Holder. Mining for structural anomalies in graph-based data. In DMIN, 2007.Google ScholarGoogle Scholar
  15. M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. In SIGCOMM, pages 251--262, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Flake, S. Lawrence, C. L. Giles, and F. Coetzee. Self-organization and identification of web communities. IEEE Computer, 35(3):66--71, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. F. Geerts, H. Mannila, and E. Terzi. Relational link-based ranking. In VLDB, pages 552--563, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. B. Gibbons and Y. Matias. New sampling-based summary statistics for improving approximate query answers. In SIGMOD, pages 331--342, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. K. Henderson, T. Eliassi-Rad, S. Papdimitriou, and C. Faloutsos. HCDF: A hybrid community discovery framework. In SDM, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  20. S. Hirose, K. Yamanishi, T. Nakata, and R. Fujimaki. Network anomaly detection based on eigen equation compression. In KDD, pages 1185--1194, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Id´e and H. Kashima. Eigenspace-based anomaly detection in computer systems. In KDD, pages 440--449, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. B. Karrer, E. Levina, and M. E. J. Newman. Robustness of community structure in networks. Phys. Rev. E, 77(046119), 2008.Google ScholarGoogle Scholar
  23. J. M. Kleinberg. Bursty and hierarchical structure in streams. In KDD, pages 91--101, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Lakhina, M. Crovella, and C. Diot. Mining anomalies using traffic feature distributions. In SIGCOMM, pages 217--228, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J.-G. Lee, J. Han, and X. Li. Trajectory outlier detection: A partition-and-detect framework. In ICDE, pages 140--149, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks. In CIKM, pages 556--559, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. Liu, X. Yan, H. Yu, J. Han, and P. S. Yu. Mining behavior graphs for "backtrace" of noncrashing bugs. In SDM, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  28. M. E. J. Newman. The structure and function of complex networks. SIAM Review, 45:167--256, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. C. C. Noble and D. J. Cook. Graph-based anomaly detection. In KDD, pages 631--636, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. B. A. Prakash, N. Valler, D. Andersen, M. Faloutsos, and C. Faloutsos. BGP-lens: Patterns and anomalies in internet routing updates. In KDD, pages 1315--1324, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Sun, D. Tao, and C. Faloutsos. Beyond streams and graphs: dynamic tensor analysis. In KDD, pages 374--383, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. V. Syrotiuk, K. Shaukat, Y. Kwon, M. Kraetzl, and J. Arnold. Application of a network dynamics analysis tool to mobile ad hoc networks. In MSWiM, pages 36--43, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. H. Tong, C. Faloutsos, and J.-Y. Pan. Random walk with restart: Fast solutions and applications. Knowledge and Information Systems: An International Journal (KAIS), 14(3):327--346, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M.Wang, N. H. Chan, S. Papadimitriou, C. Faloutsos, and T. Madhyastha. Data mining meets performance evaluation: Fast algorithms for modeling bursty traffic. In ICDE, pages 507--516, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. D. Xin, J. Han, X. Yan, and H. Cheng. Mining compressed frequent-pattern sets. In VLDB, pages 709--720, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Metric forensics: a multi-level approach for mining volatile graphs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
        July 2010
        1240 pages
        ISBN:9781450300551
        DOI:10.1145/1835804

        Copyright © 2010 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 25 July 2010

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader