ABSTRACT
Advances in data collection and storage capacity have made it increasingly possible to collect highly volatile graph data for analysis. Existing graph analysis techniques are not appropriate for such data, especially in cases where streaming or near-real-time results are required. An example that has drawn significant research interest is the cyber-security domain, where internet communication traces are collected and real-time discovery of events, behaviors, patterns, and anomalies is desired. We propose MetricForensics, a scalable framework for analysis of volatile graphs. MetricForensics combines a multi-level "drill down" approach, a collection of user-selected graph metrics, and a collection of analysis techniques. At each successive level, more sophisticated metrics are computed and the graph is viewed at finer temporal resolutions. In this way, MetricForensics scales to highly volatile graphs by only allocating resources for computationally expensive analysis when an interesting event is discovered at a coarser resolution first. We test MetricForensics on three real-world graphs: an enterprise IP trace, a trace of legitimate and malicious network traffic from a research institution, and the MIT Reality Mining proximity sensor data. Our largest graph has 3M vertices and 32M edges, spanning 4.5 days. The results demonstrate the scalability and capability of MetricForensics in analyzing volatile graphs; and highlight four novel phenomena in such graphs: elbows, broken correlations, prolonged spikes, and lightweight stars.
Supplemental Material
- D. Agarwal, A. Z. Broder, D. Chakrabarti, D. Diklic, V. Josifovski, and M. Sayyadian. Estimating rates of rare events at multiple resolutions. In KDD, pages 16--25, 2007. Google ScholarDigital Library
- L. Akoglu, M. McGlohon, and C. Faloutsos. OddBall: Spotting anomalies in weighted graphs. In PAKDD, 2010. Google ScholarDigital Library
- R. Albert, H. Jeong, and A.-L. Barabasi. Diameter of the world wide web. Nature, (401):130--131, 1999.Google Scholar
- L. Backstrom, D. P. Huttenlocher, J. M. Kleinberg, and X. Lan. Group formation in large social networks: membership, growth, and evolution. In KDD, pages 44--54, 2006. Google ScholarDigital Library
- A.-L. Barabasi. The origin of bursts and heavy tails in human dynamics. Nature, 435:207--211, 2005.Google ScholarCross Ref
- A. Belussi and C. Faloutsos. Estimating the selectivity of spatial queries using the 'correlation' fractal dimension. In VLDB, pages 299--310, 1995. Google ScholarDigital Library
- M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. LOF: Identifying density-based local outliers. In SIGMOD, pages 93--104, 2000. Google ScholarDigital Library
- A. Z. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. L.Wiener. Graph structure in the web. Computer Networks, 33(1-6):309--320, 2000. Google ScholarDigital Library
- H. Bunke, P. J. Dickinson, A. Humm, C. Irniger, and M. Kraetzl. Graph sequence visualisation and its application to computer network monitoring and abnormal event detection. In A. Kandel, H. Bunke, and M. Last, editors, Applied Graph Theory in Computer Vision and Pattern Recognition, volume 52 of Studies in Computational Intelligence, pages 227--245. Springer, 2007.Google Scholar
- D. Chakrabarti. Autopart: Parameter-free graph partitioning and outlier detection. In PKDD, pages 112--124, 2004. Google ScholarDigital Library
- D. Chakrabarti, S. Papadimitriou, D. Modha, and C. Faloutsos. Fully automatic cross-associations. In KDD, pages 79--88, 2004. Google ScholarDigital Library
- V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A survey. ACM Comput. Surv., 41(3), 2009. Google ScholarDigital Library
- Y. Chi, X. Song, D. Zhou, K. Hino, and B. L. Tseng. Evolutionary spectral clustering by incorporating temporal smoothness. In KDD, pages 153--162, 2007. Google ScholarDigital Library
- W. Eberle and L. B. Holder. Mining for structural anomalies in graph-based data. In DMIN, 2007.Google Scholar
- M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. In SIGCOMM, pages 251--262, 1999. Google ScholarDigital Library
- G. Flake, S. Lawrence, C. L. Giles, and F. Coetzee. Self-organization and identification of web communities. IEEE Computer, 35(3):66--71, 2002. Google ScholarDigital Library
- F. Geerts, H. Mannila, and E. Terzi. Relational link-based ranking. In VLDB, pages 552--563, 2004. Google ScholarDigital Library
- P. B. Gibbons and Y. Matias. New sampling-based summary statistics for improving approximate query answers. In SIGMOD, pages 331--342, 1998. Google ScholarDigital Library
- K. Henderson, T. Eliassi-Rad, S. Papdimitriou, and C. Faloutsos. HCDF: A hybrid community discovery framework. In SDM, 2010.Google ScholarCross Ref
- S. Hirose, K. Yamanishi, T. Nakata, and R. Fujimaki. Network anomaly detection based on eigen equation compression. In KDD, pages 1185--1194, 2009. Google ScholarDigital Library
- T. Id´e and H. Kashima. Eigenspace-based anomaly detection in computer systems. In KDD, pages 440--449, 2004. Google ScholarDigital Library
- B. Karrer, E. Levina, and M. E. J. Newman. Robustness of community structure in networks. Phys. Rev. E, 77(046119), 2008.Google Scholar
- J. M. Kleinberg. Bursty and hierarchical structure in streams. In KDD, pages 91--101, 2002. Google ScholarDigital Library
- A. Lakhina, M. Crovella, and C. Diot. Mining anomalies using traffic feature distributions. In SIGCOMM, pages 217--228, 2005. Google ScholarDigital Library
- J.-G. Lee, J. Han, and X. Li. Trajectory outlier detection: A partition-and-detect framework. In ICDE, pages 140--149, 2008. Google ScholarDigital Library
- D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks. In CIKM, pages 556--559, 2003. Google ScholarDigital Library
- C. Liu, X. Yan, H. Yu, J. Han, and P. S. Yu. Mining behavior graphs for "backtrace" of noncrashing bugs. In SDM, 2005.Google ScholarCross Ref
- M. E. J. Newman. The structure and function of complex networks. SIAM Review, 45:167--256, 2003.Google ScholarDigital Library
- C. C. Noble and D. J. Cook. Graph-based anomaly detection. In KDD, pages 631--636, 2003. Google ScholarDigital Library
- B. A. Prakash, N. Valler, D. Andersen, M. Faloutsos, and C. Faloutsos. BGP-lens: Patterns and anomalies in internet routing updates. In KDD, pages 1315--1324, 2009. Google ScholarDigital Library
- J. Sun, D. Tao, and C. Faloutsos. Beyond streams and graphs: dynamic tensor analysis. In KDD, pages 374--383, 2006. Google ScholarDigital Library
- V. Syrotiuk, K. Shaukat, Y. Kwon, M. Kraetzl, and J. Arnold. Application of a network dynamics analysis tool to mobile ad hoc networks. In MSWiM, pages 36--43, 2006. Google ScholarDigital Library
- H. Tong, C. Faloutsos, and J.-Y. Pan. Random walk with restart: Fast solutions and applications. Knowledge and Information Systems: An International Journal (KAIS), 14(3):327--346, 2008. Google ScholarDigital Library
- M.Wang, N. H. Chan, S. Papadimitriou, C. Faloutsos, and T. Madhyastha. Data mining meets performance evaluation: Fast algorithms for modeling bursty traffic. In ICDE, pages 507--516, 2002. Google ScholarDigital Library
- D. Xin, J. Han, X. Yan, and H. Cheng. Mining compressed frequent-pattern sets. In VLDB, pages 709--720, 2005. Google ScholarDigital Library
Index Terms
- Metric forensics: a multi-level approach for mining volatile graphs
Recommendations
Mining Frequent Subgraph Patterns from Uncertain Graph Data
In many real applications, graph data is subject to uncertainties due to incompleteness and imprecision of data. Mining such uncertain graph data is semantically different from and computationally more challenging than mining conventional exact graph ...
Mining frequent cross-graph quasi-cliques
Joint mining of multiple datasets can often discover interesting, novel, and reliable patterns which cannot be obtained solely from any single source. For example, in bioinformatics, jointly mining multiple gene expression datasets obtained by different ...
Improvised Apriori with frequent subgraph tree for extracting frequent subgraphs
Soft computing and intelligent systems: Tools, techniques and applicationsGraphs are considered to be one of the best studied data structures in discrete mathematics and computer science. Hence, data mining on graphs has become quite popular in the past few years. The problem of finding frequent itemsets in conventional data ...
Comments