ABSTRACT
Graphs are widely used to model real world objects and their relationships, and large graph datasets are common in many application domains. To understand the underlying characteristics of large graphs, graph summarization techniques are critical. However, existing graph summarization methods are mostly statistical (studying statistics such as degree distributions, hop-plots and clustering coefficients). These statistical methods are very useful, but the resolutions of the summaries are hard to control.
In this paper, we introduce two database-style operations to summarize graphs. Like the OLAP-style aggregation methods that allow users to drill-down or roll-up to control the resolution of summarization, our methods provide an analogous functionality for large graph datasets. The first operation, called SNAP, produces a summary graph by grouping nodes based on user-selected node attributes and relationships. The second operation, called k-SNAP, further allows users to control the resolutions of summaries and provides the "drill-down" and "roll-up" abilities to navigate through summaries with different resolutions. We propose an efficient algorithm to evaluate the SNAP operation. In addition, we prove that the k-SNAP computation is NP-complete. We propose two heuristic methods to approximate the k-SNAP results. Through extensive experiments on a variety of real and synthetic datasets, we demonstrate the effectiveness and efficiency of the proposed methods.
- L. A. Adamic and N. Glance. The political blogosphere and the 2004 US Election: Divided they blog. In Proceedings of the 3rd International Workshop on Link Discovery, pages 36--43, 2005. Google ScholarDigital Library
- D. A. Bader and K. Madduri. GTgraph: A suite of synthetic graph generators. http://www.cc.gatech.edu/~kamesh/GTgraph.Google Scholar
- G. Battista, P. Eades, R. Tamassia, and I. Tollis. Graph Drawing: Algorithms for the Visualization of Graphs. Prentice Hall, 1999. Google ScholarDigital Library
- D. K. Blandford, G. E. Blelloch, and I. A. Kash. Compact representations of separable graphs. In Proceedings of SODA'03, pages 679--688, 2003. Google ScholarDigital Library
- P. Boldi and S. Vigna. The WebGraph framework I: Compression techniques. In Proceedings of WWW'04, pages 595--602, 2004. Google ScholarDigital Library
- D. Chakrabarti and C. Faloutsos. Graph mining: Laws, generators, and algorithms. ACM Comput. Surv., 38(1), 2006. Google ScholarDigital Library
- D. Chakrabarti, C. Faloutsos, and Y. Zhan. Visualization of large networks with min-cut plots, A-plots and R-MAT. Int. J. Hum.-Comput. Stud., 65(5):434--445, 2007. Google ScholarDigital Library
- D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-MAT: A recursive model for graph mining. In Proceedings of 4th SIAM International Conference on Data Mining, 2004.Google ScholarCross Ref
- D. G. Corneil and C. C. Gotlieb. An efficient algorithm for graph isomorphism. J. ACM, 17(1):51--64, 1970. Google ScholarDigital Library
- I. Herman, G. Melançon, and M. S. Marshall. Graph visualization and navigation in information visualization: A survey. IEEE Trans. Vis. Comput. Graph., 6(1):24--43, 2000. Google ScholarDigital Library
- J. Huan, W. Wang, J. Prins, and J. Yang. SPIN: Mining maximal frequent subgraphs from graph databases. In Proceedings of KDD'04, pages 581--586, 2004. Google ScholarDigital Library
- M. Ley. DBLP Bibliography. http://www.informatik.uni-trier.de/ ley/db/.Google Scholar
- M. E. J. Newman. The structure and function of complex networks. SIAM Review, 45:167--256, 2003.Google ScholarDigital Library
- M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E, 69:026113, 2004.Google ScholarCross Ref
- S. Raghavan and H. Garcia-Molina. Representing Web graphs. In Proceedings of ICDE'03, pages 405--416, 2003.Google Scholar
- F. S. Roberts and L. Sheng. How hard is it to determine if a graph has a 2-role assignment? Networks, 37(2):67--73, 2001.Google ScholarCross Ref
- J. F. Rodrigues, A. J. M. Traina, C. Faloutsos, and C. Traina Jr. SuperGraph visualization. In Proceedings of the 8th IEEE International Symposium on Multimedia, pages 227--234, 2006. Google ScholarDigital Library
- J. Sun, Y. Xie, H. Zhang, and C. Faloutsos. Less is more: Sparse graph mining with compact matrix decomposition. Stat. Anal. Data Min., 1(1):6--22, 2008. Google ScholarDigital Library
- W. Wang, C. Wang, Y. Zhu, B. Shi, J. Pei, X. Yan, and J. Han. GraphMiner: A structural pattern-mining system for large disk-based graph databases and its applications. In Proceedings of SIGMOD'05, pages 879--881, 2005. Google ScholarDigital Library
- T. Washio and H. Motoda. State of the art of graph-based data mining. SIGKDD Explor. Newsl., 5(1):59--68, 2003. Google ScholarDigital Library
- D. R. White and K. P. Reitz. Graph and semigroup homomorphisms on semigroups of relations. Social Networks, 5(2):193--234, 1983.Google ScholarCross Ref
- X. Xu, N. Yuruk, Z. Feng, and T. A. J. Schweiger. SCAN: A structural clustering algorithm for networks. In Proceedings of KDD'07, pages 824--833, 2007. Google ScholarDigital Library
- X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In Proceedings of ICDM'02, pages 721--724, 2002. Google ScholarDigital Library
Index Terms
- Efficient aggregation for graph summarization
Recommendations
Overlaying a hypergraph with a graph with bounded maximum degree
AbstractLet G and H be respectively a graph and a hypergraph defined on a same set of vertices, and let F be a fixed graph. We say that G F-overlays a hyperedge S of H if F is a spanning subgraph of the subgraph of G induced by S,...
Clique r-Domination and Clique r-Packing Problems on Dually Chordal Graphs
Let $\cal C$ be a family of cliques of a graph G=(V,E). Suppose that each clique C of $\cal C$ is associated with an integer r(C)$, where $r(C) \ge 0$. A vertex v r-dominates a clique C of G if $d(v,x) \le r(C)$ for all $x \in C$, where d(v,x) is the ...
Graph summarization with quality guarantees
We study the problem of graph summarization. Given a large graph we aim at producing a concise lossy representation (a summary) that can be stored in main memory and used to approximately answer queries about the original graph much faster than by using ...
Comments