skip to main content
10.1145/1376616.1376675acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Efficient aggregation for graph summarization

Published:09 June 2008Publication History

ABSTRACT

Graphs are widely used to model real world objects and their relationships, and large graph datasets are common in many application domains. To understand the underlying characteristics of large graphs, graph summarization techniques are critical. However, existing graph summarization methods are mostly statistical (studying statistics such as degree distributions, hop-plots and clustering coefficients). These statistical methods are very useful, but the resolutions of the summaries are hard to control.

In this paper, we introduce two database-style operations to summarize graphs. Like the OLAP-style aggregation methods that allow users to drill-down or roll-up to control the resolution of summarization, our methods provide an analogous functionality for large graph datasets. The first operation, called SNAP, produces a summary graph by grouping nodes based on user-selected node attributes and relationships. The second operation, called k-SNAP, further allows users to control the resolutions of summaries and provides the "drill-down" and "roll-up" abilities to navigate through summaries with different resolutions. We propose an efficient algorithm to evaluate the SNAP operation. In addition, we prove that the k-SNAP computation is NP-complete. We propose two heuristic methods to approximate the k-SNAP results. Through extensive experiments on a variety of real and synthetic datasets, we demonstrate the effectiveness and efficiency of the proposed methods.

References

  1. L. A. Adamic and N. Glance. The political blogosphere and the 2004 US Election: Divided they blog. In Proceedings of the 3rd International Workshop on Link Discovery, pages 36--43, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. A. Bader and K. Madduri. GTgraph: A suite of synthetic graph generators. http://www.cc.gatech.edu/~kamesh/GTgraph.Google ScholarGoogle Scholar
  3. G. Battista, P. Eades, R. Tamassia, and I. Tollis. Graph Drawing: Algorithms for the Visualization of Graphs. Prentice Hall, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. K. Blandford, G. E. Blelloch, and I. A. Kash. Compact representations of separable graphs. In Proceedings of SODA'03, pages 679--688, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Boldi and S. Vigna. The WebGraph framework I: Compression techniques. In Proceedings of WWW'04, pages 595--602, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Chakrabarti and C. Faloutsos. Graph mining: Laws, generators, and algorithms. ACM Comput. Surv., 38(1), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Chakrabarti, C. Faloutsos, and Y. Zhan. Visualization of large networks with min-cut plots, A-plots and R-MAT. Int. J. Hum.-Comput. Stud., 65(5):434--445, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-MAT: A recursive model for graph mining. In Proceedings of 4th SIAM International Conference on Data Mining, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  9. D. G. Corneil and C. C. Gotlieb. An efficient algorithm for graph isomorphism. J. ACM, 17(1):51--64, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. I. Herman, G. Melançon, and M. S. Marshall. Graph visualization and navigation in information visualization: A survey. IEEE Trans. Vis. Comput. Graph., 6(1):24--43, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Huan, W. Wang, J. Prins, and J. Yang. SPIN: Mining maximal frequent subgraphs from graph databases. In Proceedings of KDD'04, pages 581--586, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Ley. DBLP Bibliography. http://www.informatik.uni-trier.de/ ley/db/.Google ScholarGoogle Scholar
  13. M. E. J. Newman. The structure and function of complex networks. SIAM Review, 45:167--256, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E, 69:026113, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  15. S. Raghavan and H. Garcia-Molina. Representing Web graphs. In Proceedings of ICDE'03, pages 405--416, 2003.Google ScholarGoogle Scholar
  16. F. S. Roberts and L. Sheng. How hard is it to determine if a graph has a 2-role assignment? Networks, 37(2):67--73, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  17. J. F. Rodrigues, A. J. M. Traina, C. Faloutsos, and C. Traina Jr. SuperGraph visualization. In Proceedings of the 8th IEEE International Symposium on Multimedia, pages 227--234, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Sun, Y. Xie, H. Zhang, and C. Faloutsos. Less is more: Sparse graph mining with compact matrix decomposition. Stat. Anal. Data Min., 1(1):6--22, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. W. Wang, C. Wang, Y. Zhu, B. Shi, J. Pei, X. Yan, and J. Han. GraphMiner: A structural pattern-mining system for large disk-based graph databases and its applications. In Proceedings of SIGMOD'05, pages 879--881, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. Washio and H. Motoda. State of the art of graph-based data mining. SIGKDD Explor. Newsl., 5(1):59--68, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. R. White and K. P. Reitz. Graph and semigroup homomorphisms on semigroups of relations. Social Networks, 5(2):193--234, 1983.Google ScholarGoogle ScholarCross RefCross Ref
  22. X. Xu, N. Yuruk, Z. Feng, and T. A. J. Schweiger. SCAN: A structural clustering algorithm for networks. In Proceedings of KDD'07, pages 824--833, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In Proceedings of ICDM'02, pages 721--724, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient aggregation for graph summarization

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data
          June 2008
          1396 pages
          ISBN:9781605581026
          DOI:10.1145/1376616

          Copyright © 2008 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 9 June 2008

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate785of4,003submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader