ABSTRACT
Dense subgraph discovery, in a large graph, is useful to solve the community search problem. Motivated from this, we propose a graph summarization method where we search and aggregate dense subgraphs into super nodes. Since the dense subgraphs have high overlap of common neighbors, thus merging such subgraphs can produce a highly compact summary graph. Whereas the member nodes of each dense subgraph have many edges in common, they also have some edges not belonging to a given subgraph; which in turn reduce the compression ratio. To solve this problem, we propose the concept of AutoPruning that effectively filters the dense subgraphs having higher ratio of common to that of non-common neighbors. To summarize the dense subgraphs, we use the Minimum Description Length (MDL) principle to obtain a highly compact summary with least edge corrections for lossless compression. We propose two alternatives to trade-off the computation time and compression ratio while creating a super node from each dense subgraph. Through experiments on two publicly available real world graphs, we compare the proposed approach with the well known Minimum Degree Measure (MDM), for dense subgraph discovery, and observe very encouraging results.
- Y. Asahiro, K. Iwama, H. Tamaki, and T. Tokuyama. Greedily finding a dense subgraph. Journal of Algorithms, 34(2): 203--221, 2000. Google ScholarDigital Library
- P. Boldi and S. Vigna. The webgraph framework i: compression techniques. In Proceedings of the 13th international conference on World Wide Web, pages 595--602. ACM, 2004. Google ScholarDigital Library
- G. Buehrer and K. Chellapilla. A scalable pattern mining approach to web graph compression with communities. In Proceedings of the 2008 International Conference on Web Search and Data Mining, pages 95--106. ACM, 2008. Google ScholarDigital Library
- M. Charikar. Greedy approximation algorithms for finding dense components in a graph. In Approximation Algorithms for Combinatorial Optimization, pages 84--95. Springer, 2000. Google ScholarDigital Library
- F. Chierichetti, R. Kumar, S. Lattanzi, M. Mitzenmacher, A. Panconesi, and P. Raghavan. On compressing social networks. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 219--228. ACM, 2009. Google ScholarDigital Library
- W. Cui, Y. Xiao, H. Wang, J. Hong, and W. Wang. Local search of communities in large graphs. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM, 2014. Google ScholarDigital Library
- C. Hernández and G. Navarro. Compressed representations for web and social graphs. Knowledge and Information Systems, pages 1--35, 2013.Google Scholar
- U. Kang and C. Faloutsos. Beyond'caveman communities': Hubs and spokes for graph compression and mining. In Data Mining (ICDM), 2011 IEEE 11th International Conference on, pages 300--309. IEEE, 2011. Google ScholarDigital Library
- K. U. Khan, N. Waqas, and Y.-K. Lee. Set-based approximate approach for lossless graph summarization. The Computing Journal- Under Review, 2014.Google Scholar
- S. Khuller and B. Saha. On finding dense subgraphs. In Automata, Languages and Programming, pages 597--608. Springer, 2009. Google ScholarDigital Library
- S. Navlakha, R. Rastogi, and N. Shrivastava. Graph summarization with bounded error. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 419--432. ACM, 2008. Google ScholarDigital Library
- J. Rissanen. Modeling by shortest data description. Automatica, 14(5): 465--471, 1978. Google ScholarDigital Library
- S. B. Seidman. Network structure and minimum degree. Social networks, 5(3): 269--287, 1983.Google Scholar
- M. Sozio and A. Gionis. The community-search problem and how to plan a successful cocktail party. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 939--948. ACM, 2010. Google ScholarDigital Library
- Y. Tian, R. A. Hankins, and J. M. Patel. Efficient aggregation for graph summarization. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 567--580. ACM, 2008. Google ScholarDigital Library
- H. Toivonen, F. Zhou, A. Hartikainen, and A. Hinkka. Compression of weighted graphs. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 965--973. ACM, 2011. Google ScholarDigital Library
Index Terms
- Lossless graph summarization using dense subgraphs discovery
Recommendations
Incremental Lossless Graph Summarization
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningGiven a fully dynamic graph, represented as a stream of edge insertions and deletions, how can we obtain and incrementally update a lossless summary of its current snapshot? As large-scale graphs are prevalent, concisely representing them is inevitable ...
Large Very Dense Subgraphs in a Stream of Edges
FODS '20: Proceedings of the 2020 ACM-IMS on Foundations of Data Science ConferenceWe study the detection and the reconstruction of a large very dense subgraph in a social graph with n nodes and m edges given as a stream of edges, when the graph follows a power law degree distribution, in the regime when $m=O(n. łog n)$. A subgraph is ...
Dense Subgraphs Summarization: An Efficient Way to Summarize Large Scale Graphs by Super Nodes
Intelligent Computing MethodologiesAbstractFor large scale graphs, the graph summarization technique is essential, which can reduce the complexity for large-scale graphs analysis. The traditional graph summarization methods focus on reducing the complexity of original graph, and ignore the ...
Comments