Abstract
The problem of finding locally dense components of a graph is an important primitive in data analysis, with wide-ranging applications from community mining to spam detection and the discovery of biological network modules. In this paper we present new algorithms for finding the densest subgraph in the streaming model. For any ε > 0, our algorithms make O(log1+ε n) passes over the input and find a subgraph whose density is guaranteed to be within a factor 2(1 + ε) of the optimum. Our algorithms are also easily parallelizable and we illustrate this by realizing them in the MapReduce model. In addition we perform extensive experimental evaluation on massive real-world graphs showing the performance and scalability of our algorithms in practice.
- C. C. Aggarwal and H. Wang. Managing and Mining Graph Data. Springer Publishing Company, Inc., 1st edition, 2010. Google ScholarDigital Library
- R. Albert and A.-L. Barabási. Statistical mechanics of complex networks. Rev. Mod. Phys., 74(1):47--97, 2002. Google ScholarCross Ref
- R. Andersen and K. Chellapilla. Finding dense subgraphs with size bounds. In WAW, pages 25--37, 2009. Google ScholarDigital Library
- Z. Bar-Yossef, T. S. Jayram, R. Kumar, and D. Sivakumar. An information statistics approach to data stream and communication complexity. JCSS, 68(4):702--732, 2004. Google ScholarDigital Library
- Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. In SODA, pages 623--632, 2002. Google ScholarDigital Library
- L. Becchetti, P. Boldi, C. Castillo, and A. Gionis. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In KDD, pages 16--24, 2008. Google ScholarDigital Library
- B. Berger, J. Rompel, and P. W. Shor. Efficient NC algorithms for set cover with applications to learning and geometry. JCSS, 49(3):454--477, 1994. Google ScholarDigital Library
- G. Buehrer and K. Chellapilla. A scalable pattern mining approach to web graph compression with communities. In WSDM, pages 95--106, 2008. Google ScholarDigital Library
- A. Chakrabarti, S. Khot, and X. Sun. Near-optimal lower bounds on the multiparty communication complexity of set-disjointness. In CCC, pages 107--117, 2003.Google ScholarCross Ref
- M. Charikar. Greedy approximation algorithms for finding dense components in a graph. In APPROX, pages 84--95, 2000. Google ScholarDigital Library
- M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. TCS, 312:3--15, 2004. Google ScholarDigital Library
- J. Chen and Y. Saad. Dense subgraph extraction with application to community detection. TKDE, To appear. Google ScholarDigital Library
- F. Chierichetti, R. Kumar, and A. Tomkins. Max-Cover in Map-Reduce. In WWW, pages 231--240, 2010. Google ScholarDigital Library
- E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and distance queries via 2-hop labels. In SODA, pages 937--946, 2002. Google ScholarDigital Library
- A. S. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: scalable online collaborative filtering. In WWW, pages 271--280, 2007. Google ScholarDigital Library
- J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. In OSDI, pages 137--150, 2004. Google ScholarDigital Library
- Y. Dourisboure, F. Geraci, and M. Pellegrini. Extraction and classification of dense communities in the web. In WWW, pages 461--470, 2007. Google ScholarDigital Library
- J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. On graph problems in a semi-streaming model. TCS, 348(2-3):207--216, 2005. Google ScholarDigital Library
- J. Feldman, S. Muthukrishnan, A. Sidiropoulos, C. Stein, and Z. Svitkina. On distributing symmetric streaming computations. TALG, 6:1--19, 2010. Google ScholarDigital Library
- A. Gajewar and A. D. Sarma. Multi-skill collaborative teams based on densest subgraphs. CoRR, abs/1102.3340, 2011.Google Scholar
- D. Gibson, R. Kumar, and A. Tomkins. Discovering large dense subgraphs in massive graphs. In VLDB, pages 721--732, 2005. Google ScholarDigital Library
- A. V. Goldberg. Finding a maximum density subgraph. Technical Report UCB/CSD-84-171, EECS Department, University of California, Berkeley, 1984. Google ScholarDigital Library
- R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 3-HOP: A high-compression indexing scheme for reachability query. In SIGMOD, pages 813--826, 2009. Google ScholarDigital Library
- R. Kannan and V. Vinay. Analyzing the structure of large graphs, 1999. Manuscript.Google Scholar
- H. Karloff, S. Suri, and S. Vassilvitskii. A model of computation for mapreduce. In SODA, pages 938--948, 2010. Google ScholarDigital Library
- S. Khuller and B. Saha. On finding dense subgraphs. In ICALP, pages 597--608, 2009. Google ScholarDigital Library
- R. Kumar, J. Novak, and A. Tomkins. Structure and evolution of online social networks. In KDD, pages 611--617, 2006. Google ScholarDigital Library
- S. Lattanzi, B. Moseley, S. Suri, and S. Vassilvitskii. Filtering: a method for solving graph problems in mapreduce. In SPAA, pages 85--94, 2011. Google ScholarDigital Library
- E. Lawler. Combinatorial Optimization: Networks and Matroids. Holt, Rinehart, and Winston, 1976.Google Scholar
- J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6(1):29--123, 2009.Google ScholarCross Ref
- A. McGregor. Finding graph matchings in data streams. In APPROX, pages 170--181, 2005. Google ScholarDigital Library
- M.Newman. Modularity and community structure in networks. PNAS, 103(23):8577--8582, 2006.Google ScholarCross Ref
- G. D. F. Morales, A. Gionis, and M. Sozio. Social content matching in mapreduce. PVLDB, pages 460--469, 2011. Google ScholarDigital Library
- S. Muthukrishnan. Data streams: Algorithms and applications. Foundations and Trends in Theoretical Computer Science, 1(2), 2005. Google ScholarDigital Library
- A. Nandi, C. Yu, P. Bohannon, and R. Ramakrishnan. Distributed cube materialization on holistic measures. In ICDE, pages 183--194, 2011. Google ScholarDigital Library
- B. Saha, A. Hoch, S. Khuller, L. Raschid, and X.-N. Zhang. Dense subgraphs with restrictions and applications to gene annotation graphs. In RECOMB, pages 456--472, 2010. Google ScholarDigital Library
- S. Suri and S. Vassilvitskii. Counting triangles and the curse of the last reducer. In WWW, pages 607--614, 2011. Google ScholarDigital Library
Index Terms
- Densest subgraph in streaming and MapReduce
Recommendations
Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data miningFinding dense subgraphs is an important graph-mining task with many applications. Given that the direct optimization of edge density is not meaningful, as even a single edge achieves maximum density, research has focused on optimizing alternative ...
Efficient Densest Subgraph Computation in Evolving Graphs
WWW '15: Proceedings of the 24th International Conference on World Wide WebDensest subgraph computation has emerged as an important primitive in a wide range of data analysis tasks such as community and event detection. Social media such as Facebook and Twitter are highly dynamic with new friendship links and tweets being ...
The K-clique Densest Subgraph Problem
WWW '15: Proceedings of the 24th International Conference on World Wide WebNumerous graph mining applications rely on detecting subgraphs which are large near-cliques. Since formulations that are geared towards finding large near-cliques are hard and frequently inapproximable due to connections with the Maximum Clique problem, ...
Comments