skip to main content
research-article

Densest subgraph in streaming and MapReduce

Published:01 January 2012Publication History
Skip Abstract Section

Abstract

The problem of finding locally dense components of a graph is an important primitive in data analysis, with wide-ranging applications from community mining to spam detection and the discovery of biological network modules. In this paper we present new algorithms for finding the densest subgraph in the streaming model. For any ε > 0, our algorithms make O(log1+ε n) passes over the input and find a subgraph whose density is guaranteed to be within a factor 2(1 + ε) of the optimum. Our algorithms are also easily parallelizable and we illustrate this by realizing them in the MapReduce model. In addition we perform extensive experimental evaluation on massive real-world graphs showing the performance and scalability of our algorithms in practice.

References

  1. C. C. Aggarwal and H. Wang. Managing and Mining Graph Data. Springer Publishing Company, Inc., 1st edition, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Albert and A.-L. Barabási. Statistical mechanics of complex networks. Rev. Mod. Phys., 74(1):47--97, 2002. Google ScholarGoogle ScholarCross RefCross Ref
  3. R. Andersen and K. Chellapilla. Finding dense subgraphs with size bounds. In WAW, pages 25--37, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Z. Bar-Yossef, T. S. Jayram, R. Kumar, and D. Sivakumar. An information statistics approach to data stream and communication complexity. JCSS, 68(4):702--732, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. In SODA, pages 623--632, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. Becchetti, P. Boldi, C. Castillo, and A. Gionis. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In KDD, pages 16--24, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Berger, J. Rompel, and P. W. Shor. Efficient NC algorithms for set cover with applications to learning and geometry. JCSS, 49(3):454--477, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Buehrer and K. Chellapilla. A scalable pattern mining approach to web graph compression with communities. In WSDM, pages 95--106, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Chakrabarti, S. Khot, and X. Sun. Near-optimal lower bounds on the multiparty communication complexity of set-disjointness. In CCC, pages 107--117, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  10. M. Charikar. Greedy approximation algorithms for finding dense components in a graph. In APPROX, pages 84--95, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. TCS, 312:3--15, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Chen and Y. Saad. Dense subgraph extraction with application to community detection. TKDE, To appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. F. Chierichetti, R. Kumar, and A. Tomkins. Max-Cover in Map-Reduce. In WWW, pages 231--240, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and distance queries via 2-hop labels. In SODA, pages 937--946, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. S. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: scalable online collaborative filtering. In WWW, pages 271--280, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. In OSDI, pages 137--150, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Dourisboure, F. Geraci, and M. Pellegrini. Extraction and classification of dense communities in the web. In WWW, pages 461--470, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. On graph problems in a semi-streaming model. TCS, 348(2-3):207--216, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Feldman, S. Muthukrishnan, A. Sidiropoulos, C. Stein, and Z. Svitkina. On distributing symmetric streaming computations. TALG, 6:1--19, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Gajewar and A. D. Sarma. Multi-skill collaborative teams based on densest subgraphs. CoRR, abs/1102.3340, 2011.Google ScholarGoogle Scholar
  21. D. Gibson, R. Kumar, and A. Tomkins. Discovering large dense subgraphs in massive graphs. In VLDB, pages 721--732, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. V. Goldberg. Finding a maximum density subgraph. Technical Report UCB/CSD-84-171, EECS Department, University of California, Berkeley, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 3-HOP: A high-compression indexing scheme for reachability query. In SIGMOD, pages 813--826, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Kannan and V. Vinay. Analyzing the structure of large graphs, 1999. Manuscript.Google ScholarGoogle Scholar
  25. H. Karloff, S. Suri, and S. Vassilvitskii. A model of computation for mapreduce. In SODA, pages 938--948, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Khuller and B. Saha. On finding dense subgraphs. In ICALP, pages 597--608, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. Kumar, J. Novak, and A. Tomkins. Structure and evolution of online social networks. In KDD, pages 611--617, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Lattanzi, B. Moseley, S. Suri, and S. Vassilvitskii. Filtering: a method for solving graph problems in mapreduce. In SPAA, pages 85--94, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. E. Lawler. Combinatorial Optimization: Networks and Matroids. Holt, Rinehart, and Winston, 1976.Google ScholarGoogle Scholar
  30. J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6(1):29--123, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  31. A. McGregor. Finding graph matchings in data streams. In APPROX, pages 170--181, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M.Newman. Modularity and community structure in networks. PNAS, 103(23):8577--8582, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  33. G. D. F. Morales, A. Gionis, and M. Sozio. Social content matching in mapreduce. PVLDB, pages 460--469, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Muthukrishnan. Data streams: Algorithms and applications. Foundations and Trends in Theoretical Computer Science, 1(2), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. Nandi, C. Yu, P. Bohannon, and R. Ramakrishnan. Distributed cube materialization on holistic measures. In ICDE, pages 183--194, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. B. Saha, A. Hoch, S. Khuller, L. Raschid, and X.-N. Zhang. Dense subgraphs with restrictions and applications to gene annotation graphs. In RECOMB, pages 456--472, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. S. Suri and S. Vassilvitskii. Counting triangles and the curse of the last reducer. In WWW, pages 607--614, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Densest subgraph in streaming and MapReduce
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the VLDB Endowment
        Proceedings of the VLDB Endowment  Volume 5, Issue 5
        January 2012
        108 pages

        Publisher

        VLDB Endowment

        Publication History

        • Published: 1 January 2012
        Published in pvldb Volume 5, Issue 5

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader