skip to main content
10.1145/2463676.2463704acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Massive graph triangulation

Authors Info & Claims
Published:22 June 2013Publication History

ABSTRACT

This paper studies I/O-efficient algorithms for settling the classic triangle listing problem, whose solution is a basic operator in dealing with many other graph problems. Specifically, given an undirected graph G, the objective of triangle listing is to find all the cliques involving 3 vertices in G. The problem has been well studied in internal memory, but remains an urgent difficult challenge when G does not fit in memory, rendering any algorithm to entail frequent I/O accesses. Although previous research has attempted to tackle the challenge, the state-of-the-art solutions rely on a set of crippling assumptions to guarantee good performance. Motivated by this, we develop a new algorithm that is provably I/O and CPU efficient at the same time, without making any assumption on the input G at all. The algorithm uses ideas drastically different from all the previous approaches, and outperformed the existing competitors by a factor over an order of magnitude in our extensive experimentation.

References

  1. A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. CACM, 31(9):1116--1127, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Alon and J. H. Spencer. The Probabilistic Methods. Wiley, New York, 2nd edition, 2000.Google ScholarGoogle Scholar
  3. E. Bakshy, I. Rosenn, C. Marlow, and L. A. Adamic. The role of social networks in information diffusion. CoRR, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. V. Batagelj and M. Zaversnik. Short cycle connectivity. Discrete Mathematics, 307:310--318, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-MAT: A recursive model for graph mining. In SDM, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  6. J. Cheng, Y. Ke, A. W.-C. Fu, J. X. Yu, and L. Zhu. Finding maximal cliques in massive networks. TODS, 36(4):21, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. N. Chiba and T. Nishizeki. Arboricity and subgraph listing algorithms. SIAM J. of Comp., 14(1):210--223, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Chu and J. Cheng. Triangle listing in massive networks and its applications. In SIGKDD, pages 672--680, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Chu and J. Cheng. Triangle listing in massive networks. TKDD, 6(4):17, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Cohen. Graph twiddling in a mapreduce world. Computing in Science and Engineering, 11(4):29--41, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Dementiev. Algorithm engineering for large data sets hardware, software, algorithms. PhD thesis, Saarland University, 2006.Google ScholarGoogle Scholar
  12. D. Eppstein and E. S. Spiro. The h-index of a graph and its application to dynamic subgraph statistics. In WADS, pages 278--289, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Hellings, G. H. L. Fletcher, and H. J. Haverkort. Efficient external-memory bisimulation on dags. In SIGMOD, pages 553--564, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Itai and M. Rodeh. Finding a minimum circuit in a graph. SIAM J. of Comp., 7(4):413--423, 1978.Google ScholarGoogle ScholarCross RefCross Ref
  15. M. Latapy. Main-memory triangle computations for very large (sparse (power-law)) graphs. TCC, 407(1-3):458--473, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. C. Lin, F. J. Soulignac, and J. L. Szwarcfiter. Arboricity, h-index, and dynamic algorithms. TCC, 426:75--90, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Menegola. An external memory algorithm for listing triangles. Technical report, Universidade Federal do Rio Grande do Sul, 2010.Google ScholarGoogle Scholar
  18. C. S. J. A. Nash-Williams. Decomposition of finite graphs into forests. Journal of the London Mathematical Society, 39(1):12, 1964.Google ScholarGoogle ScholarCross RefCross Ref
  19. T. Schank. Algorithmic Aspects of Triangle-Based Network Analysis. PhD thesis, Universitat Karlsruhe, Fakultat fur Informatik, 2007.Google ScholarGoogle Scholar
  20. T. Schank and D. Wagner. Finding, counting and listing all triangles in large graphs, an experimental study. In Workshop on Experimental Algorithms (WEA), pages 606--609, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C.-H. Tai, P. S. Yu, D.-N. Yang, and M.-S. Chen. Privacy-preserving social network publication against friendship attacks. In SIGKDD, pages 1262--1270, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Wang and J. Cheng. Truss decomposition in massive networks. PVLDB, 5(9):812--823, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. N. Wang, J. Zhang, K.-L. Tan, and A. K. H. Tung. On triangulation-based dense neighborhood graph discovery. PVLDB, 4(2):58--68, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. J. Watts and S. H. Strogatz. Collective dynamics of 'small-world' networks. Nature, 393:440--442, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  25. P. Zhao, C. C. Aggarwal, and M. Wang. gsketch: On query estimation in graph streams. PVLDB, 5(3):193--204, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Massive graph triangulation

    Recommendations

    Reviews

    Amitabha Roy

    A triangle of an undirected graph is simply three vertices of the graph, u , v , and w , which are also connected by the edges ( u , v ), ( v , w ), and ( w , u ). Spatially, this arrangement looks like a triangle. Listing all of the triangles of a graph has a number of important applications in network analysis and knowledge discovery. For example, vertices with a high triangle count are usually extremely significant in social networks, such as Twitter, often associated with people who act as central figures in communities. This 2013 SIGMOD Best Paper Award winner deals with the problem of counting triangles in graphs that are too large to be stored entirely in the main memory. The algorithm uses as input a graph whose edge list is sorted by source vertex, thereby clustering vertices of an edge together. The edge list is loaded sequentially, one chunk of edges at a time, into memory. This chunk is then compared to a scan of the entire edge list. The process identifies all triangles with one edge in the loaded chunk (say ( u , v ) in the example above) and the opposite vertex ( w in the example above). This paper provides an algorithmic complexity analysis of both data movement from disk as well as of work done in the memory, showing that both are done efficiently by connecting the amount of work done to the arboricity of the graph. The paper is a must-read for both theorists and practitioners in the large-scale graph processing area. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
      June 2013
      1322 pages
      ISBN:9781450320375
      DOI:10.1145/2463676

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 June 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SIGMOD '13 Paper Acceptance Rate76of372submissions,20%Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader