skip to main content
research-article

I/O-Efficient Algorithms on Triangle Listing and Counting

Authors Info & Claims
Published:30 December 2014Publication History
Skip Abstract Section

Abstract

This article studies I/O-efficient algorithms for the triangle listing problem and the triangle counting problem, whose solutions are basic operators in dealing with many other graph problems. In the former problem, given an undirected graph G, the objective is to find all the cliques involving 3 vertices in G. In the latter problem, the objective is to report just the number of such cliques without having to enumerate them. Both problems have been well studied in internal memory, but still remain as difficult challenges when G does not fit in memory, thus making it crucial to minimize the number of disk I/Os performed. Although previous research has attempted to tackle these challenges, the state-of-the-art solutions rely on a set of crippling assumptions to guarantee good performance. Motivated by this, we develop a new algorithm that is provably I/O and CPU efficient at the same time, without making any assumption on the input G at all. The algorithm uses ideas drastically different from all the previous approaches, and outperforms the existing competitors by a factor of over an order of magnitude in our extensive experimentation.

References

  1. Alok Aggarwal and Jeffrey Scott Vitter. 1988. The input/output complexity of sorting and related problems. Comm. ACM 31, 9, 1116--1127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Noga Alon and Joel H. Spencer. 2000. The Probabilistic Methods 2nd Ed. Wiley, New York.Google ScholarGoogle Scholar
  3. Noga Alon, Raphael Yuster, and Uri Zwick. 1997. Finding and counting given length cycles. Algorithmica 17, 3, 209--223.Google ScholarGoogle ScholarCross RefCross Ref
  4. Eytan Bakshy, Itamar Rosenn, Cameron Marlow, and Lada A. Adamic. 2012. The role of social networks in information diffusion. In Proceedings of the 21st International Conference on World Wide Web (WWW'12). 519--528. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Vladimir Batagelj and Matjaz Zaversnik. 2007. Short cycle connectivity. Discr. Math. 307, 310--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Burton H. Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Comm. ACM 13, 7, 422--426. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Vladimir Braverman, Rafail Ostrovsky, and Dan Vilenchik. 2013. How hard is counting triangles in the streaming model? In Proceedings of the 40th International Conference on Automata, Languages, and Programming (ICALP'13). 244--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-mat: A recursive model for graph mining. In Proceedings of the SIAM International Conference on Data Mining (SDM'04).Google ScholarGoogle ScholarCross RefCross Ref
  9. James Cheng, Yiping Ke, Ada Wai-Chee Fu, Jeffrey Xu Yu, and Linhong Zhu. 2011. Finding maximal cliques in massive networks. ACM Trans. Database Syst. 36, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Norishige Chiba and Takao Nishizeki. 1985. Arboricity and subgraph listing algorithms. SIAM J. Comput. 14, 1, 210--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Shumo Chu and James Cheng. 2011. Triangle listing in massive networks and its applications. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'11). 672--680. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Shumo Chu and James Cheng. 2012. Triangle listing in massive networks. ACM Trans. Knowl. Discov. Data 6, 4, 17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jonathan Cohen. 2009. Graph twiddling in a mapreduce world. Comput. Sci. Engin. 11, 4, 29--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Don Coppersmith and Shmuel Winograd. 1990. Matrix multiplication via arithmetic progressions. J. Symbol. Comput. 9, 3, 251--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Graham Cormode and S. Muthukrishnan. 2005. An improved data stream summary: The count-min sketch and its applications. J. Algor. 55, 1, 58--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Roman Dementiev. 2006. Algorithm engineering for large data sets hardware, software, algorithms. http://algo2.iti.kit.edu/documents/DementievDiss.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. David Eppstein and Emma S. Spiro. 2009. The h-index of a graph and its application to dynamic subgraph statistics. In Proceedings of the 11th International Symposium on Algorithms and Data Structures (WADS'09). 278--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Philippe Flajolet and G. Nigel Martin. 1983. Probabilistic counting. In Proceedings of the 24th Annual Symposium on Foundations of Computer Science (FOCS'83). 76--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Gaurav Goel and Jens Gustedt. 2006. Bounded arboricity to determine the local structure of sparse graphs. In Proceedings of the 32nd International Conference on Graph-Theoretic Concepts in Computer Science (WG'06). 159--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jelle Hellings, George H. L. Fletcher, and Herman J. Haverkort. 2012. Efficient external-memory bisimulation on dags. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'12). 553--564. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Xiaocheng Hu, Yufei Tao, and Chin-Wan Chung. 2013. Massive graph triangulation. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'13). 325--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Alon Itai and Michael Rodeh. 1978. Finding a minimum circuit in a graph. SIAM J. Comput. 7, 4, 413--423.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Madhav Jha, C. Seshadhri, and Ali Pinar. 2013. A space efficient streaming algorithm for triangle counting using the birthday paradox. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'13). 589--597. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Mihail N. Kolountzakis, Gary L. Miller, Richard Peng, and Charalampos E. Tsourakakis. 2012. Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Math. 8, 1-2, 161--185.Google ScholarGoogle ScholarCross RefCross Ref
  25. Matthieu Latapy. 2008. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci. 407, 1--3, 458--473. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Min Chih Lin, Francisco J. Soulignac, and Jayme Luiz Szwarcfiter. 2012. Arboricity, h-index, and dynamic algorithms. Theor. Comput. Sci. 426--427, 75--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Bruno Menegola. 2010. An external memory algorithm for listing triangles. Tech. rep. Universidade Federal do Rio Grande do Sul.Google ScholarGoogle Scholar
  28. C. St. J. A. Nash-Williams. 1964. Decomposition of finite graphs into forests. J. London Math. Soc. 39, 1, 12.Google ScholarGoogle Scholar
  29. Rasmus Pagh and Francesco Silvestri. 2014. The input/output complexity of triangle enumeration. In Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'14). 224--233. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Thomas Schank. 2007. Algorithmic aspects of triangle-based network analysis. Ph.D. dissertation, Universitat Karlsruhe, Fakultat fur Informatik.Google ScholarGoogle Scholar
  31. Thomas Schank and Dorothea Wagner. 2005. Finding, counting and listing all triangles in large graphs, an experimental study. In Proceedings of the Workshop on Experimental Algorithms (WEA'05). 606--609. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Siddharth Suri and Sergei Vassilvitskii. 2011. Counting triangles and the curse of the last reducer. In Proceedings of the 20th International Conference on World Wide Web (WWW'11). 607--614. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Chih-Hua Tai, Philip S. Yu, De-Nian Yang, and Ming-Syan Chen. 2011. Privacy-preserving social network publication against friendship attacks. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD'11). 1262--1270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jia Wang and James Cheng. 2012. Truss decomposition in massive networks. Proc. VLDB Endow. 5, 9, 812--823. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Nan Wang, Jingbo Zhang, Kian-Lee Tan, and Anthony K. H. Tung. 2010. On triangulation-based dense neighborhood graph discovery. Proc. VLDB Endow. 4, 2, 58--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Duncan J. Watts and Steven H. Strogatz. 1998. Collective dynamics of ‘small-world’ networks. Nature 393, 440--442.Google ScholarGoogle ScholarCross RefCross Ref
  37. Peixiang Zhao, Charu C. Aggarwal, and Min Wang. 2011. gSketch: On query estimation in graph streams. Proc. VLDB Endow. 5, 3, 193--204. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. I/O-Efficient Algorithms on Triangle Listing and Counting

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Database Systems
      ACM Transactions on Database Systems  Volume 39, Issue 4
      Invited Articles Issue, SIGMOD 2013, PODS 2013 and ICDT 2013
      December 2014
      341 pages
      ISSN:0362-5915
      EISSN:1557-4644
      DOI:10.1145/2691190
      Issue’s Table of Contents

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 30 December 2014
      • Accepted: 1 November 2014
      • Revised: 1 October 2014
      • Received: 1 September 2013
      Published in tods Volume 39, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader