Abstract
This article studies I/O-efficient algorithms for the triangle listing problem and the triangle counting problem, whose solutions are basic operators in dealing with many other graph problems. In the former problem, given an undirected graph G, the objective is to find all the cliques involving 3 vertices in G. In the latter problem, the objective is to report just the number of such cliques without having to enumerate them. Both problems have been well studied in internal memory, but still remain as difficult challenges when G does not fit in memory, thus making it crucial to minimize the number of disk I/Os performed. Although previous research has attempted to tackle these challenges, the state-of-the-art solutions rely on a set of crippling assumptions to guarantee good performance. Motivated by this, we develop a new algorithm that is provably I/O and CPU efficient at the same time, without making any assumption on the input G at all. The algorithm uses ideas drastically different from all the previous approaches, and outperforms the existing competitors by a factor of over an order of magnitude in our extensive experimentation.
- Alok Aggarwal and Jeffrey Scott Vitter. 1988. The input/output complexity of sorting and related problems. Comm. ACM 31, 9, 1116--1127. Google ScholarDigital Library
- Noga Alon and Joel H. Spencer. 2000. The Probabilistic Methods 2nd Ed. Wiley, New York.Google Scholar
- Noga Alon, Raphael Yuster, and Uri Zwick. 1997. Finding and counting given length cycles. Algorithmica 17, 3, 209--223.Google ScholarCross Ref
- Eytan Bakshy, Itamar Rosenn, Cameron Marlow, and Lada A. Adamic. 2012. The role of social networks in information diffusion. In Proceedings of the 21st International Conference on World Wide Web (WWW'12). 519--528. Google ScholarDigital Library
- Vladimir Batagelj and Matjaz Zaversnik. 2007. Short cycle connectivity. Discr. Math. 307, 310--318. Google ScholarDigital Library
- Burton H. Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Comm. ACM 13, 7, 422--426. Google ScholarDigital Library
- Vladimir Braverman, Rafail Ostrovsky, and Dan Vilenchik. 2013. How hard is counting triangles in the streaming model? In Proceedings of the 40th International Conference on Automata, Languages, and Programming (ICALP'13). 244--254. Google ScholarDigital Library
- Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-mat: A recursive model for graph mining. In Proceedings of the SIAM International Conference on Data Mining (SDM'04).Google ScholarCross Ref
- James Cheng, Yiping Ke, Ada Wai-Chee Fu, Jeffrey Xu Yu, and Linhong Zhu. 2011. Finding maximal cliques in massive networks. ACM Trans. Database Syst. 36, 4. Google ScholarDigital Library
- Norishige Chiba and Takao Nishizeki. 1985. Arboricity and subgraph listing algorithms. SIAM J. Comput. 14, 1, 210--223. Google ScholarDigital Library
- Shumo Chu and James Cheng. 2011. Triangle listing in massive networks and its applications. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'11). 672--680. Google ScholarDigital Library
- Shumo Chu and James Cheng. 2012. Triangle listing in massive networks. ACM Trans. Knowl. Discov. Data 6, 4, 17. Google ScholarDigital Library
- Jonathan Cohen. 2009. Graph twiddling in a mapreduce world. Comput. Sci. Engin. 11, 4, 29--41. Google ScholarDigital Library
- Don Coppersmith and Shmuel Winograd. 1990. Matrix multiplication via arithmetic progressions. J. Symbol. Comput. 9, 3, 251--280. Google ScholarDigital Library
- Graham Cormode and S. Muthukrishnan. 2005. An improved data stream summary: The count-min sketch and its applications. J. Algor. 55, 1, 58--75. Google ScholarDigital Library
- Roman Dementiev. 2006. Algorithm engineering for large data sets hardware, software, algorithms. http://algo2.iti.kit.edu/documents/DementievDiss.pdf. Google ScholarDigital Library
- David Eppstein and Emma S. Spiro. 2009. The h-index of a graph and its application to dynamic subgraph statistics. In Proceedings of the 11th International Symposium on Algorithms and Data Structures (WADS'09). 278--289. Google ScholarDigital Library
- Philippe Flajolet and G. Nigel Martin. 1983. Probabilistic counting. In Proceedings of the 24th Annual Symposium on Foundations of Computer Science (FOCS'83). 76--82. Google ScholarDigital Library
- Gaurav Goel and Jens Gustedt. 2006. Bounded arboricity to determine the local structure of sparse graphs. In Proceedings of the 32nd International Conference on Graph-Theoretic Concepts in Computer Science (WG'06). 159--167. Google ScholarDigital Library
- Jelle Hellings, George H. L. Fletcher, and Herman J. Haverkort. 2012. Efficient external-memory bisimulation on dags. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'12). 553--564. Google ScholarDigital Library
- Xiaocheng Hu, Yufei Tao, and Chin-Wan Chung. 2013. Massive graph triangulation. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'13). 325--336. Google ScholarDigital Library
- Alon Itai and Michael Rodeh. 1978. Finding a minimum circuit in a graph. SIAM J. Comput. 7, 4, 413--423.Google ScholarDigital Library
- Madhav Jha, C. Seshadhri, and Ali Pinar. 2013. A space efficient streaming algorithm for triangle counting using the birthday paradox. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'13). 589--597. Google ScholarDigital Library
- Mihail N. Kolountzakis, Gary L. Miller, Richard Peng, and Charalampos E. Tsourakakis. 2012. Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Math. 8, 1-2, 161--185.Google ScholarCross Ref
- Matthieu Latapy. 2008. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci. 407, 1--3, 458--473. Google ScholarDigital Library
- Min Chih Lin, Francisco J. Soulignac, and Jayme Luiz Szwarcfiter. 2012. Arboricity, h-index, and dynamic algorithms. Theor. Comput. Sci. 426--427, 75--90. Google ScholarDigital Library
- Bruno Menegola. 2010. An external memory algorithm for listing triangles. Tech. rep. Universidade Federal do Rio Grande do Sul.Google Scholar
- C. St. J. A. Nash-Williams. 1964. Decomposition of finite graphs into forests. J. London Math. Soc. 39, 1, 12.Google Scholar
- Rasmus Pagh and Francesco Silvestri. 2014. The input/output complexity of triangle enumeration. In Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'14). 224--233. Google ScholarDigital Library
- Thomas Schank. 2007. Algorithmic aspects of triangle-based network analysis. Ph.D. dissertation, Universitat Karlsruhe, Fakultat fur Informatik.Google Scholar
- Thomas Schank and Dorothea Wagner. 2005. Finding, counting and listing all triangles in large graphs, an experimental study. In Proceedings of the Workshop on Experimental Algorithms (WEA'05). 606--609. Google ScholarDigital Library
- Siddharth Suri and Sergei Vassilvitskii. 2011. Counting triangles and the curse of the last reducer. In Proceedings of the 20th International Conference on World Wide Web (WWW'11). 607--614. Google ScholarDigital Library
- Chih-Hua Tai, Philip S. Yu, De-Nian Yang, and Ming-Syan Chen. 2011. Privacy-preserving social network publication against friendship attacks. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD'11). 1262--1270. Google ScholarDigital Library
- Jia Wang and James Cheng. 2012. Truss decomposition in massive networks. Proc. VLDB Endow. 5, 9, 812--823. Google ScholarDigital Library
- Nan Wang, Jingbo Zhang, Kian-Lee Tan, and Anthony K. H. Tung. 2010. On triangulation-based dense neighborhood graph discovery. Proc. VLDB Endow. 4, 2, 58--68. Google ScholarDigital Library
- Duncan J. Watts and Steven H. Strogatz. 1998. Collective dynamics of ‘small-world’ networks. Nature 393, 440--442.Google ScholarCross Ref
- Peixiang Zhao, Charu C. Aggarwal, and Min Wang. 2011. gSketch: On query estimation in graph streams. Proc. VLDB Endow. 5, 3, 193--204. Google ScholarDigital Library
Index Terms
- I/O-Efficient Algorithms on Triangle Listing and Counting
Recommendations
Triangle listing in massive networks
Special Issue on the Best of SIGKDD 2011Triangle listing is one of the fundamental algorithmic problems whose solution has numerous applications especially in the analysis of complex networks, such as the computation of clustering coefficients, transitivity, triangular connectivity, trusses, ...
An efficient exact algorithm for triangle listing in large graphs
This paper presents a new efficient exact algorithm for listing triangles in a large graph. While the problem of listing triangles in a graph has been considered before, dealing with large graphs continues to be a challenge. Although previous research ...
Efficient MapReduce algorithms for triangle listing in billion-scale graphs
This paper addresses the classical triangle listing problem, which aims at enumerating all the tuples of three vertices connected with each other by edges. This problem has been intensively studied in internal and external memory, but it is still an ...
Comments