ABSTRACT
Massive networks arising in numerous application areas poses significant challenges for network analysts as these networks grow to billions of nodes and are prohibitively large to fit in the main memory. Finding the number of triangles in a network is an important problem in the analysis of complex networks. Several interesting graph mining applications depend on the number of triangles in the graph. In this paper, we present an efficient MPI-based distributed memory parallel algorithm, called PATRIC, for counting triangles in massive networks. PATRIC scales well to networks with billions of nodes and can compute the exact number of triangles in a network with one billion nodes and 10 billion edges in 16 minutes. Balancing computational loads among processors for a graph problem like counting triangles is a challenging issue. We present and analyze several schemes for balancing load among processors for the triangle counting problem. These schemes achieve very good load balancing. We also show how our parallel algorithm can adapt an existing edge sparsification technique to approximate the number of triangles with very high accuracy. This modification allows us to count triangles in even larger networks.
- M. Alam and M. Khan. Efficient algorithms for generating massive random networks. Technical Report 13-064, NDSSL at Virginia Tech, May 2013.Google Scholar
- N. Alon, R. Yuster, and U. Zwick. Finding and counting given length cycles. Algorithmica, 17:209--223, 1997.Google ScholarCross Ref
- C. Apte, B. Liu, E. Pednault, and P. Smyth. Business applications of data mining. Commun. ACM, 45(8):49--53, Aug. 2002. Google ScholarDigital Library
- Z. Bar-Yosseff, R. Kumar, and D. Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. In SODA, pages 623--632, 2002. Google ScholarDigital Library
- A. Barabasi and R. Albert. Emergence of scaling in random networks. Science, 286:509--512, 1999.Google ScholarCross Ref
- C. Barrett, R. Beckman, et al. Generation and analysis of large synthetic social contact networks. In WSC, pages 103--114, 2009. Google ScholarDigital Library
- L. Becchetti, P. Boldi, C. Castillo, and A. Gionis. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In KDD, pages 16--24, 2008. Google ScholarDigital Library
- B. Bollobas. Random Graphs. Cambridge Univ. Press, 2001.Google Scholar
- J. Chen and S. Lonardi. Biological Data Mining. Chapman & Hall/CRC, 1st edition, 2009. Google ScholarDigital Library
- S. Chu and J. Cheng. Triangle listing in massive networks and its applications. In KDD, pages 672--680, 2011. Google ScholarDigital Library
- F. Chung and L. Lu. Complex Graphs and Networks. American Mathematical Society, Aug. 2006. Google ScholarDigital Library
- D. Coppersmith and S. Winograd. Matrix multiplication via arithmetic progressions. In STOC, pages 1--6, 1987. Google ScholarDigital Library
- J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, pages 137--150, 2004. Google ScholarDigital Library
- J. Eckmann and E. Moses. Curvature of co-links uncovers hidden thematic layers in the world wide web. Proc. Natl. Acad. of Sci. USA, 99(9):5825--5829, 2002.Google ScholarCross Ref
- M. Girvan and M. Newman. Community structure in social and biological networks. Proc. Natl. Acad. of Sci. USA, 99(12):7821--7826, June 2002.Google ScholarCross Ref
- M. Jha, C. Seshadhri, and A. Pinar. A space efficient streaming algorithm for triangle counting using the birthday paradox. In KDD, 2013. Google ScholarDigital Library
- H. Kwak, C. Lee, et al. What is twitter, a social network or a news media? In WWW, pages 591--600, 2010. Google ScholarDigital Library
- M. Latapy. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci., 407:458--473, 2008. Google ScholarDigital Library
- M. McPherson, L. Smith-Lovin, and J. Cook. Birds of a feather: Homophily in social networks. Annual Rev. of Soc., 27(1):415--444, 2001.Google ScholarCross Ref
- R. Milo, S. Shen-Orr, et al. Network motifs: simple building blocks of complex networks. Science, 298(5594):824--827, October 2002.Google ScholarCross Ref
- M. Newman. Coauthorship networks and patterns of scientific collaboration. Proc. Natl. Acad. of Sci. USA, 101(1):5200--5205, April 2004.Google ScholarCross Ref
- R. Pagh and C. Tsourakakis. Colorful triangle counting and a mapreduce implementation. Information Processing Letters, 112(7):277--281, 2012. Google ScholarDigital Library
- T. Schank. Algorithmic Aspects of Triangle-Based Network Analysis. PhD thesis, University of Karlsruhe, 2007.Google Scholar
- T. Schank and D. Wagner. Finding, counting and listing all triangles in large graphs, an experimental study. In Exp. and Efficient Algorithms, 2005. Google ScholarDigital Library
- C. Seshadhri, A. Pinar, and T. Kolda. Triadic measures on graphs: the power of wedge sampling. In SDM, pages 10--18, 2013.Google ScholarCross Ref
- SNAP. Stanford network analysis project. http://snap.stanford.edu/, 2012.Google Scholar
- S. Suri and S. Vassilvitskii. Counting triangles and the curse of the last reducer. In WWW, pages 607--614, 2011. Google ScholarDigital Library
- C. Tsourakakis, U. Kang, G. Miller, and C. Faloutsos. Doulion: counting triangles in massive graphs with a coin. In KDD, pages 837--846, 2009. Google ScholarDigital Library
Index Terms
- PATRIC: a parallel algorithm for counting triangles in massive networks
Recommendations
Fast Parallel Algorithms for Counting and Listing Triangles in Big Graphs
Big graphs (networks) arising in numerous application areas pose significant challengesfor graph analysts as these graphs grow to billions of nodes and edges and are prohibitively large to fit in the main memory. Finding the number of triangles in a ...
A Space-Efficient Parallel Algorithm for Counting Exact Triangles in Massive Networks
HPCC-CSS-ICESS '15: Proceedings of the 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conf on Embedded Software and SystemsFinding the number of triangles in a network (graph) is an important problem in mining and analysis of complex networks. Massive networks emerging from numerous application areas pose a significant challenge in network analytics since these networks ...
Fast Parallel Algorithms for Edge-Switching to Achieve a Target Visit Rate in Heterogeneous Graphs
BRACIS '14: Proceedings of the 2014 Brazilian Conference on Intelligent SystemsAn edge switch is an operation on a network (graph) where two edges are selected randomly and one of their end vertices are swapped with each other. Usually, a sequence of these operations are performed to generate network perturbations having the same ...
Comments