Abstract
We design a space-efficient algorithm that approximates the transitivity (global clustering coefficient) and total triangle count with only a single pass through a graph given as a stream of edges. Our procedure is based on the classic probabilistic result, the birthday paradox. When the transitivity is constant and there are more edges than wedges (common properties for social networks), we can prove that our algorithm requires O(√n) space (n is the number of vertices) to provide accurate estimates. We run a detailed set of experiments on a variety of real graphs and demonstrate that the memory requirement of the algorithm is a tiny fraction of the graph. For example, even for a graph with 200 million edges, our algorithm stores just 40,000 edges to give accurate results. Being a single pass streaming algorithm, our procedure also maintains a real-time estimate of the transitivity/number of triangles of a graph by storing a minuscule fraction of edges.
- Nesreen K. Ahmed, Nick Duffield, Jennifer Neville, and Ramana Kompella. 2014. Graph sample and hold: A framework for big-graph analytics. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). ACM, New York, NY, USA, 1446--1455. DOI: http://dx.doi.org/10.1145/2623330.2623757 Google ScholarDigital Library
- Nesreen K. Ahmed, Jennifer Neville, and Ramana Kompella. 2013. Network sampling: From static to streaming graphs. ACM Transactions on Knowledge Discovery from Data 8, 2, Article No. 7. DOI: http://dx.doi.org/10.1145/2601438 Google ScholarDigital Library
- Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. 2012. Graph sketches: Sparsification, spanners, and subgraphs. In Proceedings of the 31st Symposium on Principles of Database Systems (PODS’12). ACM, New York, NY, 5--14. DOI: http://dx.doi.org/10.1145/2213556.2213560 Google ScholarDigital Library
- Lars Arge, Michael T. Goodrich, and Nodari Sitchinava. 2010. Parallel external memory graph algorithms. In Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing (IPDPS ’10). 1--11. DOI: http://dx.doi.org/10.1109/IPDPS.2010.5470440Google ScholarCross Ref
- Shaikh Arifuzzaman, Maleq Khan, and Madhav Marathe. 2013. PATRIC: A parallel algorithm for counting triangles in massive networks. In Proceedings of the 22nd ACM International Conference on Conference on Information and Knowledge Management (CIKM’13). ACM, New York, NY, 529--538. DOI: http://dx.doi.org/10.1145/2505515.2505545 Google ScholarDigital Library
- Haim Avron. 2010. Counting triangles in large graphs using randomized matrix trace estimation. In Proceedings of the KDD Workshop on Large Scale Data Mining.Google Scholar
- Ziv Bar-Yossef, Ravi Kumar, and D. Sivakumar. 2002. Reductions in streaming algorithms, with an application to counting triangles in graphs. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’02). 623--632. http://dl.acm.org/citation.cfm?id=545381.545464. Google ScholarDigital Library
- Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. 2008. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08). ACM, New York, NY, 16--24. DOI: http://dx.doi.org/10.1145/1401890.1401898 Google ScholarDigital Library
- Jonathan Berry, Luke Fostvedt, Daniel Nordman, Cynthia A. Phillips, and Alyson G. Wilson. 2011. Listing Triangles in Expected Linear Time on Power Law Graphs with Exponent at Least &frac73;. Technical Report SAND2010-4474c. Sandia National Laboratories.Google Scholar
- Jonathan W. Berry, Bruce Hendrickson, Simon Kahan, and Petr Konecny. 2007. Software and algorithms for graph queries on multithreaded architectures. In Proceedings of the 21st International Parallel and Distributed Processing Symposium (IPDPS’07). 1--14. DOI: http://dx.doi.org/10.1109/IPDPS.2007.370685Google ScholarCross Ref
- Luciana S. Buriol, Gereon Frahling, Stefano Leonardi, Alberto Marchetti-Spaccamela, and Christian Sohler. 2006. Counting triangles in data streams. In Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’06). ACM, New York, NY, 253--262. DOI: http://dx.doi.org/10.1145/1142351.1142388 Google ScholarDigital Library
- Ronald S. Burt. 2004. Structural holes and good ideas. American Journal of Sociology 110, 2, 349--399. Available at http://www.jstor.org/stable/10.1086/421787.Google ScholarCross Ref
- Dhruva R. Chakrabarti, Prithviraj Banerjee, Hans-J. Boehm, Pramod G. Joisha, and Robert S. Schreiber. 2011. The runtime abort graph and its application to software transactional memory optimization. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’11). IEEE, Los Alamitos, CA, 42--53. http://dl.acm.org/citation.cfm?id=2190025.2190052. Google ScholarDigital Library
- Yi-Jen Chiang, Michael T. Goodrich, Edward F. Grove, Roberto Tamassia, Darren Erik Vengroff, and Jeffrey Scott Vitter. 1995. External-memory graph algorithms. In Proceedings of the 6th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’95). 139--149. http://dl.acm.org/citation.cfm?id=313651.313681. Google ScholarDigital Library
- Norishige Chiba and Takao Nishizeki. 1985. Arboricity and subgraph listing algorithms. SIAM Journal on Computing 14, 1, 210--223. DOI: http://dx.doi.org/10.1137/0214017 Google ScholarDigital Library
- Shumo Chu and James Cheng. 2011. Triangle listing in massive networks and its applications. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’11). ACM, New York, NY, 672--680. DOI: http://dx.doi.org/10.1145/2020408.2020513 Google ScholarDigital Library
- Jonathan Cohen. 2009. Graph twiddling in a MapReduce world. Computing in Science and Engineering 11, 4, 29--41. DOI: http://dx.doi.org/10.1109/MCSE.2009.120 Google ScholarDigital Library
- James S. Coleman. 1988. Social capital in the creation of human capital. American Journal of Sociology 94, S95--S120. Available at http://www.jstor.org/stable/2780243.Google ScholarCross Ref
- Nurcan Durak, Ali Pinar, Tamara G. Kolda, and C. Seshadhri. 2012. Degree relations of triangles in real-world networks and graph models. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, NY, 1712--1716. DOI: http://dx.doi.org/10.1145/2396761.2398503 Google ScholarDigital Library
- Jean-Pierre Eckmann and Elisha Moses. 2002. Curvature of co-links uncovers hidden thematic layers in the World Wide Web. Proceedings of the National Academy of Sciences 99, 9, 5825--5829. DOI: http://dx.doi.org/10.1073/pnas.032093399Google ScholarCross Ref
- William Feller. 1968. An Introduction to Probability Theory and Applications (Vol. I, 3rd ed.). John Wiley & Sons.Google Scholar
- Brooke Foucault Welles, Anne Van Devender, and Noshir Contractor. 2010. Is a friend a friend? Investigating the structure of friendship networks in virtual worlds. In Proceedings of CHI ’10 Extended Abstracts on Human Factors in Computing Systems (CHI EA’10). ACM, New York, NY, 4027--4032. DOI: http://dx.doi.org/10.1145/1753846.1754097 Google ScholarDigital Library
- Xiaocheng Hu, Yufei Tao, and Chin-Wan Chung. 2013. Massive graph triangulation. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD’13). ACM, New York, NY, 325--336. DOI: http://dx.doi.org/10.1145/2463676.2463704 Google ScholarDigital Library
- Madhav Jha, C. Seshadhri, and Ali Pinar. 2013. A space efficient streaming algorithm for triangle counting using the birthday paradox. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’13). ACM, New York, NY, 589--597. DOI: http://dx.doi.org/10.1145/2487575.2487678 Google ScholarDigital Library
- Hossein Jowhari and Mohammad Ghodsi. 2005. New streaming algorithms for counting triangles in graphs. In Proceedings of the 11th Annual International Conference on Computing and Combinatorics (COCOON ’05). 710--716. DOI: http://dx.doi.org/10.1007/11533719_72 Google ScholarDigital Library
- Daniel M. Kane, Kurt Mehlhorn, Thomas Sauerwald, and He Sun. 2012. Counting arbitrary subgraphs in data streams. In Proceedings of the 39th International Colloquium Conference on Automata, Languages, and Programming—Volume Part II (ICALP’12). 598--609. DOI: http://dx.doi.org/10.1007/978-3-642-31585-5_53 Google ScholarDigital Library
- Tamara G. Kolda, Ali Pinar, Todd Plantenga, C. Seshadhri, and Christine Task. 2014. Counting triangles in massive graphs with MapReduce. SIAM Journal on Scientific Computing 36, 5. DOI: http://dx.doi.org/10.1137/13090729XGoogle ScholarCross Ref
- Mihail N. Kolountzakis, Gary L. Miller, Richard Peng, and Charalampos E. Tsourakakis. 2010. Efficient triangle counting in large graphs via degree-based vertex partitioning. In Proceedings of the 7th International Workshop on Algorithms and Models for the Web-Graph. 15--24. DOI: http://dx.doi.org/10.1007/978-3-642-18009-5_3Google Scholar
- Matthieu Latapy. 2008. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoretical Computer Science 407, 1-3, 458--473. DOI: http://dx.doi.org/10.1016/j.tcs.2008.07.017 Google ScholarDigital Library
- Ron Milo, ShaiShen-Orr, Shalev Itzkovitz, Nadav Kashtan, David Chklovskii, and Uri Alon. 2002. Network motifs: Simple building blocks of complex networks. Science 298, 5594, 824--827.Google Scholar
- Rasmus Pagh and Charalampos E. Tsourakakis. 2012. Colorful triangle counting and a MapReduce implementation. Information Processing Letters 112, 7, 277--281. DOI: http://dx.doi.org/10.1016/j.ipl.2011.12.007 Google ScholarDigital Library
- Byung-Hoon Park, George Ostrouchov, Nagiza F. Samatova, and Al Geist. 2004. Reservoir-based random sampling with replacement from data stream. In SDM, M. W. Berry, U. Dayal, C. Kamath, and D. B. Skillicorn (Eds.). SIAM, 492--496.Google Scholar
- Aduri Pavan, Kanat Tangwongsan, Srikanta Tirthapura, and Kun-Lung Wu. 2013. Counting and sampling triangles from a graph stream. Proceedings of the VLDB Endowment 6, 14, 1870--1881. DOI: http://dx.doi.org/10.14778/2556549.2556569 Google ScholarDigital Library
- Todd Plantenga. 2013. Inexact subgraph isomorphism in MapReduce. Journal of Parallel and Distributed Computing 73, 2, 164--175. DOI: http://dx.doi.org/10.1016/j.jpdc.2012.10.005 Google ScholarDigital Library
- Alejandro Portes. 1998. Social capital: Its origins and applications in modern sociology. Annual Review of Sociology 24, 1, 1--24. DOI: http://dx.doi.org/10.1146/annurev.soc.24.1.1Google ScholarCross Ref
- Thomas Schank and Dorothea Wagner. 2005a. Approximating clustering coefficient and transitivity. Journal of Graph Algorithms and Applications 9, 2, 265--275.Google ScholarCross Ref
- Thomas Schank and Dorothea Wagner. 2005b. Finding, counting and listing all triangles in large graphs, an experimental study. In Experimental and Efficient Algorithms. Springer, Berlin, Heidelberg, 606--609. DOI: http://dx.doi.org/10.1007/11427186_54 Google ScholarDigital Library
- C. Seshadhri, Tamara G. Kolda, and Ali Pinar. 2012. Community structure and scale-free collections of Erdös-Rényi graphs. Physical Review E 85, 5, 056109. DOI: http://dx.doi.org/10.1103/PhysRevE.85.056109Google ScholarCross Ref
- C. Seshadhri, Ali Pinar, and Tamara G. Kolda. 2013a. Triadic measures on graphs: The power of wedge sampling. In Proceedings of the 13th SIAM International Conference on Data Mining. 10--18. DOI: http://dx.doi.org/10.1137/1.9781611972832.2Google Scholar
- C. Seshadhri, Ali Pinar, and Tamara G. Kolda. 2013b. Wedge Sampling for Computing Clustering Coefficients and Triangle Counts on Large Graphs. Retrieved December 27, 2014, from http://arxiv.org/abs/1309.3321.Google Scholar
- SNAP. 2013. Stanford Network Analysis Project. Available at http://snap.stanford.edu/.Google Scholar
- Siddharth Suri and Sergei Vassilvitskii. 2011. Counting triangles and the curse of the last reducer. In Proceedings of the 20th International Conference on World Wide Web (WWW’11). ACM, New York, NY, 607--614. DOI: http://dx.doi.org/10.1145/1963405.1963491 Google ScholarDigital Library
- Kanat Tangwongsan, Aduri Pavan, and Srikanta Tirthapura. 2013. Parallel triangle counting in massive streaming graphs. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management (CIKM’13). ACM, New York, NY, 781--786. DOI: http://dx.doi.org/10.1145/2505515.2505741 Google ScholarDigital Library
- Charalampos E. Tsourakakis. 2008. Fast counting of triangles in large real networks without counting: Algorithms and laws. In Proceedings of the 2008 8th IEEE International Conference on Data Mining (ICDM ’08). IEEE, Los Alamitos, CA, 608--617. DOI: http://dx.doi.org/10.1109/ICDM.2008.72 Google ScholarDigital Library
- Charalampos E. Tsourakakis, Petros Drineas, Eirinaios Michelakis, Ioannis Koutis, and Christos Faloutsos. 2009a. Spectral counting of triangles in power-law networks via element-wise sparsification. In Proceedings of the 2009 International Conference on Advances in Social Network Analysis and Mining (ASONAM’09). 66--71. DOI: http://dx.doi.org/10.1109/ASONAM.2009.32 Google ScholarDigital Library
- Charalampos E. Tsourakakis, U. Kang, Gary L. Miller, and Christos Faloutsos. 2009b. DOULION: Counting triangles in massive graphs with a coin. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09). ACM, New York, NY, 837--846. DOI: http://dx.doi.org/10.1145/1557019.1557111 Google ScholarDigital Library
- Charalampos E. Tsourakakis, Mihail N. Kolountzakis, and Gary L. Miller. 2011. Triangle sparsifiers. Journal of Graph Algorithms and Applications 15, 6, 703--726. DOI: http://dx.doi.org/10.7155/jgaa.00245Google ScholarCross Ref
- Jeffrey S. Vitter. 1985. Random sampling with a reservoir. ACM Transactions on Mathematical Software 11, 1, 37--57. DOI: http://dx.doi.org/10.1145/3147.3165 Google ScholarDigital Library
- Stanley Wasserman and Katherine Faust. 1994. Social Network Analysis: Methods and Applications. Cambridge University Press.Google Scholar
- Jin-Hyun Yoon and Sung-Ryul Kim. 2011. Improved sampling for triangle counting with MapReduce. In Proceedings of the 5th International Conference on Convergence and Hybrid Information Technology. 685--689. http://dl.acm.org/citation.cfm?id=2045005.2045097. Google ScholarDigital Library
Index Terms
- A Space-Efficient Streaming Algorithm for Estimating Transitivity and Triangle Counts Using the Birthday Paradox
Recommendations
A space efficient streaming algorithm for triangle counting using the birthday paradox
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data miningWe design a space efficient algorithm that approximates the transitivity (global clustering coefficient) and total triangle count with only a single pass through a graph given as a stream of edges. Our procedure is based on the classic probabilistic ...
Triangle Counting in Dynamic Graph Streams
Estimating the number of triangles in graph streams using a limited amount of memory has become a popular topic in the last decade. Different variations of the problem have been studied, depending on whether the graph edges are provided in an arbitrary ...
Transitivity on Subclasses of Chordal Graphs
Algorithms and Discrete Applied MathematicsAbstractLet be a graph, where V and E are the vertex and edge sets, respectively. For two disjoint subsets A and B of V, we say AdominatesB if every vertex of B is adjacent to at least one vertex of A in G. A vertex partition of G ...
Comments