skip to main content
research-article

A Space-Efficient Streaming Algorithm for Estimating Transitivity and Triangle Counts Using the Birthday Paradox

Published:17 February 2015Publication History
Skip Abstract Section

Abstract

We design a space-efficient algorithm that approximates the transitivity (global clustering coefficient) and total triangle count with only a single pass through a graph given as a stream of edges. Our procedure is based on the classic probabilistic result, the birthday paradox. When the transitivity is constant and there are more edges than wedges (common properties for social networks), we can prove that our algorithm requires O(√n) space (n is the number of vertices) to provide accurate estimates. We run a detailed set of experiments on a variety of real graphs and demonstrate that the memory requirement of the algorithm is a tiny fraction of the graph. For example, even for a graph with 200 million edges, our algorithm stores just 40,000 edges to give accurate results. Being a single pass streaming algorithm, our procedure also maintains a real-time estimate of the transitivity/number of triangles of a graph by storing a minuscule fraction of edges.

References

  1. Nesreen K. Ahmed, Nick Duffield, Jennifer Neville, and Ramana Kompella. 2014. Graph sample and hold: A framework for big-graph analytics. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). ACM, New York, NY, USA, 1446--1455. DOI: http://dx.doi.org/10.1145/2623330.2623757 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Nesreen K. Ahmed, Jennifer Neville, and Ramana Kompella. 2013. Network sampling: From static to streaming graphs. ACM Transactions on Knowledge Discovery from Data 8, 2, Article No. 7. DOI: http://dx.doi.org/10.1145/2601438 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. 2012. Graph sketches: Sparsification, spanners, and subgraphs. In Proceedings of the 31st Symposium on Principles of Database Systems (PODS’12). ACM, New York, NY, 5--14. DOI: http://dx.doi.org/10.1145/2213556.2213560 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Lars Arge, Michael T. Goodrich, and Nodari Sitchinava. 2010. Parallel external memory graph algorithms. In Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing (IPDPS ’10). 1--11. DOI: http://dx.doi.org/10.1109/IPDPS.2010.5470440Google ScholarGoogle ScholarCross RefCross Ref
  5. Shaikh Arifuzzaman, Maleq Khan, and Madhav Marathe. 2013. PATRIC: A parallel algorithm for counting triangles in massive networks. In Proceedings of the 22nd ACM International Conference on Conference on Information and Knowledge Management (CIKM’13). ACM, New York, NY, 529--538. DOI: http://dx.doi.org/10.1145/2505515.2505545 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Haim Avron. 2010. Counting triangles in large graphs using randomized matrix trace estimation. In Proceedings of the KDD Workshop on Large Scale Data Mining.Google ScholarGoogle Scholar
  7. Ziv Bar-Yossef, Ravi Kumar, and D. Sivakumar. 2002. Reductions in streaming algorithms, with an application to counting triangles in graphs. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’02). 623--632. http://dl.acm.org/citation.cfm?id=545381.545464. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. 2008. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08). ACM, New York, NY, 16--24. DOI: http://dx.doi.org/10.1145/1401890.1401898 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jonathan Berry, Luke Fostvedt, Daniel Nordman, Cynthia A. Phillips, and Alyson G. Wilson. 2011. Listing Triangles in Expected Linear Time on Power Law Graphs with Exponent at Least &frac73;. Technical Report SAND2010-4474c. Sandia National Laboratories.Google ScholarGoogle Scholar
  10. Jonathan W. Berry, Bruce Hendrickson, Simon Kahan, and Petr Konecny. 2007. Software and algorithms for graph queries on multithreaded architectures. In Proceedings of the 21st International Parallel and Distributed Processing Symposium (IPDPS’07). 1--14. DOI: http://dx.doi.org/10.1109/IPDPS.2007.370685Google ScholarGoogle ScholarCross RefCross Ref
  11. Luciana S. Buriol, Gereon Frahling, Stefano Leonardi, Alberto Marchetti-Spaccamela, and Christian Sohler. 2006. Counting triangles in data streams. In Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’06). ACM, New York, NY, 253--262. DOI: http://dx.doi.org/10.1145/1142351.1142388 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ronald S. Burt. 2004. Structural holes and good ideas. American Journal of Sociology 110, 2, 349--399. Available at http://www.jstor.org/stable/10.1086/421787.Google ScholarGoogle ScholarCross RefCross Ref
  13. Dhruva R. Chakrabarti, Prithviraj Banerjee, Hans-J. Boehm, Pramod G. Joisha, and Robert S. Schreiber. 2011. The runtime abort graph and its application to software transactional memory optimization. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’11). IEEE, Los Alamitos, CA, 42--53. http://dl.acm.org/citation.cfm?id=2190025.2190052. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Yi-Jen Chiang, Michael T. Goodrich, Edward F. Grove, Roberto Tamassia, Darren Erik Vengroff, and Jeffrey Scott Vitter. 1995. External-memory graph algorithms. In Proceedings of the 6th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’95). 139--149. http://dl.acm.org/citation.cfm?id=313651.313681. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Norishige Chiba and Takao Nishizeki. 1985. Arboricity and subgraph listing algorithms. SIAM Journal on Computing 14, 1, 210--223. DOI: http://dx.doi.org/10.1137/0214017 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Shumo Chu and James Cheng. 2011. Triangle listing in massive networks and its applications. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’11). ACM, New York, NY, 672--680. DOI: http://dx.doi.org/10.1145/2020408.2020513 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jonathan Cohen. 2009. Graph twiddling in a MapReduce world. Computing in Science and Engineering 11, 4, 29--41. DOI: http://dx.doi.org/10.1109/MCSE.2009.120 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. James S. Coleman. 1988. Social capital in the creation of human capital. American Journal of Sociology 94, S95--S120. Available at http://www.jstor.org/stable/2780243.Google ScholarGoogle ScholarCross RefCross Ref
  19. Nurcan Durak, Ali Pinar, Tamara G. Kolda, and C. Seshadhri. 2012. Degree relations of triangles in real-world networks and graph models. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, NY, 1712--1716. DOI: http://dx.doi.org/10.1145/2396761.2398503 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jean-Pierre Eckmann and Elisha Moses. 2002. Curvature of co-links uncovers hidden thematic layers in the World Wide Web. Proceedings of the National Academy of Sciences 99, 9, 5825--5829. DOI: http://dx.doi.org/10.1073/pnas.032093399Google ScholarGoogle ScholarCross RefCross Ref
  21. William Feller. 1968. An Introduction to Probability Theory and Applications (Vol. I, 3rd ed.). John Wiley & Sons.Google ScholarGoogle Scholar
  22. Brooke Foucault Welles, Anne Van Devender, and Noshir Contractor. 2010. Is a friend a friend? Investigating the structure of friendship networks in virtual worlds. In Proceedings of CHI ’10 Extended Abstracts on Human Factors in Computing Systems (CHI EA’10). ACM, New York, NY, 4027--4032. DOI: http://dx.doi.org/10.1145/1753846.1754097 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Xiaocheng Hu, Yufei Tao, and Chin-Wan Chung. 2013. Massive graph triangulation. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD’13). ACM, New York, NY, 325--336. DOI: http://dx.doi.org/10.1145/2463676.2463704 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Madhav Jha, C. Seshadhri, and Ali Pinar. 2013. A space efficient streaming algorithm for triangle counting using the birthday paradox. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’13). ACM, New York, NY, 589--597. DOI: http://dx.doi.org/10.1145/2487575.2487678 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Hossein Jowhari and Mohammad Ghodsi. 2005. New streaming algorithms for counting triangles in graphs. In Proceedings of the 11th Annual International Conference on Computing and Combinatorics (COCOON ’05). 710--716. DOI: http://dx.doi.org/10.1007/11533719_72 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Daniel M. Kane, Kurt Mehlhorn, Thomas Sauerwald, and He Sun. 2012. Counting arbitrary subgraphs in data streams. In Proceedings of the 39th International Colloquium Conference on Automata, Languages, and Programming—Volume Part II (ICALP’12). 598--609. DOI: http://dx.doi.org/10.1007/978-3-642-31585-5_53 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Tamara G. Kolda, Ali Pinar, Todd Plantenga, C. Seshadhri, and Christine Task. 2014. Counting triangles in massive graphs with MapReduce. SIAM Journal on Scientific Computing 36, 5. DOI: http://dx.doi.org/10.1137/13090729XGoogle ScholarGoogle ScholarCross RefCross Ref
  28. Mihail N. Kolountzakis, Gary L. Miller, Richard Peng, and Charalampos E. Tsourakakis. 2010. Efficient triangle counting in large graphs via degree-based vertex partitioning. In Proceedings of the 7th International Workshop on Algorithms and Models for the Web-Graph. 15--24. DOI: http://dx.doi.org/10.1007/978-3-642-18009-5_3Google ScholarGoogle Scholar
  29. Matthieu Latapy. 2008. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoretical Computer Science 407, 1-3, 458--473. DOI: http://dx.doi.org/10.1016/j.tcs.2008.07.017 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ron Milo, ShaiShen-Orr, Shalev Itzkovitz, Nadav Kashtan, David Chklovskii, and Uri Alon. 2002. Network motifs: Simple building blocks of complex networks. Science 298, 5594, 824--827.Google ScholarGoogle Scholar
  31. Rasmus Pagh and Charalampos E. Tsourakakis. 2012. Colorful triangle counting and a MapReduce implementation. Information Processing Letters 112, 7, 277--281. DOI: http://dx.doi.org/10.1016/j.ipl.2011.12.007 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Byung-Hoon Park, George Ostrouchov, Nagiza F. Samatova, and Al Geist. 2004. Reservoir-based random sampling with replacement from data stream. In SDM, M. W. Berry, U. Dayal, C. Kamath, and D. B. Skillicorn (Eds.). SIAM, 492--496.Google ScholarGoogle Scholar
  33. Aduri Pavan, Kanat Tangwongsan, Srikanta Tirthapura, and Kun-Lung Wu. 2013. Counting and sampling triangles from a graph stream. Proceedings of the VLDB Endowment 6, 14, 1870--1881. DOI: http://dx.doi.org/10.14778/2556549.2556569 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Todd Plantenga. 2013. Inexact subgraph isomorphism in MapReduce. Journal of Parallel and Distributed Computing 73, 2, 164--175. DOI: http://dx.doi.org/10.1016/j.jpdc.2012.10.005 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Alejandro Portes. 1998. Social capital: Its origins and applications in modern sociology. Annual Review of Sociology 24, 1, 1--24. DOI: http://dx.doi.org/10.1146/annurev.soc.24.1.1Google ScholarGoogle ScholarCross RefCross Ref
  36. Thomas Schank and Dorothea Wagner. 2005a. Approximating clustering coefficient and transitivity. Journal of Graph Algorithms and Applications 9, 2, 265--275.Google ScholarGoogle ScholarCross RefCross Ref
  37. Thomas Schank and Dorothea Wagner. 2005b. Finding, counting and listing all triangles in large graphs, an experimental study. In Experimental and Efficient Algorithms. Springer, Berlin, Heidelberg, 606--609. DOI: http://dx.doi.org/10.1007/11427186_54 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. C. Seshadhri, Tamara G. Kolda, and Ali Pinar. 2012. Community structure and scale-free collections of Erdös-Rényi graphs. Physical Review E 85, 5, 056109. DOI: http://dx.doi.org/10.1103/PhysRevE.85.056109Google ScholarGoogle ScholarCross RefCross Ref
  39. C. Seshadhri, Ali Pinar, and Tamara G. Kolda. 2013a. Triadic measures on graphs: The power of wedge sampling. In Proceedings of the 13th SIAM International Conference on Data Mining. 10--18. DOI: http://dx.doi.org/10.1137/1.9781611972832.2Google ScholarGoogle Scholar
  40. C. Seshadhri, Ali Pinar, and Tamara G. Kolda. 2013b. Wedge Sampling for Computing Clustering Coefficients and Triangle Counts on Large Graphs. Retrieved December 27, 2014, from http://arxiv.org/abs/1309.3321.Google ScholarGoogle Scholar
  41. SNAP. 2013. Stanford Network Analysis Project. Available at http://snap.stanford.edu/.Google ScholarGoogle Scholar
  42. Siddharth Suri and Sergei Vassilvitskii. 2011. Counting triangles and the curse of the last reducer. In Proceedings of the 20th International Conference on World Wide Web (WWW’11). ACM, New York, NY, 607--614. DOI: http://dx.doi.org/10.1145/1963405.1963491 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Kanat Tangwongsan, Aduri Pavan, and Srikanta Tirthapura. 2013. Parallel triangle counting in massive streaming graphs. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management (CIKM’13). ACM, New York, NY, 781--786. DOI: http://dx.doi.org/10.1145/2505515.2505741 Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Charalampos E. Tsourakakis. 2008. Fast counting of triangles in large real networks without counting: Algorithms and laws. In Proceedings of the 2008 8th IEEE International Conference on Data Mining (ICDM ’08). IEEE, Los Alamitos, CA, 608--617. DOI: http://dx.doi.org/10.1109/ICDM.2008.72 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Charalampos E. Tsourakakis, Petros Drineas, Eirinaios Michelakis, Ioannis Koutis, and Christos Faloutsos. 2009a. Spectral counting of triangles in power-law networks via element-wise sparsification. In Proceedings of the 2009 International Conference on Advances in Social Network Analysis and Mining (ASONAM’09). 66--71. DOI: http://dx.doi.org/10.1109/ASONAM.2009.32 Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Charalampos E. Tsourakakis, U. Kang, Gary L. Miller, and Christos Faloutsos. 2009b. DOULION: Counting triangles in massive graphs with a coin. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09). ACM, New York, NY, 837--846. DOI: http://dx.doi.org/10.1145/1557019.1557111 Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Charalampos E. Tsourakakis, Mihail N. Kolountzakis, and Gary L. Miller. 2011. Triangle sparsifiers. Journal of Graph Algorithms and Applications 15, 6, 703--726. DOI: http://dx.doi.org/10.7155/jgaa.00245Google ScholarGoogle ScholarCross RefCross Ref
  48. Jeffrey S. Vitter. 1985. Random sampling with a reservoir. ACM Transactions on Mathematical Software 11, 1, 37--57. DOI: http://dx.doi.org/10.1145/3147.3165 Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Stanley Wasserman and Katherine Faust. 1994. Social Network Analysis: Methods and Applications. Cambridge University Press.Google ScholarGoogle Scholar
  50. Jin-Hyun Yoon and Sung-Ryul Kim. 2011. Improved sampling for triangle counting with MapReduce. In Proceedings of the 5th International Conference on Convergence and Hybrid Information Technology. 685--689. http://dl.acm.org/citation.cfm?id=2045005.2045097. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Space-Efficient Streaming Algorithm for Estimating Transitivity and Triangle Counts Using the Birthday Paradox

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                • Published in

                  cover image ACM Transactions on Knowledge Discovery from Data
                  ACM Transactions on Knowledge Discovery from Data  Volume 9, Issue 3
                  TKDD Special Issue (SIGKDD'13)
                  April 2015
                  313 pages
                  ISSN:1556-4681
                  EISSN:1556-472X
                  DOI:10.1145/2737800
                  Issue’s Table of Contents

                  Copyright © 2015 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 17 February 2015
                  • Accepted: 1 September 2014
                  • Revised: 1 August 2014
                  • Received: 1 October 2013
                  Published in tkdd Volume 9, Issue 3

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article
                  • Research
                  • Refereed

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader