skip to main content
10.1145/2487575.2487678acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

A space efficient streaming algorithm for triangle counting using the birthday paradox

Authors Info & Claims
Published:11 August 2013Publication History

ABSTRACT

We design a space efficient algorithm that approximates the transitivity (global clustering coefficient) and total triangle count with only a single pass through a graph given as a stream of edges. Our procedure is based on the classic probabilistic result, the birthday paradox. When the transitivity is constant and there are more edges than wedges (common properties for social networks), we can prove that our algorithm requires O(√n) space (n is the number of vertices) to provide accurate estimates. We run a detailed set of experiments on a variety of real graphs and demonstrate that the memory requirement of the algorithm is a tiny fraction of the graph. For example, even for a graph with 200 million edges, our algorithm stores just 60,000 edges to give accurate results. Being a single pass streaming algorithm, our procedure also maintains a real-time estimate of the transitivity/number of triangles of a graph, by storing a miniscule fraction of edges.

References

  1. K. J. Ahn, S. Guha, and A. McGregor. Graph sketches: sparsification, spanners, and subgraphs. In Principles of Database Systems, pages 5--14, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Alon, T. Kaufman, and M. Krivelevich. Testing triangle-freeness in general graphs. In Proceedings of the 17th Annual Symposium on Discrete Algorithms (SODA), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Arge, M. Goodrich, and N. Sitchinava. Parallel external memory graph algorithms. In Parallel Distributed Processing Symposium (IPDPS), pages 1--11, april 2010.Google ScholarGoogle ScholarCross RefCross Ref
  4. S. M. Arifuzzaman, M. Khan, and M. Marathe. Patric: A parallel algorithm for counting triangles and computing clustering coefficients in massive networks. Technical Report 12-042, NDSSL, 2012.Google ScholarGoogle Scholar
  5. H. Avron. Counting triangles in large graphs using randomized matrix trace estimation. In KDD workshon Large Scale Data Mining, 2010.Google ScholarGoogle Scholar
  6. Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. In Symposium of Discrete Algorith,s, pages 623--632, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. Becchetti, P. Boldi, C. Castillo, and A. Gionis. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In Knowledge Data and Discovery (KDD), pages 16--24, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Berry, L. Fosvedt, D. Nordman, C. A. Phillips, and A. G. Wilson. Listing triangles in expected linear time on power law graphs with exponent at least 7 over 3. Technical Report SAND2010-4474c, Sandia National Laboratories, 2011.Google ScholarGoogle Scholar
  9. J. Berry, B. Hendrickson, S. Kahan, and P. Konecny. Software and algorithms for graph queries on multithreaded architectures. In Parallel and Distributed Processing Symposium (IPDPS), pages 1--14, march 2007.Google ScholarGoogle ScholarCross RefCross Ref
  10. L. S. Buriol, G. Frahling, S. Leonardi, A. Marchetti-Spaccamela, and C. Sohler. Counting triangles in data streams. In Principles of Database Systems, pages 253--262, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. S. Burt. Structural holes and good ideas. American Journal of Sociology, 110(2):349--399, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  12. D. R. Chakrabarti, P. Banerjee, H.-J. Boehm, P. G. Joisha, and R. S. Schreiber. The runtime abort graph and its application to software transactional memory optimization. In International Symposium on Code Generation and Optimization, pages 42--53, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y.-J. Chiang, M. T. Goodrich, E. F. Grove, R. Tamassia, D. E. Vengroff, and J. S. Vitter. External-memory graph algorithms. In Symposium on Discrete Algorithms (SODA), pages 139--149, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. N. Chiba and T. Nishizeki. Arboricity and subgraph listing algorithms. SIAM J. Comput., 14:210--223, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Chu and J. Cheng. Triangle listing in massive networks and its applications. In Knowledge Data and Discovery (KDD), pages 672--680, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Cohen. Graph twiddling in a MapReduce world. Computing in Science & Engineering, 11:29--41, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. S. Coleman. Social capital in the creation of human capital. American Journal of Sociology, 94:S95-S120, 1988.Google ScholarGoogle ScholarCross RefCross Ref
  18. J.-P. Eckmann and E. Moses. Curvature of co-links uncovers hidden thematic layers in the World Wide Web. Proceedings of the National Academy of Sciences (PNAS), 99(9):5825--5829, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  19. W. Feller. An Introduction to probability theory and applications: Vol I. John Wiley and Sons, 3rd edition, 1968.Google ScholarGoogle Scholar
  20. M. Jha, C. Seshadhri, and A. Pinar. A space efficient streaming algorithm for triangle counting using the birthday paradox. Technical Report 1212.2264, Arxiv, 2012.Google ScholarGoogle Scholar
  21. H. Jowhari and M. Ghodsi. New streaming algorithms for counting triangles in graphs. In Computing and Combinatorics Conference (COCOON), pages 710--716, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. M. Kane, K. Mehlhorn, T. Sauerwald, and H. Sun. Counting arbitrary subgraphs in data streams. In International Colloquium on Automata, Languages, and Programming (ICALP), pages 598--609, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. N. Kolountzakis, G. L. Miller, R. Peng, and C. Tsourakakis. Efficient triangle counting in large graphs via degree-based vertex partitioning. In WAW'10, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  24. M. Latapy. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoretical Computer Science, 407:458--473, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: Simple building blocks of complex networks. Science, 298(5594):824--827.Google ScholarGoogle ScholarCross RefCross Ref
  26. R. Pagh and C. Tsourakakis. Colorful triangle counting and a mapreduce implementation. Information Processing Letters, 112:277--281, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. T. Plantenga. Inexact subgraph isomorphism in mapreduce. Journal of Parallel and Distributed Computing, (0), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Portes. Social capital: Its origins and applications in modern sociology. Annual Review of Sociology, 24(1):1--24, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  29. T. Schank and D. Wagner. Approximating clustering coefficient and transitivity. Journal of Graph Algorithms and Applications, 9:265--275, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  30. T. Schank and D. Wagner. Finding, counting and listing all triangles in large graphs, an experimental study. In Experimental and Efficient Algorithms, pages 606--609. Springer Berlin / Heidelberg, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. Seshadhri, A. Pinar, and T. G. Kolda. Fast triangle counting through wedge sampling. In Proceedings of the SIAM Conference on Data Mining, 2013.Google ScholarGoogle Scholar
  32. S. Suri and S. Vassilvitskii. Counting triangles and the curse of the last reducer. In World Wide Web (WWW), pages 607--614, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. C. Tsourakakis. Fast counting of triangles in large real networks without counting: Algorithms and laws. In International Conference on Data Mining (ICDM), pages 608--617, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. C. Tsourakakis, P. Drineas, E. Michelakis, I. Koutis, and C. Faloutsos. Spectral counting of triangles in power-law networks via element-wise sparsification. In ASONAM'09, pages 66--71, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. C. Tsourakakis, M. N. Kolountzakis, and G. Miller. Triangle sparsifiers. J. Graph Algorithms and Applications, 15:703--726, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  36. C. E. Tsourakakis, U. Kang, G. L. Miller, and C. Faloutsos. Doulion: counting triangles in massive graphs with a coin. In Knowledge Data and Discovery (KDD), pages 837--846, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software (TOMS), 11(1):37--57, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications. Cambridge University Press, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  39. B. F. Welles, A. V. Devender, and N. Contractor. Is a friend a friend?: Investigating the structure of friendship networks in virtual worlds. In CHI-EA'10, pages 4027--4032, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. J.-H. Yoon and S.-R. Kim. Improved sampling for triangle counting with MapReduce. In Convergence and Hybrid Information Technology, volume 6935, pages 685--689. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Stanford Network Analysis Project (SNAP). Available at http://snap.stanford.edu/.Google ScholarGoogle Scholar

Index Terms

  1. A space efficient streaming algorithm for triangle counting using the birthday paradox

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
        August 2013
        1534 pages
        ISBN:9781450321747
        DOI:10.1145/2487575

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 August 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        KDD '13 Paper Acceptance Rate125of726submissions,17%Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader