ABSTRACT
We design a space efficient algorithm that approximates the transitivity (global clustering coefficient) and total triangle count with only a single pass through a graph given as a stream of edges. Our procedure is based on the classic probabilistic result, the birthday paradox. When the transitivity is constant and there are more edges than wedges (common properties for social networks), we can prove that our algorithm requires O(√n) space (n is the number of vertices) to provide accurate estimates. We run a detailed set of experiments on a variety of real graphs and demonstrate that the memory requirement of the algorithm is a tiny fraction of the graph. For example, even for a graph with 200 million edges, our algorithm stores just 60,000 edges to give accurate results. Being a single pass streaming algorithm, our procedure also maintains a real-time estimate of the transitivity/number of triangles of a graph, by storing a miniscule fraction of edges.
- K. J. Ahn, S. Guha, and A. McGregor. Graph sketches: sparsification, spanners, and subgraphs. In Principles of Database Systems, pages 5--14, 2012. Google ScholarDigital Library
- N. Alon, T. Kaufman, and M. Krivelevich. Testing triangle-freeness in general graphs. In Proceedings of the 17th Annual Symposium on Discrete Algorithms (SODA), 2006. Google ScholarDigital Library
- L. Arge, M. Goodrich, and N. Sitchinava. Parallel external memory graph algorithms. In Parallel Distributed Processing Symposium (IPDPS), pages 1--11, april 2010.Google ScholarCross Ref
- S. M. Arifuzzaman, M. Khan, and M. Marathe. Patric: A parallel algorithm for counting triangles and computing clustering coefficients in massive networks. Technical Report 12-042, NDSSL, 2012.Google Scholar
- H. Avron. Counting triangles in large graphs using randomized matrix trace estimation. In KDD workshon Large Scale Data Mining, 2010.Google Scholar
- Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. In Symposium of Discrete Algorith,s, pages 623--632, 2002. Google ScholarDigital Library
- L. Becchetti, P. Boldi, C. Castillo, and A. Gionis. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In Knowledge Data and Discovery (KDD), pages 16--24, 2008. Google ScholarDigital Library
- J. Berry, L. Fosvedt, D. Nordman, C. A. Phillips, and A. G. Wilson. Listing triangles in expected linear time on power law graphs with exponent at least 7 over 3. Technical Report SAND2010-4474c, Sandia National Laboratories, 2011.Google Scholar
- J. Berry, B. Hendrickson, S. Kahan, and P. Konecny. Software and algorithms for graph queries on multithreaded architectures. In Parallel and Distributed Processing Symposium (IPDPS), pages 1--14, march 2007.Google ScholarCross Ref
- L. S. Buriol, G. Frahling, S. Leonardi, A. Marchetti-Spaccamela, and C. Sohler. Counting triangles in data streams. In Principles of Database Systems, pages 253--262, 2006. Google ScholarDigital Library
- R. S. Burt. Structural holes and good ideas. American Journal of Sociology, 110(2):349--399, 2004.Google ScholarCross Ref
- D. R. Chakrabarti, P. Banerjee, H.-J. Boehm, P. G. Joisha, and R. S. Schreiber. The runtime abort graph and its application to software transactional memory optimization. In International Symposium on Code Generation and Optimization, pages 42--53, 2011. Google ScholarDigital Library
- Y.-J. Chiang, M. T. Goodrich, E. F. Grove, R. Tamassia, D. E. Vengroff, and J. S. Vitter. External-memory graph algorithms. In Symposium on Discrete Algorithms (SODA), pages 139--149, 1995. Google ScholarDigital Library
- N. Chiba and T. Nishizeki. Arboricity and subgraph listing algorithms. SIAM J. Comput., 14:210--223, 1985. Google ScholarDigital Library
- S. Chu and J. Cheng. Triangle listing in massive networks and its applications. In Knowledge Data and Discovery (KDD), pages 672--680, 2011. Google ScholarDigital Library
- J. Cohen. Graph twiddling in a MapReduce world. Computing in Science & Engineering, 11:29--41, 2009. Google ScholarDigital Library
- J. S. Coleman. Social capital in the creation of human capital. American Journal of Sociology, 94:S95-S120, 1988.Google ScholarCross Ref
- J.-P. Eckmann and E. Moses. Curvature of co-links uncovers hidden thematic layers in the World Wide Web. Proceedings of the National Academy of Sciences (PNAS), 99(9):5825--5829, 2002.Google ScholarCross Ref
- W. Feller. An Introduction to probability theory and applications: Vol I. John Wiley and Sons, 3rd edition, 1968.Google Scholar
- M. Jha, C. Seshadhri, and A. Pinar. A space efficient streaming algorithm for triangle counting using the birthday paradox. Technical Report 1212.2264, Arxiv, 2012.Google Scholar
- H. Jowhari and M. Ghodsi. New streaming algorithms for counting triangles in graphs. In Computing and Combinatorics Conference (COCOON), pages 710--716, 2005. Google ScholarDigital Library
- D. M. Kane, K. Mehlhorn, T. Sauerwald, and H. Sun. Counting arbitrary subgraphs in data streams. In International Colloquium on Automata, Languages, and Programming (ICALP), pages 598--609, 2012. Google ScholarDigital Library
- M. N. Kolountzakis, G. L. Miller, R. Peng, and C. Tsourakakis. Efficient triangle counting in large graphs via degree-based vertex partitioning. In WAW'10, 2010.Google ScholarCross Ref
- M. Latapy. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoretical Computer Science, 407:458--473, 2008. Google ScholarDigital Library
- R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: Simple building blocks of complex networks. Science, 298(5594):824--827.Google ScholarCross Ref
- R. Pagh and C. Tsourakakis. Colorful triangle counting and a mapreduce implementation. Information Processing Letters, 112:277--281, 2012. Google ScholarDigital Library
- T. Plantenga. Inexact subgraph isomorphism in mapreduce. Journal of Parallel and Distributed Computing, (0), 2012. Google ScholarDigital Library
- A. Portes. Social capital: Its origins and applications in modern sociology. Annual Review of Sociology, 24(1):1--24, 1998.Google ScholarCross Ref
- T. Schank and D. Wagner. Approximating clustering coefficient and transitivity. Journal of Graph Algorithms and Applications, 9:265--275, 2005.Google ScholarCross Ref
- T. Schank and D. Wagner. Finding, counting and listing all triangles in large graphs, an experimental study. In Experimental and Efficient Algorithms, pages 606--609. Springer Berlin / Heidelberg, 2005. Google ScholarDigital Library
- C. Seshadhri, A. Pinar, and T. G. Kolda. Fast triangle counting through wedge sampling. In Proceedings of the SIAM Conference on Data Mining, 2013.Google Scholar
- S. Suri and S. Vassilvitskii. Counting triangles and the curse of the last reducer. In World Wide Web (WWW), pages 607--614, 2011. Google ScholarDigital Library
- C. Tsourakakis. Fast counting of triangles in large real networks without counting: Algorithms and laws. In International Conference on Data Mining (ICDM), pages 608--617, 2008. Google ScholarDigital Library
- C. Tsourakakis, P. Drineas, E. Michelakis, I. Koutis, and C. Faloutsos. Spectral counting of triangles in power-law networks via element-wise sparsification. In ASONAM'09, pages 66--71, 2009. Google ScholarDigital Library
- C. Tsourakakis, M. N. Kolountzakis, and G. Miller. Triangle sparsifiers. J. Graph Algorithms and Applications, 15:703--726, 2011.Google ScholarCross Ref
- C. E. Tsourakakis, U. Kang, G. L. Miller, and C. Faloutsos. Doulion: counting triangles in massive graphs with a coin. In Knowledge Data and Discovery (KDD), pages 837--846, 2009. Google ScholarDigital Library
- J. Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software (TOMS), 11(1):37--57, 1985. Google ScholarDigital Library
- S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications. Cambridge University Press, 1994.Google ScholarCross Ref
- B. F. Welles, A. V. Devender, and N. Contractor. Is a friend a friend?: Investigating the structure of friendship networks in virtual worlds. In CHI-EA'10, pages 4027--4032, 2010. Google ScholarDigital Library
- J.-H. Yoon and S.-R. Kim. Improved sampling for triangle counting with MapReduce. In Convergence and Hybrid Information Technology, volume 6935, pages 685--689. 2011. Google ScholarDigital Library
- Stanford Network Analysis Project (SNAP). Available at http://snap.stanford.edu/.Google Scholar
Index Terms
- A space efficient streaming algorithm for triangle counting using the birthday paradox
Recommendations
A Space-Efficient Streaming Algorithm for Estimating Transitivity and Triangle Counts Using the Birthday Paradox
TKDD Special Issue (SIGKDD'13)We design a space-efficient algorithm that approximates the transitivity (global clustering coefficient) and total triangle count with only a single pass through a graph given as a stream of edges. Our procedure is based on the classic probabilistic ...
Triangle Counting in Dynamic Graph Streams
Estimating the number of triangles in graph streams using a limited amount of memory has become a popular topic in the last decade. Different variations of the problem have been studied, depending on whether the graph edges are provided in an arbitrary ...
Think Before You Discard: Accurate Triangle Counting in Graph Streams with Deletions
Machine Learning and Knowledge Discovery in DatabasesAbstractGiven a stream of edge additions and deletions, how can we estimate the count of triangles in it? If we can store only a subset of the edges, how can we obtain unbiased estimates with small variances?
Counting triangles (i.e., cliques of size three)...
Comments