ABSTRACT
We present space-efficient data stream algorithms for approximating the number of triangles in a graph up to a factor 1+ε. While it can be shown that determining whether a graph is triangle-free is not possible in sub-linear space, a large body of work has focused on minimizing the space required in terms of the number of triangles T (or a lower bound on this quantity) and other parameters including the number of nodes n and the number of edges m. Two models are important in the literature: the arbitrary order model in which the stream consists of the edges of the graph in arbitrary order and the adjacency list order model in which all edges incident to the same node appear consecutively. We improve over the state of the art results in both models. For the adjacency list order model, we show that ~O(ε-2m/√T) space is sufficient in one pass and ~O(ε-2m3/2/T) space is sufficient in two passes where the ~O(·) notation suppresses log factors. For the arbitrary order model, we show that ~O(ε-2m/√T) space suffices given two passes and that ~O(ε-2m3/2/T) space suffices given three passes and oracle access to the degrees. Finally, we show how to efficiently implement the "wedge sampling" approach to triangle estimation in the arbitrary order model. To do this, we develop the first algorithm for lp sampling such that multiple independent samples can be generated with O(polylog n) update time; this primitive is widely applicable and this result may be of independent interest.
- K. J. Ahn, S. Guha, and A. McGregor. Graph sketches: sparsification, spanners, and subgraphs. In 31st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 5--14, 2012. Google ScholarDigital Library
- N. Alon, R. Yuster, and U. Zwick. Finding and counting given length cycles. Algorithmica, 17(3):209--223, 1997.Google ScholarCross Ref
- A. Andoni, R. Krauthgamer, and K. Onak. Streaming algorithms via precision sampling. In Foundations of Computer Science (FOCS), 2011 IEEE 52nd Annual Symposium on, pages 363--372. IEEE, 2011. Google ScholarDigital Library
- L. Arge, M. T. Goodrich, and N. Sitchinava. Parallel external memory graph algorithms. In 24th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2010, Atlanta, Georgia, USA, 19--23 April 2010 - Conference Proceedings, pages 1--11, 2010.Google ScholarCross Ref
- S. Arifuzzaman, M. Khan, and M. V. Marathe. PATRIC: a parallel algorithm for counting triangles in massive networks. In 22nd ACM International Conference on Information and Knowledge Management, CIKM'13, San Francisco, CA, USA, October 27 - November 1, 2013, pages 529--538, 2013. Google ScholarDigital Library
- Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. In Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 6--8, 2002, San Francisco, CA, USA., pages 623--632, 2002. Google ScholarDigital Library
- J. W. Berry, B. Hendrickson, S. Kahan, and P. Konecny. Software and algorithms for graph queries on multithreaded architectures. In 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), Proceedings, 26--30 March 2007, Long Beach, California, USA, pages 1--14, 2007.Google ScholarCross Ref
- V. Braverman, R. Ostrovsky, and D. Vilenchik. How hard is counting triangles in the streaming model? In Automata, Languages, and Programming - 40th International Colloquium, ICALP 2013, Riga, Latvia, July 8--12, 2013, Proceedings, Part I, pages 244--254, 2013. Google ScholarDigital Library
- L. Bulteau, V. Froese, K. Kutzkov, and R. Pagh. Triangle counting in dynamic graph streams. CoRR, abs/1404.4696, 2014.Google Scholar
- L. S. Buriol, G. Frahling, S. Leonardi, A. Marchetti-Spaccamela, and C. Sohler. Counting triangles in data streams. In Proceedings of the Twenty-Fifth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 26--28, 2006, Chicago, Illinois, USA, pages 253--262, 2006. Google ScholarDigital Library
- N. Chiba and T. Nishizeki. Arboricity and subgraph listing algorithms. SIAM J. Comput., 14(1):210--223, 1985. Google ScholarDigital Library
- K. L. Clarkson and D. P. Woodruff. Numerical linear algebra in the streaming model. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC 2009, Bethesda, MD, USA, May 31 - June 2, 2009, pages 205--214, 2009. Google ScholarDigital Library
- G. Cormode. Personal communication 8/23/15. 2015.Google Scholar
- G. Cormode. Personal communication 1/23/16. 2016.Google Scholar
- G. Cormode and D. Firmani. A unifying framework for l0-sampling algorithms. Distributed and Parallel Databases, 32(3):315--335, 2014. Google ScholarDigital Library
- G. Cormode and H. Jowhari. A second look at counting triangles in graph streams. Theor. Comput. Sci., 552:44--51, 2014.Google ScholarCross Ref
- J.-P. Eckmann and E. Moses. Curvature of co-links uncovers hidden thematic layers in the world wide web. Proceedings of the National Academy of Sciences, 99(9):5825--5829, 2002.Google ScholarCross Ref
- T. Eden, A. Levi, D. Ron, and C. Seshadhri. Approximately counting triangles in sublinear time. In FOCS, 2015. Google ScholarDigital Library
- M. Ghashami, E. Liberty, J. M. Phillips, and D. P. Woodruff. Frequent directions : Simple and deterministic matrix sketching. CoRR, abs/1501.01711, 2015.Google Scholar
- A. Goel, M. Kapralov, and S. Khanna. On the communication and streaming complexity of maximum bipartite matching. In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012, Kyoto, Japan, January 17--19, 2012, pages 468--485, 2012. Google ScholarDigital Library
- P. Indyk and D. Woodruff. Optimal approximations of the frequency moments of data streams. In thirty-seventh annual ACM symposium on Theory of computing, pages 202--208. ACM New York, NY, USA, 2005. Google ScholarDigital Library
- M. Jha, C. Seshadhri, and A. Pinar. A space-efficient streaming algorithm for estimating transitivity and triangle counts using the birthday paradox. TKDD, 9(3):15:1--15:21, 2015. Google ScholarDigital Library
- A. G. Jørgensen and S. Pettie. Threesomes, degenerates, and love triangles. In 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014, Philadelphia, PA, USA, October 18--21, 2014, pages 621--630, 2014. Google ScholarDigital Library
- H. Jowhari and M. Ghodsi. New streaming algorithms for counting triangles in graphs. In Computing and Combinatorics, 11th Annual International Conference, COCOON 2005, Kunming, China, August 16--29, 2005, Proceedings, pages 710--716, 2005.Google Scholar
- H. Jowhari, M. Saglam, and G. Tardos. Tight bounds for lp samplers, finding duplicates in streams, and related problems. In PODS, pages 49--58, 2011. Google ScholarDigital Library
- M. Kapralov. Better bounds for matchings in the streaming model. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013, New Orleans, Louisiana, USA, January 6--8, 2013, pages 1679--1697, 2013. Google ScholarDigital Library
- T. G. Kolda, A. Pinar, and C. Seshadhri. Triadic measures on graphs: The power of wedge sampling. In Proceedings of the 13th SIAM International Conference on Data Mining, May 2--4, 2013. Austin, Texas, USA., pages 10--18, 2013.Google Scholar
- M. N. Kolountzakis, G. L. Miller, R. Peng, and C. E. Tsourakakis. Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Mathematics, 8(1--2):161--185, 2012.Google Scholar
- K. Kutzkov and R. Pagh. Triangle counting in dynamic graph streams. In Algorithm Theory - SWAT 2014 - 14th Scandinavian Symposium and Workshops, Copenhagen, Denmark, July 2--4, 2014. Proceedings, pages 306--318, 2014.Google Scholar
- J. Leskovec, L. Backstrom, R. Kumar, and A. Tomkins. Microscopic evolution of social networks. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA, August 24--27, 2008, pages 462--470, 2008. Google ScholarDigital Library
- M. Manjunath, K. Mehlhorn, K. Panagiotou, and H. Sun. Approximate counting of cycles in streams. In Algorithms - ESA 2011 - 19th Annual European Symposium, Saarbrücken, Germany, September 5--9, 2011. Proceedings, pages 677--688, 2011. Google ScholarDigital Library
- A. McGregor. Graph stream algorithms: a survey. SIGMOD Record, 43(1):9--20, 2014. Google ScholarDigital Library
- A. McGregor, D. Tench, S. Vorotnikova, and H. T. Vu. Densest subgraph in dynamic graph streams. In Mathematical Foundations of Computer Science 2015, pages 472--482. Springer Berlin Heidelberg, 2015.Google Scholar
- R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: simple building blocks of complex networks. In Science, pages 824--827, 2002.Google Scholar
- M. Mitzenmacher and E. Upfal. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, New York, NY, USA, 2005. Google ScholarDigital Library
- M. Monemizadeh and D. P. Woodruff. 1-pass relative-error lp-sampling with applications. In SODA, pages 1143--1160, 2010. Google ScholarDigital Library
- R. Pagh and C. E. Tsourakakis. Colorful triangle counting and a mapreduce implementation. Inf. Process. Lett., 112(7):277--281, 2012. Google ScholarDigital Library
- A. Pavan, K. Tangwongsan, S. Tirthapura, and K. Wu. Counting and sampling triangles from a graph stream. PVLDB, 6(14):1870--1881, 2013. Google ScholarDigital Library
- T. Schank and D. Wagner. Approximating clustering coefficient and transitivity. J. Graph Algorithms Appl., 9(2):265--275, 2005.Google ScholarCross Ref
- S. Suri and S. Vassilvitskii. Counting triangles and the curse of the last reducer. In Proceedings of the 20th International Conference on World Wide Web, WWW 2011, Hyderabad, India, March 28 - April 1, 2011, pages 607--614, 2011. Google ScholarDigital Library
- K. Tangwongsan, A. Pavan, and S. Tirthapura. Parallel triangle counting in massive streaming graphs. In 22nd ACM International Conference on Information and Knowledge Management, CIKM'13, San Francisco, CA, USA, October 27 - November 1, 2013, pages 781--786, 2013. Google ScholarDigital Library
- C. E. Tsourakakis, M. N. Kolountzakis, and G. L. Miller. Triangle sparsifiers. J. Graph Algorithms Appl., 15(6):703--726, 2011.Google ScholarCross Ref
- J. S. Vitter. Random sampling with a reservoir. ACM Trans. Math. Softw., 11(1):37--57, 1985. Google ScholarDigital Library
- B. F. Welles, A. V. Devender, and N. S. Contractor. Is a friend a friend?: investigating the structure of friendship networks in virtual worlds. In Proceedings of the 28th International Conference on Human Factors in Computing Systems, CHI 2010, Extended Abstracts Volume, Atlanta, Georgia, USA, April 10--15, 2010, pages 4027--4032, 2010. Google ScholarDigital Library
Index Terms
- Better Algorithms for Counting Triangles in Data Streams
Recommendations
Triangle and Four Cycle Counting in the Data Stream Model
PODS'20: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database SystemsThe problem of estimating the number of cycles in a graph is one of the most widely studied graph problems in the data stream model. Three relevant variants of the data stream model include: the arbitrary order model in which the stream consists of the ...
Contractible Edges and Triangles in k-Connected Graphs
It is proved that if G is a k-connected graph which does not contain K 4, then G has an edge e or a triangle T such that the graph obtained from G by connecting e or by contracting T is still k-connected. By using this theorem, we prove some theorems ...
Approximately Counting Subgraphs in Data Streams
PODS '22: Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database SystemsEstimating the number of subgraphs in data streams is a fundamental problem that has received great attention in the past decade. In this paper, we give improved streaming algorithms for approximately counting the number of occurrences of an arbitrary ...
Comments