skip to main content
10.1145/2902251.2902283acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
research-article
Public Access

Better Algorithms for Counting Triangles in Data Streams

Published:15 June 2016Publication History

ABSTRACT

We present space-efficient data stream algorithms for approximating the number of triangles in a graph up to a factor 1+ε. While it can be shown that determining whether a graph is triangle-free is not possible in sub-linear space, a large body of work has focused on minimizing the space required in terms of the number of triangles T (or a lower bound on this quantity) and other parameters including the number of nodes n and the number of edges m. Two models are important in the literature: the arbitrary order model in which the stream consists of the edges of the graph in arbitrary order and the adjacency list order model in which all edges incident to the same node appear consecutively. We improve over the state of the art results in both models. For the adjacency list order model, we show that ~O(ε-2m/√T) space is sufficient in one pass and ~O(ε-2m3/2/T) space is sufficient in two passes where the ~O(·) notation suppresses log factors. For the arbitrary order model, we show that ~O(ε-2m/√T) space suffices given two passes and that ~O(ε-2m3/2/T) space suffices given three passes and oracle access to the degrees. Finally, we show how to efficiently implement the "wedge sampling" approach to triangle estimation in the arbitrary order model. To do this, we develop the first algorithm for lp sampling such that multiple independent samples can be generated with O(polylog n) update time; this primitive is widely applicable and this result may be of independent interest.

References

  1. K. J. Ahn, S. Guha, and A. McGregor. Graph sketches: sparsification, spanners, and subgraphs. In 31st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 5--14, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Alon, R. Yuster, and U. Zwick. Finding and counting given length cycles. Algorithmica, 17(3):209--223, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  3. A. Andoni, R. Krauthgamer, and K. Onak. Streaming algorithms via precision sampling. In Foundations of Computer Science (FOCS), 2011 IEEE 52nd Annual Symposium on, pages 363--372. IEEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Arge, M. T. Goodrich, and N. Sitchinava. Parallel external memory graph algorithms. In 24th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2010, Atlanta, Georgia, USA, 19--23 April 2010 - Conference Proceedings, pages 1--11, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  5. S. Arifuzzaman, M. Khan, and M. V. Marathe. PATRIC: a parallel algorithm for counting triangles in massive networks. In 22nd ACM International Conference on Information and Knowledge Management, CIKM'13, San Francisco, CA, USA, October 27 - November 1, 2013, pages 529--538, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. In Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 6--8, 2002, San Francisco, CA, USA., pages 623--632, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. W. Berry, B. Hendrickson, S. Kahan, and P. Konecny. Software and algorithms for graph queries on multithreaded architectures. In 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), Proceedings, 26--30 March 2007, Long Beach, California, USA, pages 1--14, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  8. V. Braverman, R. Ostrovsky, and D. Vilenchik. How hard is counting triangles in the streaming model? In Automata, Languages, and Programming - 40th International Colloquium, ICALP 2013, Riga, Latvia, July 8--12, 2013, Proceedings, Part I, pages 244--254, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. Bulteau, V. Froese, K. Kutzkov, and R. Pagh. Triangle counting in dynamic graph streams. CoRR, abs/1404.4696, 2014.Google ScholarGoogle Scholar
  10. L. S. Buriol, G. Frahling, S. Leonardi, A. Marchetti-Spaccamela, and C. Sohler. Counting triangles in data streams. In Proceedings of the Twenty-Fifth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 26--28, 2006, Chicago, Illinois, USA, pages 253--262, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. N. Chiba and T. Nishizeki. Arboricity and subgraph listing algorithms. SIAM J. Comput., 14(1):210--223, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. L. Clarkson and D. P. Woodruff. Numerical linear algebra in the streaming model. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC 2009, Bethesda, MD, USA, May 31 - June 2, 2009, pages 205--214, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. Cormode. Personal communication 8/23/15. 2015.Google ScholarGoogle Scholar
  14. G. Cormode. Personal communication 1/23/16. 2016.Google ScholarGoogle Scholar
  15. G. Cormode and D. Firmani. A unifying framework for l0-sampling algorithms. Distributed and Parallel Databases, 32(3):315--335, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Cormode and H. Jowhari. A second look at counting triangles in graph streams. Theor. Comput. Sci., 552:44--51, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  17. J.-P. Eckmann and E. Moses. Curvature of co-links uncovers hidden thematic layers in the world wide web. Proceedings of the National Academy of Sciences, 99(9):5825--5829, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  18. T. Eden, A. Levi, D. Ron, and C. Seshadhri. Approximately counting triangles in sublinear time. In FOCS, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Ghashami, E. Liberty, J. M. Phillips, and D. P. Woodruff. Frequent directions : Simple and deterministic matrix sketching. CoRR, abs/1501.01711, 2015.Google ScholarGoogle Scholar
  20. A. Goel, M. Kapralov, and S. Khanna. On the communication and streaming complexity of maximum bipartite matching. In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012, Kyoto, Japan, January 17--19, 2012, pages 468--485, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Indyk and D. Woodruff. Optimal approximations of the frequency moments of data streams. In thirty-seventh annual ACM symposium on Theory of computing, pages 202--208. ACM New York, NY, USA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Jha, C. Seshadhri, and A. Pinar. A space-efficient streaming algorithm for estimating transitivity and triangle counts using the birthday paradox. TKDD, 9(3):15:1--15:21, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. G. Jørgensen and S. Pettie. Threesomes, degenerates, and love triangles. In 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014, Philadelphia, PA, USA, October 18--21, 2014, pages 621--630, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. H. Jowhari and M. Ghodsi. New streaming algorithms for counting triangles in graphs. In Computing and Combinatorics, 11th Annual International Conference, COCOON 2005, Kunming, China, August 16--29, 2005, Proceedings, pages 710--716, 2005.Google ScholarGoogle Scholar
  25. H. Jowhari, M. Saglam, and G. Tardos. Tight bounds for lp samplers, finding duplicates in streams, and related problems. In PODS, pages 49--58, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Kapralov. Better bounds for matchings in the streaming model. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013, New Orleans, Louisiana, USA, January 6--8, 2013, pages 1679--1697, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. T. G. Kolda, A. Pinar, and C. Seshadhri. Triadic measures on graphs: The power of wedge sampling. In Proceedings of the 13th SIAM International Conference on Data Mining, May 2--4, 2013. Austin, Texas, USA., pages 10--18, 2013.Google ScholarGoogle Scholar
  28. M. N. Kolountzakis, G. L. Miller, R. Peng, and C. E. Tsourakakis. Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Mathematics, 8(1--2):161--185, 2012.Google ScholarGoogle Scholar
  29. K. Kutzkov and R. Pagh. Triangle counting in dynamic graph streams. In Algorithm Theory - SWAT 2014 - 14th Scandinavian Symposium and Workshops, Copenhagen, Denmark, July 2--4, 2014. Proceedings, pages 306--318, 2014.Google ScholarGoogle Scholar
  30. J. Leskovec, L. Backstrom, R. Kumar, and A. Tomkins. Microscopic evolution of social networks. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA, August 24--27, 2008, pages 462--470, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Manjunath, K. Mehlhorn, K. Panagiotou, and H. Sun. Approximate counting of cycles in streams. In Algorithms - ESA 2011 - 19th Annual European Symposium, Saarbrücken, Germany, September 5--9, 2011. Proceedings, pages 677--688, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. McGregor. Graph stream algorithms: a survey. SIGMOD Record, 43(1):9--20, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. McGregor, D. Tench, S. Vorotnikova, and H. T. Vu. Densest subgraph in dynamic graph streams. In Mathematical Foundations of Computer Science 2015, pages 472--482. Springer Berlin Heidelberg, 2015.Google ScholarGoogle Scholar
  34. R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: simple building blocks of complex networks. In Science, pages 824--827, 2002.Google ScholarGoogle Scholar
  35. M. Mitzenmacher and E. Upfal. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, New York, NY, USA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Monemizadeh and D. P. Woodruff. 1-pass relative-error lp-sampling with applications. In SODA, pages 1143--1160, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. R. Pagh and C. E. Tsourakakis. Colorful triangle counting and a mapreduce implementation. Inf. Process. Lett., 112(7):277--281, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. A. Pavan, K. Tangwongsan, S. Tirthapura, and K. Wu. Counting and sampling triangles from a graph stream. PVLDB, 6(14):1870--1881, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. T. Schank and D. Wagner. Approximating clustering coefficient and transitivity. J. Graph Algorithms Appl., 9(2):265--275, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  40. S. Suri and S. Vassilvitskii. Counting triangles and the curse of the last reducer. In Proceedings of the 20th International Conference on World Wide Web, WWW 2011, Hyderabad, India, March 28 - April 1, 2011, pages 607--614, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. K. Tangwongsan, A. Pavan, and S. Tirthapura. Parallel triangle counting in massive streaming graphs. In 22nd ACM International Conference on Information and Knowledge Management, CIKM'13, San Francisco, CA, USA, October 27 - November 1, 2013, pages 781--786, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. C. E. Tsourakakis, M. N. Kolountzakis, and G. L. Miller. Triangle sparsifiers. J. Graph Algorithms Appl., 15(6):703--726, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  43. J. S. Vitter. Random sampling with a reservoir. ACM Trans. Math. Softw., 11(1):37--57, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. B. F. Welles, A. V. Devender, and N. S. Contractor. Is a friend a friend?: investigating the structure of friendship networks in virtual worlds. In Proceedings of the 28th International Conference on Human Factors in Computing Systems, CHI 2010, Extended Abstracts Volume, Atlanta, Georgia, USA, April 10--15, 2010, pages 4027--4032, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Better Algorithms for Counting Triangles in Data Streams

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PODS '16: Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
        June 2016
        504 pages
        ISBN:9781450341912
        DOI:10.1145/2902251
        • General Chair:
        • Tova Milo,
        • Program Chair:
        • Wang-Chiew Tan

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 15 June 2016

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        PODS '16 Paper Acceptance Rate31of94submissions,33%Overall Acceptance Rate642of2,707submissions,24%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader