skip to main content
10.1145/3517804.3524145acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
research-article
Open Access

Approximately Counting Subgraphs in Data Streams

Published:13 June 2022Publication History

ABSTRACT

Estimating the number of subgraphs in data streams is a fundamental problem that has received great attention in the past decade. In this paper, we give improved streaming algorithms for approximately counting the number of occurrences of an arbitrary subgraph H, denoted #H, when the input graph G is represented as a stream of m edges. To obtain our algorithms, we provide a generic transformation that converts constant-round sublinear-time graph algorithms in the query access model to constant-pass sublinear-space graph streaming algorithms. Using this transformation, we obtain the following results.

• We give a 3-pass turnstile streaming algorithm for (1 ± ε)-approximating #H in Õ(mρ(H) /ε2⋅#H) space, where ρ(H) is the fractional edge-cover of H. This improves upon and generalizes a result of McGregor et al. [PODS 2016], who gave a 3-pass insertion-only streaming algorithm for (1 ± ε)-approximating the number #T of triangles in Õ(m3/2/ε2 ⋅ #T) space if the algorithm is given additional oracle access to the degrees.• We provide a constant-pass streaming algorithm for (1 ± ε)-approximating #Kr in Õ(m/λr-2 ε2 ⋅ #Kr) space for any r ≥ 3, in a graph G with degeneracy λ, where Kr is a clique on r vertices. This resolves a conjecture by Bera and Seshadhri [PODS 2020].

More generally, our reduction relates the adaptivity of a query algorithm to the pass complexity of a corresponding streaming algorithm, and it is applicable to all algorithms in standard sublinear-time graph query models, e.g., the (augmented) general model.

Skip Supplemental Material Section

Supplemental Material

PODS22-fp18.mp4

mp4

80.4 MB

References

  1. Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. 2012. Graph sketches: sparsification, spanners, and subgraphs. In Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems. 5--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Sepehr Assadi, Michael Kapralov, and Sanjeev Khanna. 2019. A simple sublinear-time algorithm for counting arbitrary subgraphs via edge sampling. In 10th Innovations in Theoretical Computer Science, ITCS 2019. Schloss Dagstuhl-Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 6.Google ScholarGoogle Scholar
  3. Albert Atserias, Martin Grohe, and Dániel Marx. 2008. Size bounds and query plans for relational joins. In 2008 49th Annual IEEE Symposium on Foundations of Computer Science. IEEE, 739--748.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ziv Bar-Yossef, Ravi Kumar, and D Sivakumar. 2002. Reductions in streaming algorithms, with an application to counting triangles in graphs. In Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms. 623--632.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Suman K Bera and Amit Chakrabarti. 2017. Towards tighter space bounds for counting triangles and other substructures in graph streams. In 34th Symposium on Theoretical Aspects of Computer Science.Google ScholarGoogle Scholar
  6. Suman K. Bera and C. Seshadhri. 2020. How the Degeneracy Helps for Triangle Counting in Graph Streams. In Proceedings of the 39th ACM SIGMOD-SIGACT- SIGAI Symposium on Principles of Database Systems (PODS'20). Association for Computing Machinery, New York, NY, USA, 457--467. https://doi.org/10.1145/3375395.3387665Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Vladimir Braverman, Rafail Ostrovsky, and Dan Vilenchik. 2013. How hard is counting triangles in the streaming model?. In International Colloquium on Automata, Languages, and Programming. Springer, 244--254.Google ScholarGoogle Scholar
  8. Laurent Bulteau, Vincent Froese, Konstantin Kutzkov, and Rasmus Pagh. 2016. Triangle counting in dynamic graph streams. Algorithmica 76, 1 (2016), 259--278.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Luciana S Buriol, Gereon Frahling, Stefano Leonardi, Alberto Marchetti-Spaccamela, and Christian Sohler. 2006. Counting triangles in data streams. In Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. 253--262.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Clément L. Canonne and Tom Gur. 2018. An Adaptivity Hierarchy Theorem for Property Testing. computational complexity 27, 4 (Dec. 2018), 671--716. https://doi.org/10.1007/s00037-018-0168--4Google ScholarGoogle Scholar
  11. Graham Cormode and Donatella Firmani. 2014. A Unifying Framework for l0-Sampling Algorithms. Distributed and Parallel Databases 32, 3 (Sept. 2014), 315--335. https://doi.org/10.1007/s10619-013--7131--9Google ScholarGoogle ScholarCross RefCross Ref
  12. Graham Cormode and Hossein Jowhari. 2014. A second look at counting triangles in graph streams. Theoretical Computer Science 552 (2014), 44--51.Google ScholarGoogle ScholarCross RefCross Ref
  13. Graham Cormode and Hossein Jowhari. 2017. A second look at counting triangles in graph streams (corrected). Theoretical Computer Science 683 (2017), 22--30.Google ScholarGoogle ScholarCross RefCross Ref
  14. Talya Eden, Dana Ron, and C Seshadhri. 2020. Faster sublinear approximation of the number of k-cliques in low-arboricity graphs. In Symposium on Algorithms and Data Structures (SODA).Google ScholarGoogle ScholarCross RefCross Ref
  15. Hendrik Fichtenberger, Mingze Gao, and Pan Peng. 2020. Sampling Arbitrary Subgraphs Exactly Uniformly in Sublinear Time. In 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik.Google ScholarGoogle Scholar
  16. Joshua A Grochow and Manolis Kellis. 2007. Network motif discovery using subgraph enumeration and symmetry-breaking. In Annual International Conference on Research in Computational Molecular Biology. Springer, 92--106.Google ScholarGoogle ScholarCross RefCross Ref
  17. Rajesh Jayaram and John Kallaugher. 2021. An Optimal Algorithm for Triangle Counting. arXiv preprint arXiv:2105.01785 (2021).Google ScholarGoogle Scholar
  18. Hossein Jowhari and Mohammad Ghodsi. 2005. New streaming algorithms for counting triangles in graphs. In International Computing and Combinatorics Conference. Springer, 710--716.Google ScholarGoogle ScholarCross RefCross Ref
  19. John Kallaugher, Michael Kapralov, and Eric Price. 2018. The sketching complexity of graph and hypergraph counting. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 556--567.Google ScholarGoogle ScholarCross RefCross Ref
  20. John Kallaugher, Andrew McGregor, Eric Price, and Sofya Vorotnikova. 2019. The complexity of counting cycles in the adjacency list streaming model. In Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 119--133.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. John Kallaugher and Eric Price. 2017. A hybrid sampling scheme for triangle counting. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM, 1778--1797.Google ScholarGoogle ScholarCross RefCross Ref
  22. Daniel M Kane, Kurt Mehlhorn, Thomas Sauerwald, and He Sun. 2012. Counting arbitrary subgraphs in data streams. In International Colloquium on Automata, Languages, and Programming. Springer, 598--609.Google ScholarGoogle Scholar
  23. Mihail N Kolountzakis, Gary L Miller, Richard Peng, and Charalampos E Tsourakakis. 2012. Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Mathematics 8, 1--2 (2012), 161--185.Google ScholarGoogle ScholarCross RefCross Ref
  24. Madhusudan Manjunath, Kurt Mehlhorn, Konstantinos Panagiotou, and He Sun. 2011. Approximate counting of cycles in streams. In European Symposium on Algorithms. Springer, 677--688.Google ScholarGoogle ScholarCross RefCross Ref
  25. Andrew McGregor and Sofya Vorotnikova. 2020. Triangle and four cycle counting in the data stream model. In Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 445--456.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Andrew McGregor, Sofya Vorotnikova, and Hoa T Vu. 2016. Better algorithms for counting triangles in data streams. In Proceedings of the 35th ACM SIGMOD- SIGACT-SIGAI Symposium on Principles of Database Systems. 401--411.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Hung Q Ngo, Ely Porat, Christopher Ré, and Atri Rudra. 2018. Worst-case optimal join algorithms. Journal of the ACM (JACM) 65, 3 (2018), 1--40.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Rasmus Pagh and Charalampos E Tsourakakis. 2012. Colorful triangle counting and a mapreduce implementation. Inform. Process. Lett. 112, 7 (2012), 277--281.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A Pavan, Kanat Tangwongsan, Srikanta Tirthapura, and Kun-Lung Wu. 2013. Counting and sampling triangles from a graph stream. Proceedings of the VLDB Endowment 6, 14 (2013), 1870--1881.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Alexander Schrijver. 2003. Combinatorial optimization: polyhedra and efficiency. Vol. 24. Springer Science & Business Media.Google ScholarGoogle Scholar
  31. Charalampos E Tsourakakis, U Kang, Gary L Miller, and Christos Faloutsos. 2009. Doulion: counting triangles in massive graphs with a coin. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 837--846.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Approximately Counting Subgraphs in Data Streams

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PODS '22: Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
        June 2022
        462 pages
        ISBN:9781450392600
        DOI:10.1145/3517804

        Copyright © 2022 Owner/Author

        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 June 2022

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate642of2,707submissions,24%
      • Article Metrics

        • Downloads (Last 12 months)101
        • Downloads (Last 6 weeks)11

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader