ABSTRACT
Estimating the number of subgraphs in data streams is a fundamental problem that has received great attention in the past decade. In this paper, we give improved streaming algorithms for approximately counting the number of occurrences of an arbitrary subgraph H, denoted #H, when the input graph G is represented as a stream of m edges. To obtain our algorithms, we provide a generic transformation that converts constant-round sublinear-time graph algorithms in the query access model to constant-pass sublinear-space graph streaming algorithms. Using this transformation, we obtain the following results.
• We give a 3-pass turnstile streaming algorithm for (1 ± ε)-approximating #H in Õ(mρ(H) /ε2⋅#H) space, where ρ(H) is the fractional edge-cover of H. This improves upon and generalizes a result of McGregor et al. [PODS 2016], who gave a 3-pass insertion-only streaming algorithm for (1 ± ε)-approximating the number #T of triangles in Õ(m3/2/ε2 ⋅ #T) space if the algorithm is given additional oracle access to the degrees.• We provide a constant-pass streaming algorithm for (1 ± ε)-approximating #Kr in Õ(m/λr-2 ε2 ⋅ #Kr) space for any r ≥ 3, in a graph G with degeneracy λ, where Kr is a clique on r vertices. This resolves a conjecture by Bera and Seshadhri [PODS 2020].
More generally, our reduction relates the adaptivity of a query algorithm to the pass complexity of a corresponding streaming algorithm, and it is applicable to all algorithms in standard sublinear-time graph query models, e.g., the (augmented) general model.
Supplemental Material
- Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. 2012. Graph sketches: sparsification, spanners, and subgraphs. In Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems. 5--14.Google ScholarDigital Library
- Sepehr Assadi, Michael Kapralov, and Sanjeev Khanna. 2019. A simple sublinear-time algorithm for counting arbitrary subgraphs via edge sampling. In 10th Innovations in Theoretical Computer Science, ITCS 2019. Schloss Dagstuhl-Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 6.Google Scholar
- Albert Atserias, Martin Grohe, and Dániel Marx. 2008. Size bounds and query plans for relational joins. In 2008 49th Annual IEEE Symposium on Foundations of Computer Science. IEEE, 739--748.Google ScholarDigital Library
- Ziv Bar-Yossef, Ravi Kumar, and D Sivakumar. 2002. Reductions in streaming algorithms, with an application to counting triangles in graphs. In Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms. 623--632.Google ScholarDigital Library
- Suman K Bera and Amit Chakrabarti. 2017. Towards tighter space bounds for counting triangles and other substructures in graph streams. In 34th Symposium on Theoretical Aspects of Computer Science.Google Scholar
- Suman K. Bera and C. Seshadhri. 2020. How the Degeneracy Helps for Triangle Counting in Graph Streams. In Proceedings of the 39th ACM SIGMOD-SIGACT- SIGAI Symposium on Principles of Database Systems (PODS'20). Association for Computing Machinery, New York, NY, USA, 457--467. https://doi.org/10.1145/3375395.3387665Google ScholarDigital Library
- Vladimir Braverman, Rafail Ostrovsky, and Dan Vilenchik. 2013. How hard is counting triangles in the streaming model?. In International Colloquium on Automata, Languages, and Programming. Springer, 244--254.Google Scholar
- Laurent Bulteau, Vincent Froese, Konstantin Kutzkov, and Rasmus Pagh. 2016. Triangle counting in dynamic graph streams. Algorithmica 76, 1 (2016), 259--278.Google ScholarDigital Library
- Luciana S Buriol, Gereon Frahling, Stefano Leonardi, Alberto Marchetti-Spaccamela, and Christian Sohler. 2006. Counting triangles in data streams. In Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. 253--262.Google ScholarDigital Library
- Clément L. Canonne and Tom Gur. 2018. An Adaptivity Hierarchy Theorem for Property Testing. computational complexity 27, 4 (Dec. 2018), 671--716. https://doi.org/10.1007/s00037-018-0168--4Google Scholar
- Graham Cormode and Donatella Firmani. 2014. A Unifying Framework for l0-Sampling Algorithms. Distributed and Parallel Databases 32, 3 (Sept. 2014), 315--335. https://doi.org/10.1007/s10619-013--7131--9Google ScholarCross Ref
- Graham Cormode and Hossein Jowhari. 2014. A second look at counting triangles in graph streams. Theoretical Computer Science 552 (2014), 44--51.Google ScholarCross Ref
- Graham Cormode and Hossein Jowhari. 2017. A second look at counting triangles in graph streams (corrected). Theoretical Computer Science 683 (2017), 22--30.Google ScholarCross Ref
- Talya Eden, Dana Ron, and C Seshadhri. 2020. Faster sublinear approximation of the number of k-cliques in low-arboricity graphs. In Symposium on Algorithms and Data Structures (SODA).Google ScholarCross Ref
- Hendrik Fichtenberger, Mingze Gao, and Pan Peng. 2020. Sampling Arbitrary Subgraphs Exactly Uniformly in Sublinear Time. In 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik.Google Scholar
- Joshua A Grochow and Manolis Kellis. 2007. Network motif discovery using subgraph enumeration and symmetry-breaking. In Annual International Conference on Research in Computational Molecular Biology. Springer, 92--106.Google ScholarCross Ref
- Rajesh Jayaram and John Kallaugher. 2021. An Optimal Algorithm for Triangle Counting. arXiv preprint arXiv:2105.01785 (2021).Google Scholar
- Hossein Jowhari and Mohammad Ghodsi. 2005. New streaming algorithms for counting triangles in graphs. In International Computing and Combinatorics Conference. Springer, 710--716.Google ScholarCross Ref
- John Kallaugher, Michael Kapralov, and Eric Price. 2018. The sketching complexity of graph and hypergraph counting. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 556--567.Google ScholarCross Ref
- John Kallaugher, Andrew McGregor, Eric Price, and Sofya Vorotnikova. 2019. The complexity of counting cycles in the adjacency list streaming model. In Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 119--133.Google ScholarDigital Library
- John Kallaugher and Eric Price. 2017. A hybrid sampling scheme for triangle counting. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM, 1778--1797.Google ScholarCross Ref
- Daniel M Kane, Kurt Mehlhorn, Thomas Sauerwald, and He Sun. 2012. Counting arbitrary subgraphs in data streams. In International Colloquium on Automata, Languages, and Programming. Springer, 598--609.Google Scholar
- Mihail N Kolountzakis, Gary L Miller, Richard Peng, and Charalampos E Tsourakakis. 2012. Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Mathematics 8, 1--2 (2012), 161--185.Google ScholarCross Ref
- Madhusudan Manjunath, Kurt Mehlhorn, Konstantinos Panagiotou, and He Sun. 2011. Approximate counting of cycles in streams. In European Symposium on Algorithms. Springer, 677--688.Google ScholarCross Ref
- Andrew McGregor and Sofya Vorotnikova. 2020. Triangle and four cycle counting in the data stream model. In Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 445--456.Google ScholarDigital Library
- Andrew McGregor, Sofya Vorotnikova, and Hoa T Vu. 2016. Better algorithms for counting triangles in data streams. In Proceedings of the 35th ACM SIGMOD- SIGACT-SIGAI Symposium on Principles of Database Systems. 401--411.Google ScholarDigital Library
- Hung Q Ngo, Ely Porat, Christopher Ré, and Atri Rudra. 2018. Worst-case optimal join algorithms. Journal of the ACM (JACM) 65, 3 (2018), 1--40.Google ScholarDigital Library
- Rasmus Pagh and Charalampos E Tsourakakis. 2012. Colorful triangle counting and a mapreduce implementation. Inform. Process. Lett. 112, 7 (2012), 277--281.Google ScholarDigital Library
- A Pavan, Kanat Tangwongsan, Srikanta Tirthapura, and Kun-Lung Wu. 2013. Counting and sampling triangles from a graph stream. Proceedings of the VLDB Endowment 6, 14 (2013), 1870--1881.Google ScholarDigital Library
- Alexander Schrijver. 2003. Combinatorial optimization: polyhedra and efficiency. Vol. 24. Springer Science & Business Media.Google Scholar
- Charalampos E Tsourakakis, U Kang, Gary L Miller, and Christos Faloutsos. 2009. Doulion: counting triangles in massive graphs with a coin. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 837--846.Google ScholarDigital Library
Index Terms
- Approximately Counting Subgraphs in Data Streams
Recommendations
Counting arbitrary subgraphs in data streams
ICALP'12: Proceedings of the 39th international colloquium conference on Automata, Languages, and Programming - Volume Part IIWe study the subgraph counting problem in data streams. We provide the first non-trivial estimator for approximately counting the number of occurrences of an arbitrary subgraph H of constant size in a (large) graph G. Our estimator works in the ...
Triangle Counting in Dynamic Graph Streams
Estimating the number of triangles in graph streams using a limited amount of memory has become a popular topic in the last decade. Different variations of the problem have been studied, depending on whether the graph edges are provided in an arbitrary ...
Counting Subgraphs in Degenerate Graphs
We consider the problem of counting the number of copies of a fixed graph H within an input graph G. This is one of the most well-studied algorithmic graph problems, with many theoretical and practical applications. We focus on solving this problem when ...
Comments