ABSTRACT
In this paper we study the problem of local triangle counting in large graphs. Namely, given a large graph G = (V;E) we want to estimate as accurately as possible the number of triangles incident to every node υ ∈ V in the graph. The problem of computing the global number of triangles in a graph has been considered before, but to our knowledge this is the first paper that addresses the problem of local triangle counting with a focus on the efficiency issues arising in massive graphs. The distribution of the local number of triangles and the related local clustering coefficient can be used in many interesting applications. For example, we show that the measures we compute can help to detect the presence of spamming activity in large-scale Web graphs, as well as to provide useful features to assess content quality in social networks.
For computing the local number of triangles we propose two approximation algorithms, which are based on the idea of min-wise independent permutations (Broder et al. 1998). Our algorithms operate in a semi-streaming fashion, using O(jV j) space in main memory and performing O(log jV j) sequential scans over the edges of the graph. The first algorithm we describe in this paper also uses O(jEj) space in external memory during computation, while the second algorithm uses only main memory. We present the theoretical analysis as well as experimental results in massive graphs demonstrating the practical efficiency of our approach.
- N. Alon, R. Yuster, and U. Zwick. Finding and counting given length cycles. Algorithmica, 17(3):209--223, 1997.Google ScholarCross Ref
- Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. In SODA, 2002. Google ScholarDigital Library
- V. Batagelj and A. Mrvar. A subquadratic triad census algorithm for large sparse networks with small maximum degree. Social Networks, 23:237--243, 2001.Google ScholarCross Ref
- T. Bohman, C. Cooper, and A. M. Frieze. Min-wise independent linear permutations. Electr. J. Comb, 7, 2000.Google Scholar
- P. Boldi and S. Vigna. The webgraph framework I: compression techniques. In WWW, 2004. Google ScholarDigital Library
- A. Z. Broder. On the resemblance and containment of documents. In Compression and Complexity of Sequences, IEEE Computer Society, 1998. Google ScholarDigital Library
- A. Z. Broder. Identifying and filtering near-duplicate documents. In CPM. Springer, 2000. Google ScholarDigital Library
- A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations. In STOC, New York, NY, USA, 1998. Google ScholarDigital Library
- A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syntactic clustering of the web. In WWW, 1997. Google ScholarDigital Library
- L. S. Buriol, G. Frahling, S. Leonardi, A. Marchetti-Spaccamela, and C. Sohler. Counting triangles in data streams. In PODS, 2006. Google ScholarDigital Library
- C. Castillo, D. Donato, L. Becchetti, P. Boldi, S. Leonardi, M. Santini, and S. Vigna. A reference collection for web spam. SIGIR Forum, 40(2):11--24, December 2006. Google ScholarDigital Library
- C. Castillo, D. Donato, A. Gionis, V. Murdock, and F. Silvestri. Know your neighbors: Web spam detection using the web topology. In SIGIR, 2007. Google ScholarDigital Library
- D. Coppersmith and R. Kumar. An improved data stream algorithm for frequency moments. In SODA, 2004. Google ScholarDigital Library
- D. Coppersmith and S. Winograd. Matrix multiplication via arithmetic progressions. Journal of Symbolic Computation, 9(3):251--280, 1990. Google ScholarDigital Library
- C. Demetrescu, I. Finocchi, and A. Ribichini. Trading of space for passes in graph streaming problems. In SODA, 2006. Google ScholarDigital Library
- J.-P. Eckmann and E. Moses. Curvature of co-links uncovers hidden thematic layers in the world wide web. PNAS, 99(9):5825--5829, 2002.Google ScholarCross Ref
- J. Feigenbaum, S. Kannan, M. A. Gregor, S. Suri, and J. Zhang. On graph problems in a semi-streaming model. In ICALP, 2004.Google ScholarCross Ref
- D. Fetterly, M. Manasse, and M. Najork. Spam, damn spam, and statistics: Using statistical analysis to locate spam web pages. In WebDB, 2004. Google ScholarDigital Library
- D. Fogaras and B. Racz. Scaling link-based similarity search. In WWW, 2005. Google ScholarDigital Library
- D. Gibson, R. Kumar, and A. Tomkins. Discovering large dense subgraphs in massive graphs. In VLDB, 2005. Google ScholarDigital Library
- A. Gulli and A. Signorini. The indexable Web is more than 11.5 billion pages. In WWW, 2005. Google ScholarDigital Library
- T. Haveliwala. Efficient computation of pagerank. Technical report, Stanford University, 1999.Google Scholar
- M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. Dimacs Series In Discrete Mathematics And Theoretical Computer Science, pages 107--118, 1999. Google ScholarDigital Library
- P. Indyk. A small approximately min-wise independent family of hash functions. In SODA, 1999. Google ScholarDigital Library
- A. Itai and M. Rodeh. Finding a minimum circuit in a graph. SIAM Journal of Computing, 7(4):413--423, 1978.Google ScholarCross Ref
- G. Jeh and J. Widom. Simrank: a measure of structural-context similarity. In KDD, New York, NY, USA, 2002. Google ScholarDigital Library
- R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling the Web for emerging cyber-communities. Computer Networks, 31(11{16):1481--1493, 1999. Google ScholarDigital Library
- J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over time: densification laws, shrinking diameters and possible explanations. In KDD, 2005. Google ScholarDigital Library
- M. E. J. Newman. The structure and function of complex networks. SIAM Review, 45(2):167--256, 2003.Google ScholarDigital Library
- T. Schank and D. Wagner. Finding, counting and listing all triangles in large graphs, an experimental study. In Proceedings of the 4th International Workshop on Experimental and Efficient Algorithms (WEA), 2005. Google ScholarDigital Library
- J. S. Vitter. External memory algorithms and data structures. ACM Computing Surveys, 33(2):209--271, 2001. Google ScholarDigital Library
- H. T. Welser, E. Gleave, D. Fisher, and M. Smith. Visualizing the signatures of social roles in online discussion groups. The Journal of Social Structure, 8(2), 2007.Google Scholar
Index Terms
- Efficient semi-streaming algorithms for local triangle counting in massive graphs
Recommendations
Efficient algorithms for large-scale local triangle counting
In this article, we study the problem of approximate local triangle counting in large graphs. Namely, given a large graph G=(V,E) we want to estimate as accurately as possible the number of triangles incident to every node v∈ V in the graph. We consider ...
Bipartite subgraphs of triangle-free subcubic graphs
Suppose G is a graph with n vertices and m edges. Let n^' be the maximum number of vertices in an induced bipartite subgraph of G and let m^' be the maximum number of edges in a spanning bipartite subgraph of G. Then b(G)=m^'/m is called the bipartite ...
Induced cycles in triangle graphs
The triangle graph of a graph G, denoted by T(G), is the graph whose vertices represent the triangles (K3 subgraphs) of G, and two vertices of T(G) are adjacent if and only if the corresponding triangles share an edge. In this paper, we characterize ...
Comments