skip to main content
10.1145/1401890.1401898acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Efficient semi-streaming algorithms for local triangle counting in massive graphs

Published:24 August 2008Publication History

ABSTRACT

In this paper we study the problem of local triangle counting in large graphs. Namely, given a large graph G = (V;E) we want to estimate as accurately as possible the number of triangles incident to every node υ ∈ V in the graph. The problem of computing the global number of triangles in a graph has been considered before, but to our knowledge this is the first paper that addresses the problem of local triangle counting with a focus on the efficiency issues arising in massive graphs. The distribution of the local number of triangles and the related local clustering coefficient can be used in many interesting applications. For example, we show that the measures we compute can help to detect the presence of spamming activity in large-scale Web graphs, as well as to provide useful features to assess content quality in social networks.

For computing the local number of triangles we propose two approximation algorithms, which are based on the idea of min-wise independent permutations (Broder et al. 1998). Our algorithms operate in a semi-streaming fashion, using O(jV j) space in main memory and performing O(log jV j) sequential scans over the edges of the graph. The first algorithm we describe in this paper also uses O(jEj) space in external memory during computation, while the second algorithm uses only main memory. We present the theoretical analysis as well as experimental results in massive graphs demonstrating the practical efficiency of our approach.

References

  1. N. Alon, R. Yuster, and U. Zwick. Finding and counting given length cycles. Algorithmica, 17(3):209--223, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  2. Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. In SODA, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. V. Batagelj and A. Mrvar. A subquadratic triad census algorithm for large sparse networks with small maximum degree. Social Networks, 23:237--243, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  4. T. Bohman, C. Cooper, and A. M. Frieze. Min-wise independent linear permutations. Electr. J. Comb, 7, 2000.Google ScholarGoogle Scholar
  5. P. Boldi and S. Vigna. The webgraph framework I: compression techniques. In WWW, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Z. Broder. On the resemblance and containment of documents. In Compression and Complexity of Sequences, IEEE Computer Society, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Z. Broder. Identifying and filtering near-duplicate documents. In CPM. Springer, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations. In STOC, New York, NY, USA, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syntactic clustering of the web. In WWW, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. S. Buriol, G. Frahling, S. Leonardi, A. Marchetti-Spaccamela, and C. Sohler. Counting triangles in data streams. In PODS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Castillo, D. Donato, L. Becchetti, P. Boldi, S. Leonardi, M. Santini, and S. Vigna. A reference collection for web spam. SIGIR Forum, 40(2):11--24, December 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Castillo, D. Donato, A. Gionis, V. Murdock, and F. Silvestri. Know your neighbors: Web spam detection using the web topology. In SIGIR, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Coppersmith and R. Kumar. An improved data stream algorithm for frequency moments. In SODA, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Coppersmith and S. Winograd. Matrix multiplication via arithmetic progressions. Journal of Symbolic Computation, 9(3):251--280, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Demetrescu, I. Finocchi, and A. Ribichini. Trading of space for passes in graph streaming problems. In SODA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J.-P. Eckmann and E. Moses. Curvature of co-links uncovers hidden thematic layers in the world wide web. PNAS, 99(9):5825--5829, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  17. J. Feigenbaum, S. Kannan, M. A. Gregor, S. Suri, and J. Zhang. On graph problems in a semi-streaming model. In ICALP, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  18. D. Fetterly, M. Manasse, and M. Najork. Spam, damn spam, and statistics: Using statistical analysis to locate spam web pages. In WebDB, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Fogaras and B. Racz. Scaling link-based similarity search. In WWW, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Gibson, R. Kumar, and A. Tomkins. Discovering large dense subgraphs in massive graphs. In VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Gulli and A. Signorini. The indexable Web is more than 11.5 billion pages. In WWW, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Haveliwala. Efficient computation of pagerank. Technical report, Stanford University, 1999.Google ScholarGoogle Scholar
  23. M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. Dimacs Series In Discrete Mathematics And Theoretical Computer Science, pages 107--118, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. Indyk. A small approximately min-wise independent family of hash functions. In SODA, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Itai and M. Rodeh. Finding a minimum circuit in a graph. SIAM Journal of Computing, 7(4):413--423, 1978.Google ScholarGoogle ScholarCross RefCross Ref
  26. G. Jeh and J. Widom. Simrank: a measure of structural-context similarity. In KDD, New York, NY, USA, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling the Web for emerging cyber-communities. Computer Networks, 31(11{16):1481--1493, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over time: densification laws, shrinking diameters and possible explanations. In KDD, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. E. J. Newman. The structure and function of complex networks. SIAM Review, 45(2):167--256, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. T. Schank and D. Wagner. Finding, counting and listing all triangles in large graphs, an experimental study. In Proceedings of the 4th International Workshop on Experimental and Efficient Algorithms (WEA), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. S. Vitter. External memory algorithms and data structures. ACM Computing Surveys, 33(2):209--271, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. H. T. Welser, E. Gleave, D. Fisher, and M. Smith. Visualizing the signatures of social roles in online discussion groups. The Journal of Social Structure, 8(2), 2007.Google ScholarGoogle Scholar

Index Terms

  1. Efficient semi-streaming algorithms for local triangle counting in massive graphs

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
              August 2008
              1116 pages
              ISBN:9781605581934
              DOI:10.1145/1401890
              • General Chair:
              • Ying Li,
              • Program Chairs:
              • Bing Liu,
              • Sunita Sarawagi

              Copyright © 2008 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 24 August 2008

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              KDD '08 Paper Acceptance Rate118of593submissions,20%Overall Acceptance Rate1,133of8,635submissions,13%

              Upcoming Conference

              KDD '24

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader