research-article

Counting triangles and the curse of the last reducer

Authors:
Siddharth Suri

Yahoo! Research, New York, NY, USA

Yahoo! Research, New York, NY, USA
View Profile

,
Sergei Vassilvitskii

Yahoo! Research, New York, NY, USA

Yahoo! Research, New York, NY, USA
View Profile

WWW '11: Proceedings of the 20th international conference on World wide webMarch 2011Pages 607–614https://doi.org/10.1145/1963405.1963491

Published:28 March 2011Publication History

WWW '11: Proceedings of the 20th international conference on World wide web

Pages 607–614

ABSTRACT

The clustering coefficient of a node in a social network is a fundamental measure that quantifies how tightly-knit the community is around the node. Its computation can be reduced to counting the number of triangles incident on the particular node in the network. In case the graph is too big to fit into memory, this is a non-trivial task, and previous researchers showed how to estimate the clustering coefficient in this scenario. A different avenue of research is to to perform the computation in parallel, spreading it across many machines. In recent years MapReduce has emerged as a de facto programming paradigm for parallel computation on massive data sets. The main focus of this work is to give MapReduce algorithms for counting triangles which we use to compute clustering coefficients.

Our contributions are twofold. First, we describe a sequential triangle counting algorithm and show how to adapt it to the MapReduce setting. This algorithm achieves a factor of 10-100 speed up over the naive approach. Second, we present a new algorithm designed specifically for the MapReduce framework. A key feature of this approach is that it allows for a smooth tradeoff between the memory available on each individual machine and the total memory available to the algorithm, while keeping the total work done constant. Moreover, this algorithm can use any triangle counting algorithm as a black box and distribute the computation across many machines. We validate our algorithms on real world datasets comprising of millions of nodes and over a billion edges. Our results show both algorithms effectively deal with skew in the degree distribution and lead to dramatic speed ups over the naive implementation.

References

Albert-László Barabási and Réka Albert. Emergence of scaling in random networks. Science, 286(5439):509--512, October 1999.Google ScholarCross Ref
Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '08, pages 16--24, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig. Syntactic clustering of the web. Computer Networks, 29(8-13):1157--1166, 1997. Google ScholarDigital Library
Luciana S. Buriol, Gereon Frahling, Stefano Leonardi, Alberto Marchetti-Spaccamela, and Christian Sohler. Counting triangles in data streams. In Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, PODS '06, pages 253--262, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
Ronald S. Burt. Structural holes and good ideas. American Journal of Sociology, 110(2):349--99, September 2004.Google ScholarCross Ref
Ronald S. Burt. Second-hand brokerage: Evidence on the importance of local structure for managers, bankers, and analysts. Academy of Management Journal, 50:119--148, 2007.Google ScholarCross Ref
Norishige Chiba and Takao Nishizeki. Arboricity and subgraph listing algorithms. SIAM J. Comput., 14(1):210--223, 1985. Google ScholarDigital Library
James S. Coleman. Social capital in the creation of human capital. American Journal of Sociology, 94:S95--S120, 1988.Google ScholarCross Ref
Don Coppersmith and Ravi Kumar. An improved data stream algorithm for frequency moments. In Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms, SODA '04, pages 151--156, Philadelphia, PA, USA, 2004. Society for Industrial and Applied Mathematics. Google ScholarDigital Library
Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified data processing on large clusters. In Proceedings of OSDI, pages 137--150, 2004. Google ScholarDigital Library
Howard Karloff, Siddharth Suri, and Sergei Vassilvitskii. A model of computation for MapReduce. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 938--948, 2010. Google ScholarCross Ref
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is twitter, a social network or news media. In Proc. 19th International World Wide Web Conference, 2010. Google ScholarDigital Library
Jure Leskovec and Eric Horvitz. Planetary-scale views on a large instant-messaging network. In Proc. 17th International World Wide Web Conference, 2008. Google ScholarDigital Library
Jimmy Lin and Chris Dyer. Data-Intensive Text Processing with MapReduce. Number 7 in Synthesis Lectures on Human Language Technologies. Morgan and Claypool Publishers, April 2010. Google ScholarDigital Library
Alejandro Portes. Social capital: Its origins and applications in modern sociology. Annual Review of Sociology, 24:1--24, 1998.Google ScholarCross Ref
Thomas Schank. Algorithmic Aspects of Triangle-Based Network Analysis. PhD thesis, Universität Karlsruhe (TH), 2007.Google Scholar
Charalampos E. Tsourakakis, U. Kang, Gary L. Miller, and Christos Faloutsos. Doulion: Counting triangles in massive graphs with a coin. In Knowledge Discovery and Data Mining (KDD), 2009. Google ScholarDigital Library
Duncan Watts and Steve Strogatz. Collective dynamics of 'small-world' networks. Nature, 383:440--442, 1998.Google ScholarCross Ref
Tom White. Hadoop: The Definitive Guide. O'Reilly Media, 2009. Google ScholarDigital Library

Index Terms

Counting triangles and the curse of the last reducer
1. Information systems
  1. Information systems applications

Recommendations

Efficient semi-streaming algorithms for local triangle counting in massive graphs
KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

In this paper we study the problem of local triangle counting in large graphs. Namely, given a large graph G = (V;E) we want to estimate as accurately as possible the number of triangles incident to every node υ ∈ V in the graph. The problem of ...
Read More
DOULION: counting triangles in massive graphs with a coin
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Counting the number of triangles in a graph is a beautiful algorithmic problem which has gained importance over the last years due to its significant role in complex network analysis. Metrics frequently computed such as the clustering coefficient and ...
Read More
An efficient MapReduce algorithm for counting triangles in a very large graph
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management

Triangle counting problem is one of the fundamental problem in various domains. The problem can be utilized for computation of clustering coefficient, transitivity, trianglular connectivity, trusses, etc. The problem have been extensively studied in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '11: Proceedings of the 20th international conference on World wide web
March 2011
840 pages
ISBN:9781450306324
DOI:10.1145/1963405
General Chairs:
S. Sadagopan
IIIT-Bangalore, India
,
Krithi Ramamritham
IIT-Bombay, India
,
Arun Kumar
IBM Research, India
,
M. P. Ravindra
Infosys E & R, India
,
Program Chairs:
Elisa Bertino
Purdue University, USA
,
Ravi Kumar
Yahoo! Research, USA
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 March 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clustering coefficient
mapreduce
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 244
  Total Citations
  View Citations
- 1,185
  Total Downloads
- Downloads (Last 12 months)97
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Counting triangles and the curse of the last reducer

WWW '11: Proceedings of the 20th international conference on World wide web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Efficient semi-streaming algorithms for local triangle counting in massive graphs

DOULION: counting triangles in massive graphs with a coin

An efficient MapReduce algorithm for counting triangles in a very large graph