skip to main content
10.1145/1963405.1963491acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Counting triangles and the curse of the last reducer

Published:28 March 2011Publication History

ABSTRACT

The clustering coefficient of a node in a social network is a fundamental measure that quantifies how tightly-knit the community is around the node. Its computation can be reduced to counting the number of triangles incident on the particular node in the network. In case the graph is too big to fit into memory, this is a non-trivial task, and previous researchers showed how to estimate the clustering coefficient in this scenario. A different avenue of research is to to perform the computation in parallel, spreading it across many machines. In recent years MapReduce has emerged as a de facto programming paradigm for parallel computation on massive data sets. The main focus of this work is to give MapReduce algorithms for counting triangles which we use to compute clustering coefficients.

Our contributions are twofold. First, we describe a sequential triangle counting algorithm and show how to adapt it to the MapReduce setting. This algorithm achieves a factor of 10-100 speed up over the naive approach. Second, we present a new algorithm designed specifically for the MapReduce framework. A key feature of this approach is that it allows for a smooth tradeoff between the memory available on each individual machine and the total memory available to the algorithm, while keeping the total work done constant. Moreover, this algorithm can use any triangle counting algorithm as a black box and distribute the computation across many machines. We validate our algorithms on real world datasets comprising of millions of nodes and over a billion edges. Our results show both algorithms effectively deal with skew in the degree distribution and lead to dramatic speed ups over the naive implementation.

References

  1. Albert-László Barabási and Réka Albert. Emergence of scaling in random networks. Science, 286(5439):509--512, October 1999.Google ScholarGoogle ScholarCross RefCross Ref
  2. Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '08, pages 16--24, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig. Syntactic clustering of the web. Computer Networks, 29(8-13):1157--1166, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Luciana S. Buriol, Gereon Frahling, Stefano Leonardi, Alberto Marchetti-Spaccamela, and Christian Sohler. Counting triangles in data streams. In Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, PODS '06, pages 253--262, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ronald S. Burt. Structural holes and good ideas. American Journal of Sociology, 110(2):349--99, September 2004.Google ScholarGoogle ScholarCross RefCross Ref
  6. Ronald S. Burt. Second-hand brokerage: Evidence on the importance of local structure for managers, bankers, and analysts. Academy of Management Journal, 50:119--148, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  7. Norishige Chiba and Takao Nishizeki. Arboricity and subgraph listing algorithms. SIAM J. Comput., 14(1):210--223, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. James S. Coleman. Social capital in the creation of human capital. American Journal of Sociology, 94:S95--S120, 1988.Google ScholarGoogle ScholarCross RefCross Ref
  9. Don Coppersmith and Ravi Kumar. An improved data stream algorithm for frequency moments. In Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms, SODA '04, pages 151--156, Philadelphia, PA, USA, 2004. Society for Industrial and Applied Mathematics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified data processing on large clusters. In Proceedings of OSDI, pages 137--150, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Howard Karloff, Siddharth Suri, and Sergei Vassilvitskii. A model of computation for MapReduce. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 938--948, 2010. Google ScholarGoogle ScholarCross RefCross Ref
  12. Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is twitter, a social network or news media. In Proc. 19th International World Wide Web Conference, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jure Leskovec and Eric Horvitz. Planetary-scale views on a large instant-messaging network. In Proc. 17th International World Wide Web Conference, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jimmy Lin and Chris Dyer. Data-Intensive Text Processing with MapReduce. Number 7 in Synthesis Lectures on Human Language Technologies. Morgan and Claypool Publishers, April 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Alejandro Portes. Social capital: Its origins and applications in modern sociology. Annual Review of Sociology, 24:1--24, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  16. Thomas Schank. Algorithmic Aspects of Triangle-Based Network Analysis. PhD thesis, Universität Karlsruhe (TH), 2007.Google ScholarGoogle Scholar
  17. Charalampos E. Tsourakakis, U. Kang, Gary L. Miller, and Christos Faloutsos. Doulion: Counting triangles in massive graphs with a coin. In Knowledge Discovery and Data Mining (KDD), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Duncan Watts and Steve Strogatz. Collective dynamics of 'small-world' networks. Nature, 383:440--442, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  19. Tom White. Hadoop: The Definitive Guide. O'Reilly Media, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Counting triangles and the curse of the last reducer

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WWW '11: Proceedings of the 20th international conference on World wide web
      March 2011
      840 pages
      ISBN:9781450306324
      DOI:10.1145/1963405

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 March 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader