skip to main content
research-article

The little engine(s) that could: scaling online social networks

Published:30 August 2010Publication History
Skip Abstract Section

Abstract

The difficulty of scaling Online Social Networks (OSNs) has introduced new system design challenges that has often caused costly re-architecting for services like Twitter and Facebook. The complexity of interconnection of users in social networks has introduced new scalability challenges. Conventional vertical scaling by resorting to full replication can be a costly proposition. Horizontal scaling by partitioning and distributing data among multiples servers - e.g. using DHTs - can lead to costly inter-server communication.

We design, implement, and evaluate SPAR, a social partitioning and replication middle-ware that transparently leverages the social graph structure to achieve data locality while minimizing replication. SPAR guarantees that for all users in an OSN, their direct neighbor's data is co-located in the same server. The gains from this approach are multi-fold: application developers can assume local semantics, i.e., develop as they would for a single server; scalability is achieved by adding commodity servers with low memory and network I/O requirements; and redundancy is achieved at a fraction of the cost.

We detail our system design and an evaluation based on datasets from Twitter, Orkut, and Facebook, with a working implementation. We show that SPAR incurs minimum overhead, and can help a well-known open-source Twitter clone reach Twitter's scale without changing a line of its application logic and achieves higher throughput than Cassandra, Facebook's DHT based key-value store database.

References

  1. Facebook's memcached multiget hole: More machines != more capacity. http://highscalability.com/blog/2009/10/26/ facebooks-memcached-multiget-hole-more-machinesmore- capacit.html.Google ScholarGoogle Scholar
  2. Friendster lost lead due to failure to scale. http://highscalability.com/blog/2007/11/13/friendsterlost- lead-because-of-a-failure-to-scale.html.Google ScholarGoogle Scholar
  3. Notes from scaling mysql up or out. http://venublog.com/2008/04/16/notes-from-scalingmysql- up-or-out/.Google ScholarGoogle Scholar
  4. Rightscale. http://www.rightscale.com.Google ScholarGoogle Scholar
  5. Status net. http://status.net.Google ScholarGoogle Scholar
  6. Tsung: Distributed load testing tool. http://tsung.erlang-projects.org/.Google ScholarGoogle Scholar
  7. Twitter architecture. http://highscalability.com/scaling-twitter-makingtwitter- 10000-percent-faster.Google ScholarGoogle Scholar
  8. A. Adya, W. J. Bolosky, M. Castro, G. Cermak, R. Chaiken, J. R. Douceur, Jon, J. Howell, J. R. Lorch, M. Theimer, and R. P. Wattenhofer. Farsite: Federated, available, and reliable storage for an incompletely trusted environment. In OSDI 02. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, A. Konwinski, G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and M. Zaharia. Above the clouds: A berkeley view of cloud computing. Technical Report UCB/EECS-2009-28.Google ScholarGoogle Scholar
  10. S. Arora, S. Rao, and U. Vazirani. Expander flows, geometric embeddings and graph partitioning. J. ACM, 56(2):137, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. F. Benevenuto, T. Rodrigues, M. Cha, and V. Almeida. Characterizing user behavior in online social networks. In Proc. of IMC'09, pages 49--62, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. J.STAT.MECH., page P10008, 2008.Google ScholarGoogle Scholar
  13. G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: amazon's highly available key-value store. SIGOPS Oper. Syst. Rev., 41(6):205--220, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. G. Guy, J. S. Heidemann, W. Mak, T. W. Page, Jr., G. J. Popek, and D. Rothmeier. Implementation of the Ficus replicated file system. In USENIX Conference Proceedings, 1990.Google ScholarGoogle Scholar
  16. J. Hamilton. Geo-replication at facebook. http://perspectives.mvdirona.com/2008/08/21/ GeoReplicationAtFacebook.aspx.Google ScholarGoogle Scholar
  17. J. Hamilton. Scaling linkedin. http://perspectives.mvdirona.com/2008/06/08/ ScalingLinkedIn.aspx.Google ScholarGoogle Scholar
  18. HighScalability.com. Why are facebook, digg and twitter so hard to scale? http://highscalability.com/blog/2009/10/13 /why-arefacebook- digg-and-twitter-so-hard-to-scale.html.Google ScholarGoogle Scholar
  19. J. Rothschild. High performance at massive scale - lessons learned at facebook. http://cns.ucsd.edu/lecturearchive09.shtml#Roth.Google ScholarGoogle Scholar
  20. G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput., 20(1):359--392, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Kwak, Y. Choi, Y.-H. Eom, H. Jeong, and S. Moon. Mining communities in networks: a solution for consistency and its evaluation. In ACM IMC'09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? 2010.Google ScholarGoogle Scholar
  23. J. Leskovec, J. Kleinberg, and C. Faloutsos. Graph evolution: Densification and shrinking diameters. ACM Transactions on KDD, 1:1, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. CoRR, abs/0810.1355, 2008.Google ScholarGoogle Scholar
  25. N. Media. Growth of twitter. http://blog.nielsen.com/nielsenwire/online mobile/ twitters-tweet-smell-of-success/.Google ScholarGoogle Scholar
  26. A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In ACM IMC'07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Newman and J. Park. Why social networks are different from other types of networks. Phys. Rev. E, 68:036122, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  28. M. E. J. Newman. Modularity and community structure in networks. PNAS, 103:8577, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  29. J. M. Pujol, V. Erramilli, and P. Rodriguez. Divide and conquer: Partitioning online social networks. http://arxiv.org/abs/0905.4918v1, 2009.Google ScholarGoogle Scholar
  30. J. M. Pujol, G. Siganos, V. Erramilli, and P. Rodriguez. Scaling online social networks without pains. In Proc of NETDB, 2009.Google ScholarGoogle Scholar
  31. M. Satyanarayanan. Coda: A highly available file system for a distributed workstation environment. IEEE Transactions on Computers, 39:447--459, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. F. Schneider, A. Feldmann, B. Krishnamurthy, and W. Willinger. Understanding online social network usage from a network perspective. In IMC'09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D. B. Terry, M. M. Theimer, K. Petersen, A. J. Demers, M. J. Spreitzer, and C. H. Hauser. Managing update conflicts in bayou, a weakly connected replicated storage system. In ACM SOSP'95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi. On the evolution of user interaction in facebook. In Proc of WOSN'09). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The little engine(s) that could: scaling online social networks

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGCOMM Computer Communication Review
            ACM SIGCOMM Computer Communication Review  Volume 40, Issue 4
            SIGCOMM '10
            October 2010
            481 pages
            ISSN:0146-4833
            DOI:10.1145/1851275
            Issue’s Table of Contents

            Copyright © 2010 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 30 August 2010

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader