research-article

The little engine(s) that could: scaling online social networks

Authors:
Josep M. Pujol

Telefonica Research, Barcelona, Spain

Telefonica Research, Barcelona, Spain
View Profile

,
Vijay Erramilli

Telefonica Research, Barcelona, Spain

Telefonica Research, Barcelona, Spain
View Profile

,
Georgos Siganos

Telefonica Research, Barcelona, Spain

Telefonica Research, Barcelona, Spain
View Profile

,
Xiaoyuan Yang

Telefonica Research, Barcelona, Spain

Telefonica Research, Barcelona, Spain
View Profile

,
Nikos Laoutaris

Telefonica Reserach, Barcelona, Spain

Telefonica Reserach, Barcelona, Spain
View Profile

,
Parminder Chhabra

Telefonica Reserach, Barcelona, Spain

Telefonica Reserach, Barcelona, Spain
View Profile

,
Pablo Rodriguez

Telefonica Research, Barcelona, Spain

Telefonica Research, Barcelona, Spain
View Profile

ACM SIGCOMM Computer Communication Review Volume 40 Issue 4October 2010pp 375–386https://doi.org/10.1145/1851275.1851227

Published:30 August 2010Publication History

ACM SIGCOMM Computer Communication Review

Abstract

The difficulty of scaling Online Social Networks (OSNs) has introduced new system design challenges that has often caused costly re-architecting for services like Twitter and Facebook. The complexity of interconnection of users in social networks has introduced new scalability challenges. Conventional vertical scaling by resorting to full replication can be a costly proposition. Horizontal scaling by partitioning and distributing data among multiples servers - e.g. using DHTs - can lead to costly inter-server communication.

We design, implement, and evaluate SPAR, a social partitioning and replication middle-ware that transparently leverages the social graph structure to achieve data locality while minimizing replication. SPAR guarantees that for all users in an OSN, their direct neighbor's data is co-located in the same server. The gains from this approach are multi-fold: application developers can assume local semantics, i.e., develop as they would for a single server; scalability is achieved by adding commodity servers with low memory and network I/O requirements; and redundancy is achieved at a fraction of the cost.

We detail our system design and an evaluation based on datasets from Twitter, Orkut, and Facebook, with a working implementation. We show that SPAR incurs minimum overhead, and can help a well-known open-source Twitter clone reach Twitter's scale without changing a line of its application logic and achieves higher throughput than Cassandra, Facebook's DHT based key-value store database.

References

Facebook's memcached multiget hole: More machines != more capacity. http://highscalability.com/blog/2009/10/26/ facebooks-memcached-multiget-hole-more-machinesmore- capacit.html.Google Scholar
Friendster lost lead due to failure to scale. http://highscalability.com/blog/2007/11/13/friendsterlost- lead-because-of-a-failure-to-scale.html.Google Scholar
Notes from scaling mysql up or out. http://venublog.com/2008/04/16/notes-from-scalingmysql- up-or-out/.Google Scholar
Rightscale. http://www.rightscale.com.Google Scholar
Status net. http://status.net.Google Scholar
Tsung: Distributed load testing tool. http://tsung.erlang-projects.org/.Google Scholar
Twitter architecture. http://highscalability.com/scaling-twitter-makingtwitter- 10000-percent-faster.Google Scholar
A. Adya, W. J. Bolosky, M. Castro, G. Cermak, R. Chaiken, J. R. Douceur, Jon, J. Howell, J. R. Lorch, M. Theimer, and R. P. Wattenhofer. Farsite: Federated, available, and reliable storage for an incompletely trusted environment. In OSDI 02. Google ScholarDigital Library
M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, A. Konwinski, G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and M. Zaharia. Above the clouds: A berkeley view of cloud computing. Technical Report UCB/EECS-2009-28.Google Scholar
S. Arora, S. Rao, and U. Vazirani. Expander flows, geometric embeddings and graph partitioning. J. ACM, 56(2):137, 2009. Google ScholarDigital Library
F. Benevenuto, T. Rodrigues, M. Cha, and V. Almeida. Characterizing user behavior in online social networks. In Proc. of IMC'09, pages 49--62, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. J.STAT.MECH., page P10008, 2008.Google Scholar
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: amazon's highly available key-value store. SIGOPS Oper. Syst. Rev., 41(6):205--220, 2007. Google ScholarDigital Library
M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA, 1979. Google ScholarDigital Library
R. G. Guy, J. S. Heidemann, W. Mak, T. W. Page, Jr., G. J. Popek, and D. Rothmeier. Implementation of the Ficus replicated file system. In USENIX Conference Proceedings, 1990.Google Scholar
J. Hamilton. Geo-replication at facebook. http://perspectives.mvdirona.com/2008/08/21/ GeoReplicationAtFacebook.aspx.Google Scholar
J. Hamilton. Scaling linkedin. http://perspectives.mvdirona.com/2008/06/08/ ScalingLinkedIn.aspx.Google Scholar
HighScalability.com. Why are facebook, digg and twitter so hard to scale? http://highscalability.com/blog/2009/10/13 /why-arefacebook- digg-and-twitter-so-hard-to-scale.html.Google Scholar
J. Rothschild. High performance at massive scale - lessons learned at facebook. http://cns.ucsd.edu/lecturearchive09.shtml#Roth.Google Scholar
G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput., 20(1):359--392, 1998. Google ScholarDigital Library
H. Kwak, Y. Choi, Y.-H. Eom, H. Jeong, and S. Moon. Mining communities in networks: a solution for consistency and its evaluation. In ACM IMC'09. Google ScholarDigital Library
H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? 2010.Google Scholar
J. Leskovec, J. Kleinberg, and C. Faloutsos. Graph evolution: Densification and shrinking diameters. ACM Transactions on KDD, 1:1, 2007. Google ScholarDigital Library
J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. CoRR, abs/0810.1355, 2008.Google Scholar
N. Media. Growth of twitter. http://blog.nielsen.com/nielsenwire/online mobile/ twitters-tweet-smell-of-success/.Google Scholar
A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In ACM IMC'07. Google ScholarDigital Library
M. Newman and J. Park. Why social networks are different from other types of networks. Phys. Rev. E, 68:036122, 2003.Google ScholarCross Ref
M. E. J. Newman. Modularity and community structure in networks. PNAS, 103:8577, 2006.Google ScholarCross Ref
J. M. Pujol, V. Erramilli, and P. Rodriguez. Divide and conquer: Partitioning online social networks. http://arxiv.org/abs/0905.4918v1, 2009.Google Scholar
J. M. Pujol, G. Siganos, V. Erramilli, and P. Rodriguez. Scaling online social networks without pains. In Proc of NETDB, 2009.Google Scholar
M. Satyanarayanan. Coda: A highly available file system for a distributed workstation environment. IEEE Transactions on Computers, 39:447--459, 1990. Google ScholarDigital Library
F. Schneider, A. Feldmann, B. Krishnamurthy, and W. Willinger. Understanding online social network usage from a network perspective. In IMC'09. Google ScholarDigital Library
D. B. Terry, M. M. Theimer, K. Petersen, A. J. Demers, M. J. Spreitzer, and C. H. Hauser. Managing update conflicts in bayou, a weakly connected replicated storage system. In ACM SOSP'95. Google ScholarDigital Library
B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi. On the evolution of user interaction in facebook. In Proc of WOSN'09). Google ScholarDigital Library

Index Terms

The little engine(s) that could: scaling online social networks
1. Information systems
  1. Information retrieval
    1. Search engine architectures and scalability
      1. Distributed retrieval
      2. Peer-to-peer retrieval
  2. Information storage systems
    1. Storage architectures
      1. Distributed storage
2. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory

Recommendations

The little engine(s) that could: scaling online social networks
SIGCOMM '10: Proceedings of the ACM SIGCOMM 2010 conference

The difficulty of scaling Online Social Networks (OSNs) has introduced new system design challenges that has often caused costly re-architecting for services like Twitter and Facebook. The complexity of interconnection of users in social networks has ...
Read More
"I LOVE THIS SITE!" vs. "It's a little girly": Perceptions of and Initial User Experience with Pinterest
CSCW '15: Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing

Pinterest is a popular social networking site that lets people discover, collect, and share pictures of items from the Web. Among popular social media sites, Pinterest has by far the most skewed gender distribution: women are four times more likely than ...
Read More
Replicate and Bundle (RnB) -- A Mechanism for Relieving Bottlenecks in Data Centers
IPDPS '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing

This work addresses the scalability and efficiency of RAM-based storage systems wherein multiple objects must be retrieved per user request. Here, much of the CPU work is per server transaction, not per requested item. Adding servers and spreading the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGCOMM Computer Communication Review Volume 40, Issue 4
SIGCOMM '10
October 2010
481 pages
ISSN:0146-4833
DOI:10.1145/1851275
Issue’s Table of Contents
SIGCOMM '10: Proceedings of the ACM SIGCOMM 2010 conference
August 2010
500 pages
ISBN:9781450302012
DOI:10.1145/1851182
General Chairs:
Shiv Kalyanaraman
IBM Research, India
,
Venkat Padmanabhan
Microsoft Research, India
,
K. K. Ramakrishnan
IBM Research, India
,
Rajeev Shorey
NIIT University, India
,
Program Chairs:
K. K. Ramakrishnan
IBM Research, India
,
Geoffrey M. Voelker
University of California, San Diego, USA
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 August 2010
Check for updates
Author Tags
partition
replication
scalability
social networks
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 205
  Total Citations
  View Citations
- 1,928
  Total Downloads
- Downloads (Last 12 months)89
- Downloads (Last 6 weeks)14
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The little engine(s) that could: scaling online social networks

ACM SIGCOMM Computer Communication Review

Abstract

References

Cited By

Index Terms

Recommendations

The little engine(s) that could: scaling online social networks

"I LOVE THIS SITE!" vs. "It's a little girly": Perceptions of and Initial User Experience with Pinterest

Replicate and Bundle (RnB) -- A Mechanism for Relieving Bottlenecks in Data Centers