research-article

Cassandra: a decentralized structured storage system

Authors:
Avinash Lakshman

Facebook

Facebook
View Profile

,
Prashant Malik

Facebook

Facebook
View Profile

Authors Info & Claims

ACM SIGOPS Operating Systems Review Volume 44 Issue 2April 2010pp 35–40https://doi.org/10.1145/1773912.1773922

Published:14 April 2010Publication History

ACM SIGOPS Operating Systems Review

Abstract

Cassandra is a distributed storage system for managing very large amounts of structured data spread out across many commodity servers, while providing highly available service with no single point of failure. Cassandra aims to run on top of an infrastructure of hundreds of nodes (possibly spread across different data centers). At this scale, small and large components fail continuously. The way Cassandra manages the persistent state in the face of these failures drives the reliability and scalability of the software systems relying on this service. While in many ways Cassandra resembles a database and shares many design and implementation strategies therewith, Cassandra does not support a full relational data model; instead, it provides clients with a simple data model that supports dynamic control over data layout and format. Cassandra system was designed to run on cheap commodity hardware and handle high write throughput while not sacrificing read efficiency.

References

MySQL AB. Mysql.Google Scholar
Atul Adya, William J. Bolosky, Miguel Castro, Gerald Cermak, Ronnie Chaiken, John R. Douceur, Jon Howell, Jacob R. Lorch, Marvin Theimer, and Roger P. Wattenhofer. Farsite: Federated, available, and reliable storage for an incompletely trusted environment. In In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI, pages 1--14, 2002. Google ScholarDigital Library
Mike Burrows. The chubby lock service for loosely-coupled distributed systems. In OSDI '06: Proceedings of the 7th symposium on Operating systems design and implementation, pages 335--350, Berkeley, CA, USA, 2006. USENIX Association. Google ScholarDigital Library
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. Bigtable: A distributed storage system for structured data. In In Proceedings of the 7th Conference on USENIX Symposium on Operating Systems Design and Implementation - Volume 7, pages 205--218, 2006. Google ScholarDigital Library
Abhinandan Das, Indranil Gupta, and Ashish Motivala. Swim: Scalable weakly-consistent infection-style process group membership protocol. In DSN '02: Proceedings of the 2002 International Conference on Dependable Systems and Networks, pages 303--312, Washington, DC, USA, 2002. IEEE Computer Society. Google ScholarDigital Library
Giuseppe de Candia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. Dynamo: amazon***Os highly available key-value store. In Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, pages 205--220. ACM, 2007. Google ScholarDigital Library
Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107--113, 2008. Google ScholarDigital Library
Xavier Défago, Péter Urbán, Naohiro Hayashibara, and Takuya Katayama. The φ accrual failure detector. In RR IS-RR-2004-010, Japan Advanced Institute of Science and Technology, pages 66--78, 2004.Google Scholar
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The google file system. In SOSP '03: Proceedings of the nineteenth ACM symposium on Operating systems principles, pages 29--43, New York, NY, USA, 2003. ACM. Google ScholarDigital Library
Jim Gray and Pat Helland. The dangers of replication and a solution. In In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pages 173--182, 1996. Google ScholarDigital Library
David Karger, Eric Lehman, Tom Leighton, Matthew Levine, Daniel Lewin, and Rina Panigrahy. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In In ACM Symposium on Theory of Computing, pages 654--663, 1997. Google ScholarDigital Library
Matthew L. Massie, Brent N. Chun, and David E. Culler. The ganglia distributed monitoring system: Design, implementation, and experience. Parallel Computing, 30:2004, 2004.Google ScholarCross Ref
Benjamin Reed and Flavio Junquieira. Zookeeper.Google Scholar
Peter Reiher, John Heidemann, David Ratner, Greg Skinner, and Gerald Popek. Resolving file conflicts in the ficus file system. In USTC'94: Proceedings of the USENIX Summer 1994 Technical Conference on USENIX Summer 1994 Technical Conference, pages 12--12, Berkeley, CA, USA, 1994. USENIX Association. Google ScholarDigital Library
Robbert Van Renesse, Yaron Minsky, and Mark Hayden. A gossip-style failure detection service. In Service, ?T Proc. Conf. Middleware, pages 55--70, 1996. Google ScholarDigital Library
Mahadev Satyanarayanan, James J. Kistler, Puneet Kumar, Maria E. Okasaki, Ellen H. Siegel, and David C. Steere. Coda: A highly available file system for a distributed workstation environment. IEEE Trans. Comput., 39(4):447--459, 1990. Google ScholarDigital Library
Ion Stoica, Robert Morris, David Liben-nowell, David R. Karger, M. Frans Kaashoek, Frank Dabek, and Hari Balakrishnan. Chord: a scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Transactions on Networking, 11:17--32, 2003. Google ScholarDigital Library
D. B. Terry, M. M. Theimer, Karin Petersen, A. J. Demers, M. J. Spreitzer, and C. H. Hauser. Managing update conflicts in bayou, a weakly connected replicated storage system. In SOSP '95: Proceedings of the fifteenth ACM symposium on Operating systems principles, pages 172--182, New York, NY, USA, 1995. ACM. Google ScholarDigital Library
Robbert van Renesse, Dan Mihai Dumitriu, Valient Gough, and Chris Thomas. Efficient reconciliation and flow control for anti-entropy protocols. In Proceedings of the 2nd Large Scale Distributed Systems and Middleware Workshop (LADIS '08), New York, NY, USA, 2008. ACM. Google ScholarDigital Library
Matt Welsh, David Culler, and Eric Brewer. Seda: an architecture for well-conditioned, scalable internet services. In SOSP '01: Proceedings of the eighteenth ACM symposium on Operating systems principles, pages 230--243, New York, NY, USA, 2001. ACM. Google ScholarDigital Library

Index Terms

Cassandra: a decentralized structured storage system

Recommendations

Cassandra: structured storage system on a P2P network
PODC '09: Proceedings of the 28th ACM symposium on Principles of distributed computing

Cassandra is a distributed storage system for managing structured data that is designed to scale to a very large size across many commodity servers, with no single point of failure. Reliability at massive scale is a very big challenge. Outages in the ...
Read More
Expert Apache Cassandra Administration
Read More
Beginning Apache Cassandra Development
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM SIGOPS Operating Systems Review Volume 44, Issue 2
April 2010
92 pages
ISSN:0163-5980
DOI:10.1145/1773912
Issue’s Table of Contents

Copyright © 2010 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 April 2010
Check for updates
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1,884
  Total Citations
  View Citations
- 13,053
  Total Downloads
- Downloads (Last 12 months)561
- Downloads (Last 6 weeks)91
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cassandra: a decentralized structured storage system

ACM SIGOPS Operating Systems Review

Abstract

References

Cited By

Index Terms

Recommendations

Cassandra: structured storage system on a P2P network

Expert Apache Cassandra Administration

Beginning Apache Cassandra Development

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Cassandra: a decentralized structured storage system

ACM SIGOPS Operating Systems Review

Abstract

References

Cited By

Index Terms

Recommendations

Cassandra: structured storage system on a P2P network

Expert Apache Cassandra Administration

Beginning Apache Cassandra Development

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media