skip to main content
research-article

Cassandra: a decentralized structured storage system

Published:14 April 2010Publication History
Skip Abstract Section

Abstract

Cassandra is a distributed storage system for managing very large amounts of structured data spread out across many commodity servers, while providing highly available service with no single point of failure. Cassandra aims to run on top of an infrastructure of hundreds of nodes (possibly spread across different data centers). At this scale, small and large components fail continuously. The way Cassandra manages the persistent state in the face of these failures drives the reliability and scalability of the software systems relying on this service. While in many ways Cassandra resembles a database and shares many design and implementation strategies therewith, Cassandra does not support a full relational data model; instead, it provides clients with a simple data model that supports dynamic control over data layout and format. Cassandra system was designed to run on cheap commodity hardware and handle high write throughput while not sacrificing read efficiency.

References

  1. MySQL AB. Mysql.Google ScholarGoogle Scholar
  2. Atul Adya, William J. Bolosky, Miguel Castro, Gerald Cermak, Ronnie Chaiken, John R. Douceur, Jon Howell, Jacob R. Lorch, Marvin Theimer, and Roger P. Wattenhofer. Farsite: Federated, available, and reliable storage for an incompletely trusted environment. In In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI, pages 1--14, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Mike Burrows. The chubby lock service for loosely-coupled distributed systems. In OSDI '06: Proceedings of the 7th symposium on Operating systems design and implementation, pages 335--350, Berkeley, CA, USA, 2006. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. Bigtable: A distributed storage system for structured data. In In Proceedings of the 7th Conference on USENIX Symposium on Operating Systems Design and Implementation - Volume 7, pages 205--218, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Abhinandan Das, Indranil Gupta, and Ashish Motivala. Swim: Scalable weakly-consistent infection-style process group membership protocol. In DSN '02: Proceedings of the 2002 International Conference on Dependable Systems and Networks, pages 303--312, Washington, DC, USA, 2002. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Giuseppe de Candia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. Dynamo: amazon***Os highly available key-value store. In Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, pages 205--220. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107--113, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Xavier Défago, Péter Urbán, Naohiro Hayashibara, and Takuya Katayama. The φ accrual failure detector. In RR IS-RR-2004-010, Japan Advanced Institute of Science and Technology, pages 66--78, 2004.Google ScholarGoogle Scholar
  9. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The google file system. In SOSP '03: Proceedings of the nineteenth ACM symposium on Operating systems principles, pages 29--43, New York, NY, USA, 2003. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jim Gray and Pat Helland. The dangers of replication and a solution. In In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pages 173--182, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. David Karger, Eric Lehman, Tom Leighton, Matthew Levine, Daniel Lewin, and Rina Panigrahy. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In In ACM Symposium on Theory of Computing, pages 654--663, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Matthew L. Massie, Brent N. Chun, and David E. Culler. The ganglia distributed monitoring system: Design, implementation, and experience. Parallel Computing, 30:2004, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  13. Benjamin Reed and Flavio Junquieira. Zookeeper.Google ScholarGoogle Scholar
  14. Peter Reiher, John Heidemann, David Ratner, Greg Skinner, and Gerald Popek. Resolving file conflicts in the ficus file system. In USTC'94: Proceedings of the USENIX Summer 1994 Technical Conference on USENIX Summer 1994 Technical Conference, pages 12--12, Berkeley, CA, USA, 1994. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Robbert Van Renesse, Yaron Minsky, and Mark Hayden. A gossip-style failure detection service. In Service, ?T Proc. Conf. Middleware, pages 55--70, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Mahadev Satyanarayanan, James J. Kistler, Puneet Kumar, Maria E. Okasaki, Ellen H. Siegel, and David C. Steere. Coda: A highly available file system for a distributed workstation environment. IEEE Trans. Comput., 39(4):447--459, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ion Stoica, Robert Morris, David Liben-nowell, David R. Karger, M. Frans Kaashoek, Frank Dabek, and Hari Balakrishnan. Chord: a scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Transactions on Networking, 11:17--32, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. B. Terry, M. M. Theimer, Karin Petersen, A. J. Demers, M. J. Spreitzer, and C. H. Hauser. Managing update conflicts in bayou, a weakly connected replicated storage system. In SOSP '95: Proceedings of the fifteenth ACM symposium on Operating systems principles, pages 172--182, New York, NY, USA, 1995. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Robbert van Renesse, Dan Mihai Dumitriu, Valient Gough, and Chris Thomas. Efficient reconciliation and flow control for anti-entropy protocols. In Proceedings of the 2nd Large Scale Distributed Systems and Middleware Workshop (LADIS '08), New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Matt Welsh, David Culler, and Eric Brewer. Seda: an architecture for well-conditioned, scalable internet services. In SOSP '01: Proceedings of the eighteenth ACM symposium on Operating systems principles, pages 230--243, New York, NY, USA, 2001. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Cassandra: a decentralized structured storage system

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGOPS Operating Systems Review
            ACM SIGOPS Operating Systems Review  Volume 44, Issue 2
            April 2010
            92 pages
            ISSN:0163-5980
            DOI:10.1145/1773912
            Issue’s Table of Contents

            Copyright © 2010 Authors

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 14 April 2010

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader