Abstract
Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this article, we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.
- Abadi, D. J., Madden, S. R., and Ferreira, M. C. 2006. Integrating compression and execution in column-oriented database systems. Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York. Google ScholarDigital Library
- Ailamaki, A., DeWitt, D. J., Hill, M. D., and Skounakis, M. 2001. Weaving relations for cache performance. The VLDB J. 169--180. Google ScholarDigital Library
- Banga, G., Druschel, P., and Mogul, J. C. 1999. Resource containers: A new facility for resource management in server systems. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation. 45--58. Google ScholarDigital Library
- Baru, C. K., Fecteau, G., Goyal, A., Hsiao, H., Jhingran, A., Padmanabhan, S., Copeland, G. P., and Wilson, W. G. 1995. DB2 parallel edition. IBM Syst. J. 34, 2, 292--322. Google ScholarDigital Library
- Bavier, A., Bowman, M., Chun, B., Culler, D., Karlin, S., Peterson, L., Roscoe, T., Spalink, T., and Wawrzoniak, M. 2004. Operating system support for planetary-scale network services. In Proceedings of the 1st Symposium on Networked Systems Design and Implementation. 253--266. Google ScholarDigital Library
- Bentley, J. L. and McIlroy, M. D. 1999. Data compression using long common strings. In Data Compression Conference. 287--295. Google ScholarDigital Library
- Bloom, B. H. 1970. Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13, 7, 422--426. Google ScholarDigital Library
- Burrows, M. 2006. The Chubby lock service for loosely-coupled distributed systems. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation. 335--350. Google ScholarDigital Library
- Chandra, T., Griesemer, R., and Redstone, J. 2007. Paxos made live --- An engineering perspective. In Proceedings of PODC. Google ScholarDigital Library
- Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R. E. 2006. Bigtable: A distributed storage system for structured data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation. 205--218. Google ScholarDigital Library
- Comer, D. 1979. Ubiquitous B-tree. Computing Surveys 11, 2 (June), 121--137. Google ScholarDigital Library
- Copeland, G. P., Alexander, W., Boughter, E. E., and Keller, T. W. 1988. Data placement in Bubba. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 99--108. Google ScholarDigital Library
- Dean, J. and Ghemawat, S. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th USENIX Symposium on Operating Systems Design and Implementation. 137--150. Google ScholarDigital Library
- DeWitt, D., Katz, R., Olken, F., Shapiro, L., Stonebraker, M., and Wood, D. 1984. Implementation techniques for main memory database systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 1--8. Google ScholarDigital Library
- DeWitt, D. J. and Gray, J. 1992. Parallel database systems: The future of high performance database systems. Commun. ACM 35, 6 (June), 85--98. Google ScholarDigital Library
- French, C. D. 1995. One size fits all database architectures do not work for DSS. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 449--450. Google ScholarDigital Library
- Gawlick, D. and Kinkade, D. 1985. Varieties of concurrency control in IMS/VS fast path. Datab. Eng. Bull. 8, 2, 3--10.Google Scholar
- Ghemawat, S., Gobioff, H., and Leung, S.-T. 2003. The Google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles. ACM, New York, 29--43. Google ScholarDigital Library
- Gray, J. 1978. Notes on database operating systems. In Operating Systems --- An Advanced Course. Lecture Notes in Computer Science, vol. 60. Springer-Verlag, ACM, New York. Google ScholarDigital Library
- Greer, R. 1999. Daytona and the fourth-generation language Cymbal. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 525--526. Google ScholarDigital Library
- Hagmann, R. 1987. Reimplementing the Cedar file system using logging and group commit. In Proceedings of the 11th Symposium on Operating Systems Principles. 155--162. Google ScholarDigital Library
- Hartman, J. H. and Ousterhout, J. K. 1993. The Zebra striped network file system. In Proceedings of the 14th Symposium on Operating Systems Principles. ACM, New York, 29--43. Google ScholarDigital Library
- kx.com. kx.com/products/database.php. Product page.Google Scholar
- Lamport, L. 1998. The part-time parliament. ACM Trans. Comput. Syst. 16, 2, 133--169. Google ScholarDigital Library
- MacCormick, J., Murphy, N., Najork, M., Thekkath, C. A., and Zhou, L. 2004. Boxwood: Abstractions as the foundation for storage infrastructure. In Proceedings of the 6th USENIX Symposium on Operating Systems Design and Implementation. 105--120. Google ScholarDigital Library
- McCarthy, J. 1960. Recursive functions of symbolic expressions and their computation by machine. Commun. ACM 3, 4 (Apr.), 184--195. Google ScholarDigital Library
- O'Neil, P., Cheng, E., Gawlick, D., and O'Neil, E. 1996. The log-structured merge-tree (LSM-tree). Acta Inf. 33, 4, 351--385. Google ScholarDigital Library
- oracle.com. www.oracle.com/technology/products/database/clustering/index.html. Product page.Google Scholar
- Pike, R., Dorward, S., Griesemer, R., and Quinlan, S. 2005. Interpreting the data: Parallel analysis with Sawzall. Scientific Programming Journal 13, 4, 227--298. Google ScholarDigital Library
- Ratnasamy, S., Francis, P., Handley, M., Karp, R., and Shenker, S. 2001. A scalable content-addressable network. In Proceedings of SIGCOMM. ACM, New York, 161--172. Google ScholarDigital Library
- Rowstron, A. and Druschel, P. 2001. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In Proceedings of Middleware 2001. 329--350. Google ScholarDigital Library
- sensage.com. sensage.com/products-sensage.htm. Product page.Google Scholar
- Stoica, I., Morris, R., Karger, D., Kaashoek, M. F., and Balakrishnan, H. 2001. Chord: A scalable peer-to-peer lookup service for Internet applications. In Proceedings of SIGCOMM. ACM, New York, 149--160. Google ScholarDigital Library
- Stonebraker, M. 1986. The case for shared nothing. Datab. Eng. Bull. 9, 1 (Mar.), 4--9.Google Scholar
- Stonebraker, M., Abadi, D. J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O'Neil, E., O'Neil, P., Rasin, A., Tran, N., and Zdonik, S. 2005. C-Store: A column-oriented DBMS. In Proceedings of the 10th International Conference on Very Large Data Bases. ACM, New York, 553--564. Google ScholarDigital Library
- Stonebraker, M., Aoki, P. M., Devine, R., Litwin, W., and Olson, M. A. 1994. Mariposa: A new architecture for distributed data. In Proceedings of the 10th International Conference on Data Engineering. IEEE Computer Society Press, Los Alamitos, CA, 54--65. Google ScholarDigital Library
- sybase.com. www.sybase.com/products/databaseservers/sybaseiq. Product page.Google Scholar
- Zhao, B. Y., Kubiatowicz, J., and Joseph, A. D. 2001. Tapestry: An infrastructure for fault-tolerant wide-area location and routing. Tech. Rep. UCB/CSD-01-1141, CS Division, University of California, Berkeley. Apr. Google ScholarDigital Library
- Zukowski, M., Boncz, P. A., Nes, N., and Heman, S. 2005. MonetDB/X100 --- A DBMS in the CPU cache. IEEE Data Eng. Bull. 28, 2, 17--22.Google Scholar
Index Terms
- Bigtable: A Distributed Storage System for Structured Data
Recommendations
Bigtable: a distributed storage system for structured data
OSDI '06: Proceedings of the 7th symposium on Operating systems design and implementationBigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google ...
Bigtable: a distributed storage system for structured data
OSDI '06: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google ...
On Managing Very Large Sensor-Network Data Using Bigtable
CCGRID '12: Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)Recent advances and innovations in smart sensor technologies, energy storage, data communications, and distributed computing paradigms are enabling technological breakthroughs in very large sensor networks. There is an emerging surge of next-generation ...
Comments