Top

Cluster Computing

Published in:

05-04-2017

A practical cross-datacenter fault-tolerance algorithm in the cloud storage system

Authors: Yuxia Cheng, Xinjie Yu, Wenzhi Chen, Rui Chang, Yang Xiang

Published in: Cluster Computing | Issue 2/2017

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The fault-tolerance property in most cloud storage systems are designed within the scale of a single datacenter. The single datacenter as a whole may be unreachable or crashed due to severe problems, such as broken network links, power supply interruptions, and natural disasters, etc. Therefore, the design of an effective cross-datacenter fault-tolerant storage system is important to protect data security in the cloud. However, building a cross-datacenter fault-tolerant system faces great challenges, such as high latency, low throughput, high costs of bandwidth resources between datacenters. In this paper, we propose a practical cross-datacenter fault-tolerant (CDFT) algorithm in the cloud storage system. Our fault-tolerant algorithm design considers the difficult tradeoffs among fault tolerance, latency, throughput, network and storage costs. We propose the Domain Fault Codes (DFC) and the topology-aware scheduling techniques, which can tolerate the whole datacenter breakdown. We implemented the DFC-CDFT algorithm in a prototype cloud storage system. The experimental results showed that the proposed DFC-CDFT algorithm can effectively recover data blocks from the single datacenter failure while achieves low storage and bandwidth costs.

previous article A collaborative resource management for big IoT data processing in Cloud

next article An ODT-based abstraction for mining closed sequential temporal patterns in IoT-cloud smart homes

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Data Center Knowledge. UPDATE: Explosion in Downtown Los Angeles Disrupts Data Center Operations[N/OL]. http://www.datacenterknowledge.com/archives/2015/ 08/21/explosion-downtown-los-angeles-disrupts-data-center-operatio ns/. Accessed 15 Oct 2016

Bailis, P., Davidson, A., Fekete, A., Ghodsi, A., Hellerstein, J.M., Stoica, I.: Highly available transactions: virtues and limitations. Proc. VLDB Endow. 7(3), 181–192 (2013)CrossRef

Greenberg, A., Hamilton, J., Maltz, D.A., Patel, P.: The cost of a cloud: research problems in data center networks. ACM SIGCOMM Comput. Commun. Rev. 39(1), 68–73 (2008)CrossRef

Shah, N.B., Lee, K., Ramchandran, K.: The MDS queue: analysing the latency performance of erasure codes. In: Proceedings of International Symposium on Information Theory (2014)

Bailis, P.: Communication Costs in Real-world Networks [R/OL]. http://www.bailis.org/blog/communication-costs-in-real-world-networks/. Accessed 16 Oct 2016

Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google file system. ACM SIGOPS Oper. Syst. Rev. ACM 37(5), 29–43 (2003)CrossRef

Sivasubramanian, S.: Amazon dynamoDB: a seamlessly scalable non-relational database service. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 729–730. ACM (2012)

Huang, C., Simitci, H., Xu, Y., Ogus, A., Calder, B., Gopalan, P., Li, J., Yekhanin, S.: Erasure coding in Windows Azure storage. In: Proceedings of USENIX Annual Technical Conference (2012)

Fikes, A.: Storage architecture and challenges. Talk at the Google Faculty Summit (2010)

10.

Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system[C]. In: IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)

11.

Dian Fu, Avik Key.: HDFS-5442: Zero loss HDFS data replication for multiple datacenters[EB/OL]. https://issues.apache.org/jira/browse/HDFS-5442. Accessed 16 Oct 2016

12.

Zhang, Z., Jiang, W.: HDFS-7285: Erasure Coding Support inside HDFS[EB/OL]. https://issues.apache.org/jira/browse/HDFS-7285. Accessed 16 Oct 2016

13.

The Apache Software Foundation.: HDFS-RAID Wiki[EB/OL]. http://wiki.apache.org/hadoop/HDFS-RAID. Accessed 16 Oct 2016

14.

Fan, B., Tantisiriroj, W., Xiao, L., Gibson, G.: DiskReduce: RAID for data-intensive scalable computing. In: Proceedings of the 4th Annual Workshop on Petascale Data Storage, pp. 6–10. ACM (2009)

15.

Sathiamoorthy, M., Asteris, M., Papailiopoulos, D., Dimakis, A.G., Vadali, R., Chen, S., Borthakur, D.: Xoring elephants: novel erasure codes for big data. Proc. VLDB Endow. VLDB Endow. 6(5), 325–336 (2013)CrossRef

16.

Amazon Web Services, Inc.: Amazon DynamoDB Developer Guide: Cross-Region Replication Using DynamoDB Streams[R/OL]. http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.CrossRegionRepl.html. Accessed 16 Oct 2016

17.

The Apache Software Foundation.: Apache Cassandra. http://cassandra.apache.org/. Accessed 16 Oct 2016

18.

Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. (TOCS) 26(2), 4 (2008)CrossRef

19.

Baker, J., Bond, C., Corbett, J.C., Furman, J.J., Khorlin, A., Larson, J., Leon, J.-M., Li, Y., Lloyd, A., Yushprakh, V.: Megastore: providing scalable, highly available storage for interactive services. CIDR 11, 223–234 (2011)

20.

Corbett, J.C., Dean, J., Epstein, M., Fikes, A., Frost, C., Furman, J.J., Ghemawat, S., et al.: Spanner: Google’s globally distributed database. ACM Trans. Comput. Syst. (TOCS) 31(3), 8 (2013)CrossRef

21.

Silberstein, M., Ganesh, L., Wang, Y., Alvisi, L., Dahlin, M.: Lazy means smart: reducing repair bandwidth costs in erasure-coded distributed storage. In: Proceedings of International Conference on Systems and Storage (2014)

22.

Huang, J., Liang, X., Qin, X., Xie, P., Xie, C.: Scale-RS: an efficient scaling scheme for RS-coded storage clusters. IEEE Trans. Parallel Distrib. Syst. 26(6), 1704–1717 (2015)CrossRef

23.

Reed, I.S., Solomon, G.: Polynomial codes over certain finite fields. J. Soc. Ind. Appl. Math. 8(2), 300–304 (1960)MathSciNetCrossRefMATH

24.

Galois field. https://en.wikipedia.org/wiki/Finite_field. Accessed 16 Oct 2016

25.

Rashmi, K.V., Nakkiran, P., Wang, J., Shah, N.B., Ramchandran, K.: Having Your Cake and Eating It Too: Jointly Optimal Erasure Codes for I/O, Storage, and Network-bandwidth. In: USENIX Conference on File and Storage Technologies (2015)

26.

Singh, A., Ong, J., Agarwal, A., Anderson, G.: Jupiter rising: a decade of clos topologies and centralized control in Google’s datacenter network. Commun. ACM 59(9), 88–97 (2016)CrossRef

27.

Xiao, L., Ren, K., Zheng, Q., Gibson, G.A.: ShardFS vs. IndexFS: replication vs. caching strategies for distributed metadata management in cloud storage systems. In: Proceedings of the Sixth ACM Symposium on Cloud Computing (2015)

28.

Thomson, A., Abadi, D.J.: CalvinFS: consistent WAN replication and scalable metadata management for distributed file systems. In: Proceedings of the 13th USENIX Conference on File and Storage Technologies (2015)

29.

LevelDB. https://github.com/google/leveldb. Accessed 16 Oct 2016

30.

Ford, D., Labelle, F., Popovici, F.I., Stokely, M., Truong, V.-A., Barroso, L., Grimes, C., Quinlan, S.: Availability in Globally Distributed Storage Systems. In: Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation, pp. 61–74 (2010)

31.

Standard, NIST-FIPS.: Announcing the advanced encryption standard (AES). Fed. Inf. Process. Stand. Publ. 197, 1–51 (2001)

Title: A practical cross-datacenter fault-tolerance algorithm in the cloud storage system
Authors: Yuxia Cheng
Xinjie Yu
Wenzhi Chen
Rui Chang
Yang Xiang
Publication date: 05-04-2017
Publisher: Springer US
Published in: Cluster Computing / Issue 2/2017
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI: https://doi.org/10.1007/s10586-017-0840-5

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 2/2017

Research and implementation of animations evaluation system

Bonded-cluster simulation of rock-cutting using PFC2D

Improved SLIC imagine segmentation algorithm based on K-means

A BSP model graph processing system on many cores

A privacy preserving authentication scheme for roaming in ubiquitous networks

Distance learning techniques for ontology similarity measuring and ontology mapping

Premium Partner