Skip to main content
Top
Published in: Cluster Computing 2/2017

05-04-2017

A practical cross-datacenter fault-tolerance algorithm in the cloud storage system

Authors: Yuxia Cheng, Xinjie Yu, Wenzhi Chen, Rui Chang, Yang Xiang

Published in: Cluster Computing | Issue 2/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The fault-tolerance property in most cloud storage systems are designed within the scale of a single datacenter. The single datacenter as a whole may be unreachable or crashed due to severe problems, such as broken network links, power supply interruptions, and natural disasters, etc. Therefore, the design of an effective cross-datacenter fault-tolerant storage system is important to protect data security in the cloud. However, building a cross-datacenter fault-tolerant system faces great challenges, such as high latency, low throughput, high costs of bandwidth resources between datacenters. In this paper, we propose a practical cross-datacenter fault-tolerant (CDFT) algorithm in the cloud storage system. Our fault-tolerant algorithm design considers the difficult tradeoffs among fault tolerance, latency, throughput, network and storage costs. We propose the Domain Fault Codes (DFC) and the topology-aware scheduling techniques, which can tolerate the whole datacenter breakdown. We implemented the DFC-CDFT algorithm in a prototype cloud storage system. The experimental results showed that the proposed DFC-CDFT algorithm can effectively recover data blocks from the single datacenter failure while achieves low storage and bandwidth costs.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Bailis, P., Davidson, A., Fekete, A., Ghodsi, A., Hellerstein, J.M., Stoica, I.: Highly available transactions: virtues and limitations. Proc. VLDB Endow. 7(3), 181–192 (2013)CrossRef Bailis, P., Davidson, A., Fekete, A., Ghodsi, A., Hellerstein, J.M., Stoica, I.: Highly available transactions: virtues and limitations. Proc. VLDB Endow. 7(3), 181–192 (2013)CrossRef
3.
go back to reference Greenberg, A., Hamilton, J., Maltz, D.A., Patel, P.: The cost of a cloud: research problems in data center networks. ACM SIGCOMM Comput. Commun. Rev. 39(1), 68–73 (2008)CrossRef Greenberg, A., Hamilton, J., Maltz, D.A., Patel, P.: The cost of a cloud: research problems in data center networks. ACM SIGCOMM Comput. Commun. Rev. 39(1), 68–73 (2008)CrossRef
4.
go back to reference Shah, N.B., Lee, K., Ramchandran, K.: The MDS queue: analysing the latency performance of erasure codes. In: Proceedings of International Symposium on Information Theory (2014) Shah, N.B., Lee, K., Ramchandran, K.: The MDS queue: analysing the latency performance of erasure codes. In: Proceedings of International Symposium on Information Theory (2014)
6.
go back to reference Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google file system. ACM SIGOPS Oper. Syst. Rev. ACM 37(5), 29–43 (2003)CrossRef Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google file system. ACM SIGOPS Oper. Syst. Rev. ACM 37(5), 29–43 (2003)CrossRef
7.
go back to reference Sivasubramanian, S.: Amazon dynamoDB: a seamlessly scalable non-relational database service. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 729–730. ACM (2012) Sivasubramanian, S.: Amazon dynamoDB: a seamlessly scalable non-relational database service. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 729–730. ACM (2012)
8.
go back to reference Huang, C., Simitci, H., Xu, Y., Ogus, A., Calder, B., Gopalan, P., Li, J., Yekhanin, S.: Erasure coding in Windows Azure storage. In: Proceedings of USENIX Annual Technical Conference (2012) Huang, C., Simitci, H., Xu, Y., Ogus, A., Calder, B., Gopalan, P., Li, J., Yekhanin, S.: Erasure coding in Windows Azure storage. In: Proceedings of USENIX Annual Technical Conference (2012)
9.
go back to reference Fikes, A.: Storage architecture and challenges. Talk at the Google Faculty Summit (2010) Fikes, A.: Storage architecture and challenges. Talk at the Google Faculty Summit (2010)
10.
go back to reference Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system[C]. In: IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010) Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system[C]. In: IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)
14.
go back to reference Fan, B., Tantisiriroj, W., Xiao, L., Gibson, G.: DiskReduce: RAID for data-intensive scalable computing. In: Proceedings of the 4th Annual Workshop on Petascale Data Storage, pp. 6–10. ACM (2009) Fan, B., Tantisiriroj, W., Xiao, L., Gibson, G.: DiskReduce: RAID for data-intensive scalable computing. In: Proceedings of the 4th Annual Workshop on Petascale Data Storage, pp. 6–10. ACM (2009)
15.
go back to reference Sathiamoorthy, M., Asteris, M., Papailiopoulos, D., Dimakis, A.G., Vadali, R., Chen, S., Borthakur, D.: Xoring elephants: novel erasure codes for big data. Proc. VLDB Endow. VLDB Endow. 6(5), 325–336 (2013)CrossRef Sathiamoorthy, M., Asteris, M., Papailiopoulos, D., Dimakis, A.G., Vadali, R., Chen, S., Borthakur, D.: Xoring elephants: novel erasure codes for big data. Proc. VLDB Endow. VLDB Endow. 6(5), 325–336 (2013)CrossRef
18.
go back to reference Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. (TOCS) 26(2), 4 (2008)CrossRef Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. (TOCS) 26(2), 4 (2008)CrossRef
19.
go back to reference Baker, J., Bond, C., Corbett, J.C., Furman, J.J., Khorlin, A., Larson, J., Leon, J.-M., Li, Y., Lloyd, A., Yushprakh, V.: Megastore: providing scalable, highly available storage for interactive services. CIDR 11, 223–234 (2011) Baker, J., Bond, C., Corbett, J.C., Furman, J.J., Khorlin, A., Larson, J., Leon, J.-M., Li, Y., Lloyd, A., Yushprakh, V.: Megastore: providing scalable, highly available storage for interactive services. CIDR 11, 223–234 (2011)
20.
go back to reference Corbett, J.C., Dean, J., Epstein, M., Fikes, A., Frost, C., Furman, J.J., Ghemawat, S., et al.: Spanner: Google’s globally distributed database. ACM Trans. Comput. Syst. (TOCS) 31(3), 8 (2013)CrossRef Corbett, J.C., Dean, J., Epstein, M., Fikes, A., Frost, C., Furman, J.J., Ghemawat, S., et al.: Spanner: Google’s globally distributed database. ACM Trans. Comput. Syst. (TOCS) 31(3), 8 (2013)CrossRef
21.
go back to reference Silberstein, M., Ganesh, L., Wang, Y., Alvisi, L., Dahlin, M.: Lazy means smart: reducing repair bandwidth costs in erasure-coded distributed storage. In: Proceedings of International Conference on Systems and Storage (2014) Silberstein, M., Ganesh, L., Wang, Y., Alvisi, L., Dahlin, M.: Lazy means smart: reducing repair bandwidth costs in erasure-coded distributed storage. In: Proceedings of International Conference on Systems and Storage (2014)
22.
go back to reference Huang, J., Liang, X., Qin, X., Xie, P., Xie, C.: Scale-RS: an efficient scaling scheme for RS-coded storage clusters. IEEE Trans. Parallel Distrib. Syst. 26(6), 1704–1717 (2015)CrossRef Huang, J., Liang, X., Qin, X., Xie, P., Xie, C.: Scale-RS: an efficient scaling scheme for RS-coded storage clusters. IEEE Trans. Parallel Distrib. Syst. 26(6), 1704–1717 (2015)CrossRef
25.
go back to reference Rashmi, K.V., Nakkiran, P., Wang, J., Shah, N.B., Ramchandran, K.: Having Your Cake and Eating It Too: Jointly Optimal Erasure Codes for I/O, Storage, and Network-bandwidth. In: USENIX Conference on File and Storage Technologies (2015) Rashmi, K.V., Nakkiran, P., Wang, J., Shah, N.B., Ramchandran, K.: Having Your Cake and Eating It Too: Jointly Optimal Erasure Codes for I/O, Storage, and Network-bandwidth. In: USENIX Conference on File and Storage Technologies (2015)
26.
go back to reference Singh, A., Ong, J., Agarwal, A., Anderson, G.: Jupiter rising: a decade of clos topologies and centralized control in Google’s datacenter network. Commun. ACM 59(9), 88–97 (2016)CrossRef Singh, A., Ong, J., Agarwal, A., Anderson, G.: Jupiter rising: a decade of clos topologies and centralized control in Google’s datacenter network. Commun. ACM 59(9), 88–97 (2016)CrossRef
27.
go back to reference Xiao, L., Ren, K., Zheng, Q., Gibson, G.A.: ShardFS vs. IndexFS: replication vs. caching strategies for distributed metadata management in cloud storage systems. In: Proceedings of the Sixth ACM Symposium on Cloud Computing (2015) Xiao, L., Ren, K., Zheng, Q., Gibson, G.A.: ShardFS vs. IndexFS: replication vs. caching strategies for distributed metadata management in cloud storage systems. In: Proceedings of the Sixth ACM Symposium on Cloud Computing (2015)
28.
go back to reference Thomson, A., Abadi, D.J.: CalvinFS: consistent WAN replication and scalable metadata management for distributed file systems. In: Proceedings of the 13th USENIX Conference on File and Storage Technologies (2015) Thomson, A., Abadi, D.J.: CalvinFS: consistent WAN replication and scalable metadata management for distributed file systems. In: Proceedings of the 13th USENIX Conference on File and Storage Technologies (2015)
30.
go back to reference Ford, D., Labelle, F., Popovici, F.I., Stokely, M., Truong, V.-A., Barroso, L., Grimes, C., Quinlan, S.: Availability in Globally Distributed Storage Systems. In: Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation, pp. 61–74 (2010) Ford, D., Labelle, F., Popovici, F.I., Stokely, M., Truong, V.-A., Barroso, L., Grimes, C., Quinlan, S.: Availability in Globally Distributed Storage Systems. In: Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation, pp. 61–74 (2010)
31.
go back to reference Standard, NIST-FIPS.: Announcing the advanced encryption standard (AES). Fed. Inf. Process. Stand. Publ. 197, 1–51 (2001) Standard, NIST-FIPS.: Announcing the advanced encryption standard (AES). Fed. Inf. Process. Stand. Publ. 197, 1–51 (2001)
Metadata
Title
A practical cross-datacenter fault-tolerance algorithm in the cloud storage system
Authors
Yuxia Cheng
Xinjie Yu
Wenzhi Chen
Rui Chang
Yang Xiang
Publication date
05-04-2017
Publisher
Springer US
Published in
Cluster Computing / Issue 2/2017
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-017-0840-5

Other articles of this Issue 2/2017

Cluster Computing 2/2017 Go to the issue

Premium Partner