Skip to main content

2016 | OriginalPaper | Buchkapitel

Differential Erasure Codes for Efficient Archival of Versioned Data in Cloud Storage Systems

verfasst von : J. Harshan, Anwitaman Datta, Frédérique Oggier

Erschienen in: Transactions on Large-Scale Data- and Knowledge-Centered Systems XXX

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we study the problem of storing an archive of versioned data in a reliable and efficient manner. The proposed technique is relevant in cloud settings, where, because of the huge volume of data to be stored, distributed (scale-out) storage systems deploying erasure codes for fault tolerance is typical. However existing erasure coding techniques do not leverage redundancy of information across multiple versions of a file. We propose a new technique called differential erasure coding (DEC) where the differences (deltas) between subsequent versions are stored rather than the whole objects, akin to a typical delta encoding technique. However, unlike delta encoding techniques, DEC opportunistically exploits the sparsity (i.e., when the differences between two successive versions have few non-zero entries) in the updates to store the deltas using sparse sampling techniques applied with erasure coding. We first show that DEC provides significant savings in the storage size for versioned data whenever the update patterns are characterized by in-place alterations. Subsequently, we propose a practical DEC framework so as to reap storage size benefits against not just in-place alterations but also real-world update patterns such as insertions and deletions that alter the overall data sizes. We conduct experiments with several synthetic and practical workloads to demonstrate that the practical variant of DEC provides significant reductions in storage-overhead.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The proof for the proposition follows from the property that any \(2\gamma \) columns of a parity check matrix of a linear code with minimum distance \(2\gamma +1\) are linearly independent.
 
Literatur
1.
Zurück zum Zitat Huang, C., Simitci, H., Xu, Y., Ogus, A., Calder, B., Gopalan, P., Li, J., Yekhanin, S.: Erasure coding in windows azure storage. In: The Proceedings of the USENIX Annual Technical Conference (ATC) (2012) Huang, C., Simitci, H., Xu, Y., Ogus, A., Calder, B., Gopalan, P., Li, J., Yekhanin, S.: Erasure coding in windows azure storage. In: The Proceedings of the USENIX Annual Technical Conference (ATC) (2012)
2.
Zurück zum Zitat Thusoo, A., Shao, Z., Anthony, S., Borthakur, D., Jain, N., Sen Sarma, J., Murthy, R., Liu, H.: Data warehousing and analytics infrastructure at Facebook. In: The Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (2010) Thusoo, A., Shao, Z., Anthony, S., Borthakur, D., Jain, N., Sen Sarma, J., Murthy, R., Liu, H.: Data warehousing and analytics infrastructure at Facebook. In: The Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (2010)
3.
Zurück zum Zitat Ford, D., Labelle, F., Popovici, F.I., Stokely, M., Truong, V.A., Barroso, L., Grimes, C., Quinlan, S.: Availability in globally distributed storage systems. In: The 9th USENIX Conference on Operating Systems Designand Implementation (OSDI) (2010) Ford, D., Labelle, F., Popovici, F.I., Stokely, M., Truong, V.A., Barroso, L., Grimes, C., Quinlan, S.: Availability in globally distributed storage systems. In: The 9th USENIX Conference on Operating Systems Designand Implementation (OSDI) (2010)
4.
Zurück zum Zitat Dimakis, A.G., Ramchandran, K., Wu, Y., Suh, C.: A survey on network codes for distributed storage. Proc. IEEE 99, 476–489 (2011)CrossRef Dimakis, A.G., Ramchandran, K., Wu, Y., Suh, C.: A survey on network codes for distributed storage. Proc. IEEE 99, 476–489 (2011)CrossRef
5.
Zurück zum Zitat Oggier, F., Datta, A.: Coding techniques for repairability in networked distributed storage systems. In: Foundations and Trends in Communications and Information Theory, vol. 9, no. 4, pp. 383–466. Now Publishers, June 2013 Oggier, F., Datta, A.: Coding techniques for repairability in networked distributed storage systems. In: Foundations and Trends in Communications and Information Theory, vol. 9, no. 4, pp. 383–466. Now Publishers, June 2013
10.
Zurück zum Zitat Wang, Z., Cadambe, V.: Multi-version Coding for Distributed Storage. In: Proceedings of IEEE ISIT 2014, Honalulu, USA (2014) Wang, Z., Cadambe, V.: Multi-version Coding for Distributed Storage. In: Proceedings of IEEE ISIT 2014, Honalulu, USA (2014)
11.
Zurück zum Zitat Rouayheb, S., Goparaju, S., Kiah, H., Milenkovic, O.: Synchronising edits in distributed storage networks. In: The Proceedings of the IEEE International Symposium on Information Theory, Hong Kong (2015) Rouayheb, S., Goparaju, S., Kiah, H., Milenkovic, O.: Synchronising edits in distributed storage networks. In: The Proceedings of the IEEE International Symposium on Information Theory, Hong Kong (2015)
12.
Zurück zum Zitat Rawat, A., Vishwanath, S., Bhowmick, A., Soljanin, E.: Update efficient codes for distributed storage. In: IEEE International Symposium on Information Theory (2011) Rawat, A., Vishwanath, S., Bhowmick, A., Soljanin, E.: Update efficient codes for distributed storage. In: IEEE International Symposium on Information Theory (2011)
13.
Zurück zum Zitat Han, Y., Pai, H.-T., Zheng, R., Varshney, P.K.: Update-efficient regenerating codes with minimum per-node storage. In: Proceedings of IEEE International Symposium on Information Theory (ISIT 2013), Istanbul (2013) Han, Y., Pai, H.-T., Zheng, R., Varshney, P.K.: Update-efficient regenerating codes with minimum per-node storage. In: Proceedings of IEEE International Symposium on Information Theory (ISIT 2013), Istanbul (2013)
14.
Zurück zum Zitat Mazumdar, A., Wornell, G.W., Chandar, V.: Update efficient codes for error correction. In: The Proceedings of IEEE IEEE International Symposium on Information Theory, Cambridge, MA, pp. 1558–1562, July 2012 Mazumdar, A., Wornell, G.W., Chandar, V.: Update efficient codes for error correction. In: The Proceedings of IEEE IEEE International Symposium on Information Theory, Cambridge, MA, pp. 1558–1562, July 2012
15.
Zurück zum Zitat Esmaili, K.S., Chiniah, A., Datta, A.: Efficient updates in cross-object erasure-coded storage systems. In: IEEE International Conference on Big Data, Silicon Valley, CA, October 2013 Esmaili, K.S., Chiniah, A., Datta, A.: Efficient updates in cross-object erasure-coded storage systems. In: IEEE International Conference on Big Data, Silicon Valley, CA, October 2013
16.
Zurück zum Zitat Tarasov, V., Mudrankit, A., Buik, W., Shilane, P., Kuenning, G., Zadok, E.: Generating realistic datasets for deduplication analysis. In: The Proceedings of the 2012 USENIX Conference on Annual Technical Conference (2012) Tarasov, V., Mudrankit, A., Buik, W., Shilane, P., Kuenning, G., Zadok, E.: Generating realistic datasets for deduplication analysis. In: The Proceedings of the 2012 USENIX Conference on Annual Technical Conference (2012)
17.
Zurück zum Zitat Harshan, J., Oggier, F., Datta, A.: Sparsity exploiting erasure coding for resilient storage and efficient I, O access in delta based versioning systems. In: The Proceedings of IEEE ICDCS, Columbus, Ohio, USA (2015) Harshan, J., Oggier, F., Datta, A.: Sparsity exploiting erasure coding for resilient storage and efficient I, O access in delta based versioning systems. In: The Proceedings of IEEE ICDCS, Columbus, Ohio, USA (2015)
18.
Zurück zum Zitat Harshan, J., Oggier, F., Datta, A.: Sparsity exploiting erasure coding for distributed storage of versioned data. Computing 98, 1305–1329 (2016). SpringerMathSciNetCrossRef Harshan, J., Oggier, F., Datta, A.: Sparsity exploiting erasure coding for distributed storage of versioned data. Computing 98, 1305–1329 (2016). SpringerMathSciNetCrossRef
21.
Zurück zum Zitat Zhang, F., Pfister, H.D.: Compressed sensing and linear codes over real numbers. In: Information Theory and Applications Workshop (ITA) (2008) Zhang, F., Pfister, H.D.: Compressed sensing and linear codes over real numbers. In: Information Theory and Applications Workshop (ITA) (2008)
22.
Zurück zum Zitat McWilliams, F.J., Sloane, N.J.A.: The Theory of Error Correcting Codes. North Holland, Amsterdam (1977) McWilliams, F.J., Sloane, N.J.A.: The Theory of Error Correcting Codes. North Holland, Amsterdam (1977)
23.
Zurück zum Zitat Lacan, J., Fimes, J.: A construction of matrices with no singular square submatrices. In: The Proceedings of International Conference on Finite Fields and Applications, pp. 145–147 (2003) Lacan, J., Fimes, J.: A construction of matrices with no singular square submatrices. In: The Proceedings of International Conference on Finite Fields and Applications, pp. 145–147 (2003)
24.
Zurück zum Zitat Harshan, J., Datta, A., Oggier, F.: DiVers: an erasure code based storage architecture for versioning exploiting sparsity. Future Gener. Comput. Syst. 59, 47–62 (2016). ElsevierCrossRef Harshan, J., Datta, A., Oggier, F.: DiVers: an erasure code based storage architecture for versioning exploiting sparsity. Future Gener. Comput. Syst. 59, 47–62 (2016). ElsevierCrossRef
Metadaten
Titel
Differential Erasure Codes for Efficient Archival of Versioned Data in Cloud Storage Systems
verfasst von
J. Harshan
Anwitaman Datta
Frédérique Oggier
Copyright-Jahr
2016
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-662-54054-1_2