Skip to main content

2020 | OriginalPaper | Buchkapitel

HSPP: Load-Balanced and Low-Latency File Partition and Placement Strategy on Distributed Heterogeneous Storage with Erasure Coding

verfasst von : Jiazhao Sun, Yunchun Li, Hailong Yang

Erschienen in: Algorithms and Architectures for Parallel Processing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

To speedup the accesses to massive amount of data, heterogeneous architecture has been widely adopted in the mainstream storage system. In such systems, load imbalance and scheduler overhead are the primary factors that slow down the I/O performance. In this paper, we propose an effective file scheduling strategy HSPP that includes statistic based file classification, partition with erasure coding and adaptive data placement to optimize load balance and read latency on the distributed heterogeneous storage system. The experiment results show that HSPP is superior than existing strategies in terms of load balance, read latency, and scheduling overhead.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
8.
Zurück zum Zitat Ananthanarayanan, G., Kandula, S., Greenberg, A.G., Stoica, I., Harris, E.: Reining in the outliers in map-reduce clusters using mantri. In: Usenix Conference on Operating Systems Design & Implementation (2010) Ananthanarayanan, G., Kandula, S., Greenberg, A.G., Stoica, I., Harris, E.: Reining in the outliers in map-reduce clusters using mantri. In: Usenix Conference on Operating Systems Design & Implementation (2010)
9.
Zurück zum Zitat Ananthanarayanan, G., et al.: Scarlett: coping with skewed content popularity in mapreduce clusters. In: Eurosys 2011, pp. 287–300 (2011) Ananthanarayanan, G., et al.: Scarlett: coping with skewed content popularity in mapreduce clusters. In: Eurosys 2011, pp. 287–300 (2011)
10.
Zurück zum Zitat Armbrust, M., et al.: Spark SQL: relational data processing in spark. In: ACM SIGMOD International Conference on Management of Data (2015) Armbrust, M., et al.: Spark SQL: relational data processing in spark. In: ACM SIGMOD International Conference on Management of Data (2015)
11.
Zurück zum Zitat Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef
12.
Zurück zum Zitat Fidler, M., Jiang, Y.: Non-asymptotic delay bounds for (k, l) fork-join systems and multi-stage fork-join networks (2015) Fidler, M., Jiang, Y.: Non-asymptotic delay bounds for (k, l) fork-join systems and multi-stage fork-join networks (2015)
13.
Zurück zum Zitat Floratou, A., Megiddo, N., Potti, N., Özcan, F., Kale, U., Schmitz-Hermes, J.: Tech. rep. IBM (2015) Floratou, A., Megiddo, N., Potti, N., Özcan, F., Kale, U., Schmitz-Hermes, J.: Tech. rep. IBM (2015)
14.
Zurück zum Zitat Gao, P.X., et al.: Network requirements for resource disaggregation. In: Usenix Conference on Operating Systems Design & Implementation (2016) Gao, P.X., et al.: Network requirements for resource disaggregation. In: Usenix Conference on Operating Systems Design & Implementation (2016)
15.
Zurück zum Zitat Ghemawat, S., Gobioff, H., Leung, S.T.: The google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, pp. 29–43 (2003) Ghemawat, S., Gobioff, H., Leung, S.T.: The google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, pp. 29–43 (2003)
16.
Zurück zum Zitat Han, S., Egi, N., Panda, A., Ratnasamy, S., Shi, G., Shenker, S.: Network support for resource disaggregation in next-generation datacenters. In: Twelfth ACM Workshop on Hot Topics in Networks (2013) Han, S., Egi, N., Panda, A., Ratnasamy, S., Shi, G., Shenker, S.: Network support for resource disaggregation in next-generation datacenters. In: Twelfth ACM Workshop on Hot Topics in Networks (2013)
17.
Zurück zum Zitat Hong, Y.J., Thottethodi, M.: Understanding and mitigating the impact of load imbalance in the memory caching tier. In: Symposium on Cloud Computing (2013) Hong, Y.J., Thottethodi, M.: Understanding and mitigating the impact of load imbalance in the memory caching tier. In: Symposium on Cloud Computing (2013)
18.
Zurück zum Zitat Huang, S., Wei, Q., Chen, J., Chen, C., Feng, D.: Improving flash-based disk cache with lazy adaptive replacement. In: Mass Storage Systems & Technologies (2013) Huang, S., Wei, Q., Chen, J., Chen, C., Feng, D.: Improving flash-based disk cache with lazy adaptive replacement. In: Mass Storage Systems & Technologies (2013)
19.
Zurück zum Zitat Islam, N.S., Lu, X., Wasi-Ur-Rahman, M., Shankar, D., Panda, D.K.: Triple-H: a hybrid approach to accelerate HDFS on HPC clusters with heterogeneous storage architecture. In: IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 101–110 (2015) Islam, N.S., Lu, X., Wasi-Ur-Rahman, M., Shankar, D., Panda, D.K.: Triple-H: a hybrid approach to accelerate HDFS on HPC clusters with heterogeneous storage architecture. In: IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 101–110 (2015)
20.
Zurück zum Zitat Jiang, S., Zhang, X.: LIRS: an efficient low inter-reference recency set replacement policy to improve buffer cache performance. ACM SIGMETRICS Perform. Eval. Rev. 30(1), 31–42 (2002)CrossRef Jiang, S., Zhang, X.: LIRS: an efficient low inter-reference recency set replacement policy to improve buffer cache performance. ACM SIGMETRICS Perform. Eval. Rev. 30(1), 31–42 (2002)CrossRef
21.
Zurück zum Zitat Joshi, G., Liu, Y., Soljanin, E.: On the delay-storage trade-off in content download from coded distributed storage systems. IEEE J. Sel. Areas Commun. 32(5), 989–997 (2014)CrossRef Joshi, G., Liu, Y., Soljanin, E.: On the delay-storage trade-off in content download from coded distributed storage systems. IEEE J. Sel. Areas Commun. 32(5), 989–997 (2014)CrossRef
22.
Zurück zum Zitat Kakoulli, E., Herodotou, H.: OctopusFS: a distributed file system with tiered storage management. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 65–78. ACM (2017) Kakoulli, E., Herodotou, H.: OctopusFS: a distributed file system with tiered storage management. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 65–78. ACM (2017)
23.
Zurück zum Zitat Krish, K.R., Anwar, A., Butt, A.R.: hatS: a heterogeneity-aware tiered storage for hadoop. In: IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 502–511 (2014) Krish, K.R., Anwar, A., Butt, A.R.: hatS: a heterogeneity-aware tiered storage for hadoop. In: IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 502–511 (2014)
24.
Zurück zum Zitat Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44, 35–40 (2010)CrossRef Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44, 35–40 (2010)CrossRef
25.
Zurück zum Zitat Lee, D., et al.: LRFU: a spectrum of policies that subsumes the least recently used and least frequently used policies. IEEE Trans. Comput. 50(12), 1352–1361 (2001)MathSciNetCrossRef Lee, D., et al.: LRFU: a spectrum of policies that subsumes the least recently used and least frequently used policies. IEEE Trans. Comput. 50(12), 1352–1361 (2001)MathSciNetCrossRef
26.
Zurück zum Zitat Megiddo, N., Modha, D.S.: ARC: a self-tuning, low overhead replacement cache. In: Usenix Conference on File & Storage Technologies (2003) Megiddo, N., Modha, D.S.: ARC: a self-tuning, low overhead replacement cache. In: Usenix Conference on File & Storage Technologies (2003)
27.
Zurück zum Zitat Paiva, J., Ruivo, P., Romano, P., Rodrigues, L.: AUTOPLACER: scalable self-tuning data placement in distributed key-value stores. ACM Trans. Auton. Adapt. Syst. 9(4), 19 (2014) Paiva, J., Ruivo, P., Romano, P., Rodrigues, L.: AUTOPLACER: scalable self-tuning data placement in distributed key-value stores. ACM Trans. Auton. Adapt. Syst. 9(4), 19 (2014)
28.
Zurück zum Zitat Rashmi, K.V., Chowdhury, M., Kosaian, J., Stoica, I., Ramchandran, K.: EC-Cache: load-balanced, low-latency cluster caching with online erasure coding. In: Usenix Conference on Operating Systems Design & Implementation (2016) Rashmi, K.V., Chowdhury, M., Kosaian, J., Stoica, I., Ramchandran, K.: EC-Cache: load-balanced, low-latency cluster caching with online erasure coding. In: Usenix Conference on Operating Systems Design & Implementation (2016)
29.
Zurück zum Zitat Reiss, C., Tumanov, A., Ganger, G.R., Katz, R.H., Kozuch, M.A.: Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In: ACM Symposium on Cloud Computing (2012) Reiss, C., Tumanov, A., Ganger, G.R., Katz, R.H., Kozuch, M.A.: Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In: ACM Symposium on Cloud Computing (2012)
30.
Zurück zum Zitat Shu, P., Gu, R., Dong, Q., Yuan, C., Huang, Y.: Accelerating big data applications on tiered storage system with various eviction policies. In: IEEE Trustcom/BigDataSE/ISPA (2016) Shu, P., Gu, R., Dong, Q., Yuan, C., Huang, Y.: Accelerating big data applications on tiered storage system with various eviction policies. In: IEEE Trustcom/BigDataSE/ISPA (2016)
31.
Zurück zum Zitat Weng, M., Shang, Y., Tian, Y.: The design and implementation of LRU-based web cache. In: International Conference on Communications and NETWORKING in China, pp. 400–404 (2013) Weng, M., Shang, Y., Tian, Y.: The design and implementation of LRU-based web cache. In: International Conference on Communications and NETWORKING in China, pp. 400–404 (2013)
32.
Zurück zum Zitat Xiang, Y., Lan, T., Aggarwal, V., Chen, Y.F.R.: Joint latency and cost optimization for erasurecoded data center storage. ACM SIGMETRICS Perform. Eval. Rev. 42(2), 3–14 (2014)CrossRef Xiang, Y., Lan, T., Aggarwal, V., Chen, Y.F.R.: Joint latency and cost optimization for erasurecoded data center storage. ACM SIGMETRICS Perform. Eval. Rev. 42(2), 3–14 (2014)CrossRef
33.
Zurück zum Zitat Yu, C., Luo, Y., Haratsch, E.F., Mai, K., Mutlu, O.: Data retention in MLC NAND flash memory: characterization, optimization, and recovery. In: IEEE International Symposium on High Performance Computer Architecture (2015) Yu, C., Luo, Y., Haratsch, E.F., Mai, K., Mutlu, O.: Data retention in MLC NAND flash memory: characterization, optimization, and recovery. In: IEEE International Symposium on High Performance Computer Architecture (2015)
34.
Zurück zum Zitat Yu, Y., Huang, R., Wang, W., Zhang, J., Letaief, K.B.: SP-cache: load-balanced, redundancy-free cluster caching with selective partition. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis (2018) Yu, Y., Huang, R., Wang, W., Zhang, J., Letaief, K.B.: SP-cache: load-balanced, redundancy-free cluster caching with selective partition. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis (2018)
35.
Zurück zum Zitat Zaman, S., Grosu, D.: A distributed algorithm for the replica placement problem. IEEE Trans. Parallel Distrib. Syst. 22, 1455–1468 (2011)CrossRef Zaman, S., Grosu, D.: A distributed algorithm for the replica placement problem. IEEE Trans. Parallel Distrib. Syst. 22, 1455–1468 (2011)CrossRef
36.
Zurück zum Zitat Zhou, J., Xie, W., Dai, D., Chen, Y.: Pattern-directed replication scheme for heterogeneous object-based storage. In: IEEE/ACM International Symposium on Cluster (2017) Zhou, J., Xie, W., Dai, D., Chen, Y.: Pattern-directed replication scheme for heterogeneous object-based storage. In: IEEE/ACM International Symposium on Cluster (2017)
Metadaten
Titel
HSPP: Load-Balanced and Low-Latency File Partition and Placement Strategy on Distributed Heterogeneous Storage with Erasure Coding
verfasst von
Jiazhao Sun
Yunchun Li
Hailong Yang
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-38961-1_18