nach oben

International Journal of Parallel Programming

Erschienen in:

01.06.2015

CCAP: A Cache Contention-Aware Virtual Machine Placement Approach for HPC Cloud

verfasst von: Hai Jin, Hanfeng Qin, Song Wu, Xuerong Guo

Erschienen in: International Journal of Parallel Programming | Ausgabe 3/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Applications in High Performance Computing (HPC) cloud are characterized by large cache resource consumption due to large-scale inputs and intensive communications, which creates serious Shared Last Level cache (SLLC) performance bottleneck. Current system software stacks are not efficient in addressing this issue among virtual machines at the hypervisor level or the threads at the operating system level. In this paper, we investigate performance interference due to contention for SLLC in the HPC cloud. We employ an enhanced reuse distance analysis technique with an accelerated cyclic compression algorithm to identify application’s cache interference intensity. Based on reuse distance analysis, we propose a practical Cache Contention-Aware virtual machine Placement approach (CCAP). CCAP dispatches virtual machines according to their cache interference intensities to avoid cache pollution and interference, thus alleviating negative effects of cache contention. We implement CCAP in the Xen hypervisor. Evaluation of NPB workload reveals that CCAP can improve performance of cache sensitive applications when they are co-scheduled with cache pollution programs. For a 2-workload system, it reduces execution time by 12 %, as well as cache miss rate by 13 %, while increasing throughput by 13 %, on average. Moreover, CCAP also improves the average performance of the cache pollution programs by 5 %. For a 4-workload system, CCAP brings more significant performance improvement to cache sensitive applications, an average increase of 20 %.

Vorheriger Artikel Power Efficiency for Hardware/Software Partitioning with Time and Area Constraints on MPSoC

Nächster Artikel Efficiently Restoring Virtual Machines

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Alarm, S., Barrett, R.F., Kuehn, J.A., Roth, P.C., Vetter, J.S.: Characterization of scientific workloads on systems with multi-core processors. In: Proceedings of IEEE International Symposium on Workload Characterization (IISWC’06), pp. 225–236. IEEE (2006)

Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., et al.: The nas parallel benchmarks—summary and preliminary results. In: Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (SC’91), pp. 158–165. ACM (1991)

Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., Warfield, A.: Xen and the art of virtualization. In: Proceedings of the 9th ACM Symposium on Operating Systems Principles (SOSP’03), pp. 164–177. ACM (2003)

Barker, D.P.: Realities of multi-core CPU chips and memory contention. In: Proceedings of the 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP’2009), pp. 446–453. IEEE (2009)

Borkar, S.: Thousand core chips: a technology perspective. In: Proceedings of the 44th Annual Design Automation Conference, pp. 746–749. ACM (2007)

Chandra, D., Guo, F., Kim, S., Solihin, Y.: Predicting inter-thread cache contention on a chip multi-processor architecture. In: Proceedings of the 11th International Symposium on High-Performance Computer Architecture (HPCA’05), pp. 340–351. IEEE (2005)

Chang, J., Sohi, G.S.: Cooperative cache partitioning for chip multiprocessors. In: Proceedings of the 21st Annual International Conference on Supercomputing (SC’07), pp. 242–252. ACM (2007)

Cohen, W.E.: Tuning programs with oprofile. Wide Open Mag. 1, 53–62 (2004)

Ding, C., Zhong, Y.: Predicting whole-program locality through reuse distance analysis. In: Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation (PLDI’03), pp. 245–257. ACM (2003)

10.

Duong, N., Zhao, D., Kim, T., Cammarota, R., Valero, M., Veidenbaum, A.V.: Improving cache management policies using dynamic reuse distances. In: Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’12), pp. 389–400. IEEE (2012)

11.

Fedorova, A., Seltzer, M., Smith, M.D.: Improving performance isolation on chip multiprocessors via an operating system scheduler. In: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques (PACT’07), pp. 25–38. IEEE (2007)

12.

Goldberg, R.P.: Survey of virtual machine research. Computer 7(6), 34–45 (1974)CrossRef

13.

Guo, F., Kannan, H., Zhao, L., Illikkal, R., Iyer, R., Newell, D., Solihin, Y., Kozyrakis, C.: From chaos to QoS: case studies in CMP resource management. ACM SIGARCH Comput. Archit. News 35(1), 21–30 (2007)CrossRef

14.

Hao, S., Du, Z., Bader, D.A., Ye, Y.: A partition-merge based cache-conscious parallel sorting algorithm for CMP with shared cache. In: Proceedings of the 38th International Conference on Parallel Processing (ICPP’09), pp. 396–403. IEEE (2009)

15.

Hsu, L.R., Reinhardt, S.K., Iyer, R., Makineni, S.: Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In: Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT’06), pp. 13–22. ACM (2006)

16.

Hsu, W.C., Chen, H., Yew, P.C., Chen, D.Y.: On the predictability of program behavior using different input data sets. In: Proceedings of 6th Annual Workshop on Interaction Between Compilers and Computer Architectures, pp. 45–53. IEEE (2002)

17.

Iyer, R.: CQoS: a framework for enabling Qos in shared caches of CMP platforms. In: Proceedings of the 18th Annual International Conference on Supercomputing (SC’04), pp. 257–266. ACM (2004)

18.

Jahre, M., Natvig, L.: A light-weight fairness mechanism for chip multiprocessor memory systems. In: Proceedings of the 6th ACM Conference on Computing Frontiers (CF’09), pp. 1–10. ACM (2009)

19.

Jaleel, A., Theobald, K.B., Steely, S.C. Jr., Emer, J.: High performance cache replacement using re-reference interval prediction (rrip). In: Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA’10), pp. 60–71. ACM (2010)

20.

Kim, S., Chandra, D., Solihin, Y.: Fair cache sharing and partitioning in a chip multiprocessor architecture. In: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT’04), pp. 111–122. IEEE (2004)

21.

Lu, Q., Lin, J., Ding, X., Zhang, Z., Zhang, X., Sadayappan, P.: Soft-olp: improving hardware cache performance through software-controlled object-level partitioning. In: Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques (PACT’09), pp. 246–257. IEEE (2009)

22.

Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: building customized program analysis tools with dynamic instrumentation. In: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’05), pp. 190–200. ACM (2005)

23.

Mattson, R.L., Gecsei, J., Slutz, D.R., Traiger, I.L.: Evaluation techniques for storage hierarchies. IBM Syst. J. 9(2), 78–117 (1970)CrossRef

24.

Nesbit, K.J., Laudon, J., Smith, J.E.: Virtual private caches. In: Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA’07), pp. 57–68. ACM (2007)

25.

Nesbit, K.J., Moreto, M., Cazorla, F.J., Ramirez, A., Valero, M., Smith, J.E.: Multicore resource management. IEEE Micro 28(3), 6–16 (2008)CrossRef

26.

Qureshi, M.K., Patt, Y.N.: Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches. In: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’06), pp. 423–432. IEEE (2006)

27.

Rosenblum, M., Garfinkel, T.: Virtual machine monitors: current technology and future trends. Computer 38(5), 39–47 (2005)CrossRef

28.

Schuff, D.L., Parsons, B.S., Pai, V.S.: Multicore-aware reuse distance analysis. In: Proceedings of 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Ph.d. Forum (IPDPSW’10), pp. 1–8. IEEE (2010)

29.

Smith, J.E., Nair, R.: The architecture of virtual machines. Computer 38(5), 32–38 (2005)CrossRef

30.

Soares, L., Tam, D., Stumm, M.: Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer. In: Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’08), pp. 258–269. IEEE (2008)

31.

Suo, G., Yang, X., Liu, G., Wu, J., Zeng, K., Zhang, B., Lin, Y.: IPC-based cache partitioning: an IPC-oriented dynamic shared cache partitioning mechanism. In: Proceedings of the 2008 3rd International Conference on Convergence and Hybrid Information Technology (ICHIT’08), pp. 399–406. IEEE (2008)

32.

Zhong, Y., Dropsho, S.G., Shen, X., Studer, A., Ding, C.: Miss rate prediction across program inputs and cache configurations. IEEE Trans. Comput. 56(3), 328–343 (2007)CrossRefMathSciNet

33.

Zhuravlev, S., Blagodurov, S., Fedorova, A.: Addressing shared resource contention in multicore processors via scheduling. In: Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’10), pp. 129–142. ACM (2010)

Titel: CCAP: A Cache Contention-Aware Virtual Machine Placement Approach for HPC Cloud
verfasst von: Hai Jin
Hanfeng Qin
Song Wu
Xuerong Guo
Publikationsdatum: 01.06.2015
Verlag: Springer US
Erschienen in: International Journal of Parallel Programming / Ausgabe 3/2015
Print ISSN: 0885-7458
Elektronische ISSN: 1573-7640
DOI: https://doi.org/10.1007/s10766-013-0286-1

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 3/2015

Inaccuracy in Private BitTorrent Measurements

YuruBackup: A Space-Efficient and Highly Scalable Incremental Backup System in the Cloud

OFScheduler: A Dynamic Network Optimizer for MapReduce in Heterogeneous Cluster

A Parallel Job Execution Time Estimation Approach Based on User Submission Patterns within Computational Grids

A Virtualization Based Monitoring System for Mini-intrusive Live Forensics

Data Reduction Analysis for Climate Data Sets