Skip to main content
Erschienen in: The Journal of Supercomputing 4/2015

01.04.2015

Addressing characterization methods for memory contention aware co-scheduling

verfasst von: Andreas de Blanche, Thomas Lundqvist

Erschienen in: The Journal of Supercomputing | Ausgabe 4/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The ability to precisely predict how memory contention degrades performance when co-scheduling programs is critical for reaching high performance levels in cluster, grid and cloud environments. In this paper we present an overview and compare the performance of state-of-the-art characterization methods for memory aware (co-)scheduling. We evaluate the prediction accuracy and co-scheduling performance of four methods: one slowdown-based, two cache-contention based and one based on memory bandwidth usage. Both our regression analysis and scheduling simulations find that the slowdown based method, represented by Memgen, performs better than the other methods. The linear correlation coefficient \(R^2\) of Memgen’s prediction is 0.890. Memgen’s preferred schedules reached 99.53 % of the obtainable performance on average. Also, the memory bandwidth usage method performed almost as well as the slowdown based method. Furthermore, while most prior work promote characterization based on cache miss rate we found it to be on par with random scheduling of programs and highly unreliable.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Antonopoulos CD, Nikolopoulos DS, Papatheodorou TS (2004) Realistic workload scheduling policies for taming the memory bandwidth bottleneck of smps., International conference on high performance computing, Springer, Berlin Antonopoulos CD, Nikolopoulos DS, Papatheodorou TS (2004) Realistic workload scheduling policies for taming the memory bandwidth bottleneck of smps., International conference on high performance computing, Springer, Berlin
3.
Zurück zum Zitat Araiza R, Aguilera MG, Pham T, Teller PJ (2005) Towards a cross-platform microbenchmark suite for evaluating hardware performance counter data. In: Proceedings of the 2005 conference on diversity in computing, ACM, New York, NY, USA, TAPIA ’05, pp 36–39. doi:10.1145/1095242.1095259 Araiza R, Aguilera MG, Pham T, Teller PJ (2005) Towards a cross-platform microbenchmark suite for evaluating hardware performance counter data. In: Proceedings of the 2005 conference on diversity in computing, ACM, New York, NY, USA, TAPIA ’05, pp 36–39. doi:10.​1145/​1095242.​1095259
5.
Zurück zum Zitat de Blanche A, Lundqvist T (2014) A methodology for estimating co-scheduling slowdowns due to memory bus contention on multicore nodes. In: International conference on parallel and distributed computing and networks de Blanche A, Lundqvist T (2014) A methodology for estimating co-scheduling slowdowns due to memory bus contention on multicore nodes. In: International conference on parallel and distributed computing and networks
6.
Zurück zum Zitat de Blanche A, Mankefors-Christiernin S (2010) Method for experimental measurement of an applications memory bus usage. In: International conference on parallel and distributed processing techniques and applications, CRSEA de Blanche A, Mankefors-Christiernin S (2010) Method for experimental measurement of an applications memory bus usage. In: International conference on parallel and distributed processing techniques and applications, CRSEA
7.
Zurück zum Zitat Boklund A, Jiresjo C, Mankefors-Christiernin S, Namaki N, Gustavsson-Christiernin L, Ebbmar M (2005) Performance of network subsystems for technical simulation on linux clusters. In: Conference on parallel and distributed computing and systems, pp 503–509 Boklund A, Jiresjo C, Mankefors-Christiernin S, Namaki N, Gustavsson-Christiernin L, Ebbmar M (2005) Performance of network subsystems for technical simulation on linux clusters. In: Conference on parallel and distributed computing and systems, pp 503–509
8.
Zurück zum Zitat Boklund A, Namaki N, Mankefors-Christiernin S, Gustafsson J, Lingbrand M (2008) Dual core efficiency for engineering simulation applications. In: International conference on parallel and distributed processing techniques and applications, pp 962–968 Boklund A, Namaki N, Mankefors-Christiernin S, Gustafsson J, Lingbrand M (2008) Dual core efficiency for engineering simulation applications. In: International conference on parallel and distributed processing techniques and applications, pp 962–968
9.
Zurück zum Zitat Browne S, Dongarra J, Garner N, London K, Mucci P (2000) A portable programming interface for performance evaluation on modern processors. Int J High Perform Comput Appl 14:189–204CrossRef Browne S, Dongarra J, Garner N, London K, Mucci P (2000) A portable programming interface for performance evaluation on modern processors. Int J High Perform Comput Appl 14:189–204CrossRef
11.
Zurück zum Zitat Chandra D, Guo F, Kim S, Solihin Y (2005) Predicting inter-thread cache contention on a chip multi-processor architecture., International symposium on high-performance computer architectureIEEE Computer Society, Washington, DC, USACrossRef Chandra D, Guo F, Kim S, Solihin Y (2005) Predicting inter-thread cache contention on a chip multi-processor architecture., International symposium on high-performance computer architectureIEEE Computer Society, Washington, DC, USACrossRef
12.
Zurück zum Zitat Daci G, Tartari M (2013) A comparative review of contention-aware scheduling algorithms to avoid contention in multicore systems. In: Das VV (ed) Proceedings of the third international conference on trends in information, telecommunication and computing, vol 150, lecture notes in electrical engineering, Springer, New York, pp 99–106 Daci G, Tartari M (2013) A comparative review of contention-aware scheduling algorithms to avoid contention in multicore systems. In: Das VV (ed) Proceedings of the third international conference on trends in information, telecommunication and computing, vol 150, lecture notes in electrical engineering, Springer, New York, pp 99–106
13.
Zurück zum Zitat Eklov D, Nikoleris N, Black-Schaffer D, Hagersten E (2011) Cache pirating: measuring the curse of the shared cache. In: Parallel processing (ICPP), 2011 International conference on, pp 165–175. doi:10.1109/ICPP.2011.15 Eklov D, Nikoleris N, Black-Schaffer D, Hagersten E (2011) Cache pirating: measuring the curse of the shared cache. In: Parallel processing (ICPP), 2011 International conference on, pp 165–175. doi:10.​1109/​ICPP.​2011.​15
14.
Zurück zum Zitat Eklov D, Nikoleris N, Black-Schaffer D, Hagersten E (2012) Bandwidth bandit: quantitative characterization of memory contention. In: Proceedings of the 21st international conference on parallel architectures and compilation techniques, ACM, New York, PACT ’12, pp 457–458. doi:10.1145/2370816.2370894 Eklov D, Nikoleris N, Black-Schaffer D, Hagersten E (2012) Bandwidth bandit: quantitative characterization of memory contention. In: Proceedings of the 21st international conference on parallel architectures and compilation techniques, ACM, New York, PACT ’12, pp 457–458. doi:10.​1145/​2370816.​2370894
15.
Zurück zum Zitat Eranian S (2008) What can performance counters do for memory subsystem analysis? ACM SIGPLAN workshop on Memory systems performance and correctness: in conjunction with the thirteenth international conference on architectural support for programming languages and operating systems. ACM, New York, pp 26–30 Eranian S (2008) What can performance counters do for memory subsystem analysis? ACM SIGPLAN workshop on Memory systems performance and correctness: in conjunction with the thirteenth international conference on architectural support for programming languages and operating systems. ACM, New York, pp 26–30
17.
Zurück zum Zitat Field D, Johnson D, Mize D, Stober R (2007) Scheduling to overcome the multi-core memory bandwidth bottleneck. Hewlett Packard and Platform Computing White Paper Field D, Johnson D, Mize D, Stober R (2007) Scheduling to overcome the multi-core memory bandwidth bottleneck. Hewlett Packard and Platform Computing White Paper
18.
Zurück zum Zitat Guo F (2008) Analyzing and managing shared cache in chip multi-processors. PhD thesis, North Carolina State University Guo F (2008) Analyzing and managing shared cache in chip multi-processors. PhD thesis, North Carolina State University
20.
Zurück zum Zitat Iyer R, Zhao L, Guo F, Illikkal R, Makineni S, Newell D, Solihin Y, Hsu L, Reinhardt S (2007) Qos policies and architecture for cache/memory in cmp platforms. SIGMETRICS Perform Eval Rev 35(1):25–36. doi:10.1145/1269899.1254886 CrossRef Iyer R, Zhao L, Guo F, Illikkal R, Makineni S, Newell D, Solihin Y, Hsu L, Reinhardt S (2007) Qos policies and architecture for cache/memory in cmp platforms. SIGMETRICS Perform Eval Rev 35(1):25–36. doi:10.​1145/​1269899.​1254886 CrossRef
21.
Zurück zum Zitat Jia G, Sheng W, Dai W, Li X (2011) Using fom predicting method for scheduling on chip multi-processor. In: Communication software and networks (ICCSN), 2011 IEEE 3rd international conference on, pp 579–584. doi:10.1109/ICCSN.2011.6013973 Jia G, Sheng W, Dai W, Li X (2011) Using fom predicting method for scheduling on chip multi-processor. In: Communication software and networks (ICCSN), 2011 IEEE 3rd international conference on, pp 579–584. doi:10.​1109/​ICCSN.​2011.​6013973
22.
Zurück zum Zitat Jiang Y, Shen X, Chen J, Tripathi R (2008) Analysis and approximation of optimal co-scheduling on chip multiprocessors. International conference on parallel architectures and compilation techniques. NY, USA, New York, pp 220–229 Jiang Y, Shen X, Chen J, Tripathi R (2008) Analysis and approximation of optimal co-scheduling on chip multiprocessors. International conference on parallel architectures and compilation techniques. NY, USA, New York, pp 220–229
23.
Zurück zum Zitat Koller R, Verma A, Rangaswami R (2011) Estimating application cache requirement for provisioning caches in virtualized systems. In: Modeling, analysis simulation of computer and telecommunication systems (MASCOTS), 2011 IEEE 19th international symposium on, pp 55–62. doi:10.1109/MASCOTS.2011.67 Koller R, Verma A, Rangaswami R (2011) Estimating application cache requirement for provisioning caches in virtualized systems. In: Modeling, analysis simulation of computer and telecommunication systems (MASCOTS), 2011 IEEE 19th international symposium on, pp 55–62. doi:10.​1109/​MASCOTS.​2011.​67
24.
Zurück zum Zitat Koukis E, Koziris N (2006) Memory and network bandwidth aware scheduling of multiprogrammed workloads on clusters of smps. International conference on parallel and distributed systems, vol 1. IEEE Computer Society, Washington, DC, pp 345–354 Koukis E, Koziris N (2006) Memory and network bandwidth aware scheduling of multiprogrammed workloads on clusters of smps. International conference on parallel and distributed systems, vol 1. IEEE Computer Society, Washington, DC, pp 345–354
27.
Zurück zum Zitat Liu X, Tong W, Zhi X, ZhiRen F, WenZhao L (2014) Performance analysis of cloud computing services considering resources sharing among virtual machines. J Supercomput 69(1):357–374. doi:10.1007/s11227-014-1156-3 CrossRef Liu X, Tong W, Zhi X, ZhiRen F, WenZhao L (2014) Performance analysis of cloud computing services considering resources sharing among virtual machines. J Supercomput 69(1):357–374. doi:10.​1007/​s11227-014-1156-3 CrossRef
28.
Zurück zum Zitat Mars J, Vachharajani N, Hundt R, Soffa ML (2010) Contention aware execution: online contention detection and response. In: CGO ’10: proceedings of the 2010 international symposium on code generation and optimization, ACM, New York, pp 257–265. doi:10.1145/1772954.1772991 Mars J, Vachharajani N, Hundt R, Soffa ML (2010) Contention aware execution: online contention detection and response. In: CGO ’10: proceedings of the 2010 international symposium on code generation and optimization, ACM, New York, pp 257–265. doi:10.​1145/​1772954.​1772991
29.
Zurück zum Zitat Mars J, Tang L, Hundt R, Skadron K, Soffa ML (2011) Bubble-up: increasing utilization in modern warehouse scale computers via sensible co-locations. In: MICRO ’11: proceedings of the 44th annual IEEE/ACM international symposium on microarchitecture, ACM, New York Mars J, Tang L, Hundt R, Skadron K, Soffa ML (2011) Bubble-up: increasing utilization in modern warehouse scale computers via sensible co-locations. In: MICRO ’11: proceedings of the 44th annual IEEE/ACM international symposium on microarchitecture, ACM, New York
30.
Zurück zum Zitat Mars J, Tang L, Hundt R, Skadron K, Soffa ML (2012) Increasing utilization in warehouse scale computers using bubbleup. IEEE Micro Mars J, Tang L, Hundt R, Skadron K, Soffa ML (2012) Increasing utilization in warehouse scale computers using bubbleup. IEEE Micro
31.
Zurück zum Zitat McCalpin JD (1995) Memory bandwidth and machine balance in current high performance computers. In: IEEE computer society technical committee on computer architecture newsletter pp 19–25 McCalpin JD (1995) Memory bandwidth and machine balance in current high performance computers. In: IEEE computer society technical committee on computer architecture newsletter pp 19–25
32.
Zurück zum Zitat Namaki N, de Blanche A, Mankefors-Christiernin S (2009a) Exhaustion dominated performance: a first attempt. In: Proceedings of the 2009 ACM symposium on applied computing, ACM, New York, SAC ’09, pp 1011–1012. doi:10.1145/1529282.1529504 Namaki N, de Blanche A, Mankefors-Christiernin S (2009a) Exhaustion dominated performance: a first attempt. In: Proceedings of the 2009 ACM symposium on applied computing, ACM, New York, SAC ’09, pp 1011–1012. doi:10.​1145/​1529282.​1529504
33.
Zurück zum Zitat Namaki N, de Blanche A, Mankefors-Christiernin S (2009b) A tool for processor dependency characterization of hpc applications. In: International Conference HPC Asia 2009 Namaki N, de Blanche A, Mankefors-Christiernin S (2009b) A tool for processor dependency characterization of hpc applications. In: International Conference HPC Asia 2009
34.
Zurück zum Zitat Namaki N, de Blanche A, Mankefors-Christiernin S (2010) Black-box characterization of processor workloads for engineering applications. In: IEEE international symposium on workload characterization, IEEE Namaki N, de Blanche A, Mankefors-Christiernin S (2010) Black-box characterization of processor workloads for engineering applications. In: IEEE international symposium on workload characterization, IEEE
37.
Zurück zum Zitat Singer N (2009) More chip cores can mean slower supercomputing, sandia simulation shows. Sandia National Laboratories News Release Singer N (2009) More chip cores can mean slower supercomputing, sandia simulation shows. Sandia National Laboratories News Release
38.
Zurück zum Zitat Tam DK, Azimi R, Soares LB, Stumm M (2009) Rapidmrc: approximating l2 miss rate curves on commodity systems for online optimizations. In: Proceedings of the 14th international conference on architectural support for programming languages and operating systems, ACM, New York, ASPLOS XIV, pp 121–132. doi:10.1145/1508244.1508259 Tam DK, Azimi R, Soares LB, Stumm M (2009) Rapidmrc: approximating l2 miss rate curves on commodity systems for online optimizations. In: Proceedings of the 14th international conference on architectural support for programming languages and operating systems, ACM, New York, ASPLOS XIV, pp 121–132. doi:10.​1145/​1508244.​1508259
39.
Zurück zum Zitat Tang L, Mars J, Vachharajani N, Hundt R, Soffa ML (2011) The impact of memory subsystem resource sharing on datacenter applications. In: ISCA ’11: Proceeding of the 38th annual international symposium on computer architecture, ACM, New York, ISCA ’11, pp 283–294. doi:10.1145/2000064.2000099 Tang L, Mars J, Vachharajani N, Hundt R, Soffa ML (2011) The impact of memory subsystem resource sharing on datacenter applications. In: ISCA ’11: Proceeding of the 38th annual international symposium on computer architecture, ACM, New York, ISCA ’11, pp 283–294. doi:10.​1145/​2000064.​2000099
41.
Zurück zum Zitat Xu D, Wu C, Yew PC (2010) On mitigating memory bandwidth contention through bandwidth-aware scheduling. International conference on parallel architectures and compilation techniques. New York, USA, pp 237–248 Xu D, Wu C, Yew PC (2010) On mitigating memory bandwidth contention through bandwidth-aware scheduling. International conference on parallel architectures and compilation techniques. New York, USA, pp 237–248
43.
Zurück zum Zitat Yang LT, Ma X, Mueller F (2005) Cross-platform performance prediction of parallel applications using partial execution. In: Proceedings of the 2005 ACM/IEEE conference on supercomputing, IEEE Computer Society, Washington, DC, USA, SC ’05. doi:10.1109/SC.2005.20 Yang LT, Ma X, Mueller F (2005) Cross-platform performance prediction of parallel applications using partial execution. In: Proceedings of the 2005 ACM/IEEE conference on supercomputing, IEEE Computer Society, Washington, DC, USA, SC ’05. doi:10.​1109/​SC.​2005.​20
44.
Zurück zum Zitat Zhuravlev S, Blagodurov S, Fedorova A (2010) Addressing shared resource contention in multicore processors via scheduling., ASPLOS on Architectural support for programming languages and operating systems.ACM, New YorkCrossRef Zhuravlev S, Blagodurov S, Fedorova A (2010) Addressing shared resource contention in multicore processors via scheduling., ASPLOS on Architectural support for programming languages and operating systems.ACM, New YorkCrossRef
45.
Zurück zum Zitat Zhuravlev S, Saez JC, Blagodurov S, Fedorova A, Prieto M (2012) Survey of scheduling techniques for addressing shared resources in multicore processors. ACM Comput Surv 45(1):4:1–4:28. doi:10.1145/2379776.2379780 CrossRef Zhuravlev S, Saez JC, Blagodurov S, Fedorova A, Prieto M (2012) Survey of scheduling techniques for addressing shared resources in multicore processors. ACM Comput Surv 45(1):4:1–4:28. doi:10.​1145/​2379776.​2379780 CrossRef
Metadaten
Titel
Addressing characterization methods for memory contention aware co-scheduling
verfasst von
Andreas de Blanche
Thomas Lundqvist
Publikationsdatum
01.04.2015
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 4/2015
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-014-1374-8

Weitere Artikel der Ausgabe 4/2015

The Journal of Supercomputing 4/2015 Zur Ausgabe

Premium Partner