nach oben

The Journal of Supercomputing

Erschienen in:

01.04.2015

Addressing characterization methods for memory contention aware co-scheduling

verfasst von: Andreas de Blanche, Thomas Lundqvist

Erschienen in: The Journal of Supercomputing | Ausgabe 4/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The ability to precisely predict how memory contention degrades performance when co-scheduling programs is critical for reaching high performance levels in cluster, grid and cloud environments. In this paper we present an overview and compare the performance of state-of-the-art characterization methods for memory aware (co-)scheduling. We evaluate the prediction accuracy and co-scheduling performance of four methods: one slowdown-based, two cache-contention based and one based on memory bandwidth usage. Both our regression analysis and scheduling simulations find that the slowdown based method, represented by Memgen, performs better than the other methods. The linear correlation coefficient \(R^2\) of Memgen’s prediction is 0.890. Memgen’s preferred schedules reached 99.53 % of the obtainable performance on average. Also, the memory bandwidth usage method performed almost as well as the slowdown based method. Furthermore, while most prior work promote characterization based on cache miss rate we found it to be on par with random scheduling of programs and highly unreliable.

Vorheriger Artikel A hyper-heuristic approach for resource provisioning-based scheduling in grid environment

Nächster Artikel Gem5v: a modified gem5 for simulating virtualized systems

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Akyil L et al (2012) Memory management and programming tools. In: Intel guide for developing multithreaded applications, Intel Corporation, pp 1–133. http://software.intel.com/en-us/articles/intel-guide-for-developing-multithreaded-applications

Antonopoulos CD, Nikolopoulos DS, Papatheodorou TS (2004) Realistic workload scheduling policies for taming the memory bandwidth bottleneck of smps., International conference on high performance computing, Springer, Berlin

Araiza R, Aguilera MG, Pham T, Teller PJ (2005) Towards a cross-platform microbenchmark suite for evaluating hardware performance counter data. In: Proceedings of the 2005 conference on diversity in computing, ACM, New York, NY, USA, TAPIA ’05, pp 36–39. doi:10.1145/1095242.1095259

Blagodurov S, Zhuravlev S, Fedorova A (2010) Contention-aware scheduling on multicore systems. ACM Trans Comput Syst 28(4):8:1–8:45. doi:10.1145/1880018.1880019 CrossRef

de Blanche A, Lundqvist T (2014) A methodology for estimating co-scheduling slowdowns due to memory bus contention on multicore nodes. In: International conference on parallel and distributed computing and networks

de Blanche A, Mankefors-Christiernin S (2010) Method for experimental measurement of an applications memory bus usage. In: International conference on parallel and distributed processing techniques and applications, CRSEA

Boklund A, Jiresjo C, Mankefors-Christiernin S, Namaki N, Gustavsson-Christiernin L, Ebbmar M (2005) Performance of network subsystems for technical simulation on linux clusters. In: Conference on parallel and distributed computing and systems, pp 503–509

Boklund A, Namaki N, Mankefors-Christiernin S, Gustafsson J, Lingbrand M (2008) Dual core efficiency for engineering simulation applications. In: International conference on parallel and distributed processing techniques and applications, pp 962–968

Browne S, Dongarra J, Garner N, London K, Mucci P (2000) A portable programming interface for performance evaluation on modern processors. Int J High Perform Comput Appl 14:189–204CrossRef

10.

Cascaval C, Rose LD, Padua DA, Reed DA (2000) Compile-time based performance prediction. In: Proceedings of the 12th international workshop on languages and compilers for parallel computing, Springer, London, LCPC ’99, pp 365–379. http://dl.acm.org/citation.cfm?id=645677.663790

11.

Chandra D, Guo F, Kim S, Solihin Y (2005) Predicting inter-thread cache contention on a chip multi-processor architecture., International symposium on high-performance computer architectureIEEE Computer Society, Washington, DC, USACrossRef

12.

Daci G, Tartari M (2013) A comparative review of contention-aware scheduling algorithms to avoid contention in multicore systems. In: Das VV (ed) Proceedings of the third international conference on trends in information, telecommunication and computing, vol 150, lecture notes in electrical engineering, Springer, New York, pp 99–106

13.

Eklov D, Nikoleris N, Black-Schaffer D, Hagersten E (2011) Cache pirating: measuring the curse of the shared cache. In: Parallel processing (ICPP), 2011 International conference on, pp 165–175. doi:10.1109/ICPP.2011.15

14.

Eklov D, Nikoleris N, Black-Schaffer D, Hagersten E (2012) Bandwidth bandit: quantitative characterization of memory contention. In: Proceedings of the 21st international conference on parallel architectures and compilation techniques, ACM, New York, PACT ’12, pp 457–458. doi:10.1145/2370816.2370894

15.

Eranian S (2008) What can performance counters do for memory subsystem analysis? ACM SIGPLAN workshop on Memory systems performance and correctness: in conjunction with the thirteenth international conference on architectural support for programming languages and operating systems. ACM, New York, pp 26–30

16.

Fedorova A, Blagodurov S, Zhuravlev S (2010) Managing contention for shared resources on multicore processors. Commun ACM 53(2):49–57. doi:10.1145/1646353.1646371 CrossRef

17.

Field D, Johnson D, Mize D, Stober R (2007) Scheduling to overcome the multi-core memory bandwidth bottleneck. Hewlett Packard and Platform Computing White Paper

18.

Guo F (2008) Analyzing and managing shared cache in chip multi-processors. PhD thesis, North Carolina State University

19.

Hoste K, Eeckhout L (2007) Microarchitecture-independent workload characterization. IEEE Micro 27(3):63–72. doi:10.1109/MM.2007.56 CrossRef

20.

Iyer R, Zhao L, Guo F, Illikkal R, Makineni S, Newell D, Solihin Y, Hsu L, Reinhardt S (2007) Qos policies and architecture for cache/memory in cmp platforms. SIGMETRICS Perform Eval Rev 35(1):25–36. doi:10.1145/1269899.1254886 CrossRef

21.

Jia G, Sheng W, Dai W, Li X (2011) Using fom predicting method for scheduling on chip multi-processor. In: Communication software and networks (ICCSN), 2011 IEEE 3rd international conference on, pp 579–584. doi:10.1109/ICCSN.2011.6013973

22.

Jiang Y, Shen X, Chen J, Tripathi R (2008) Analysis and approximation of optimal co-scheduling on chip multiprocessors. International conference on parallel architectures and compilation techniques. NY, USA, New York, pp 220–229

23.

Koller R, Verma A, Rangaswami R (2011) Estimating application cache requirement for provisioning caches in virtualized systems. In: Modeling, analysis simulation of computer and telecommunication systems (MASCOTS), 2011 IEEE 19th international symposium on, pp 55–62. doi:10.1109/MASCOTS.2011.67

24.

Koukis E, Koziris N (2006) Memory and network bandwidth aware scheduling of multiprogrammed workloads on clusters of smps. International conference on parallel and distributed systems, vol 1. IEEE Computer Society, Washington, DC, pp 345–354

25.

Levinthal D (2007) Performance analysis guide for intel core i7 processor and intel xeon 5500 processors. Intel White Paper, from internet 2014. http://software.intel.com/sites/products/collateral/hpc/vtune/resolving_multicore_non_scaling.pdf

26.

Levinthal D (2009) Analyzing and resolving multi-core non scaling on intel core 2 processors. Intel White Paper, from internet 2014. https://software.intel.com/sites/products/collateral/hpc/vtun/performance_analysis_guide.pdf

27.

Liu X, Tong W, Zhi X, ZhiRen F, WenZhao L (2014) Performance analysis of cloud computing services considering resources sharing among virtual machines. J Supercomput 69(1):357–374. doi:10.1007/s11227-014-1156-3 CrossRef

28.

Mars J, Vachharajani N, Hundt R, Soffa ML (2010) Contention aware execution: online contention detection and response. In: CGO ’10: proceedings of the 2010 international symposium on code generation and optimization, ACM, New York, pp 257–265. doi:10.1145/1772954.1772991

29.

Mars J, Tang L, Hundt R, Skadron K, Soffa ML (2011) Bubble-up: increasing utilization in modern warehouse scale computers via sensible co-locations. In: MICRO ’11: proceedings of the 44th annual IEEE/ACM international symposium on microarchitecture, ACM, New York

30.

Mars J, Tang L, Hundt R, Skadron K, Soffa ML (2012) Increasing utilization in warehouse scale computers using bubbleup. IEEE Micro

31.

McCalpin JD (1995) Memory bandwidth and machine balance in current high performance computers. In: IEEE computer society technical committee on computer architecture newsletter pp 19–25

32.

Namaki N, de Blanche A, Mankefors-Christiernin S (2009a) Exhaustion dominated performance: a first attempt. In: Proceedings of the 2009 ACM symposium on applied computing, ACM, New York, SAC ’09, pp 1011–1012. doi:10.1145/1529282.1529504

33.

Namaki N, de Blanche A, Mankefors-Christiernin S (2009b) A tool for processor dependency characterization of hpc applications. In: International Conference HPC Asia 2009

34.

Namaki N, de Blanche A, Mankefors-Christiernin S (2010) Black-box characterization of processor workloads for engineering applications. In: IEEE international symposium on workload characterization, IEEE

35.

Niemi T, Hameri AP (2012) Memory-based scheduling of scientific computing clusters. J Supercomput 61(3):520–544. doi:10.1007/s11227-011-0612-6 CrossRef

36.

Publications NASD (2009) Nas parallel benchmarks. http://www.nas.nasa.gov/publications/npb.html

37.

Singer N (2009) More chip cores can mean slower supercomputing, sandia simulation shows. Sandia National Laboratories News Release

38.

Tam DK, Azimi R, Soares LB, Stumm M (2009) Rapidmrc: approximating l2 miss rate curves on commodity systems for online optimizations. In: Proceedings of the 14th international conference on architectural support for programming languages and operating systems, ACM, New York, ASPLOS XIV, pp 121–132. doi:10.1145/1508244.1508259

39.

Tang L, Mars J, Vachharajani N, Hundt R, Soffa ML (2011) The impact of memory subsystem resource sharing on datacenter applications. In: ISCA ’11: Proceeding of the 38th annual international symposium on computer architecture, ACM, New York, ISCA ’11, pp 283–294. doi:10.1145/2000064.2000099

40.

Utrera G, Corbalan J, Labarta J (2014) Scheduling parallel jobs on multicore clusters using cpu oversubscription. J Supercomput 68(3):1113–1140. doi:10.1007/s11227-014-1142-9 CrossRef

41.

Xu D, Wu C, Yew PC (2010) On mitigating memory bandwidth contention through bandwidth-aware scheduling. International conference on parallel architectures and compilation techniques. New York, USA, pp 237–248

42.

Yang CT, Leu FY, Chen SY (2010) Network bandwidth-aware job scheduling with dynamic information model for grid resource brokers. J Supercomput 52(3):199–223. doi:10.1007/s11227-008-0256-3 CrossRef

43.

Yang LT, Ma X, Mueller F (2005) Cross-platform performance prediction of parallel applications using partial execution. In: Proceedings of the 2005 ACM/IEEE conference on supercomputing, IEEE Computer Society, Washington, DC, USA, SC ’05. doi:10.1109/SC.2005.20

44.

Zhuravlev S, Blagodurov S, Fedorova A (2010) Addressing shared resource contention in multicore processors via scheduling., ASPLOS on Architectural support for programming languages and operating systems.ACM, New YorkCrossRef

45.

Zhuravlev S, Saez JC, Blagodurov S, Fedorova A, Prieto M (2012) Survey of scheduling techniques for addressing shared resources in multicore processors. ACM Comput Surv 45(1):4:1–4:28. doi:10.1145/2379776.2379780 CrossRef

Titel: Addressing characterization methods for memory contention aware co-scheduling
verfasst von: Andreas de Blanche
Thomas Lundqvist
Publikationsdatum: 01.04.2015
Verlag: Springer US
Erschienen in: The Journal of Supercomputing / Ausgabe 4/2015
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-014-1374-8

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 4/2015

Gravitational search algorithm using CUDA: a case study in high-performance metaheuristics

Task scheduling for grid computing systems using a genetic algorithm

Gem5v: a modified gem5 for simulating virtualized systems

A fault-tolerant routing algorithm in HyperX topology based on unsafety vectors

Efficient task scheduling algorithms for heterogeneous multi-cloud environment

A hyper-heuristic approach for resource provisioning-based scheduling in grid environment

Premium Partner