nach oben

The Journal of Supercomputing

Erschienen in:

31.10.2017

Hierarchical multicore thread mapping via estimation of remote communication

verfasst von: Hamidreza Khaleghzadeh, Hossein Deldari, Ravi Reddy, Alexey Lastovetsky

Erschienen in: The Journal of Supercomputing | Ausgabe 3/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Affinity-aware thread mapping is a method to effectively exploit cache resources in multicore processors. We propose an affinity- and architecture-aware thread mapping technique which maximizes data reuse and minimizes remote communications and cache coherency costs of multi-threaded applications. It consists of three main components: Data Sharing Estimator, Affine Mapping Finder and Maximum Speedup Predictor. Data Sharing Estimator creates application-specific data dependency signatures used by Affine Mapping Finder to determine the appropriate thread mapping of application for a given architecture. To prevent excessive thread migration, Maximum Speedup Predictor estimates the speedup of the obtained mapping and ignores it if it causes no significant performance improvement. The proposed framework is evaluated using Phoenix benchmark suite on two different multicore architectures. The proposed thread mapping approach gives 25% improvement in performance compared to default Linux scheduler. We also elucidate that affinity-based thread mapping approaches, which only consider the number of shared blocks, are not appropriate enough to accurately estimate data dependency between threads and determine the proper thread mapping.

Vorheriger Artikel Efficient design and hardware implementation of the OpenFlow v1.3 Switch on the Virtex-6 FPGA ML605

Nächster Artikel Theoretical peak FLOPS per instruction set: a tutorial

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Gepner P, Kowalik MF (2006) Multi-core processors: new way to achieve high system performance. In: International Symposium on Parallel Computing in Electrical Engineering, 2006. PAR ELEC 2006. IEEE, pp 9–13

Shukla SK, Murthy C, Chande P (2015) A survey of approaches used in parallel architectures and multi-core processors, for performance improvement. In: Progress in Systems Engineering. Springer, pp 537–545

Khammassi N, Le Lann J-C (2014) Design and implementation of a cache hierarchy-aware task scheduling for parallel loops on multicore architectures. PDCTA, Sydney, Australia

Zhang L, Liu Y, Wang R, Qian D (2014) Lightweight dynamic partitioning for last-level cache of multicore processor on real system. J Supercomput 69(2):547–560CrossRef

Sun Z, Wang R, Zhang L, Li Q, Chen L, Wu J, Liu Y (2012) Cache-aware scheduling for energy efficiency on multi-processors. In: 2012 International Conference on Computer Distributed Control and Intelligent Environmental Monitoring (CDCIEM). IEEE, pp 182–186

Ding W, Zhang Y, Kandemir M, Srinivas J, Yedlapalli P (2013) Locality-aware mapping and scheduling for multicores. In: 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, pp 1–12

Zhang EZ, Jiang Y, Shen X (2010) Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? In: ACM Sigplan Notices. ACM, vol 45, no 5, pp 203–212

Kazempour V, Fedorova A, Alagheband P (2008) Performance implications of cache affinity on multicore processors. Euro-Par 2008–Parallel Processing, pp 151–161

Valiant LG (2011) A bridging model for multi-core computing. J Comput Syst Sci 77(1):154–166MathSciNetCrossRefMATH

10.

Girão G, de Oliveira BC, Soares R, Silva IS (2007) Cache coherency communication cost in a NoC-based MPSoC platform. In: Proceedings of the 20th Annual Conference on Integrated Circuits and Systems Design. ACM, pp 288–293

11.

Ramos S, Hoefler T (2013) Modeling communication in cache-coherent SMP systems: a case-study with Xeon Phi. In: Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing. ACM, pp 97–108

12.

Song F, Moore S, Dongarra J (2009) Analytical modeling for affinity-based thread scheduling on multicore plataforms. In: Symposium on Principles and Practice of Parallel Programming

13.

Terboven C, Schmidl D, Jin H, Reichstein T et al (2008) Data and thread affinity in OpenMP programs. In: Proceedings of the 2008 Workshop on Memory Access on Future Processors: A Solved Problem? ACM, pp 377–384

14.

Anbar A, Serres O, Kayraklioglu E, Badawy A-HA, El-Ghazawi T (2016) Exploiting hierarchical locality in deep parallel architectures. ACM Trans Archit Code Optim TACO 13(2):16

15.

Yang T-F, Lin C-H, Yang C-L (2010) Cache-aware task scheduling on multi-core architecture. In: 2010 International Symposium on VLSI Design Automation and Test (VLSI-DAT). IEEE, pp 139–142

16.

Wang E, Ni F, Chen J, Wang H, Li Y (2016) Cache-aware cooperative task mapping in multi-core real-time systems. Int J Inf Electron Eng 6(2):72

17.

Ghosh M, Nathuji R, Lee M, Schwan K, Lee H-HS (2011) Symbiotic scheduling for shared caches in multi-core systems using memory footprint signature. In: 2011 International Conference on Parallel Processing (ICPP). IEEE, pp 11–20

18.

Saez JC, Shelepov D, Fedorova A, Prieto M (2011) Leveraging workload diversity through os scheduling to maximize performance on single-isa heterogeneous multicore systems. J Parallel Distrib Comput 71(1):114–131CrossRef

19.

Shelepov D, Saez Alcaide JC, Jeffery S, Fedorova A, Perez N, Huang ZF, Blagodurov S, Kumar V (2009) Hass: a scheduler for heterogeneous multicore systems. ACM SIGOPS Oper Syst Rev 43(2):66–75CrossRef

20.

Luo H, Li P, Ding C (2017) Thread data sharing in cache: theory and measurement. In: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, pp 103–115

21.

Luk C-K, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi VJ, Hazelwood K (2005) Pin: building customized program analysis tools with dynamic instrumentation. In: ACM SIGPLAN Notices. ACM, vol 40, no 6, pp 190–200

22.

Mattson RL, Gecsei J, Slutz DR, Traiger IL (1970) Evaluation techniques for storage hierarchies. IBM Syst J 9(2):78–117CrossRefMATH

23.

Shelepov D, Fedorova A (2008) Scheduling on heterogeneous multicore processors using architectural signatures

24.

Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392MathSciNetCrossRefMATH

25.

Ranger C, Raghuraman R, Penmetsa A, Bradski G, Kozyrakis C (2007) Evaluating mapreduce for multi-core and multiprocessor systems. In: IEEE 13th International Symposium on High Performance Computer Architecture, 2007. HPCA 2007. IEEE, pp 13–24

Titel: Hierarchical multicore thread mapping via estimation of remote communication
verfasst von: Hamidreza Khaleghzadeh
Hossein Deldari
Ravi Reddy
Alexey Lastovetsky
Publikationsdatum: 31.10.2017
Verlag: Springer US
Erschienen in: The Journal of Supercomputing / Ausgabe 3/2018
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-017-2176-6

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 3/2018

An analytical method for developing appropriate protection profiles of Instrumentation & Control System for nuclear power plants

An efficient implementation of pairing-based cryptography on MSP430 processor

Cross-layer design and performance analysis for maximizing the network utilization of wireless mesh networks in cloud computing

Analysis of performance measures in cloud-based ubiquitous SaaS CRM project systems

cSketch: a novel framework for capturing cliques from big graph

E2FS: an elastic storage system for cloud computing