Skip to main content
Erschienen in: The Journal of Supercomputing 3/2018

31.10.2017

Hierarchical multicore thread mapping via estimation of remote communication

verfasst von: Hamidreza Khaleghzadeh, Hossein Deldari, Ravi Reddy, Alexey Lastovetsky

Erschienen in: The Journal of Supercomputing | Ausgabe 3/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Affinity-aware thread mapping is a method to effectively exploit cache resources in multicore processors. We propose an affinity- and architecture-aware thread mapping technique which maximizes data reuse and minimizes remote communications and cache coherency costs of multi-threaded applications. It consists of three main components: Data Sharing Estimator, Affine Mapping Finder and Maximum Speedup Predictor. Data Sharing Estimator creates application-specific data dependency signatures used by Affine Mapping Finder to determine the appropriate thread mapping of application for a given architecture. To prevent excessive thread migration, Maximum Speedup Predictor estimates the speedup of the obtained mapping and ignores it if it causes no significant performance improvement. The proposed framework is evaluated using Phoenix benchmark suite on two different multicore architectures. The proposed thread mapping approach gives 25% improvement in performance compared to default Linux scheduler. We also elucidate that affinity-based thread mapping approaches, which only consider the number of shared blocks, are not appropriate enough to accurately estimate data dependency between threads and determine the proper thread mapping.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Gepner P, Kowalik MF (2006) Multi-core processors: new way to achieve high system performance. In: International Symposium on Parallel Computing in Electrical Engineering, 2006. PAR ELEC 2006. IEEE, pp 9–13 Gepner P, Kowalik MF (2006) Multi-core processors: new way to achieve high system performance. In: International Symposium on Parallel Computing in Electrical Engineering, 2006. PAR ELEC 2006. IEEE, pp 9–13
2.
Zurück zum Zitat Shukla SK, Murthy C, Chande P (2015) A survey of approaches used in parallel architectures and multi-core processors, for performance improvement. In: Progress in Systems Engineering. Springer, pp 537–545 Shukla SK, Murthy C, Chande P (2015) A survey of approaches used in parallel architectures and multi-core processors, for performance improvement. In: Progress in Systems Engineering. Springer, pp 537–545
3.
Zurück zum Zitat Khammassi N, Le Lann J-C (2014) Design and implementation of a cache hierarchy-aware task scheduling for parallel loops on multicore architectures. PDCTA, Sydney, Australia Khammassi N, Le Lann J-C (2014) Design and implementation of a cache hierarchy-aware task scheduling for parallel loops on multicore architectures. PDCTA, Sydney, Australia
4.
Zurück zum Zitat Zhang L, Liu Y, Wang R, Qian D (2014) Lightweight dynamic partitioning for last-level cache of multicore processor on real system. J Supercomput 69(2):547–560CrossRef Zhang L, Liu Y, Wang R, Qian D (2014) Lightweight dynamic partitioning for last-level cache of multicore processor on real system. J Supercomput 69(2):547–560CrossRef
5.
Zurück zum Zitat Sun Z, Wang R, Zhang L, Li Q, Chen L, Wu J, Liu Y (2012) Cache-aware scheduling for energy efficiency on multi-processors. In: 2012 International Conference on Computer Distributed Control and Intelligent Environmental Monitoring (CDCIEM). IEEE, pp 182–186 Sun Z, Wang R, Zhang L, Li Q, Chen L, Wu J, Liu Y (2012) Cache-aware scheduling for energy efficiency on multi-processors. In: 2012 International Conference on Computer Distributed Control and Intelligent Environmental Monitoring (CDCIEM). IEEE, pp 182–186
6.
Zurück zum Zitat Ding W, Zhang Y, Kandemir M, Srinivas J, Yedlapalli P (2013) Locality-aware mapping and scheduling for multicores. In: 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, pp 1–12 Ding W, Zhang Y, Kandemir M, Srinivas J, Yedlapalli P (2013) Locality-aware mapping and scheduling for multicores. In: 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, pp 1–12
7.
Zurück zum Zitat Zhang EZ, Jiang Y, Shen X (2010) Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? In: ACM Sigplan Notices. ACM, vol 45, no 5, pp 203–212 Zhang EZ, Jiang Y, Shen X (2010) Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? In: ACM Sigplan Notices. ACM, vol 45, no 5, pp 203–212
8.
Zurück zum Zitat Kazempour V, Fedorova A, Alagheband P (2008) Performance implications of cache affinity on multicore processors. Euro-Par 2008–Parallel Processing, pp 151–161 Kazempour V, Fedorova A, Alagheband P (2008) Performance implications of cache affinity on multicore processors. Euro-Par 2008–Parallel Processing, pp 151–161
10.
Zurück zum Zitat Girão G, de Oliveira BC, Soares R, Silva IS (2007) Cache coherency communication cost in a NoC-based MPSoC platform. In: Proceedings of the 20th Annual Conference on Integrated Circuits and Systems Design. ACM, pp 288–293 Girão G, de Oliveira BC, Soares R, Silva IS (2007) Cache coherency communication cost in a NoC-based MPSoC platform. In: Proceedings of the 20th Annual Conference on Integrated Circuits and Systems Design. ACM, pp 288–293
11.
Zurück zum Zitat Ramos S, Hoefler T (2013) Modeling communication in cache-coherent SMP systems: a case-study with Xeon Phi. In: Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing. ACM, pp 97–108 Ramos S, Hoefler T (2013) Modeling communication in cache-coherent SMP systems: a case-study with Xeon Phi. In: Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing. ACM, pp 97–108
12.
Zurück zum Zitat Song F, Moore S, Dongarra J (2009) Analytical modeling for affinity-based thread scheduling on multicore plataforms. In: Symposium on Principles and Practice of Parallel Programming Song F, Moore S, Dongarra J (2009) Analytical modeling for affinity-based thread scheduling on multicore plataforms. In: Symposium on Principles and Practice of Parallel Programming
13.
Zurück zum Zitat Terboven C, Schmidl D, Jin H, Reichstein T et al (2008) Data and thread affinity in OpenMP programs. In: Proceedings of the 2008 Workshop on Memory Access on Future Processors: A Solved Problem? ACM, pp 377–384 Terboven C, Schmidl D, Jin H, Reichstein T et al (2008) Data and thread affinity in OpenMP programs. In: Proceedings of the 2008 Workshop on Memory Access on Future Processors: A Solved Problem? ACM, pp 377–384
14.
Zurück zum Zitat Anbar A, Serres O, Kayraklioglu E, Badawy A-HA, El-Ghazawi T (2016) Exploiting hierarchical locality in deep parallel architectures. ACM Trans Archit Code Optim TACO 13(2):16 Anbar A, Serres O, Kayraklioglu E, Badawy A-HA, El-Ghazawi T (2016) Exploiting hierarchical locality in deep parallel architectures. ACM Trans Archit Code Optim TACO 13(2):16
15.
Zurück zum Zitat Yang T-F, Lin C-H, Yang C-L (2010) Cache-aware task scheduling on multi-core architecture. In: 2010 International Symposium on VLSI Design Automation and Test (VLSI-DAT). IEEE, pp 139–142 Yang T-F, Lin C-H, Yang C-L (2010) Cache-aware task scheduling on multi-core architecture. In: 2010 International Symposium on VLSI Design Automation and Test (VLSI-DAT). IEEE, pp 139–142
16.
Zurück zum Zitat Wang E, Ni F, Chen J, Wang H, Li Y (2016) Cache-aware cooperative task mapping in multi-core real-time systems. Int J Inf Electron Eng 6(2):72 Wang E, Ni F, Chen J, Wang H, Li Y (2016) Cache-aware cooperative task mapping in multi-core real-time systems. Int J Inf Electron Eng 6(2):72
17.
Zurück zum Zitat Ghosh M, Nathuji R, Lee M, Schwan K, Lee H-HS (2011) Symbiotic scheduling for shared caches in multi-core systems using memory footprint signature. In: 2011 International Conference on Parallel Processing (ICPP). IEEE, pp 11–20 Ghosh M, Nathuji R, Lee M, Schwan K, Lee H-HS (2011) Symbiotic scheduling for shared caches in multi-core systems using memory footprint signature. In: 2011 International Conference on Parallel Processing (ICPP). IEEE, pp 11–20
18.
Zurück zum Zitat Saez JC, Shelepov D, Fedorova A, Prieto M (2011) Leveraging workload diversity through os scheduling to maximize performance on single-isa heterogeneous multicore systems. J Parallel Distrib Comput 71(1):114–131CrossRef Saez JC, Shelepov D, Fedorova A, Prieto M (2011) Leveraging workload diversity through os scheduling to maximize performance on single-isa heterogeneous multicore systems. J Parallel Distrib Comput 71(1):114–131CrossRef
19.
Zurück zum Zitat Shelepov D, Saez Alcaide JC, Jeffery S, Fedorova A, Perez N, Huang ZF, Blagodurov S, Kumar V (2009) Hass: a scheduler for heterogeneous multicore systems. ACM SIGOPS Oper Syst Rev 43(2):66–75CrossRef Shelepov D, Saez Alcaide JC, Jeffery S, Fedorova A, Perez N, Huang ZF, Blagodurov S, Kumar V (2009) Hass: a scheduler for heterogeneous multicore systems. ACM SIGOPS Oper Syst Rev 43(2):66–75CrossRef
20.
Zurück zum Zitat Luo H, Li P, Ding C (2017) Thread data sharing in cache: theory and measurement. In: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, pp 103–115 Luo H, Li P, Ding C (2017) Thread data sharing in cache: theory and measurement. In: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, pp 103–115
21.
Zurück zum Zitat Luk C-K, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi VJ, Hazelwood K (2005) Pin: building customized program analysis tools with dynamic instrumentation. In: ACM SIGPLAN Notices. ACM, vol 40, no 6, pp 190–200 Luk C-K, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi VJ, Hazelwood K (2005) Pin: building customized program analysis tools with dynamic instrumentation. In: ACM SIGPLAN Notices. ACM, vol 40, no 6, pp 190–200
22.
Zurück zum Zitat Mattson RL, Gecsei J, Slutz DR, Traiger IL (1970) Evaluation techniques for storage hierarchies. IBM Syst J 9(2):78–117CrossRefMATH Mattson RL, Gecsei J, Slutz DR, Traiger IL (1970) Evaluation techniques for storage hierarchies. IBM Syst J 9(2):78–117CrossRefMATH
23.
Zurück zum Zitat Shelepov D, Fedorova A (2008) Scheduling on heterogeneous multicore processors using architectural signatures Shelepov D, Fedorova A (2008) Scheduling on heterogeneous multicore processors using architectural signatures
24.
Zurück zum Zitat Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392MathSciNetCrossRefMATH Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392MathSciNetCrossRefMATH
25.
Zurück zum Zitat Ranger C, Raghuraman R, Penmetsa A, Bradski G, Kozyrakis C (2007) Evaluating mapreduce for multi-core and multiprocessor systems. In: IEEE 13th International Symposium on High Performance Computer Architecture, 2007. HPCA 2007. IEEE, pp 13–24 Ranger C, Raghuraman R, Penmetsa A, Bradski G, Kozyrakis C (2007) Evaluating mapreduce for multi-core and multiprocessor systems. In: IEEE 13th International Symposium on High Performance Computer Architecture, 2007. HPCA 2007. IEEE, pp 13–24
Metadaten
Titel
Hierarchical multicore thread mapping via estimation of remote communication
verfasst von
Hamidreza Khaleghzadeh
Hossein Deldari
Ravi Reddy
Alexey Lastovetsky
Publikationsdatum
31.10.2017
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 3/2018
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-017-2176-6

Weitere Artikel der Ausgabe 3/2018

The Journal of Supercomputing 3/2018 Zur Ausgabe