nach oben

International Journal of Parallel Programming

Erschienen in:

01.04.2014

Automatic Skeleton-Driven Memory Affinity for Transactional Worklist Applications

verfasst von: Luís Fabrício Wanderley Góes, Christiane Pousa Ribeiro, Márcio Castro, Jean-François Méhaut, Murray Cole, Marcelo Cintra

Erschienen in: International Journal of Parallel Programming | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Memory affinity has become a key element to achieve scalable performance on multi-core platforms. Mechanisms such as thread scheduling, page allocation and cache prefetching are commonly employed to enhance memory affinity which keeps data close to the cores that access it. In particular, software transactional memory (STM) applications exhibit irregular memory access behavior that makes harder to determine which and when data will be needed by each core. Additionally, existing STM runtime systems are decoupled from issues such as thread and memory management. In this paper, we thus propose a skeleton-driven mechanism to improve memory affinity on STM applications that fit the worklist pattern employing a two-level approach. First, it addresses memory affinity in the DRAM level by automatic selecting page allocation policies. Then it employs data prefetching helper threads to improve affinity in the cache level. It relies on a skeleton framework to exploit the application pattern in order to provide automatic memory page allocation and cache prefetching. Our experimental results on the STAMP benchmark suite show that our proposed mechanism can achieve performance improvements of up to 46 %, with an average of 11 %, over a baseline version on two NUMA multi-core machines.

Vorheriger Artikel Erratum to: Accelerating Single Iteration Performance of CUDA-Based 3D Reaction–Diffusion Simulations

Nächster Artikel Editor’s Note

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Remote read latency divided by local read latency (obtained from BenchIT).

Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., Yelick, K.A.: A view of the parallel computing landscape. Commun. ACM 52(10), 56–67 (2009)CrossRef

Awasthi, M., Nellans, D.W., Sudan, K., Balasubramonian, R., Davis, A.: Handling the problems and opportunities posed by multiple on-chip memory controllers. In: PACT, pp. 319–330. ACM (2010). doi:10.1145/1854273.1854314

Baek, W., Minh, C.C., Trautmann, M., Kozyrakis, C., Olukotun, K.: The openTM transactional application programming interface. In: PACT 2007, pp. 376–387. IEEE Computer Society (2007)

Broquedis, F., Aumage, O., Goglin, B., Thibault, S., Wacrenier, P.A., Namyst, R.: Structuring the execution of openMP applications for multicore architectures. In: IPDPS, pp. 1–10. IEEE Computer Society (2010)

Broquedis, F., Clet Ortega, J., Moreaud, S., Furmento, N., Goglin, B., Mercier, G., Thibault, S., Namyst, R.: hwloc: A generic framework for managing hardware affinities in HPC applications. In: PDP, pp. 180–186. IEEE Computer Society (2010)

Castro, M., Góes, L.F.W., Fernandes, L.G., Méhaut, J.F.: Dynamic thread mapping based on machine learning for transactional memory applications. In: Euro-Par, pp. 465–476 (2012)

Castro, M., Góes, L.F.W., Ribeiro, C.P., Cole, M., Cintra, M., Méhaut, J.F.: A machine learning-based approach for thread mapping on transactional memory applications. In: HiPC, pp. 1–10 (2011)

Cole, M.: Algorithmic Skeletons: Structured Management of Parallel Computation. MIT Press & Pitman, London (1989)MATH

Collins, J.D., Wang, H., Tullsen, D.M., Hughes, C., Lee, Y.F., Lavery, D., Shen, J.P.: Speculative Precomputation: Long-Range Prefetching of Delinquent Loads. In: ISCA, pp. 14–25. ACM (2001)

10.

Dalessandro, L., Dice, D., Scott, M., Shavit, N., Spear, M.: Transactional mutex locks. In: Euro-Par, pp. 2–13. Springer (2010)

11.

Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI, pp. 137–150. USENIX Association (2004)

12.

Diener, M., Madruga, F., Rodrigues, E., Alves, M., Schneider, J., Navaux, P., Heiss, H.U.: Evaluating thread placement based on memory access patterns for multi-core processors. In: HPCC, pp. 491–496. IEEE Computer Society (2010)

13.

Felber, P., Fetzer, C., Riegel, T.: Dynamic Performance tuning of word-based software transactional memory. In: PPoPP, pp. 237–246. ACM (2008). doi:10.1145/1345206.1345241

14.

Felber, P., Fetzer, C., Riegel, T., Sturzrehm, H.: Transactifying applications using an open compiler framework. In: TRANSACT. ACM (2007)

15.

Garner, B.D., Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A portable programming interface for performance evaluation on modern processors. Int. J. High Perform. Comput. Appl. 14, 189–204 (2000)CrossRef

16.

Góes, L.F.W.: Automatic skeleton-driven performance optimizations for transactional memory. Ph.D. thesis, School of Informatics, University of Edinburgh, UK (2012)

17.

Goes, L.F.W., Ioannou, N., Xekalakis, P., Cole, M., Cintra, M.: Autotuning skeleton-driven optimizations for transactional worklist applications. IEEE Trans. Parallel Distrib. Syst. 23(12), 2205–2218 (2012)CrossRef

18.

Hong, S., Narayanan, S.H.K., Kandemir, M., Özturk, O.: Process variation aware thread mapping for chip multiprocessors. In: DATE, pp. 821–826. European Design and Automation Association (2009)

19.

Kleen, A.: A NUMA API for Linux. Tech. Rep. Novell-4621437 (2005)

20.

Larus, J., Rajwar, R.: Transactional Memory. Morgan & Claypool Publishers (2006)

21.

McCool, M.: Structured parallel programming with deterministic patterns. In: HotPar, pp. 25–30. USENIX Association (2010)

22.

Minh, C.C., Chung, J., Kozyrakis, C., Olukotun, K.: STAMP: Stanford transactional applications for multi-processing. In: IISWC, pp. 35–46. IEEE Computer Society (2008)

23.

Nikas, K., Anastopoulos, N., Goumas, G., Koziris, N.: Employing transactional memory and helper threads to speedup Dijkstra’s algorithm. In: ICPP, pp. 388–395. IEEE Computer Society (2009)

24.

Pousa Ribeiro, C., Castro, M., Carissimi, A., Méhaut, J.F.: Improving memory affinity of geophysics applications on NUMA platforms using Minas. In: VECPAR. Springer (2010)

25.

Song, Y., Kalogeropulos, S., Tirumalai, P.: Design and implementation of a compiler framework for helper threading on multicore processors. In: PACT, pp. 99–109. IEEE Computer Society (2005)

Titel: Automatic Skeleton-Driven Memory Affinity for Transactional Worklist Applications
verfasst von: Luís Fabrício Wanderley Góes
Christiane Pousa Ribeiro
Márcio Castro
Jean-François Méhaut
Murray Cole
Marcelo Cintra
Publikationsdatum: 01.04.2014
Verlag: Springer US
Erschienen in: International Journal of Parallel Programming / Ausgabe 2/2014
Print ISSN: 0885-7458
Elektronische ISSN: 1573-7640
DOI: https://doi.org/10.1007/s10766-013-0253-x

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 2/2014

The Experience in Designing and Evaluating the High Performance Cluster Netuno

A Case Study of Implementing Supernode Transformations

Erratum to: Accelerating Single Iteration Performance of CUDA-Based 3D Reaction–Diffusion Simulations

Editor’s Note

A Survey of Parallel and Distributed Algorithms for the Steiner Tree Problem

Boosting CUDA Applications with CPU–GPU Hybrid Computing