Skip to main content
Erschienen in: International Journal of Parallel Programming 2/2014

01.04.2014

Automatic Skeleton-Driven Memory Affinity for Transactional Worklist Applications

verfasst von: Luís Fabrício Wanderley Góes, Christiane Pousa Ribeiro, Márcio Castro, Jean-François Méhaut, Murray Cole, Marcelo Cintra

Erschienen in: International Journal of Parallel Programming | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Memory affinity has become a key element to achieve scalable performance on multi-core platforms. Mechanisms such as thread scheduling, page allocation and cache prefetching are commonly employed to enhance memory affinity which keeps data close to the cores that access it. In particular, software transactional memory (STM) applications exhibit irregular memory access behavior that makes harder to determine which and when data will be needed by each core. Additionally, existing STM runtime systems are decoupled from issues such as thread and memory management. In this paper, we thus propose a skeleton-driven mechanism to improve memory affinity on STM applications that fit the worklist pattern employing a two-level approach. First, it addresses memory affinity in the DRAM level by automatic selecting page allocation policies. Then it employs data prefetching helper threads to improve affinity in the cache level. It relies on a skeleton framework to exploit the application pattern in order to provide automatic memory page allocation and cache prefetching. Our experimental results on the STAMP benchmark suite show that our proposed mechanism can achieve performance improvements of up to 46 %, with an average of 11 %, over a baseline version on two NUMA multi-core machines.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
Remote read latency divided by local read latency (obtained from BenchIT).
 
Literatur
1.
Zurück zum Zitat Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., Yelick, K.A.: A view of the parallel computing landscape. Commun. ACM 52(10), 56–67 (2009)CrossRef Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., Yelick, K.A.: A view of the parallel computing landscape. Commun. ACM 52(10), 56–67 (2009)CrossRef
2.
Zurück zum Zitat Awasthi, M., Nellans, D.W., Sudan, K., Balasubramonian, R., Davis, A.: Handling the problems and opportunities posed by multiple on-chip memory controllers. In: PACT, pp. 319–330. ACM (2010). doi:10.1145/1854273.1854314 Awasthi, M., Nellans, D.W., Sudan, K., Balasubramonian, R., Davis, A.: Handling the problems and opportunities posed by multiple on-chip memory controllers. In: PACT, pp. 319–330. ACM (2010). doi:10.​1145/​1854273.​1854314
3.
Zurück zum Zitat Baek, W., Minh, C.C., Trautmann, M., Kozyrakis, C., Olukotun, K.: The openTM transactional application programming interface. In: PACT 2007, pp. 376–387. IEEE Computer Society (2007) Baek, W., Minh, C.C., Trautmann, M., Kozyrakis, C., Olukotun, K.: The openTM transactional application programming interface. In: PACT 2007, pp. 376–387. IEEE Computer Society (2007)
4.
Zurück zum Zitat Broquedis, F., Aumage, O., Goglin, B., Thibault, S., Wacrenier, P.A., Namyst, R.: Structuring the execution of openMP applications for multicore architectures. In: IPDPS, pp. 1–10. IEEE Computer Society (2010) Broquedis, F., Aumage, O., Goglin, B., Thibault, S., Wacrenier, P.A., Namyst, R.: Structuring the execution of openMP applications for multicore architectures. In: IPDPS, pp. 1–10. IEEE Computer Society (2010)
5.
Zurück zum Zitat Broquedis, F., Clet Ortega, J., Moreaud, S., Furmento, N., Goglin, B., Mercier, G., Thibault, S., Namyst, R.: hwloc: A generic framework for managing hardware affinities in HPC applications. In: PDP, pp. 180–186. IEEE Computer Society (2010) Broquedis, F., Clet Ortega, J., Moreaud, S., Furmento, N., Goglin, B., Mercier, G., Thibault, S., Namyst, R.: hwloc: A generic framework for managing hardware affinities in HPC applications. In: PDP, pp. 180–186. IEEE Computer Society (2010)
6.
Zurück zum Zitat Castro, M., Góes, L.F.W., Fernandes, L.G., Méhaut, J.F.: Dynamic thread mapping based on machine learning for transactional memory applications. In: Euro-Par, pp. 465–476 (2012) Castro, M., Góes, L.F.W., Fernandes, L.G., Méhaut, J.F.: Dynamic thread mapping based on machine learning for transactional memory applications. In: Euro-Par, pp. 465–476 (2012)
7.
Zurück zum Zitat Castro, M., Góes, L.F.W., Ribeiro, C.P., Cole, M., Cintra, M., Méhaut, J.F.: A machine learning-based approach for thread mapping on transactional memory applications. In: HiPC, pp. 1–10 (2011) Castro, M., Góes, L.F.W., Ribeiro, C.P., Cole, M., Cintra, M., Méhaut, J.F.: A machine learning-based approach for thread mapping on transactional memory applications. In: HiPC, pp. 1–10 (2011)
8.
Zurück zum Zitat Cole, M.: Algorithmic Skeletons: Structured Management of Parallel Computation. MIT Press & Pitman, London (1989)MATH Cole, M.: Algorithmic Skeletons: Structured Management of Parallel Computation. MIT Press & Pitman, London (1989)MATH
9.
Zurück zum Zitat Collins, J.D., Wang, H., Tullsen, D.M., Hughes, C., Lee, Y.F., Lavery, D., Shen, J.P.: Speculative Precomputation: Long-Range Prefetching of Delinquent Loads. In: ISCA, pp. 14–25. ACM (2001) Collins, J.D., Wang, H., Tullsen, D.M., Hughes, C., Lee, Y.F., Lavery, D., Shen, J.P.: Speculative Precomputation: Long-Range Prefetching of Delinquent Loads. In: ISCA, pp. 14–25. ACM (2001)
10.
Zurück zum Zitat Dalessandro, L., Dice, D., Scott, M., Shavit, N., Spear, M.: Transactional mutex locks. In: Euro-Par, pp. 2–13. Springer (2010) Dalessandro, L., Dice, D., Scott, M., Shavit, N., Spear, M.: Transactional mutex locks. In: Euro-Par, pp. 2–13. Springer (2010)
11.
Zurück zum Zitat Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI, pp. 137–150. USENIX Association (2004) Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI, pp. 137–150. USENIX Association (2004)
12.
Zurück zum Zitat Diener, M., Madruga, F., Rodrigues, E., Alves, M., Schneider, J., Navaux, P., Heiss, H.U.: Evaluating thread placement based on memory access patterns for multi-core processors. In: HPCC, pp. 491–496. IEEE Computer Society (2010) Diener, M., Madruga, F., Rodrigues, E., Alves, M., Schneider, J., Navaux, P., Heiss, H.U.: Evaluating thread placement based on memory access patterns for multi-core processors. In: HPCC, pp. 491–496. IEEE Computer Society (2010)
13.
Zurück zum Zitat Felber, P., Fetzer, C., Riegel, T.: Dynamic Performance tuning of word-based software transactional memory. In: PPoPP, pp. 237–246. ACM (2008). doi:10.1145/1345206.1345241 Felber, P., Fetzer, C., Riegel, T.: Dynamic Performance tuning of word-based software transactional memory. In: PPoPP, pp. 237–246. ACM (2008). doi:10.​1145/​1345206.​1345241
14.
Zurück zum Zitat Felber, P., Fetzer, C., Riegel, T., Sturzrehm, H.: Transactifying applications using an open compiler framework. In: TRANSACT. ACM (2007) Felber, P., Fetzer, C., Riegel, T., Sturzrehm, H.: Transactifying applications using an open compiler framework. In: TRANSACT. ACM (2007)
15.
Zurück zum Zitat Garner, B.D., Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A portable programming interface for performance evaluation on modern processors. Int. J. High Perform. Comput. Appl. 14, 189–204 (2000)CrossRef Garner, B.D., Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A portable programming interface for performance evaluation on modern processors. Int. J. High Perform. Comput. Appl. 14, 189–204 (2000)CrossRef
16.
Zurück zum Zitat Góes, L.F.W.: Automatic skeleton-driven performance optimizations for transactional memory. Ph.D. thesis, School of Informatics, University of Edinburgh, UK (2012) Góes, L.F.W.: Automatic skeleton-driven performance optimizations for transactional memory. Ph.D. thesis, School of Informatics, University of Edinburgh, UK (2012)
17.
Zurück zum Zitat Goes, L.F.W., Ioannou, N., Xekalakis, P., Cole, M., Cintra, M.: Autotuning skeleton-driven optimizations for transactional worklist applications. IEEE Trans. Parallel Distrib. Syst. 23(12), 2205–2218 (2012)CrossRef Goes, L.F.W., Ioannou, N., Xekalakis, P., Cole, M., Cintra, M.: Autotuning skeleton-driven optimizations for transactional worklist applications. IEEE Trans. Parallel Distrib. Syst. 23(12), 2205–2218 (2012)CrossRef
18.
Zurück zum Zitat Hong, S., Narayanan, S.H.K., Kandemir, M., Özturk, O.: Process variation aware thread mapping for chip multiprocessors. In: DATE, pp. 821–826. European Design and Automation Association (2009) Hong, S., Narayanan, S.H.K., Kandemir, M., Özturk, O.: Process variation aware thread mapping for chip multiprocessors. In: DATE, pp. 821–826. European Design and Automation Association (2009)
19.
Zurück zum Zitat Kleen, A.: A NUMA API for Linux. Tech. Rep. Novell-4621437 (2005) Kleen, A.: A NUMA API for Linux. Tech. Rep. Novell-4621437 (2005)
20.
Zurück zum Zitat Larus, J., Rajwar, R.: Transactional Memory. Morgan & Claypool Publishers (2006) Larus, J., Rajwar, R.: Transactional Memory. Morgan & Claypool Publishers (2006)
21.
Zurück zum Zitat McCool, M.: Structured parallel programming with deterministic patterns. In: HotPar, pp. 25–30. USENIX Association (2010) McCool, M.: Structured parallel programming with deterministic patterns. In: HotPar, pp. 25–30. USENIX Association (2010)
22.
Zurück zum Zitat Minh, C.C., Chung, J., Kozyrakis, C., Olukotun, K.: STAMP: Stanford transactional applications for multi-processing. In: IISWC, pp. 35–46. IEEE Computer Society (2008) Minh, C.C., Chung, J., Kozyrakis, C., Olukotun, K.: STAMP: Stanford transactional applications for multi-processing. In: IISWC, pp. 35–46. IEEE Computer Society (2008)
23.
Zurück zum Zitat Nikas, K., Anastopoulos, N., Goumas, G., Koziris, N.: Employing transactional memory and helper threads to speedup Dijkstra’s algorithm. In: ICPP, pp. 388–395. IEEE Computer Society (2009) Nikas, K., Anastopoulos, N., Goumas, G., Koziris, N.: Employing transactional memory and helper threads to speedup Dijkstra’s algorithm. In: ICPP, pp. 388–395. IEEE Computer Society (2009)
24.
Zurück zum Zitat Pousa Ribeiro, C., Castro, M., Carissimi, A., Méhaut, J.F.: Improving memory affinity of geophysics applications on NUMA platforms using Minas. In: VECPAR. Springer (2010) Pousa Ribeiro, C., Castro, M., Carissimi, A., Méhaut, J.F.: Improving memory affinity of geophysics applications on NUMA platforms using Minas. In: VECPAR. Springer (2010)
25.
Zurück zum Zitat Song, Y., Kalogeropulos, S., Tirumalai, P.: Design and implementation of a compiler framework for helper threading on multicore processors. In: PACT, pp. 99–109. IEEE Computer Society (2005) Song, Y., Kalogeropulos, S., Tirumalai, P.: Design and implementation of a compiler framework for helper threading on multicore processors. In: PACT, pp. 99–109. IEEE Computer Society (2005)
Metadaten
Titel
Automatic Skeleton-Driven Memory Affinity for Transactional Worklist Applications
verfasst von
Luís Fabrício Wanderley Góes
Christiane Pousa Ribeiro
Márcio Castro
Jean-François Méhaut
Murray Cole
Marcelo Cintra
Publikationsdatum
01.04.2014
Verlag
Springer US
Erschienen in
International Journal of Parallel Programming / Ausgabe 2/2014
Print ISSN: 0885-7458
Elektronische ISSN: 1573-7640
DOI
https://doi.org/10.1007/s10766-013-0253-x

Weitere Artikel der Ausgabe 2/2014

International Journal of Parallel Programming 2/2014 Zur Ausgabe

Announcement

Editor’s Note