Skip to main content
Erschienen in: The Journal of Supercomputing 3/2014

01.03.2014

A thread partitioning approach for speculative multithreading

verfasst von: Bin Liu, Yinliang Zhao, Yuxiang Li, Yanjun Sun, Boqin Feng

Erschienen in: The Journal of Supercomputing | Ausgabe 3/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Speculative multithreading (SpMT) is a thread-level automatic parallelization technique, which partitions sequential programs into multithreads to be executed in parallel. This paper presents different thread partitioning strategies for nonloops and loops. For nonloops, we propose a cost estimation based on combined run-time effects of various speculation factors to predict the resulting performance of candidate threads to guide the thread partitioning. For loops, we parallelize all the profitable loops that can potentially offer additional performance benefits by multilevel spawning in loop bodies, loop iterations, and inner loops. Then we select a proper thread boundary located in the front of loop branch instruction to reduce invalid spawning threads that waste core resources. Experimental results show that the proposed approach can obtain a significant increase in speedup and Olden benchmarks reach a performance improvement of 6.62 % on average.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bhowmik A, Franklin M (2002) A general compiler framework for speculative multithreading. In: Proceedings of the fourteenth annual ACM symposium on parallel algorithms and architectures, SPAA’02. ACM, New York, pp 99–108 CrossRef Bhowmik A, Franklin M (2002) A general compiler framework for speculative multithreading. In: Proceedings of the fourteenth annual ACM symposium on parallel algorithms and architectures, SPAA’02. ACM, New York, pp 99–108 CrossRef
2.
Zurück zum Zitat Carlisle MC, Rogers A (1995) Software caching and computation migration in olden. SIGPLAN Not 30(8):29–38 CrossRef Carlisle MC, Rogers A (1995) Software caching and computation migration in olden. SIGPLAN Not 30(8):29–38 CrossRef
3.
Zurück zum Zitat Chen M, Olukotun K (2003) Test: a tracer for extracting speculative threads. In: Proceedings of the international symposium on code generation and optimization: feedback-directed and runtime optimization, CGO’03. IEEE Computer Society, Washington, pp 301–312 Chen M, Olukotun K (2003) Test: a tracer for extracting speculative threads. In: Proceedings of the international symposium on code generation and optimization: feedback-directed and runtime optimization, CGO’03. IEEE Computer Society, Washington, pp 301–312
4.
Zurück zum Zitat Chen Z, Zhao YL, Pan XY, Dong ZY, Gao B, Zhong ZW (2009) An overview of prophet. In: Proceedings of the 9th international conference on algorithms and architectures for parallel processing, ICA3PP’09. Springer, Berlin, pp 396–407 CrossRef Chen Z, Zhao YL, Pan XY, Dong ZY, Gao B, Zhong ZW (2009) An overview of prophet. In: Proceedings of the 9th international conference on algorithms and architectures for parallel processing, ICA3PP’09. Springer, Berlin, pp 396–407 CrossRef
5.
Zurück zum Zitat Dong Z, Zhao Y, Wei Y, Wang X, Song S (2009) Prophet: a speculative multi-threading execution model with architectural support based on cmp. In: Proceedings of the 2009 international conference on scalable computing and communications; eighth international conference on embedded computing, SCALCOM-EMBEDDEDCOM’09. IEEE Computer Society, Washington, pp 103–108 CrossRef Dong Z, Zhao Y, Wei Y, Wang X, Song S (2009) Prophet: a speculative multi-threading execution model with architectural support based on cmp. In: Proceedings of the 2009 international conference on scalable computing and communications; eighth international conference on embedded computing, SCALCOM-EMBEDDEDCOM’09. IEEE Computer Society, Washington, pp 103–108 CrossRef
6.
Zurück zum Zitat Dou J, Cintra M (2007) A compiler cost model for speculative parallelization. ACM Trans Archit Code Optim 4(2) Dou J, Cintra M (2007) A compiler cost model for speculative parallelization. ACM Trans Archit Code Optim 4(2)
7.
Zurück zum Zitat Du ZH, Lim CC, Li XF, Yang C, Zhao Q, Ngai TF (2004) A cost-driven compilation framework for speculative parallelization of sequential programs. SIGPLAN Not 39(6):71–81 CrossRef Du ZH, Lim CC, Li XF, Yang C, Zhao Q, Ngai TF (2004) A cost-driven compilation framework for speculative parallelization of sequential programs. SIGPLAN Not 39(6):71–81 CrossRef
8.
Zurück zum Zitat Gao L, Xue J, Ngai TF (2010) Loop recreation for thread-level speculation on multicore processors. Softw Pract Exp 40(1):45–72 Gao L, Xue J, Ngai TF (2010) Loop recreation for thread-level speculation on multicore processors. Softw Pract Exp 40(1):45–72
9.
Zurück zum Zitat Gao L, Li L, Xue J, Yew PC (2013) Seed: a statically greedy and dynamically adaptive approach for speculative loop execution. IEEE Trans Comput 62(5):1004–1016 CrossRefMathSciNet Gao L, Li L, Xue J, Yew PC (2013) Seed: a statically greedy and dynamically adaptive approach for speculative loop execution. IEEE Trans Comput 62(5):1004–1016 CrossRefMathSciNet
10.
Zurück zum Zitat Hammond L, Hubbert BA, Siu M, Prabhu MK, Chen M, Olukotun K (2000) The stanford hydra cmp. IEEE MICRO 20(2):71–84 CrossRef Hammond L, Hubbert BA, Siu M, Prabhu MK, Chen M, Olukotun K (2000) The stanford hydra cmp. IEEE MICRO 20(2):71–84 CrossRef
11.
Zurück zum Zitat Huang J, Jablin TB, Beard SR, Johnson NP, August DI (2013) Automatically exploiting cross-invocation parallelism using runtime information. pp. ACM SIGMICRO; ACM SIGPLAN; IEEE Computer Society TC–uARCH; IEEE Computer Society; Association for Computing Machinery (ACM), Shenzhen, China Huang J, Jablin TB, Beard SR, Johnson NP, August DI (2013) Automatically exploiting cross-invocation parallelism using runtime information. pp. ACM SIGMICRO; ACM SIGPLAN; IEEE Computer Society TC–uARCH; IEEE Computer Society; Association for Computing Machinery (ACM), Shenzhen, China
12.
Zurück zum Zitat Johnson TA, Eigenmann R, Vijaykumar TN (2004) Min-cut program decomposition for thread-level speculation. SIGPLAN Not 39(6):59–70 CrossRef Johnson TA, Eigenmann R, Vijaykumar TN (2004) Min-cut program decomposition for thread-level speculation. SIGPLAN Not 39(6):59–70 CrossRef
13.
Zurück zum Zitat Johnson TA, Eigenmann R, Vijaykumar TN (2007) Speculative thread decomposition through empirical optimization. In: Proceedings of the 12th ACM SIGPLAN symposium on principles and practice of parallel programming, PPoPP’07. ACM, New York, pp 205–214 CrossRef Johnson TA, Eigenmann R, Vijaykumar TN (2007) Speculative thread decomposition through empirical optimization. In: Proceedings of the 12th ACM SIGPLAN symposium on principles and practice of parallel programming, PPoPP’07. ACM, New York, pp 205–214 CrossRef
14.
Zurück zum Zitat Li Y, Zhao Y, Li M, Zhao Y (2010) A cost estimation based speculative path determination method for speculative thread partitioning. In: 2010 international conference on computer application and system modeling (ICCASM), vol 3, pp 663–668 Li Y, Zhao Y, Li M, Zhao Y (2010) A cost estimation based speculative path determination method for speculative thread partitioning. In: 2010 international conference on computer application and system modeling (ICCASM), vol 3, pp 663–668
15.
Zurück zum Zitat Liu W, Tuck J, Ceze L, Ahn W, Strauss K, Renau J, Torrellas J (2006) Posh: a tls compiler that exploits program structure. In: Proceedings of the eleventh ACM SIGPLAN symposium on principles and practice of parallel programming, PPoPP’06. ACM, New York, pp 158–167 CrossRef Liu W, Tuck J, Ceze L, Ahn W, Strauss K, Renau J, Torrellas J (2006) Posh: a tls compiler that exploits program structure. In: Proceedings of the eleventh ACM SIGPLAN symposium on principles and practice of parallel programming, PPoPP’06. ACM, New York, pp 158–167 CrossRef
16.
Zurück zum Zitat Long S, Fursin G, Franke B (2007) A cost-aware parallel workload allocation approach based on machine learning techniques. In: Proceedings of the 2007 IFIP international conference on network and parallel computing, NPC’07. Springer, Berlin, pp 506–515 Long S, Fursin G, Franke B (2007) A cost-aware parallel workload allocation approach based on machine learning techniques. In: Proceedings of the 2007 IFIP international conference on network and parallel computing, NPC’07. Springer, Berlin, pp 506–515
17.
Zurück zum Zitat Luo Y, Zhai A (2012) Dynamically dispatching speculative threads to improve sequential execution. ACM Trans Archit Code Optim 9(3):13 CrossRef Luo Y, Zhai A (2012) Dynamically dispatching speculative threads to improve sequential execution. ACM Trans Archit Code Optim 9(3):13 CrossRef
18.
Zurück zum Zitat Madriles C, García-Quiñones C, Sánchez J, Marcuello P, González A, Tullsen DM, Wang H, Shen JP (2008) Mitosis: a speculative multithreaded processor based on precomputation slices. IEEE Trans Parallel Distrib Syst 19(7):914–925 CrossRef Madriles C, García-Quiñones C, Sánchez J, Marcuello P, González A, Tullsen DM, Wang H, Shen JP (2008) Mitosis: a speculative multithreaded processor based on precomputation slices. IEEE Trans Parallel Distrib Syst 19(7):914–925 CrossRef
19.
Zurück zum Zitat Ohsawa T, Takagi M, Kawahara S, Matsushita S (2005) Pinot: speculative multi-threading processor architecture exploiting parallelism over a wide range of granularities. In: Proceedings of the 38th annual IEEE/ACM international symposium on microarchitecture, MICRO 38. IEEE Computer Society, Washington, pp 81–92 Ohsawa T, Takagi M, Kawahara S, Matsushita S (2005) Pinot: speculative multi-threading processor architecture exploiting parallelism over a wide range of granularities. In: Proceedings of the 38th annual IEEE/ACM international symposium on microarchitecture, MICRO 38. IEEE Computer Society, Washington, pp 81–92
20.
Zurück zum Zitat Olukotun K, Hammond L, Willey M (1999) Improving the performance of speculatively parallel applications on the hydra cmp. In: Proceedings of the 13th international conference on supercomputing, ICS’99. ACM, New York, pp 21–30 CrossRef Olukotun K, Hammond L, Willey M (1999) Improving the performance of speculatively parallel applications on the hydra cmp. In: Proceedings of the 13th international conference on supercomputing, ICS’99. ACM, New York, pp 21–30 CrossRef
21.
Zurück zum Zitat Ootsu K, Abe T, Yokota T, Baba T (2010) Loop performance improvement for min-cut program decomposition method. In: ICNC, pp 78–87 Ootsu K, Abe T, Yokota T, Baba T (2010) Loop performance improvement for min-cut program decomposition method. In: ICNC, pp 78–87
22.
Zurück zum Zitat Pan XY, Zhao Y, Chen Z, Wang X, Wei Y, Du Y (2009) A thread partitioning method for speculative multithreading. In: ScalCom-EmbeddedCom, pp 285–290 Pan XY, Zhao Y, Chen Z, Wang X, Wei Y, Du Y (2009) A thread partitioning method for speculative multithreading. In: ScalCom-EmbeddedCom, pp 285–290
23.
Zurück zum Zitat Prabhu MK, Olukotun K (2003) Using thread-level speculation to simplify manual parallelization. SIGPLAN Not 38(10):1–12 CrossRef Prabhu MK, Olukotun K (2003) Using thread-level speculation to simplify manual parallelization. SIGPLAN Not 38(10):1–12 CrossRef
24.
Zurück zum Zitat Quiñones CG, Madriles C, Sánchez J, Marcuello P, González A, Tullsen DM (2005) Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices. SIGPLAN Not 40(6):269–279 CrossRef Quiñones CG, Madriles C, Sánchez J, Marcuello P, González A, Tullsen DM (2005) Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices. SIGPLAN Not 40(6):269–279 CrossRef
25.
Zurück zum Zitat Sarkar V, Hennessy J (1986) Partitioning parallel programs for macro-dataflow. In: Proceedings of the 1986 ACM conference on LISP and functional programming, LFP’86. ACM, New York, pp 202–211 CrossRef Sarkar V, Hennessy J (1986) Partitioning parallel programs for macro-dataflow. In: Proceedings of the 1986 ACM conference on LISP and functional programming, LFP’86. ACM, New York, pp 202–211 CrossRef
26.
Zurück zum Zitat Sharafeddine M, Jothi K, Akkary H (2012) Disjoint out-of-order execution processor. ACM Trans Archit Code Optim 9(3):19:1–19:32 CrossRef Sharafeddine M, Jothi K, Akkary H (2012) Disjoint out-of-order execution processor. ACM Trans Archit Code Optim 9(3):19:1–19:32 CrossRef
27.
Zurück zum Zitat Smith JE, Vajapeyam S (1997) Trace processors: moving to fourth-generation microarchitectures. Computer 30(9):68–74 CrossRef Smith JE, Vajapeyam S (1997) Trace processors: moving to fourth-generation microarchitectures. Computer 30(9):68–74 CrossRef
28.
Zurück zum Zitat Sohi GS, Breach SE, Vijaykumar TN (1995) Multiscalar processors. SIGARCH Comput Archit News 23(2):414–425 CrossRef Sohi GS, Breach SE, Vijaykumar TN (1995) Multiscalar processors. SIGARCH Comput Archit News 23(2):414–425 CrossRef
29.
Zurück zum Zitat Sun XH, Chen Y (2010) Reevaluating Amdahl’s law in the multicore era. J Parallel Distrib Comput 70(2):183–188 CrossRefMATH Sun XH, Chen Y (2010) Reevaluating Amdahl’s law in the multicore era. J Parallel Distrib Comput 70(2):183–188 CrossRefMATH
30.
Zurück zum Zitat Tang X, Wang J, Theobald KB, Gao GR (1997) Thread partitioning and scheduling based on cost model. In: Proceedings of the ninth annual ACM symposium on parallel algorithms and architectures, SPAA’97. ACM, New York, pp 272–281 CrossRef Tang X, Wang J, Theobald KB, Gao GR (1997) Thread partitioning and scheduling based on cost model. In: Proceedings of the ninth annual ACM symposium on parallel algorithms and architectures, SPAA’97. ACM, New York, pp 272–281 CrossRef
31.
Zurück zum Zitat Vijaykumar TN, Sohi GS (1998) Task selection for a multiscalar processor. In: Proceedings of the 31st annual ACM/IEEE international symposium on microarchitecture, MICRO 31. IEEE Computer Society, Los Alamitos, pp 81–92 CrossRef Vijaykumar TN, Sohi GS (1998) Task selection for a multiscalar processor. In: Proceedings of the 31st annual ACM/IEEE international symposium on microarchitecture, MICRO 31. IEEE Computer Society, Los Alamitos, pp 81–92 CrossRef
32.
Zurück zum Zitat Wang S, Dai X, Yellajyosula KS, Zhai A, Yew PC (2006) Loop selection for thread-level speculation. In: Proceedings of the 18th international conference on languages and compilers for parallel computing, LCPC’05. Springer, Berlin, pp 289–303 CrossRef Wang S, Dai X, Yellajyosula KS, Zhai A, Yew PC (2006) Loop selection for thread-level speculation. In: Proceedings of the 18th international conference on languages and compilers for parallel computing, LCPC’05. Springer, Berlin, pp 289–303 CrossRef
33.
Zurück zum Zitat Wang S, Yew PC, Zhai A (2012) Code transformations for enhancing the performance of speculatively parallel threads. 5 Toh Tuck Link, Singapore, 596224 Wang S, Yew PC, Zhai A (2012) Code transformations for enhancing the performance of speculatively parallel threads. 5 Toh Tuck Link, Singapore, 596224
34.
Zurück zum Zitat Wilson R, French R, Wilson C, Amarasinghe S, Anderson J, Tjiang S, Liao S, Tseng C, Hall M, Lam M, Hennessy J (1994) The suif compiler system: a parallelizing and optimizing research compiler. Tech rep, Stanford, CA, USA Wilson R, French R, Wilson C, Amarasinghe S, Anderson J, Tjiang S, Liao S, Tseng C, Hall M, Lam M, Hennessy J (1994) The suif compiler system: a parallelizing and optimizing research compiler. Tech rep, Stanford, CA, USA
35.
Zurück zum Zitat Zhai A, Colohan CB, Steffan JG, Mowry TC (2004) Compiler optimization of memory-resident value communication between speculative threads. In: Proceedings of the international symposium on code generation and optimization: feedback-directed and runtime optimization, CGO’04. IEEE Computer Society, Washington, pp 39–51 Zhai A, Colohan CB, Steffan JG, Mowry TC (2004) Compiler optimization of memory-resident value communication between speculative threads. In: Proceedings of the international symposium on code generation and optimization: feedback-directed and runtime optimization, CGO’04. IEEE Computer Society, Washington, pp 39–51
36.
Zurück zum Zitat Zheng B, Tsai JY, Zhang BY, Chen T, Huang B, Li JH, Ding YH, Liang J, Zhen Y, Yew PC, Zhu CQ (2000) Designing the Agassiz compiler for concurrent multithreaded architectures. In: Proceedings of the 12th international workshop on languages and compilers for parallel computing, LCPC’99. Springer, London, pp 380–398 CrossRef Zheng B, Tsai JY, Zhang BY, Chen T, Huang B, Li JH, Ding YH, Liang J, Zhen Y, Yew PC, Zhu CQ (2000) Designing the Agassiz compiler for concurrent multithreaded architectures. In: Proceedings of the 12th international workshop on languages and compilers for parallel computing, LCPC’99. Springer, London, pp 380–398 CrossRef
37.
Zurück zum Zitat Zhou H (2005) Dual-core execution: building a highly scalable single-thread instruction window. In: Proceedings of the 14th international conference on parallel architectures and compilation techniques, PACT’05. IEEE Computer Society, Washington, pp 231–242 CrossRef Zhou H (2005) Dual-core execution: building a highly scalable single-thread instruction window. In: Proceedings of the 14th international conference on parallel architectures and compilation techniques, PACT’05. IEEE Computer Society, Washington, pp 231–242 CrossRef
Metadaten
Titel
A thread partitioning approach for speculative multithreading
verfasst von
Bin Liu
Yinliang Zhao
Yuxiang Li
Yanjun Sun
Boqin Feng
Publikationsdatum
01.03.2014
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 3/2014
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-013-1000-1

Weitere Artikel der Ausgabe 3/2014

The Journal of Supercomputing 3/2014 Zur Ausgabe

Premium Partner