Skip to main content

2018 | OriginalPaper | Buchkapitel

3. Symbolic Parallelization

verfasst von : Alexandru-Petru Tanase, Frank Hannig, Jürgen Teich

Erschienen in: Symbolic Parallelization of Nested Loop Programs

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This chapter presents solutions in the form of compile-time transformations in the polyhedron model for adaptive parallel execution of loop programs on processor arrays. By analytical means, we show for the first time that it is possible to jointly tile and schedule a given loop nest with uniform data dependencies symbolically using: (1) outer loop parallelization for scenarios that are I/O-bounded, (2) inner loop parallelization for scenarios that are memory-bounded. These transformations are essential for self-organizing computing paradigms such as invasive computing, because the claimed region of processors, whose shape and size determine the forms of tiles during parallelization, is not known at compile time. In this realm, it is shown that it is indeed possible to derive parameterized latency-optimal schedules statically by proposing a two-step approach: First, the iteration space of a loop program is tiled symbolically into orthotopes of parametrized extensions. Subsequently, the resulting tiled program is also scheduled symbolically, resulting in a set of latency-optimal parameterized schedule candidates. At runtime, once the size of the processor array becomes known, simple comparisons of latency-determining expressions finally steer which of these schedules will be dynamically selected and the corresponding program configuration executed on the resulting processor array so to avoid any further runtime optimization or expensive recompilation. Having symbolic schedules allows the number of processors to execute on to remain undetermined until runtime. This proposed compile/runtime hybrid approach significantly reduces the runtime overhead compared to dynamic or just-in-time compilation.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
As introduced in Sect. 2.​3.​3, a Uniform Dependence Algorithm (UDA) consists of a set of G quantified equations of the form \(S_i:\quad \forall I\in \mathcal {I}_i : x_i[I] =\mathcal {F}_i(\ldots ,x_j[I-d_{ji}],\ldots ).\)
 
2
In the following, we do not necessarily assume perfect tilings, see Figs. 3.1 and 3.2 as examples.
 
3
For more details on how each dependency d ∈ D is described in an UDA, we refer to Sect. 2.​3.​3 and Eq. (2.​1), respectively.
 
4
For simplicity, we assume that a single iteration of a given loop nest may be executed in a unit of time (clock cycle). The generalization for multi-cycle instructions and non-atomic iterations under resource constraints is presented in [WTHT16].
 
5
We assume for regularity of a schedule that each tile is scheduled equally in exactly det(P) time steps, even if the covering of the union of the iteration spaces of all G equations of a given UDA might lead to some non-perfectly filled tiles.
 
6
Again, we assume for simplicity that a single iteration of a given loop may be executed in a unit of time. We envision to generalize this, as part of future work (see Sect. 6.​2).
 
7
The length of the feedback data register is configurable at runtime, for more details we refer to Sect. 2.​2.
 
8
Loops with affine data dependencies [TTZ97a, TT02] or certain classes of dynamic data dependencies [HRDT08] may be converted first in this form, e. g., by localization of data dependencies [TR91] or hiding data-value dependent computations by data-dependent functions [HRDT08, Han09].
 
Literatur
[AK08]
Zurück zum Zitat Aditya, S., & Kathail, V. (2008). Algorithmic synthesis using PICO: An integrated framework for application engine synthesis and verification from high level C algorithms. In P. Coussy & A. Morawiec (Eds.), High-level synthesis: From algorithm to digital circuit (1st ed., Chap. 4, pp. 53–74). Berlin: Springer. Aditya, S., & Kathail, V. (2008). Algorithmic synthesis using PICO: An integrated framework for application engine synthesis and verification from high level C algorithms. In P. Coussy & A. Morawiec (Eds.), High-level synthesis: From algorithm to digital circuit (1st ed., Chap. 4, pp. 53–74). Berlin: Springer.
[AKN95]
Zurück zum Zitat Agarwal, A., Kranz, D. A., & Natarajan, V. (1995). Automatic partitioning of parallel loops and data arrays for distributed shared-memory multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 6(9), 943–962.CrossRef Agarwal, A., Kranz, D. A., & Natarajan, V. (1995). Automatic partitioning of parallel loops and data arrays for distributed shared-memory multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 6(9), 943–962.CrossRef
[BDRR96]
Zurück zum Zitat Boulet, P., Darte, A., Risset, T., & Robert, Y. (1996). (pen)-ultimate tiling? Integration, the VLSI Journal, 17, 33–51.CrossRef Boulet, P., Darte, A., Risset, T., & Robert, Y. (1996). (pen)-ultimate tiling? Integration, the VLSI Journal, 17, 33–51.CrossRef
[BHRS08]
Zurück zum Zitat Bondhugula, U., Hartono, A., Ramanujam, J., & Sadayappan, P. (2008). A practical automatic polyhedral parallelizer and locality optimizer. ACM SIGPLAN Notices, 43(6), 101–113.CrossRef Bondhugula, U., Hartono, A., Ramanujam, J., & Sadayappan, P. (2008). A practical automatic polyhedral parallelizer and locality optimizer. ACM SIGPLAN Notices, 43(6), 101–113.CrossRef
[BHTPA11]
Zurück zum Zitat Boppu, S., Hannig, F., Teich, J., & Perez-Andrade, R. (2011). Towards symbolic run-time reconfiguration in tightly-coupled processor arrays. In 2011 International Conference on Reconfigurable Computing and FPGAs (ReConFig) (pp. 392–397). New York: IEEE.CrossRef Boppu, S., Hannig, F., Teich, J., & Perez-Andrade, R. (2011). Towards symbolic run-time reconfiguration in tightly-coupled processor arrays. In 2011 International Conference on Reconfigurable Computing and FPGAs (ReConFig) (pp. 392–397). New York: IEEE.CrossRef
[Bop15]
Zurück zum Zitat Boppu, S. (2015). Code Generation for Tightly Coupled Processor Arrays. Dissertation, Hardware/Software Co-Design, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany. Boppu, S. (2015). Code Generation for Tightly Coupled Processor Arrays. Dissertation, Hardware/Software Co-Design, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany.
[BRS07]
Zurück zum Zitat Bondhugula, U., Ramanujam, J., & Sadayappan, P. (2007). Automatic mapping of nested loops to FPGAs. In PPoPP’07 Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (pp. 101–111). New York, NY, USA: ACM. Bondhugula, U., Ramanujam, J., & Sadayappan, P. (2007). Automatic mapping of nested loops to FPGAs. In PPoPP’07 Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (pp. 101–111). New York, NY, USA: ACM.
[BRS10]
Zurück zum Zitat Baskaran, M. M., Ramanujam, J., & Sadayappan, P. (2010). Automatic C-to-CUDA code generation for affine programs. In Proceedings of the 19th Joint European Conference on Theory and Practice of Software, International Conference on Compiler Construction (pp. 244–263). Berlin, Heidelberg: Springer. Baskaran, M. M., Ramanujam, J., & Sadayappan, P. (2010). Automatic C-to-CUDA code generation for affine programs. In Proceedings of the 19th Joint European Conference on Theory and Practice of Software, International Conference on Compiler Construction (pp. 244–263). Berlin, Heidelberg: Springer.
[DHT06]
Zurück zum Zitat Dutta, H., Hannig, F., & Teich, J. (2006). Hierarchical partitioning for piecewise linear algorithms. In Proceedings of the 5th International Conference on Parallel Computing in Electrical Engineering (PARELEC), September 2006 (pp. 153–160). Washington, DC, USA: IEEE Computer Society. Dutta, H., Hannig, F., & Teich, J. (2006). Hierarchical partitioning for piecewise linear algorithms. In Proceedings of the 5th International Conference on Parallel Computing in Electrical Engineering (PARELEC), September 2006 (pp. 153–160). Washington, DC, USA: IEEE Computer Society.
[DKR92]
Zurück zum Zitat Darte, A., Khachiyan, L., & Robert, Y. (1992). Linear scheduling is close to optimality. In Proceedings of the International Conference on Application Specific Array Processors, August 1992 (pp. 37–46). Darte, A., Khachiyan, L., & Robert, Y. (1992). Linear scheduling is close to optimality. In Proceedings of the International Conference on Application Specific Array Processors, August 1992 (pp. 37–46).
[DR95]
Zurück zum Zitat Darte, A., & Robert, Y. (1995). Affine-by-statement scheduling of uniform and affine loop nests over parametric domains. Journal of Parallel and Distributed Computing, 29(1), 43–59.CrossRef Darte, A., & Robert, Y. (1995). Affine-by-statement scheduling of uniform and affine loop nests over parametric domains. Journal of Parallel and Distributed Computing, 29(1), 43–59.CrossRef
[DSR.
Zurück zum Zitat Darte, A., Schreiber, R., Rau, B. R., & Vivien, F. (2000) A constructive solution to the juggling problem in systolic array synthesis. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS) (pp. 815–821). Darte, A., Schreiber, R., Rau, B. R., & Vivien, F. (2000) A constructive solution to the juggling problem in systolic array synthesis. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS) (pp. 815–821).
[DX11]
Zurück zum Zitat Di, P., & Xue, J. (2011) Model-driven tile size selection for doacross loops on GPUs. In Proceedings of the 17th International Conference on Parallel Processing - Volume Part II, Euro-Par, Berlin, Heidelberg, 2011 (pp. 401–412). Di, P., & Xue, J. (2011) Model-driven tile size selection for doacross loops on GPUs. In Proceedings of the 17th International Conference on Parallel Processing - Volume Part II, Euro-Par, Berlin, Heidelberg, 2011 (pp. 401–412).
[DYS.
Zurück zum Zitat Di, P., Ye, D., Su, Y., Sui, Y., & Xue, J. (2012). Automatic parallelization of tiled loop nests with enhanced fine-grained parallelism on GPUs. In 2012 41st International Conference on Parallel Processing (pp. 350–359). Di, P., Ye, D., Su, Y., Sui, Y., & Xue, J. (2012). Automatic parallelization of tiled loop nests with enhanced fine-grained parallelism on GPUs. In 2012 41st International Conference on Parallel Processing (pp. 350–359).
[EM99]
Zurück zum Zitat Eckhardt, U., & Merker, R. (1999). Hierarchical algorithm partitioning at system level for an improved utilization of memory structures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 18(1), 14–24.CrossRef Eckhardt, U., & Merker, R. (1999). Hierarchical algorithm partitioning at system level for an improved utilization of memory structures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 18(1), 14–24.CrossRef
[Han09]
Zurück zum Zitat Hannig, F. (2009). Scheduling Techniques for High-throughput Loop Accelerators. Dissertation, University of Erlangen-Nuremberg, Germany, Verlag Dr. Hut, Munich, Germany. ISBN: 978-3-86853-220-3. Hannig, F. (2009). Scheduling Techniques for High-throughput Loop Accelerators. Dissertation, University of Erlangen-Nuremberg, Germany, Verlag Dr. Hut, Munich, Germany. ISBN: 978-3-86853-220-3.
[HBB.
Zurück zum Zitat Hartono, A., Baskaran, M. M., Bastoul, C., Cohen, A., Krishnamoorthy, S., Norris, B., et al. (2009). Parametric multi-level tiling of imperfectly nested loops. In Proceedings of the 23rd International Conference on Supercomputing (ICS), New York, NY, USA, 2009 (pp. 147–157). Hartono, A., Baskaran, M. M., Bastoul, C., Cohen, A., Krishnamoorthy, S., Norris, B., et al. (2009). Parametric multi-level tiling of imperfectly nested loops. In Proceedings of the 23rd International Conference on Supercomputing (ICS), New York, NY, USA, 2009 (pp. 147–157).
[HBRS10]
Zurück zum Zitat Hartono, A., Baskaran, M. M., Ramanujam, J., & Sadayappan, P. (2010). DynTile: Parametric tiled loop generation for parallel execution on multicore processors. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS) (pp. 1–12). Hartono, A., Baskaran, M. M., Ramanujam, J., & Sadayappan, P. (2010). DynTile: Parametric tiled loop generation for parallel execution on multicore processors. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS) (pp. 1–12).
[HCF97]
Zurück zum Zitat Högstedt, K., Carter, L., & Ferrante, J. (1997). Determining the idle time of a tiling. In Proceedings of the 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (pp. 160–173). New York: ACM. Högstedt, K., Carter, L., & Ferrante, J. (1997). Determining the idle time of a tiling. In Proceedings of the 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (pp. 160–173). New York: ACM.
[HCF99]
Zurück zum Zitat Högstedt, K., Carter, L., & Ferrante, J. (1999). Selecting tile shape for minimal execution time. In Proceedings of the Eleventh Annual ACM Symposium on Parallel Algorithms and Architectures, New York, NY, USA, 1999 (pp. 201–211). Högstedt, K., Carter, L., & Ferrante, J. (1999). Selecting tile shape for minimal execution time. In Proceedings of the Eleventh Annual ACM Symposium on Parallel Algorithms and Architectures, New York, NY, USA, 1999 (pp. 201–211).
[HDT06]
Zurück zum Zitat Hannig, F., Dutta, H., & Teich, J. (2006). Mapping a class of dependence algorithms to coarse-grained reconfigurable arrays: Architectural parameters and methodology. International Journal of Embedded Systems, 2(1/2), 114–127.CrossRef Hannig, F., Dutta, H., & Teich, J. (2006). Mapping a class of dependence algorithms to coarse-grained reconfigurable arrays: Architectural parameters and methodology. International Journal of Embedded Systems, 2(1/2), 114–127.CrossRef
[HHB.
Zurück zum Zitat Henkel, J., Herkersdorf, A., Bauer, L., Wild, T., Hübner, M., Pujari, R. K., et al. (2012). Invasive manycore architectures. In 17th Asia and South Pacific Design Automation Conference (ASP-DAC) (pp. 193–200). New York: IEEE.CrossRef Henkel, J., Herkersdorf, A., Bauer, L., Wild, T., Hübner, M., Pujari, R. K., et al. (2012). Invasive manycore architectures. In 17th Asia and South Pacific Design Automation Conference (ASP-DAC) (pp. 193–200). New York: IEEE.CrossRef
[HRDT08]
Zurück zum Zitat Hannig, F., Ruckdeschel, H., Dutta, H., & Teich, J. (2008). PARO: Synthesis of hardware accelerators for multi-dimensional dataflow-intensive applications. In Proceedings of the Fourth International Workshop on Applied Reconfigurable Computing (ARC). Lecture notes in computer science, March 2008 (Vol. 4943, pp. 287–293). London, UK: Springer. Hannig, F., Ruckdeschel, H., Dutta, H., & Teich, J. (2008). PARO: Synthesis of hardware accelerators for multi-dimensional dataflow-intensive applications. In Proceedings of the Fourth International Workshop on Applied Reconfigurable Computing (ARC). Lecture notes in computer science, March 2008 (Vol. 4943, pp. 287–293). London, UK: Springer.
[IR95]
Zurück zum Zitat Brewer, F., & Radivojevic, I. (1995) Symbolic scheduling techniques. In IEICE Transactions on Information and Systems, Japan, March 1995 (pp. 224–230). Brewer, F., & Radivojevic, I. (1995) Symbolic scheduling techniques. In IEICE Transactions on Information and Systems, Japan, March 1995 (pp. 224–230).
[IT88]
Zurück zum Zitat Irigoin, F., & Triolet, R. (1988). Supernode partitioning. In Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), San Diego, CA, USA, January 1988 (pp. 319–329). Irigoin, F., & Triolet, R. (1988). Supernode partitioning. In Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), San Diego, CA, USA, January 1988 (pp. 319–329).
[Jai86]
Zurück zum Zitat Jainandunsing, K. (1986). Optimal partitioning scheme for wavefront/systolic array processors. In Proceedings of IEEE Symposium on Circuits and Systems (pp. 940–943). Jainandunsing, K. (1986). Optimal partitioning scheme for wavefront/systolic array processors. In Proceedings of IEEE Symposium on Circuits and Systems (pp. 940–943).
[LYLW13]
Zurück zum Zitat Liu, D., Yin, S., Liu, L., & Wei, S. (2013). Polyhedral model based mapping optimization of loop nests for CGRAs. In Proceedings of the Design Automation Conference (DAC) (pp. 1–8). New York: IEEE. Liu, D., Yin, S., Liu, L., & Wei, S. (2013). Polyhedral model based mapping optimization of loop nests for CGRAs. In Proceedings of the Design Automation Conference (DAC) (pp. 1–8). New York: IEEE.
[MF86]
Zurück zum Zitat Moldovan, D. I., & Fortes, J. A. B. (1986). Partitioning and mapping algorithms into fixed size systolic arrays. IEEE Transactions on Computers, C-35(1), 1–12.CrossRefMATH Moldovan, D. I., & Fortes, J. A. B. (1986). Partitioning and mapping algorithms into fixed size systolic arrays. IEEE Transactions on Computers, C-35(1), 1–12.CrossRefMATH
[RKRS07]
Zurück zum Zitat Renganarayanan, L., Kim, D., Rajopadhye, S., & Strout, M. M. (2007). Parameterized tiled loops for free. In Proceeding of the Conference on Programming Language Design and Implementation, San Diego, CA, USA, 2007 (pp. 405–414). Renganarayanan, L., Kim, D., Rajopadhye, S., & Strout, M. M. (2007). Parameterized tiled loops for free. In Proceeding of the Conference on Programming Language Design and Implementation, San Diego, CA, USA, 2007 (pp. 405–414).
[RKSR10]
Zurück zum Zitat Renganarayanan, L., Kim, D., Strout, M. M., Rajopadhye, S. (2010). Parameterized loop tiling. ACM Transactions on Programming Languages and Systems (pp. 3:1–3:41). Renganarayanan, L., Kim, D., Strout, M. M., Rajopadhye, S. (2010). Parameterized loop tiling. ACM Transactions on Programming Languages and Systems (pp. 3:1–3:41).
[RTG.
Zurück zum Zitat Rong, H., Tang, Z., Govindarajan, R., Douillet, A., & Gao, G. R. (2007). Single-dimension software pipelining for multidimensional loops. ACM Transactions on Architecture and Code Optimization (TACO), 4(1), 7:1–7:44. Rong, H., Tang, Z., Govindarajan, R., Douillet, A., & Gao, G. R. (2007). Single-dimension software pipelining for multidimensional loops. ACM Transactions on Architecture and Code Optimization (TACO), 4(1), 7:1–7:44.
[SF91]
Zurück zum Zitat Shang, W., & Fortes, J. A. B. (1991). Time optimal linear schedules for algorithms with uniform dependencies. IEEE Transactions on Computers, 40(6), 723–742.MathSciNetCrossRef Shang, W., & Fortes, J. A. B. (1991). Time optimal linear schedules for algorithms with uniform dependencies. IEEE Transactions on Computers, 40(6), 723–742.MathSciNetCrossRef
[Tei93]
Zurück zum Zitat Teich, J. (1993). A compiler for application specific processor arrays. Reihe Elektrotechnik. Freiburg, Germany: Shaker. ISBN: 9783861117018. Teich, J. (1993). A compiler for application specific processor arrays. Reihe Elektrotechnik. Freiburg, Germany: Shaker. ISBN: 9783861117018.
[THB.
Zurück zum Zitat Tavarageri, S., Hartono, A., Baskaran, M., Pouchet, L.-N., Ramanujam, J., & Sadayappan, P. (2010). Parametric tiling of affine loop nests. In 15th Workshop on Compilers for Parallel Computing (CPC), Vienna, Austria, July 2010. Tavarageri, S., Hartono, A., Baskaran, M., Pouchet, L.-N., Ramanujam, J., & Sadayappan, P. (2010). Parametric tiling of affine loop nests. In 15th Workshop on Compilers for Parallel Computing (CPC), Vienna, Austria, July 2010.
[TR91]
Zurück zum Zitat Thiele, L., & Roychowdhury, V. P. (1991). Systematic design of local processor arrays for numerical algorithms. In Proceedings of the International Workshop on Algorithms and Parallel VLSI Architectures, Amsterdam, The Netherlands, 1991 (Vol. A: Tutorials, pp. 329–339). Thiele, L., & Roychowdhury, V. P. (1991). Systematic design of local processor arrays for numerical algorithms. In Proceedings of the International Workshop on Algorithms and Parallel VLSI Architectures, Amsterdam, The Netherlands, 1991 (Vol. A: Tutorials, pp. 329–339).
[TT93]
Zurück zum Zitat Teich, J., & Thiele, L. (1993). Partitioning of processor arrays: A piecewise regular approach. Integration-The Vlsi Journal,14(3), 297–332. Teich, J., & Thiele, L. (1993). Partitioning of processor arrays: A piecewise regular approach. Integration-The Vlsi Journal,14(3), 297–332.
[TT02]
Zurück zum Zitat Teich, J., & Thiele, L. (2002). Exact partitioning of affine dependence algorithms. In Embedded Processor Design Challenges. Lecture notes in computer science (Vol. 2268, pp. 135–151). Berlin, Germany: Springer. Teich, J., & Thiele, L. (2002). Exact partitioning of affine dependence algorithms. In Embedded Processor Design Challenges. Lecture notes in computer science (Vol. 2268, pp. 135–151). Berlin, Germany: Springer.
[TTH13]
Zurück zum Zitat Teich, J., Tanase, A., & Hannig, F. (2013). Symbolic parallelization of loop programs for massively parallel processor arrays. In Proceedings of the 24th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP) (pp. 1–9). New York: IEEE. Best Paper Award. Teich, J., Tanase, A., & Hannig, F. (2013). Symbolic parallelization of loop programs for massively parallel processor arrays. In Proceedings of the 24th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP) (pp. 1–9). New York: IEEE. Best Paper Award.
[TTH14]
Zurück zum Zitat Teich, J., Tanase, A., & Hannig, F. (2014). Symbolic mapping of loop programs onto processor arrays. Journal of Signal Processing Systems, 77(1–2), 31–59.CrossRef Teich, J., Tanase, A., & Hannig, F. (2014). Symbolic mapping of loop programs onto processor arrays. Journal of Signal Processing Systems, 77(1–2), 31–59.CrossRef
[TTZ96]
Zurück zum Zitat Teich, J., Thiele, L., & Zhang, L. (1996). Scheduling of partitioned regular algorithms on processor arrays with constrained resources. In Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors, ASAP’96 (p. 131). Washington, DC, USA: IEEE Computer Society. Teich, J., Thiele, L., & Zhang, L. (1996). Scheduling of partitioned regular algorithms on processor arrays with constrained resources. In Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors, ASAP’96 (p. 131). Washington, DC, USA: IEEE Computer Society.
[TTZ97a]
Zurück zum Zitat Teich, J., Thiele, L., & Zhang, L. (1997). Scheduling of partitioned regular algorithms on processor arrays with constrained resources. Journal of VLSI Signal Processing, 17(1), 5–20.CrossRef Teich, J., Thiele, L., & Zhang, L. (1997). Scheduling of partitioned regular algorithms on processor arrays with constrained resources. Journal of VLSI Signal Processing, 17(1), 5–20.CrossRef
[TWTH14]
Zurück zum Zitat Tanase, A., Witterauf, M., Teich, J., & Hannig, F. (2014). Symbolic inner loop parallelisation for massively parallel processor arrays. In Proceedings of the 12th ACM-IEEE International Conference on Formal Methods and Models for System Design (MEMOCODE) (pp. 219–228). Tanase, A., Witterauf, M., Teich, J., & Hannig, F. (2014). Symbolic inner loop parallelisation for massively parallel processor arrays. In Proceedings of the 12th ACM-IEEE International Conference on Formal Methods and Models for System Design (MEMOCODE) (pp. 219–228).
[WTHT16]
Zurück zum Zitat Witterauf, M., Tanase, A., Hannig, F., & Teich, J. (2016). Modulo scheduling of symbolically tiled loops for tightly coupled processor arrays. In Proceedings of the 27th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP) (pp. 58–66). New York: IEEE. Witterauf, M., Tanase, A., Hannig, F., & Teich, J. (2016). Modulo scheduling of symbolically tiled loops for tightly coupled processor arrays. In Proceedings of the 27th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP) (pp. 58–66). New York: IEEE.
[Xue00]
Zurück zum Zitat Xue, J. (2000). Loop tiling for parallelism. Norwell, MA, USA: Kluwer Academic Publishers.CrossRefMATH Xue, J. (2000). Loop tiling for parallelism. Norwell, MA, USA: Kluwer Academic Publishers.CrossRefMATH
[YI95]
Zurück zum Zitat Yang, T., & Ibarra, O. H. (1995). On symbolic scheduling and parallel complexity of loops. In Proceedings IEEE Symposium Parallel and Distributed Processing (pp. 360–367). Yang, T., & Ibarra, O. H. (1995). On symbolic scheduling and parallel complexity of loops. In Proceedings IEEE Symposium Parallel and Distributed Processing (pp. 360–367).
[ZA01]
Zurück zum Zitat Zimmermann, K.-H., & Achtziger, W. (2001). Optimal piecewise linear schedules for LSGP- and LPGS-decomposed array processors via quadratic programming. Computer Physics Communications, 139(1), 64–89.CrossRefMATH Zimmermann, K.-H., & Achtziger, W. (2001). Optimal piecewise linear schedules for LSGP- and LPGS-decomposed array processors via quadratic programming. Computer Physics Communications, 139(1), 64–89.CrossRefMATH
[Zim97]
Zurück zum Zitat Zimmermann, K.-H. (1997). A unifying lattice-based approach for the partitioning of systolic arrays via LPGS and LSGP. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology, 17(1), 21–41.CrossRefMATH Zimmermann, K.-H. (1997). A unifying lattice-based approach for the partitioning of systolic arrays via LPGS and LSGP. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology, 17(1), 21–41.CrossRefMATH
Metadaten
Titel
Symbolic Parallelization
verfasst von
Alexandru-Petru Tanase
Frank Hannig
Jürgen Teich
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-73909-0_3

Neuer Inhalt